Posts

Post marked as solved
1 Replies
I was not able to find how to do what I wanted exactly (building a per-tile list of primitives), but a reasonable alternative is to build a per-fragment list of primitives using raster order groups. Works very well for my purpose. The approach, in case anyone is interested, is described in detail here: https://developer.apple.com/documentation/metal/metal_sample_code_library/implementing_order-independent_transparency_with_image_blocks
Post not yet marked as solved
1 Replies
I think it's texture.sample(sampler, uv, level(0.0))
Post not yet marked as solved
6 Replies
I would still consider mesh shaders even if regenerating the geometry every frame seems wasteful. You will likely end up with considerably simpler code and unless your terrain generation is extremely complicated I kind of doubt that you will see any performance degradation (don't forget that the mesh shader also performs the function of the vertex shader — with the benefit of having access to neighbouring vertex data "for free"). Tessellation for example also regenerates tons of geometry every frame, and yet it's a popular technique fr improving performance (because storing and moving all that geometry in memory ends up being more expensive than regenerating it on the fly).
Post not yet marked as solved
6 Replies
Consider looking into mesh shaders, they are designed for this kind of thing.
Post not yet marked as solved
3 Replies
I am not quite sure what exactly the problem is that you are trying to solve, hence it’s difficult to give recommendations. Generally, if you have truly dynamic geometry (which is kind of difficult for me to imagine - why would your geometry change that radically every frame?), you can either compute new vertex data on the CPU (I think Apple recommends tripple buffering) and send the appropriate draw call, or use mesh shaders and generate the geometry directly on the GPU.
Post not yet marked as solved
5 Replies
Since there seems to be an interest in these questions a year after they were asked... Metal 3 has added support for GPU buffers and resource handles, as well as mesh shaders. This ticks off two big feature requests from the list and allows flexible encoding of resources on the CPU (or the GPU!) without any encoder APIs. The 500k resources limit has been taken a bit out out of the context. DX12 tier 2 guarantees availability of 1 million descriptors (or binding points). But Metal does not have a concept of descriptors. The hardware still uses descriptors of curse, but these details are hidden from the user and managed by the Metal driver. A GPU resource ID or a GPU buffer address are not descriptors, but either direct addresses or offsets into a hidden descriptor table. This changes the balance of things somewhat. There is no fixed limit to the number of "texture binding points" you can use with Argument Buffers for example — the only limit is the size of the buffer itself. And Metal does not need data buffer descriptors in the first place — it uses direct GPU address pointers instead. So if you are porting a DX12 game or emulating DX12 environment, you can trivially create a Metal buffer with a million texture binding points — this is fully supported a will work. What's more, you can do resource type erasure by binding this buffer to multiple typed targets simultaneously (e.g. to use the same buffer to bind different types of texture resources). Metal Argument Buffers are basically syntactic sugar over the popular bindless model — it's just that in DX12 you'd use the descriptor pool and integer indices to select resources from the pool, and in Metal the descriptor pool is hidden and the index is hidden behind the texture2D etc. object. But any time you use this texture2D object the Metal shader (at least on Apple Silicon hardware) actually translates is to something like pool[texture_id] (for more low-level info see here: https://github.com/dougallj/applegpu/issues/37). In fact, Apple Silicon GPU works very similar to current hardware from Nvidia. Instead, the 500k limit appears to be the maximal number of resources your application can use. Every time you create a texture object, Metal driver adds a descriptor to the hidden descriptor pool. If you try to create a lot of textures, you will experience major slowdown. No idea whether there is a hardware limitation or a driver implementation limit. And since it's fairly unlikely that a game will actually need half a million textures (not with current GPU memory sizes anyway), I don't see this limitation being relevant in practice for the next few years.
Post not yet marked as solved
1 Replies
It's mostly about who own the buffer memory: you or the Metal runtime. The no-copy method won't copy the data, but you are responsible for the memory allocation, plus there are some additional requirements your allocations has to fulfil (such as being page-aligned). So it's not like you can take any CPU memory allocation and turn it into a Metal buffer. The bytes method will make a new allocation and copy the data, but this is fast enough in practice, so if all you need is to upload some data to the GPU in advance it can be a reasonable default choice. I think for a lot of practical applications, both these method variants are a bit niche as it's not that often that you start with some block of CPU data that you want to expose to Metal. The most frequent case is that you allocate a buffer and then upload the data into the buffer memory directly (e.g. from a file or some other data-producing algorithm). Apple just gives you a bunch of options so the you can pick what best suits your use case.
Post not yet marked as solved
15 Replies
I did get your comment. And I am saying that learning OpenGL in 2023 is a waste of your time. Metal is the simplest modern GPU API currently on the market and is very suitable to learning 3D graphics programming for a beginner. So if you own a Mac and want to learn desktop GPU programming, it's pretty much a no-brainer. Once you understand the ropes you can very easily graduate to less user-friendly APIs like Vulkan or DirectX. If you want to make web applications instead, WebGL is the way to go.
Post not yet marked as solved
15 Replies
I have difficulty imagining a better API to start learning graphics programming than Metal. It's streamlined, compact, has very little conceptual overhead and should be familiar to anyone who knows some C++. There are plenty of good tutorials around and you can focus on important things instead of fighting the API. Also, learning Metal will also teach you the best practices of programming modern GPUs. Vulkan and DX12 are both much more complex and idiosyncratic, both in the API surface and the mental model. If you are familiar with structs and pointers, you already understand the Metal resource binding model. With other APIs you have to learn opaque multi-level concepts full of weird limitations and awkward glue. And OpenGL? Why would you waste your time with a morally obsolete API? Sure, ancient GL with its immediate mode and fixed function can appear simpler, but GPU programming has moved beyond this many years ago. If you really want to learn modern GPU programming you might as well actually learn modern GPU programming.
Post marked as solved
3 Replies
I dint know whether Wikipedia description is accurate. An Apple GPU core is a hardware device that has its own register file and cache (probably its own separate L2 cache). Logically, it also contains 4 32-wide (1024-bit) SIMD ALUs - we dint know how the hardware actually looks like, it could be multiple smaller ALUs operating in lockstep. Since a threadgroup is guaranteed to share the on-device memory, all threads in a threadgroup will execute on the same GPU core (with the caveat that the driver might decide to move the threadgroup to a different core for various reasons). As to the difference in layers, logically, there are none. Nvidia marketing focuses on “CUDA cores” (single lane of an ALU), Apple marketing focuses on “cores”. There are obviously differences in hardware architecture. For what I understand, Apples cores are minimal hardware units that can be individually replicated. The closest Nvidia equivalent is probably the GPC. Functionally, however, Apple core is not unlike Nvidia’s SM. Most of these differences are likely because of the different approach to rendering.
Post not yet marked as solved
2 Replies
I don't think the available documentation is this precise. You can try using clang builtins like __builtin_addc() etc. and see if it works, but frankly, I would be very surprised if Apple GPUs supported these operations. However, Apple Silicon does support 64-bit integers, maybe you can utilise them somehow? Still, double precisions at only 4x performance penalty might be too ambitious of a goal...
Post not yet marked as solved
7 Replies
@philipturner I think your frustration is justified. Apple's developer relations are unfortunately a mess. And it's not a criticism to the fine people who do all the hard work in the background and occasionally help us on these forums, but the obvious lack of structure and ownership in these matters. Lack of updates to the Metal Feature Set tables is just one symptom of a wide systemic problem. For example, the Metal Shading Language is very difficult to use as a reference tool due to subpar formatting and lack of hyperlinks. The API documentation is also lacklustre, incomplete and difficult to navigate. Forum communication is almost non-existent. It would be great if Apple considered creating a role dedicated to improving these aspects because it seems like this is something nobody really feels responsible for.
Post not yet marked as solved
2 Replies
Apple GPUs do not support FP64 and neither does Metal.
Post marked as solved
2 Replies
Thanks for the clarification! The origin of the question is because some members of the community claim that Metal cannot effectively emulate Vulkan or DX12 binding model since DX12 guarantees that you can have at least a million attachment points. This has apparently prompted a popular programming technique in DX12 world where one would just allocate a million attachment points for textures and then use dynamic indexing to pick the relevant one's. There were concerns that you can't do it with Metal (in the context of frameworks such as MoltenVK), but it seems like you indeed can.
Post marked as solved
2 Replies
Thanks, I also just happened to watch the relevant WWDC session. Great work! Metal is shaping up to be an incredible API, with unmatched power and ergonomy.