Post

Replies

Boosts

Views

Activity

Reply to Tile shading pipeline without fragment shader?
I was not able to find how to do what I wanted exactly (building a per-tile list of primitives), but a reasonable alternative is to build a per-fragment list of primitives using raster order groups. Works very well for my purpose. The approach, in case anyone is interested, is described in detail here: https://developer.apple.com/documentation/metal/metal_sample_code_library/implementing_order-independent_transparency_with_image_blocks
Oct ’23
Reply to Pointers in MSL
I second the above. You are writing to a null pointer, which is undefined behavior. There is no standard way to allocate device memory in Metal. Use local storage (value instead of pointer) if you only need the context for the invocation of the shader, or implement your own bump allocator from a pre-allocated data buffer if you need the context to escape the shader invocation.
Jul ’23
Reply to Generating vertex data in compute shader
I would still consider mesh shaders even if regenerating the geometry every frame seems wasteful. You will likely end up with considerably simpler code and unless your terrain generation is extremely complicated I kind of doubt that you will see any performance degradation (don't forget that the mesh shader also performs the function of the vertex shader — with the benefit of having access to neighbouring vertex data "for free"). Tessellation for example also regenerates tons of geometry every frame, and yet it's a popular technique fr improving performance (because storing and moving all that geometry in memory ends up being more expensive than regenerating it on the fly).
Jul ’23
Reply to Bindless/GPU-Driven approach with dynamic scenes?
I am not quite sure what exactly the problem is that you are trying to solve, hence it’s difficult to give recommendations. Generally, if you have truly dynamic geometry (which is kind of difficult for me to imagine - why would your geometry change that radically every frame?), you can either compute new vertex data on the CPU (I think Apple recommends tripple buffering) and send the appropriate draw call, or use mesh shaders and generate the geometry directly on the GPU.
Jun ’23
Reply to Some feature requests for Metal
Since there seems to be an interest in these questions a year after they were asked... Metal 3 has added support for GPU buffers and resource handles, as well as mesh shaders. This ticks off two big feature requests from the list and allows flexible encoding of resources on the CPU (or the GPU!) without any encoder APIs. The 500k resources limit has been taken a bit out out of the context. DX12 tier 2 guarantees availability of 1 million descriptors (or binding points). But Metal does not have a concept of descriptors. The hardware still uses descriptors of curse, but these details are hidden from the user and managed by the Metal driver. A GPU resource ID or a GPU buffer address are not descriptors, but either direct addresses or offsets into a hidden descriptor table. This changes the balance of things somewhat. There is no fixed limit to the number of "texture binding points" you can use with Argument Buffers for example — the only limit is the size of the buffer itself. And Metal does not need data buffer descriptors in the first place — it uses direct GPU address pointers instead. So if you are porting a DX12 game or emulating DX12 environment, you can trivially create a Metal buffer with a million texture binding points — this is fully supported a will work. What's more, you can do resource type erasure by binding this buffer to multiple typed targets simultaneously (e.g. to use the same buffer to bind different types of texture resources). Metal Argument Buffers are basically syntactic sugar over the popular bindless model — it's just that in DX12 you'd use the descriptor pool and integer indices to select resources from the pool, and in Metal the descriptor pool is hidden and the index is hidden behind the texture2D etc. object. But any time you use this texture2D object the Metal shader (at least on Apple Silicon hardware) actually translates is to something like pool[texture_id] (for more low-level info see here: https://github.com/dougallj/applegpu/issues/37). In fact, Apple Silicon GPU works very similar to current hardware from Nvidia. Instead, the 500k limit appears to be the maximal number of resources your application can use. Every time you create a texture object, Metal driver adds a descriptor to the hidden descriptor pool. If you try to create a lot of textures, you will experience major slowdown. No idea whether there is a hardware limitation or a driver implementation limit. And since it's fairly unlikely that a game will actually need half a million textures (not with current GPU memory sizes anyway), I don't see this limitation being relevant in practice for the next few years.
May ’23
Reply to Metal makeBuffer Question
It's mostly about who own the buffer memory: you or the Metal runtime. The no-copy method won't copy the data, but you are responsible for the memory allocation, plus there are some additional requirements your allocations has to fulfil (such as being page-aligned). So it's not like you can take any CPU memory allocation and turn it into a Metal buffer. The bytes method will make a new allocation and copy the data, but this is fast enough in practice, so if all you need is to upload some data to the GPU in advance it can be a reasonable default choice. I think for a lot of practical applications, both these method variants are a bit niche as it's not that often that you start with some block of CPU data that you want to expose to Metal. The most frequent case is that you allocate a buffer and then upload the data into the buffer memory directly (e.g. from a file or some other data-producing algorithm). Apple just gives you a bunch of options so the you can pick what best suits your use case.
Mar ’23
Reply to OpenGL on future iPhones and Macs?
I did get your comment. And I am saying that learning OpenGL in 2023 is a waste of your time. Metal is the simplest modern GPU API currently on the market and is very suitable to learning 3D graphics programming for a beginner. So if you own a Mac and want to learn desktop GPU programming, it's pretty much a no-brainer. Once you understand the ropes you can very easily graduate to less user-friendly APIs like Vulkan or DirectX. If you want to make web applications instead, WebGL is the way to go.
Feb ’23
Reply to OpenGL on future iPhones and Macs?
I have difficulty imagining a better API to start learning graphics programming than Metal. It's streamlined, compact, has very little conceptual overhead and should be familiar to anyone who knows some C++. There are plenty of good tutorials around and you can focus on important things instead of fighting the API. Also, learning Metal will also teach you the best practices of programming modern GPUs. Vulkan and DX12 are both much more complex and idiosyncratic, both in the API surface and the mental model. If you are familiar with structs and pointers, you already understand the Metal resource binding model. With other APIs you have to learn opaque multi-level concepts full of weird limitations and awkward glue. And OpenGL? Why would you waste your time with a morally obsolete API? Sure, ancient GL with its immediate mode and fixed function can appear simpler, but GPU programming has moved beyond this many years ago. If you really want to learn modern GPU programming you might as well actually learn modern GPU programming.
Feb ’23
Reply to Are threadgroups executed by cores or execution units on Apple GPUs?
I dint know whether Wikipedia description is accurate. An Apple GPU core is a hardware device that has its own register file and cache (probably its own separate L2 cache). Logically, it also contains 4 32-wide (1024-bit) SIMD ALUs - we dint know how the hardware actually looks like, it could be multiple smaller ALUs operating in lockstep. Since a threadgroup is guaranteed to share the on-device memory, all threads in a threadgroup will execute on the same GPU core (with the caveat that the driver might decide to move the threadgroup to a different core for various reasons). As to the difference in layers, logically, there are none. Nvidia marketing focuses on “CUDA cores” (single lane of an ALU), Apple marketing focuses on “cores”. There are obviously differences in hardware architecture. For what I understand, Apples cores are minimal hardware units that can be individually replicated. The closest Nvidia equivalent is probably the GPC. Functionally, however, Apple core is not unlike Nvidia’s SM. Most of these differences are likely because of the different approach to rendering.
Feb ’23
Reply to Help finding full Metal2 API operations list...
I don't think the available documentation is this precise. You can try using clang builtins like __builtin_addc() etc. and see if it works, but frankly, I would be very surprised if Apple GPUs supported these operations. However, Apple Silicon does support 64-bit integers, maybe you can utilise them somehow? Still, double precisions at only 4x performance penalty might be too ambitious of a goal...
Nov ’22
Reply to Why aren't the Metal Feature Set Tables up to date?
@philipturner I think your frustration is justified. Apple's developer relations are unfortunately a mess. And it's not a criticism to the fine people who do all the hard work in the background and occasionally help us on these forums, but the obvious lack of structure and ownership in these matters. Lack of updates to the Metal Feature Set tables is just one symptom of a wide systemic problem. For example, the Metal Shading Language is very difficult to use as a reference tool due to subpar formatting and lack of hyperlinks. The API documentation is also lacklustre, incomplete and difficult to navigate. Forum communication is almost non-existent. It would be great if Apple considered creating a role dedicated to improving these aspects because it seems like this is something nobody really feels responsible for.
Oct ’22