Clarification of Tier 2 Argument Buffer limits

Apple documentation states about Tier 2 Argument Buffer hardware capability

The maximum per-app resources available at any given time are: 500,000 buffers or textures

What does it mean exactly? Does this number refer to the maximal count of attachment points (e.g. unique indices) across all bound argument buffers, the maximal count of only bound resources across the argument buffers (e.g. when using dynamic indexing and sparsely binding resources) or the number of resource objects that the application can create and manage at a given time?

Prompted by some discussions in the community I decided to run some tests and was surprised to discover that I could bind many millions buffer attachments to a single argument buffer in a Metal shader on my M1 Max laptop, way in excess of the quoted 500,000 limit. Is that just undefined behaviour that one should not rely on or does "500,000" refer to something else instead of the number of attachment points?

Hope that someone from Apple Gpu team can clarify this. If this is not the correct venue for this question, please tell me where I can send my inquiry.

Accepted Reply

This limit refers to the number of separate buffers a shader or kernel can access in a single draw or dispatch. 500,000 is basically code for "more than you will ever need".

I expect that if your kernel/shader accesses more than 500,000 separate allocations in a single draw or dispatch, you will likely hit some pretty significant performance limitations. Having millions of buffers in a single argument buffer shouldn't have any real performance effects, accessing so many buffers would though.

Replies

This limit refers to the number of separate buffers a shader or kernel can access in a single draw or dispatch. 500,000 is basically code for "more than you will ever need".

I expect that if your kernel/shader accesses more than 500,000 separate allocations in a single draw or dispatch, you will likely hit some pretty significant performance limitations. Having millions of buffers in a single argument buffer shouldn't have any real performance effects, accessing so many buffers would though.

Thanks for the clarification! The origin of the question is because some members of the community claim that Metal cannot effectively emulate Vulkan or DX12 binding model since DX12 guarantees that you can have at least a million attachment points. This has apparently prompted a popular programming technique in DX12 world where one would just allocate a million attachment points for textures and then use dynamic indexing to pick the relevant one's. There were concerns that you can't do it with Metal (in the context of frameworks such as MoltenVK), but it seems like you indeed can.