Post

Replies

Boosts

Views

Activity

Comment on How to max computing throughput?
"Each execution unit (core, whatever) ..." While you wrote this 5 years ago, these are certainly not the same today, as a core contains many execution units. I'm confused whether it's the core or the execution unit that's analogous to the streaming multiprocessor. Are threadgroups executed at the scope of a core or an execution unit? Above you say at the scope of a core, but I'm doubtful.... any comments on this distinction?
Feb ’23
Comment on how to read metal feature set table?
That table shows the GPUs for Metal 3, but is there some similar listing somewhere for the common families and Mac2? I'm interested to understand overlap. For example, indirect compute command buffers exist for Common2, Common3, and Metal 3, but not Mac2 so I'm wondering how it's disjoint (and whether or not to design around indirect command buffers as a result).
Dec ’22
Comment on Apple on Metal and Moltenvk
While portability for WebGPU is compelling, it lacks high performance, highly portable features such as Metal SIMD group operations (analogous to Vulkan subgroup operations) present on most GPUs today. Building a safe and portable abstract layer like WebGPU comes with such limitations.
Nov ’22