While portability for WebGPU is compelling, it lacks high performance, highly portable features such as Metal SIMD group operations (analogous to Vulkan subgroup operations) present on most GPUs today. Building a safe and portable abstract layer like WebGPU comes with such limitations.
Post
Replies
Boosts
Views
Activity
Thank you. With xcrun metal -help I actually got the page for clang. But trying just man metal gave me a page just for metal.
Oh I see, the Feedback Assistant case number is FB11870726
That table shows the GPUs for Metal 3, but is there some similar listing somewhere for the common families and Mac2? I'm interested to understand overlap. For example, indirect compute command buffers exist for Common2, Common3, and Metal 3, but not Mac2 so I'm wondering how it's disjoint (and whether or not to design around indirect command buffers as a result).
Does this mean 'Metal3' can indeed be relied upon (with that correction)? For example, could I target Metal3 features rather than Apple7, Apple8, Mac2 features? and which GPU's are in the latter but not the former -- there must be some, as some features for Metal3 are not for Mac2.
I mean which GPU's are for Metal3 but not Apple7, Apple8, Mac2?
I mean which GPU's belong to Metal3 that don't belong to Apple7, Apple8, and Mac2?
"Each execution unit (core, whatever) ..."
While you wrote this 5 years ago, these are certainly not the same today, as a core contains many execution units.
I'm confused whether it's the core or the execution unit that's analogous to the streaming multiprocessor.
Are threadgroups executed at the scope of a core or an execution unit?
Above you say at the scope of a core, but I'm doubtful.... any comments on this distinction?