I have been working with Metal for a little while now and I have encountered the threadgroup address space. After reading a little about it in Apple’s MSL reference, I am aware of how threadgroups are formed and how they can be split into SIMD groups; however, I have not yet seen threadgroup memory in action. Can someone give me some examples of when/how threadgroup memory is used?
Specifically, how is the [[threadgroup(n)]] attribute used in both kernel and fragment shaders? References to WWDC videos, articles, and/or other resources would be appreciated.
Specifically, how is the [[threadgroup(n)]] attribute used in both kernel and fragment shaders? References to WWDC videos, articles, and/or other resources would be appreciated.
The purpose of threadgroup or tile memory is that it is much faster to access than system memory, since it's directly on the GPU hardware as opposed to needing to send loads and stores out over the memory bus. You can store intermediate results there to avoid expensive ram traffic. Hope that helps.