What is the purpose of threadgroup memory in Metal?

I have been working with Metal for a little while now and I have encountered the threadgroup address space. After reading a little about it in Apple’s MSL reference, I am aware of how threadgroups are formed and how they can be split into SIMD groups; however, I have not yet seen threadgroup memory in action. Can someone give me some examples of when/how threadgroup memory is used?

Specifically, how is the [[threadgroup(n)]] attribute used in both kernel and fragment shaders? References to WWDC videos, articles, and/or other resources would be appreciated.
Answered by darknoon in 642604022
The purpose of threadgroup or tile memory is that it is much faster to access than system memory, since it's directly on the GPU hardware as opposed to needing to send loads and stores out over the memory bus. You can store intermediate results there to avoid expensive ram traffic. Hope that helps.
Accepted Answer
The purpose of threadgroup or tile memory is that it is much faster to access than system memory, since it's directly on the GPU hardware as opposed to needing to send loads and stores out over the memory bus. You can store intermediate results there to avoid expensive ram traffic. Hope that helps.
This WWDC talk, https://developer.apple.com/videos/play/tech-talks/10858, near the 17 min mark discusses parallel reductions on the GPU, and the code sample for this at the 21 min mark shows an example usage of threadgroup memory. Of course this is just one of many ways threadgroup memory could be used, but I thought you might appreciate a concrete example.
What is the purpose of threadgroup memory in Metal?
 
 
Q