This WWDC talk, https://developer.apple.com/videos/play/tech-talks/10858, near the 17 min mark discusses parallel reductions on the GPU, and the code sample for this at the 21 min mark shows an example usage of threadgroup memory. Of course this is just one of many ways threadgroup memory could be used, but I thought you might appreciate a concrete example.