Hi,
In Metal compute shaders is there any way to share a resource among GPU threads? I mean for example when each thread would like to append to the same linked list. I think (although I have never tried) it is possible to implement some mutex like protection with atomics and polled wait, but that wouldn't ensure memory/cache coherency. Thread group barrier within a mutex would also not work.
My only idea for handling such kind of shared structure is that each thread should read/write its dedicated structure, than after a barrier, 1 thread from the group could merge structures together. It could involve using a lot of extra memory and an additional merge (thread group dedicated structures also have to be merged together).