Sharing resource among GPU threads

Hi,


In Metal compute shaders is there any way to share a resource among GPU threads? I mean for example when each thread would like to append to the same linked list. I think (although I have never tried) it is possible to implement some mutex like protection with atomics and polled wait, but that wouldn't ensure memory/cache coherency. Thread group barrier within a mutex would also not work.


My only idea for handling such kind of shared structure is that each thread should read/write its dedicated structure, than after a barrier, 1 thread from the group could merge structures together. It could involve using a lot of extra memory and an additional merge (thread group dedicated structures also have to be merged together).

Replies

When writing parallel code you often have to think about problems in a different way. There is a course on Udacity, Intro to Parallel Programming, that explains the basic ideas of how to solve such problems on the GPU. They use CUDA, so the terminology is a little different, but the approaches are the same.

These are hardware vector threads, where each thread does "exactly the same thing" at "exactly the same time". Dont confuse these with CPU threads, they are a totally different animal.


It is also important to understand that the hardware warp size is not the same as the workgroup size. So on the device, calculations done inside a warp occur simulaneously, but for a large work group, it will be broken up into warp size blocks (and each warp block is executed in series).


This is why you need to insert a barrier after initializing local memory (so all warp blocks are allowed to finish before the rest of the kernel is executed).


So parallel algorithms are a bit different.


Atomic operations are used with memory buffer operations where multiple hardware threads might write to or read from the same memory location.


You can also split your "merge" operation into a separate kernel (perhaps with a different workgroup size) or even hand the buffers back to the CPU,

if the merge is not a parallel operation.