mem_none vs mem_threadgroup

The documentation in the Metal Shading Language spec is as follows:


  • mem_none
    In this case, no memory fence is applied, and threadgroup_barrier acts only as an execution barrier.
  • mem_threadgroup
    Ensure correct ordering of memory operations to threadgroup memory for threads in a threadgroup.


Does this mean whenever we are using threadgroup memory, we need to use

mem_threadgroup
for our barriers? If so, under what circumstances does
mem_none
suffice?


I've seen code where threadgroup memory is loaded, but

mem_none
is used (is this code incorect?). And yet another example where
mem_threadgroup
is used.

Accepted Reply

The memflags set in the barrier tell the compiler which caches need to be flushed so that all threads can see the same thing when yoru code executes the barrier. If you use mem_none, no caches will be flushed and it's undefined whether values written by one thread to any type of memory will be seen by any other thread. If you set mem_threadgroup, you can be assured that any values written to threadgroup memory (and only threadgroup memory) can be seen by other threads after the barrier.


So to answer your quesiton, if your kernel isn't dependant on values written from another thread into threadgroup memory, you can use mem_none. But if you're using threadgroup memory in the first place, it's likely (but not a given) that you're using it to communicate between threads, so you'll probably want to set mem_threadgroup.

Replies

I saw a same question on the stack overflow, is it asked by you ?

and did you get the real answer of this question?

The memflags set in the barrier tell the compiler which caches need to be flushed so that all threads can see the same thing when yoru code executes the barrier. If you use mem_none, no caches will be flushed and it's undefined whether values written by one thread to any type of memory will be seen by any other thread. If you set mem_threadgroup, you can be assured that any values written to threadgroup memory (and only threadgroup memory) can be seen by other threads after the barrier.


So to answer your quesiton, if your kernel isn't dependant on values written from another thread into threadgroup memory, you can use mem_none. But if you're using threadgroup memory in the first place, it's likely (but not a given) that you're using it to communicate between threads, so you'll probably want to set mem_threadgroup.