1 Reply
      Latest reply on Dec 8, 2019 8:36 AM by hvp
      hvp Level 1 Level 1 (0 points)



        my assumption was that it should be possible to read / write to an array without the need of atomics by creating threadgroups of the size (threadExecutionWidth, 1, 1) and using the thread_index_in_threadgroup as index of the array.


        A simple kernel to demonstrate what I mean:

        kernel void test(volatile device uint *counter [[buffer(0)]],
                         ushort tid [[thread_index_in_threadgroup]]) {
            counter[tid] += 1;


        The test shows that my assumption is wrong (for grids with a width of threadExecutionWidth, the value stays at 1 for all elements up to a grid-height of 100 and only gets bigger when the height gets way higher).


        I hoped to avoid atomic operations / synchronisations by using this pattern.


        So, I have two questions:

        1. Is a thread_index_in_threadgroup unique across all active threadgroups at a time? (Otherwise, the pattern obviously fails because of data-races)

        2. Is there a cheap way to force the counter-array be synchronised across the threads? Seeing a value of 1 doesn't make any sense and must mean that the threads get a cached value.