Data race-free memory-access-pattern in compute-kernels?

Question

hvp OP

Created Dec ’19

Replies 1

Boosts 0

Participants 1

Hi,

my assumption was that it should be possible to read / write to an array without the need of atomics by creating threadgroups of the size (threadExecutionWidth, 1, 1) and using the thread_index_in_threadgroup as index of the array.

A simple kernel to demonstrate what I mean:

kernel void test(volatile device uint *counter [[buffer(0)]],
                 ushort tid [[thread_index_in_threadgroup]]) {
    counter[tid] += 1;
}

The test shows that my assumption is wrong (for grids with a width of threadExecutionWidth, the value stays at 1 for all elements up to a grid-height of 100 and only gets bigger when the height gets way higher).

I hoped to avoid atomic operations / synchronisations by using this pattern.

So, I have two questions:

1. Is a thread_index_in_threadgroup unique across all active threadgroups at a time? (Otherwise, the pattern obviously fails because of data-races)

2. Is there a cheap way to force the counter-array be synchronised across the threads? Seeing a value of 1 doesn't make any sense and must mean that the threads get a cached value.

Thanks,

Hendrik

Boost

Answer 1

hvp OP

Dec ’19

I think I found my mental misconception of the threads: A thread won't work on a single function until the function is finished, but may interrupt the current execution of the function and move to a different function (or another instance of the same function) - basically like a normal thread on the CPU.

0