hvp’s Profile | Apple Developer Forums

Data race-free memory-access-pattern in compute-kernels?

Hi,my assumption was that it should be possible to read / write to an array without the need of atomics by creating threadgroups of the size (threadExecutionWidth, 1, 1) and using the thread_index_in_threadgroup as index of the array.A simple kernel to demonstrate what I mean: kernel void test(volatile device uint *counter [[buffer(0)]], ushort tid [[thread_index_in_threadgroup]]) { counter[tid] += 1; }The test shows that my assumption is wrong (for grids with a width of threadExecutionWidth, the value stays at 1 for all elements up to a grid-height of 100 and only gets bigger when the height gets way higher).I hoped to avoid atomic operations / synchronisations by using this pattern.So, I have two questions:1. Is a thread_index_in_threadgroup unique across all active threadgroups at a time? (Otherwise, the pattern obviously fails because of data-races)2. Is there a cheap way to force the counter-array be synchronised across the threads? Seeing a value of 1 doesn't make any sense and must mean that the threads get a cached value.Thanks,Hendrik

Graphics & Games General Metal

692

Dec ’19

hvp

Post

Replies

Boosts

Views

Activity