I am seeking clarification regarding the new device-coherent memory (buffers and textures) in Metal 3.2. Do I understand the documentation correctly that this feature allows threads from different threadgroups to update data in device memory cooperatively? The documentation mentions, "[results of operations] are visible to other threads across thread groups if you synchronize them properly." How does one do proper synchronization? From what I understand, Metal has no device-scoped barriers.
In Metal 3.2, we’ve introduced atomic_thread_fence
with a thread_scope
(§6.15.2 and §6.15.3 of the MSL Specification. You can use a thread_scope_device
fence along with device
atomics to synchronize coherent(device) (§4.8) memory operations across thread groups.