Post

Replies

Boosts

Views

Activity

metal encoder dispatch based on value of an atomic counter
I'm trying to find a way to reduce synchronization time between two compute shader calls, where one dispatch depends on an atomic counter from the other. Example: If I have two metal kernels, select and execute, select is looking through the numbers-buffer and stores the index of all numbers < 10 in a new buffer selectedNumberIndices by using an atomic counter. execute is then run counter number of times do do something with those selected indices. kernel void select (device atomic_uint &counter, device uint *numbers, device uint *selectedNumberIndices, uint id [[thread_position_in_grid]]) { if(numbers[id] < 10) { uint idx = atomic_fetch_add_explicit(&counter, 1, memory_order_relaxed); selectedNumbers[idx] = id; } } kernel void execute (device uint *selectedNumberIndices, uint id [[thread_position_in_grid]]) { // do something #counter number of times } currently I can do this by using .waitUntilCompleted() between the dispatches to ensure I get accurate results, something like: // select buffer = queue.makeCommandBuffer()! encoder = buffer.makeComputeCommandEncoder()! encoder.setComputePipelineState(selectState) encoder.setBuffer(counterBuffer, offset: 0, index: 0) encoder.setBuffer(numbersBuffer, offset: 0, index: 1) encoder.setBuffer(selectedNumberIndicesBuffer, index: 2) encoder.dispatchThreads(.init(width: Int(numbersCount), height: 1, depth: 1), threadsPerThreadgroup: .init(width: selectState.threadExecutionWidth, height: 1, depth: 1)) encoder.endEncoding() buffer.commit() // wait buffer.waitUntilCompleted() // execute buffer = queue.makeCommandBuffer()! encoder = buffer.makeComputeCommandEncoder()! encoder.setComputePipelineState(executeState) encoder.setBuffer(selectedNumberIndicesBuffer, index: 0) var counterValue: uint = 0 // extract the value of the atomic counter counterBuffer.contents().copyBytes(to: &counterValue, count: MemoryLayout<UInt32>.stride) encoder.dispatchThreads(.init(width: Int(counterValue), height: 1, depth: 1), threadsPerThreadgroup: .init(width: executeState.threadExecutionWidth, height: 1, depth: 1)) encoder.endEncoding() buffer.commit() My question is if there is any way I can have this same functionality without the costly buffer.waitUntilCompleted() call? Or am I going about this in completely the wrong way, or missing something else?
2
0
667
Aug ’23