I have an image processing pipeline that performs some work on the CPU after the GPU processes a texture and then writes its result into a shared buffer (i.e. storageMode = .shared
) used by the CPU for its computation. After the CPU does its work, it similarly writes at a different offset into the same shared MTLBuffer object. The buffer is arranged as so:
uint | uint | .... | uint | float
offsets (contiguous):
0 | ...
where the floating point slot is written into by the CPU and later used by the GPU in subsequent compute passes.
I haven't been able to explain or find documentation on the following strange behavior. The compute pipeline with the above buffer (call it buffer A) is as follows (without the force unwraps):
let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()!
let commandBuffer = commandQueue.makeCommandBuffer()!
let sharedEvent = device.makeSharedEvent()!
let sharedEventQueue = DispatchQueue(label: "my-queue")
let sharedEventListener = MTLSharedEventListener(dispatchQueue: sharedEventQueue)
// Compute pipeline
kernelA.encode(commandBuffer: commandBuffer, sourceTexture: sourceTexture, destinationBuffer: bufferA)
commandBuffer.encodeCPUExecution(for: sharedEventObject, listener: sharedEventListener) { [self] in
var value = Float(0.0)
bufferA.unsafelyWrite(&value, offset: Self.targetBufferOffset)
}
kernelB.setTargetBuffer(histogramBuffer, offset: Self.targetBufferOffset)
kernelB.encode(commandBuffer: commandBuffer, sourceTexture: sourceTexture, destinationTexture: destinationTexture)
Note that commandBuffer.encodeCPUExecution
simply is a convenience function around the shared event object (encodeSignalEvent
and encodeWaitEvent
) that signals and waits on event.signaledValue + 1
and event.signaledValue + 2
respectively.
In the example above, kernel B does not see the writes made during the CPU execution. It can however see the values written into the buffer from kernelA.
The strange part: if you write to that same location in the buffer before the GPU schedules this work (e.g. during the encoding instead of in the middle of the GPU execution or whenever before), kernelB does see the value of the writes by the CPU.
This is odd behavior that to me suggests there is undefined behavior. If the buffer were .managed
I could understand the behavior since changes on each side must be made explicit; but with a .shared
buffer this behavior seems quite unexpected, especially considering that the CPU can read the values made by the preceding kernel (viz. kernelA)
What explains this strange behavior with Metal?
Note: This behavior occurs on an M1 Mac running MacCatalyst and an iPad Pro (5th generation) running iOS 15.3