MTLSharedEvent scheduled block called before command buffer scheduling and not in-flight

I am using a MTLSharedEvent to occasionally relay new information from the CPU to the GPU by writing into a MTLBuffer with storage mode .storageModeManaged within a block registered by the shared event (using the notify(_:atValue:block:) method of MTLSharedEvent, with a MTLSharedEventListener configured to be notified on a background dispatch queue). The process looks something like this:

Code Block swift
let device = MTLCreateSystemDefaultDevice()!
let synchronizationQueue = DispatchQueue(label: "com.myproject.synchronization")
let sharedEvent = device.makeSharedEvent()!
let sharedEventListener = MTLSharedEventListener(dispatchQueue: synchronizationQueue)
// Updated only occasionally on the CPU (on user interaction). Mostly written to
// on the GPU
let managedBuffer = device.makeBuffer(length: 10, options: .storageModeManaged)!
var doExtra = true
func computeSomething(commandBuffer: MTLCommandBuffer) {
// Do work on the GPU every frame
// After writing to the buffer on the GPU, synchronize the buffer (required)
let blitToSynchronize = commandBuffer.makeBlitCommandEncoder()!
blitToSynchronize.synchronize(resource: managedBuffer)
blitToSynchronize.endEncoding()
// Occassionally, add extra information on the GPU
if doExtraWork {
// Register a block to write into the buffer
sharedEvent.notify(sharedEventListener, atValue: 1) { event, value in
// Safely write into the buffer. Make sure we call `didModifyRange(_:)` after
// Update the counter
event.signaledValue = 2
}
commandBuffer.encodeSignalEvent(sharedEvent, value: 1)
commandBuffer.encodeWaitForEvent(sharedEvent, value: 2)
}
// Commit the work
commandBuffer.commit()
}


The expected behavior is as follows:
  1. The GPU does some work with the managed buffer

  2. Occasionally, the information needs to be updated with new information on the CPU. In this frame, we register a block of work to be executed. We do so in a dedicated block because we cannot guarantee that by the time execution on the main thread reaches this point the GPU is not simultaneously reading from or writing to the managed buffer. Hence, it is unsafe to simply write to it currently and must make sure the GPU is not doing anything with this data

  3. When the GPU schedules this command buffer to be executed, commands executed before the encodeSignalEvent(_:value:) call are executed and then execution on the GPU stops until the block increments the signaledValue property of the event passed into the block

  4. When execution reaches the block, we can safely write into the managed buffer because we know the CPU has exclusive access to the resource. Once we've done so, we resume execution of the GPU

The issue is that it seems Metal is not calling the block when the GPU is executing the command, but rather *before* the command buffer is even scheduled. Worse, the system seems to "work" with the initial command buffer (the very first command buffer, before any other are scheduled).

I first noticed this issue when I looked at a GPU frame capture after my scene would vanish after a CPU update, which is where I saw that the GPU had NaNs all over the place. I then ran into this strange situation when I purposely waited on the background dispatch queue with a sleep(:_) call. Quite correctly, my shared resource semaphore (not shown, signaled in a completion block of the command buffer and waited on in the main thread) reached a value of -1 after committing three command buffers to the command queue (three being the number of recycled shared MTLBuffers holding scene uniform data etc.). This suggests that the first command buffer has not finished executing by then time the CPU is more than three frames ahead, which is consistent with the sleep(_:) behavior. Again, what isn't consistent is the ordering: Metal seems to call the block before even scheduling the buffer. Further, in subsequent frames, it doesn't seem that Metal cares that the sharedEventListener block is taking so long and schedules the command buffer for execution even while the block is running, which finishes dozens of frames later.

This behavior is completely inconsistent with what I expect. What is going on here?

P.S.
There is probably a better way to periodically update a managed buffer whose contents are mostly
modified on the GPU, but I have not yet found a way to do so. Any advice on this subject is appreciated as well. Of course, a triple buffer system *could* work, but it would waste a lot of memory as the managed buffer is quite large (whereas the shared buffers managed by the semaphore are quite small)
Answered by Graphics and Games Engineer in 661593022
Event values must be increasing. This is why it will only work the first time if this is executed in a loop.

Try this instead

At setup / initialization / before your loop starts...
Code Block
var i=1
var j=2


In your loop...
Code Block
if doExtraWork {
    // Register a block to write into the buffer
   sharedEvent.notify(sharedEventListener, atValue: i) { event, value in
    // Safely write into the buffer. Make sure we call `didModifyRange(_:)` after
    // Update the counter
    event.signaledValue = j
   }
commandBuffer.encodeSignalEvent(sharedEvent, value: i)
commandBuffer.encodeWaitForEvent(sharedEvent, value: j)
i+=2
j=i+1

Accepted Answer
Event values must be increasing. This is why it will only work the first time if this is executed in a loop.

Try this instead

At setup / initialization / before your loop starts...
Code Block
var i=1
var j=2


In your loop...
Code Block
if doExtraWork {
    // Register a block to write into the buffer
   sharedEvent.notify(sharedEventListener, atValue: i) { event, value in
    // Safely write into the buffer. Make sure we call `didModifyRange(_:)` after
    // Update the counter
    event.signaledValue = j
   }
commandBuffer.encodeSignalEvent(sharedEvent, value: i)
commandBuffer.encodeWaitForEvent(sharedEvent, value: j)
i+=2
j=i+1

I see. It seems for some reason that I incorrectly assumed that the value would be reset somehow after the buffer finished executing (so that it was monotonically increasing within the "scope" of a single buffer encoding). This works as expected
MTLSharedEvent scheduled block called before command buffer scheduling and not in-flight
 
 
Q