nextDrawable stalls commit of command buffer

To minimize uptime, I am currently using two command buffers. One holds offscreen commands and is then committed. Then the second waits 20-30ms under heavy gpu use on nextDrawable, does a little work, and calls presentDrawable(which is drawable.present in addScheduledHandler).

This whole setup seems less than ideal. On Android, Vulkan stalls very little on vkImageAquire, and mostly on vkQueuePresent, but that is after the command buffer is ended and submitted. Doing that present call on a thread is often suggested, but the command buffer is already complete and submitted to the gpu before that call.

Metal stalls the commit of the command buffer from this fundamental architecture limitation. I would prefer to have a single command buffer here. The nextDrawable especially with frontBufferOnly set is really just a reference to a drawable, and shouldn't lead to such a long stall. This also makes using double buffering nearly impossible.

This supplied best practices example isn't performant. It incurs all the stalls using a single command buffer. https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/MTLBestPracticesGuide/Drawables.html

Here's a more realistic example of what happens using a single command buffer.

beginCommandBuffer

beginEncoding
98% of commands to offscreen
endEncoding

<- this is where I currently end/commit the 98% command buffer to start driver processing

20ms+ stall on nextDrawable under heavy gpu load

beginEncoding
2% of commands to drawable (say a blit from offscreen to drawable)
endEncoding

1ms+ stall on presentDrawable (some stall then drawable.present added to addScheduledHandler)

endCommandBuffer

[cb commit] <- this is where commands are sent to queue and the driver

Ideally the nextDrawable and presentDrawable should be off in their own little core using thread, so the main thread on a big core isn't stalled out.

The case we have are 90 alpha blended quads on iOS that cause an 11ms gpu time + the rest of rendering. This then stalls the nextDrawable returning drawables to the pool, and with a single command buffer stalls processing the next frame and getting to the next cpu update.

There is also still no test for isDrawableAvailable in the pool.

Hi Alecazam,

nextDrawable synchronizes to the display. When it blocks, it means that you’ve buffered enough work and submitting more would just allow the CPU to get much further ahead of the GPU and Display.

Is there a particular problem like stuttering that you're seeing? If so, maybe you can get a Metal System Trace with Instruments and attach it to a Feedback Assistant ticket so we can take a look.

This leads to 5-8ms of cpu driver processing that overlaps with the 10ms to 26ms nextDrawable wait. I can't post a picture from Metal System Trace here, but seems that one should be able to completely commit one CB before getting stalled by the API. Having to use two just to workaround the nextDrawable stall isn't great, but is my workaround for now. No stall is seen, since it's all using triple buffering. If I switched to double, then it gets unusable.

Very little of the render command buffer submission depends on the drawable. The framebuffer cb reads from the results of the offscreen in that cb just to display to the UIView. So the nextDrawable stall basically prevents that work from being submitted in the single cb case.

Also isn't CADisplayLink supposed to synchronize the frames. And we have a semaphore that counts down from the drawable count when before we request nextDrawable which also seems redundant. The problem is that the program thread wants to get onto the next frame, but can't due to the requirement to call nextDrawable. Even Apple's docs state this should be called as late in the frame as possible, but when that call takes 24ms, it feels like it's doing the job of the display link.

We've discussed this a bunch. CADisplayLink doesn't always sync to the display as you'd expedite.

Calling nextDrawable can prevent stuttering in many cases but will increase latency.  So for your case you probably want to do an early nextDrawable and forgo the advice of many of our docs.

That sounds even more of a stall if one is trying to use double-buffering. All Apple examples always fallback to triple-buffer to buy more time, and so I'd encourage fixing this api.

Vulkan doesn't stall in vkImageAquire for long at all, less that 3ms. Metal can take 30-40ms when the gpu is saturated. In Vulkan, that allows a single command buffer to hold the backbuffer aquire and present to the backbuffer.

nextDrawable stalls commit of command buffer
 
 
Q