Command Buffer 'Traffic Jam' (thread blocked waiting for next drawable)

Hi, I run a fairly simple Metal layer in my application that is synced with UIKit. My metal layer sits beneath some UIKit components which the user can interact with and which must be synchronised with the Metal layer beneath.


The Metal Draw calls are minimal. There are essentially three passes. A compute pass that does some processing on some camera input. An off-screen render pass that draws a simple object hierarchy to some screen-sized textures. A final render pass that blends the textures from the prior pass together.


I'm careful not to call currentDrawable until the final pass is ready to commence, and make use of triple-buffering. As I'm synchronising with UIKIt, as recommended I have set `.presentsWithTransaction` to `true` on my `MTKView`, and make use of `.waitUntilScheduled` on the draw call of my MTKView to ensure everything is synchronised, like so:


commandBuffer.commit()
commandBuffer.waitUntilScheduled()
view.currentDrawable?.present()


This generally works fine.


However, I've noticed that if I for some reason hit 100% CPU usage, things can get backed up. The command encoding is delayed, and this then has a knock on effect of drawables not being available with a whole string of grey 'thread blocked waiting for next drawable' for 30 or more frames.


What's the best way to *** this kind of backing up in the bud? It would probably be best just to skip encoding a frame or two at the point this happens, rather than the situation now where the main thread is being blocked each frame for an interderminate amount of time until the system regains stability.


Any tips on how to acheive this? Or, ideally a better alternative.


Thanks!

Replies

I should add, the application doesn't seem to be skipping too many frames. It's just the string of grey 'thread blocked waiting for next drawable' in the instruments pane that makes me uneasy I'm doing something wrong. Perhaps a consequence of using `.presentsWithTransaction` is ignoring the 'thread blocked' notifications in the instruments panel, but I'd like to have it confirmed to put my mind at ease.

When you hit 100% CPU utilization, where is most of that time being spent? You want to hold on to drawables as briefly as possible. Delaying calls to currentDrawable to after encoding previous passes, is one method of reducing the time you're holding on drawable. However, you may also need to move more CPU (i.e. non-encoding) work to before you call currentDrawable.


(Also, I'm assuming you're not calling currentRenderPassDescriptor in your app because that implicitly calls currentDrawable, which would mean you'd really need to delay the call to currentRenderPassDescriptor)

Hi Dan, I am using `currentRenderPassDescriptor` but it's on the final pass for which CPU work is negligible. The CPU very rarely hits 100%, but when it does it's for some (non-metal) drawing of a label or something similar which I've yet to move off the main thread. It doesn't appear to be sat on the wait in the draw call for too long – as far as I can tell.


Something else which seems to trigger it will be application switching – in 50% or more of cases. Again, the Metal instrument is telling me no frames have been skipped, but something is certainly affected as I've noticed any drag operations are suddenly janky. As if, even if the frame rate is upheld, the touch events are getting backed up.


Any idea? Or is it example project time?

Hi, I've managed to create a simple test project that isolates this issue. I'm fairly confident I'm not holding onto the drawable for too long as I'm now drawing a single quad from a single `.cpuCacheModeWriteCombined` vertex buffer, in a single render pass.


As far as I can tell, what triggers the issue is when the CPU has some work to do on the main thread (simulated with a usleep(8000) in the test project) and then has a spike of work to complete elsewhere (simulated with a usleep on button press in the test project). This is enough to get the 'thread blocked waiting for next drawable' warnings in instruments.


This seems to create a knock on effect where, because the current frame was delayed waiting for a drawable, the next frame is now too. It requires another CPU spike to rollover into becoming synchronised again.


Even though this doesn't necessarily cause a drop in frames, it does seem to cause a big wait on the main thread which then has, for example, the effect of disturbing the cadence of touch events – hence the appearance of dropped frames when scrolling/dragging.


Is this expected behaviour?


I'm wondering what the best way forward is here? I'm guessing that a solution would be to somehow calculate when a frame should be simply dropped – so currentDrawble isn't called at all – allowing the drawables to be replenished and for service to resume normally. Or is there a better way?

I've submitted this as a bug alongside an example project. It may well be expected behaviour, but then it may make sense for the documentation for `.presentsWIthTransaction` to be updated as I imagine whoever enables it will almost certainly encounter the same issue at some point.


Would be great to hear any potential workarounds.


rdar://43568815

Title:

Recommended `.presentsWithTransaction` render method on `MTKView` starves render loop of drawables


Summary:

I run a fairly simple Metal layer in my application that is synced with UIKit. My metal layer sits beneath some UIKit components which the user can interact with and which must be synchronised with the Metal layer beneath.


As recommended in the documentation for this use case, I make use of `.presentsWithTransaction` on MTKView and "commit the command buffer and call its waitUntilScheduled() method to synchronously wait until the drawable is ready, then call the drawable’s present() method directly."


This works well until there is light load on the CPU, with an intermittent CPU spike on the main thread. When this happens the draw loop can become starved of drawables. This causes a knock-on effect, which means the drawable for the next frame is delayed also. In instruments, this results in a series of 'thread blocked waiting for next drawable' warnings.


As the main thread is now sitting blocked in the draw call for the majority of the run loop, touch events become delayed, resulting in a distinctive stuttering pattern for a pan gesture – even though the app is still theoretically running at a full 60fps.


A subsequent CPU spike can knock things back into sequence and the app will begin performing as normal.

Hi Dan, any thoughts on this? Are you able to confirm the issue?