How to schedule CAMetalLayer rendering for lowest CPU to Display latency?

Hi,

I'm aiming to render frames as close as possible to the presentation time - it's for a smartphone-based VR headset (Google Cardboard style) where ideally there is a "late warp" just before presenting a new frame that applies both lens distortion and also orientation correction to reduce the error in the predicted head pose by leveraging the very latest motion sensor data. So leaving it as late as possible gives better pose predictions.

This late warp is a pretty simple pass - just a textured mesh, so it's typically <2ms of GPU time. Thanks to the Developer Labs it's been suggested I could use a compute shader for the warp so it can share GPU resources with any ongoing rendering work too (as Metal doesn't have a public per-queue priority to allow for pre-emption of other rendering work, which is the way this is generally handled on Android).

What I'm trying to figure out now is how best to schedule the rendering. With CAMetalLayer maximumDrawableCount set to 2, you're pretty much guaranteed that the frame will be displayed on the next vsync if rendering completes quickly enough. However sometimes the system seems to hold onto the drawables a bit longer than expected which blocks getNextDrawable.

With maximumDrawableCount of 3, it seems easy enough to maintain 60FPS but looking in instruments the CPU to display latency varies - there are times where its around 50ms (ie already 2 frames in the queue to be presented first, waitForNextDrawable blocks), some periods where it's 30 ms (generally 1 other frame queued) and sometimes where it drops down to the ideal 16ms or less.

Is there any way to call present that will just drop any other queued frames in the layer? I've tried presentDrawable:drawable atTime:0 and afterMinimumDuration:0 but to no avail.

It seems like with CAMetalLayer I'll just have to addPresentedHandler blocks to keep track of how many are queued in the display so I can ensure the queue is generally empty before presenting the next frame.

A related question is the deadline for completing the rendering. The CAMetalLayer is in the compositing fast path, but it seems rendering needs to still complete (ie all the GPU workload finished) around 5ms before the next vsync for it to be displayed on the next vsync. I suspect there's a deadline for the frame just in case it needs to be composited but any hints / ideas for handling that would be appreciated. It seems to be slightly device-specific; somewhat unexpectedly, the iPod touch 7 latches frames that finish much closer to the vsync time than the iPhone 12 Pro.

I've also just come across AVSampleBufferDisplayLayer that I'm taking a look at now. It seems to offer better control of the queue, and still enables the compositing fast path, but I can't actually see any feedback like addPresentedHandler to be able to judge what the deadline is to have a frame shown in the next vsync.

I'm still trying to figure out the best route here.

I should say the standard presentDrawable approach is usually described as "present as soon as possible" which also sounds like what I want, but in reality it seems to mean "present as soon as possible after all the other frames in the queue have been presented".

From my investigations so far it seems likely that CAMetalLayer has some logic to handle pacing, but I haven't seen that described anywhere either in docs or WWDC talks and I'm struggling to figure out the logic it's using.

For example if you look at https://developer.apple.com/videos/play/wwdc2019/606/ at 6:30 - the focus is on the command encoder for a future frame blocking on waitNextDrawable for a full frame, and how offscreen draws could be dispatched ahead.

But for me there's an unanswered frame pacing question here too - the orange surface stays on the display for 2 frame periods, even though the following frame (shown in green in the Instruments trace) is fully complete well in advance of the swap interval where we'd expect it to display. It's as if some component (likely CAMetalLayer) has decided that a future frame has missed some submission deadline and so responds by delaying the presentation of the next one in the queue, even though it's ready to go.

I think with CAMetalLayer I might just end up triggering rendering the following frame on the presentedHandler callback of the previous one rather than using CADisplayLink / MTLView at all. That way I can hopefully keep maximumDrawableCount at 3 so waitForNextDrawable should always be non-blocking, and guarantee presenting on the following VSYNC, but I don't want to be fighting against internal opaque CAMetalLayer logic that decides I'm not submitting frames fast enough to keep a full drawable queue.

I'd love to understand more about all this - any references greatly appreciated!

I'd like to know where the Metal sample app is that they mention in the WWDC 21 presentation. But check that out since it has a lot of simple things to get all this working.
Also trying to search for "ProMotion" and getting tons of hits for "promotion" stinks. Can we just call this VRR like everyone else instead? Also CVDisplayLink has been forced to 60Hz when running under Rosetta 2 since Monterrey due to an Apple bug.

https://developer.apple.com/videos/play/wwdc2021/10147/

How to schedule CAMetalLayer rendering for lowest CPU to Display latency?
 
 
Q