Metal: high base CPU load in render function on iOS

When using Metal on iOS devices, I see a fairly high base CPU load on three of Metal's API calls. Even with noop (entirely empty) shader functions and no buffers/textures allocated/assigned at all, even then the Metal API calls cause a constant CPU load of 20% (main thread) on an iPad Pro 4.

Is this normal?

I only see that issue on real iOS devices. If I run the same Metal code on macOS (or with iOS simulator on macOS) there is no measurable CPU load at all.

The heaviest Metal API calls are, in this order:
  1. renderCommandEncoderWithDescriptor: (40%)

  2. [commandBuffer commit]; (22%)

  3. [self nextDrawable]; (17%)

The render function of my very simple test code looks like this:

Code Block
- (void)renderFrame:(CADisplayLink*)dlink {
// semaphore previously initialized with 3
dispatch_semaphore_wait(semaphoreRenderFrame, DISPATCH_TIME_FOREVER);
@autoreleasepool {
id<MTLCommandBuffer> commandBuffer = commandQueue.commandBuffer; // 8%
        id<CAMetalDrawable> drawable = [self nextDrawable]; // 17%
        if (!drawable) return;
renderPass.colorAttachments[0].texture = drawable.texture;
        id<MTLRenderCommandEncoder> commander =
            [commandBuffer renderCommandEncoderWithDescriptor:renderPass]; // 40!
        [commander setRenderPipelineState:renderPipeline];
        [commander drawPrimitives:MTLPrimitiveTypeTriangle
                    vertexStart:0 vertexCount:nVertices instanceCount:1];
        [commander endEncoding];
        [commandBuffer presentDrawable:drawable];
__block dispatch_semaphore_t semaphore = semaphoreRenderFrame;
        [commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer) {
            dispatch_semaphore_signal(semaphore);
        }];
[commandBuffer commit]; // 22%
}
}

So there is no data transferred between CPU and GPU, no work on the shaders. Simply nothing.

Configuration:
  • Frame capturing disabled.

  • Metal API validation disabled.

  • Metal fast math enabled.

  • Compiled in release mode.

  • All sanitizers disabled.

  • The widget class derives directly from CAMetalLayer, so this is not using MTKView.

  • iPad Pro 4 (iOS 14.2).

  • Xcode 12.2.

Any ideas appreciated!
I filed a bug report (FB8919375).

If anybody could confirm or deny this behaviour on (real) iOS devices, very much appreciated.
It doesn't look like you're doing much in this example, so this being the top 3 makes sense. If these were still high when making lots of draw calls, changing render state (like setting buffers/textures), etc. then this would be strange.
  • Creating a render command encoder, causes the driver to perform some resource tracking and setup some state which is relatively CPU intensive. This is particularly the case with the Apple GPUs since there is more state to track for the Tile Based Deferred Rendering architecture (which likely accounts for the difference you're seeing w.r.t. macOS unless you comparing vs a Mac with Apple Silicon).

  • Command buffer commits also perform lots of resource tracking and need the driver to make a kernel trap (which wouldn't show up as CPU utilization, but would show up as time spent).

  • Next drawable needs to wait for Core Animation to have a drawable available before it returns (which wouldn't show up as CPU utilization, but would show up as time spent).






Thanks for the clarification that this affects indeed all devices with Apple SoCs (both on iOS and Mac). And yes, the comparison was against an Intel Mac where I don't encounter any measurable CPU load at all with Metal.

My primary concern is that this current, inevitable base CPU load linearly accumulates with every Metal widget, which I just tried and it apparently does; the CPU load increase is almost linear (e.g. with 8 noop Metal widgets, CPU load increases by factor ~7.5). Sometimes I even see a full saturation of all CPU cores steadily for several minutes for some reason. I did not have a chance to try with an Apple M1 Mac yet, but looking at its specs I would assume a similar performance result there.

So if I understand it correctly, the current design of Metal + Apple GPU is rather dedicated to single Metal widget applications like games that commonly only have one full screen Metal widget. In standard applications though it is more common to primarily use stock widgets (e.g. from UIKit or Cocoa on iOS/Mac) and adding multiple, custom GPU accelerated widgets (Metal, OpenGL/CL) where necessary.

Am I correct that the situation is the same with OpenGL/CL on Apple SoCs? Or is this something specific to Metal only?

What I'm also wondering is why this issue does not affect stock widgets at all. I mean UIKit widgets are also running on top of Metal, aren't they?
Metal: high base CPU load in render function on iOS
 
 
Q