DrawIndexedIndirectCount functionality under Metal

Hello everybody,

I have a situation here.

I cannot realize vkCmdDrawIndexedIndirectCount functionality by using argument buffers (actually, they are useless and buggy). I tried to reach developer support with those issues, but nobody is answering.

So maybe somebody has an idea of how to execute multiple indirect draw calls based on GPU-generated count?

Moreover, it is impossible to use indirect command buffers for that: "Fragment shader cannot be used with indirect command buffers".

Current issues with indirect command buffers:
  1. Intel UHD Graphics 630 is not rendering all elements from the buffer.

  2. eGPU RX Vega 56 hangs the whole system for 5-6 seconds when command generation is performed by the vertex shader.

  3. "Compiler encountered an internal error" on Intel Iris Plus Graphics.

  4. Apple M1 renders a magenta screen when the generation is performed on compute shader.

  5. Apple M1 renders a magenta screen with a 20% chance of success rendering when the generation is performed on vertex shader.

Thank you!
Hello,

The iPhone 11 Pro Max (A13) (13.5.1 and 14.2) reports that Tier 2 is supported.

ICB generation on Compute shader is working, but ~5% of objects partially rendered (or with corruption).

ICB generation on Vertex shader produces a black screen with console error:
"Execution of the command buffer was aborted due to an error during execution. Ignored (for causing prior/excessive GPU errors) (IOAF code 4)"

The iPhone XR (A12, which is newer than the iPhone 10) (13.5.1 and 14.2) reports that Tier 2 is not supported.

iPad Pro (12.9-inch) (4th generation with LiDAR A12) (13.5.1 and 14.2) reports that argument buffer Tier 2 is not supported (same as DTK).

What am I doing wrong, guys? Ignoring the Tier 2 test makes a random magenta pattern over the screen. Does that mean that all currently available iPad Pro models are not compatible with ICB? So it's just technically impossible to create vkCmdDrawIndexedIndirectCount() functionality.

Thank you!

Hello,

I have checked the ICB performance of serial drawIndexedPrimitives commands in comparison with drawPrimitives indirect method.
The test scene is 16K DIPs of 2 triangle quads. The static ICB is created on the CPU.

Vega 56:
Combined geometry (single DIP): 200M tri/sec
Serial drawPrimitivesIndirect: 12M tri/sec
Single executeCommandsInBuffer: 7M tri/sec
CPU and GPU ICB are working without any issues. GPU ICB is 4-5 times faster than the CPU ICB. The funny thing that AMD GPU has a native multiDrawIndirectCount command, which is working much faster...

Apple M1 (MacBook Air):
Combined geometry (single DIP): 50M tri/sec
Serial drawPrimitivesIndirect: 8M tri/sec
Single executeCommandsInBuffer: hangs after 1 second of execution with the random magenta noise. Debugging runtime nothing tells.

Apple A12 (iPhone XR):
Combined geometry (single DIP): 27M tri/sec
Serial drawPrimitivesIndirect: 13M tri/sec
Single executeCommandsInBuffer: hangs after 1 second of execution (with CPU ICB).
Copying from CPU ICB to Private ICB causes app crash.

Intel Iris Plus (MacBook Air 2020):
Combined geometry (single DIP): 4.3M tri/sec
Serial drawPrimitivesIndirect: 1.46M tri/sec
Single executeCommandsInBuffer: draws nothing, debug runtime crashes with the message that ICB is empty. executeCommandsInBuffer telling that source CPU ICB is not an ICB.

Thank you!
The IOAF error you're seeing and the hangs you're experiencing could be a driver bug, but they're also typical symptoms of accessing memory out-of-bounds in a shader or kernel. The fact that it works on an AMD GPU could just mean that AMD happens to handle that particular out-of-bounds condition in a favorable manner.

Have you tried running your app with Xcode shader validation? (Go to the Scheme, select the Diagnostics tab, and check Shader Validation) This will perform bounds checking and also check for use of many undefined behaviors.
Hello,

Can you advise me please how to run existed .ipa file with Xcode shader validation/debug?
We are not using xcodeproject files. We have a couple of bash scripts and Makefiles, which are doing all jobs well and fast for all platforms. On MacOS it's possible to set METALDEVICEWRAPPER_TYPE=1 variable to run the Metal debug layer, but unfortunately, we cannot do the same on iOS.

The Xcode feature to run an already installed app on the device would be awesome.

I can provide you reproductions samples if you need them.

Thank you!
To get shader validation via an env var you would set METAL_DEVICE_WRAPPER_TYPE=4. If you can rebuild the source, you can use setenv to set this before you create the Metal Device. (Still trying to find out there is a better way to do this and where the output goes when you don't use Xcode).

Just curious, has shader validation on M1 or another Mac shown you anything?
Thank you for the new value for the device wrapper type. I will retest everything. A validation message on M1 tells that ICB is not yet supported :)

Code Block
-[MTLGPUDebugDevice newIndirectCommandBufferWithDescriptor:maxCommandCount:options:]:1035: failed assertion `Indirect Command Buffers are not currently supported with Shader Validation'

I will check it on other devices a bit later.
Checking with some engineers on the Metal frameworks team; ICB support for shader validation is limited on Big Sur and not yet supported on iOS. Testing of this support has not been fully validated, so it must be explicitly enabled on Big Sur by setting another env var:

MTL_SHADER_VALIDATION_GPUOPT_ENABLE_INDIRECT_COMMAND_BUFFERS=1

The driver team will look at the feedback requests you submitted and hopefully will have further explanation for the IOS failures.
Thanks for the new variable. There are no errors from the debug/GPU validation layers during execution. Except that nothing is rendering during GPU ICB generation. I will wait for the answers.

PS: iOS debug layers are working great with setenv(). Thank you for that!
Is there any update about that?
Thank you
Hi,

A12 devices are not able to draw more than 512 drawindirect commands (CPU unroll for multidrawindirectcount).
The rendering objects start flickering. A13 and M1 devices are working fine even with 50K draw calls.

Thank you

Apple keeps shipping products like the new AppleTV with an A12 chips, so if that's in your market then you'll need a fallback. They are Tier1 devices that cannot index into or have pointers to Argument Buffers which is a key part of gpu-driven pipeline and ray-tracing. Fortunately M1 and A14 are replacing these old chips on macOS.

DrawIndexedIndirectCount functionality under Metal
 
 
Q