DrawIndexedIndirectCount functionality under Metal

Hello everybody,

I have a situation here.

I cannot realize vkCmdDrawIndexedIndirectCount functionality by using argument buffers (actually, they are useless and buggy). I tried to reach developer support with those issues, but nobody is answering.

So maybe somebody has an idea of how to execute multiple indirect draw calls based on GPU-generated count?

Moreover, it is impossible to use indirect command buffers for that: "Fragment shader cannot be used with indirect command buffers".

Current issues with indirect command buffers:
  1. Intel UHD Graphics 630 is not rendering all elements from the buffer.

  2. eGPU RX Vega 56 hangs the whole system for 5-6 seconds when command generation is performed by the vertex shader.

  3. "Compiler encountered an internal error" on Intel Iris Plus Graphics.

  4. Apple M1 renders a magenta screen when the generation is performed on compute shader.

  5. Apple M1 renders a magenta screen with a 20% chance of success rendering when the generation is performed on vertex shader.

Thank you!
Link a minimal project that demonstrates this - then I can review it and confirm or deny whether it is your error, or Apple's.
Reproduction applications are available here:

https://www.icloud.com/iclouddrive/0fIpVg83LFG-OACxsMtjVtZHw#apple1

The problem with partial rendering on Intel UHD 630 is fixed under Big Sur.
More information is inside the Readme.txt file.

Thank you!

Please send an actual xcode project for analysis.

What you uploaded has telling signs that some of this is Apple's fault, and some of it, like #3 should be an expected error.

But if you want someone outside or inside Apple to try to help, you'll need to spoon feed everyone a normal Xcode project. On a normal day, they are inadequate to address basic problems and do not test thoroughly, but they are even less inclined to help if you send it in this format.



We are not using Xcode for development. Moreover, I will take time to isolate source code for issue replication. Usually, such binaries are more than enough for driver developers. Because they can track all API calls internally, and they have much better tools for that. Xcode project will not help if nobody cares about software the quality on the Apple side. I'm trying to find answers to those questions during the last months. This forum is the last hope :) And it's not possible to talk about missing functionality in Metal because nobody is listening.
No, what you provided is not enough.

It is not merely tracking api calls. You should provide a project that compiles to allow them to use the full diagnostic tools that are unavailable with just the binary.

If you don't provide them this, then the other party has to write it themselves.

Often, this causes several tangent problems to occur during triage, that delays identifying and resolving the true problems. (As opposed to the misconstrued notions of what the problems are thought to be)

These things occur unnecessarily, and you can do something about that today.

You can go to File -> New Project in Xcode, and make a minimal project that replicates what you are seeing in your main project.

I am available to provide a second look on your work today to confirm without doubt the issues, but if you neglect to make the sample project and provide this, it will sit on the shelf further.

After you have sent this to me, and I have confirmed it's entirety, we can both submit crystal clear reports, to make the complaint more effective.

(Also, in case it isn't obvious, the projects you submit should be in Objective C, not swift or C++, and they genuinely should be the minimum that depicts the bug without dependencies.)
Can create a request with Feedback Assistant and post the FB number you get here? We can have someone look at fixing this and hopefully provide a workaround in the interim.
The Feedback Assistant numbers are:
  • FB8254449

  • FB8638856

Thank you!
I'm looking at the feedback reports.

It sounds like in FB8254449 you're requesting vkCmdDrawIndexedIndirectCount functionality in Metal, but Metal already has this here:

-[MTLRenderCommandEncoder drawIndexedPrimitives:indexType:indexBuffer:indexBufferOffset:indirectBuffer:indirectBufferOffset:].


With FB8638856, where you're linking to your project that's not rendering on a Intel UHD Graphics 630, it looks like someone on the Metal team tried to reproduce it, but could not. I don't know what version of the OS he tried though, so I'm following up with him. What version of MacOS did you try this on? I'm wondering if this is a bug that has been fixed in a later OS build.


Hello,

According to FB8254449, yes, there is a function to draw indirect primitives, but that command is only executing single draw command. Vulkan and other API (OpenGL, D3D12, D3D11 (via extensions)) are proving more advanced functions to draw multiple commands with CPU and GPU-generated count:

This command is rendering multiple indirect commands, and the number of draw commands is inside GPU-buffer. Unfortunately, there is no such functionality in Metal API:

void vkCmdDrawIndexedIndirectCount(
VkCommandBuffer commandBuffer,
VkBuffer buffer,
VkDeviceSize offset,
VkBuffer countBuffer,
VkDeviceSize countBufferOffset,
uint32t maxDrawCount,
uint32
t stride);

There is another request inside FB8254449: is to add an indirect buffer Offset based on GPU-buffer:

void vkCmdDrawIndexedIndirectCountOffset(
VkCommandBuffer commandBuffer,
VkBuffer buffer,
VkDeviceSize offset,
VkBuffer offsetBuffer,
VkDeviceSize offsetBufferOffset,
VkBuffer countBuffer,
VkDeviceSize countBufferOffset,
uint32t maxDrawCount,
uint32
t stride);

FB8638856: Everything is fine with UHD 630 under Big Sur. The problems are 20 seconds start time with eGPU and inability to use textures with indirect command buffer (except M1).

Thank you!
FB8254449: Metal's solution for multi-draw commands is for a kernel to create an Indirect Command Buffer with multiple draw commands. This is essentially what the driver does for you anyways for multi-draw commands in other APIs. It sounds like you've used ICBs. Why does this not work for you?

FB8638856: Okay so item 1 is no longer a problem. But each of the other 4 issues still occur? (FYI, usually better to create separate feedback requests for separate issues. The guy trying to repro it, probably just tried the first one).
FB8254449:

Yes, that was what I tried to achieve with ICB. But the following issues make it impossible at that moment:
  1. Rendering pipeline for ICB cannot use textures, except M1 GPU.

  2. 20 seconds start time with AMD eGPU with whole system freeze during this time.

  3. Big chance to have magenta screen instead of normal rendering on M1 while using ICB.

  4. Argument buffer tier 2 is not available on iPhone/iPad/DTK.

ICB and Argument buffers specification are very flexible. It makes it impossible to implement them on all HW.
So maybe a single function solution with internal driver implementation for different HW will be more flexible as a result?

FB8638856:

Reproduction applications for 2 and 3 are available in the single archive with all descriptions.
https://www.icloud.com/iclouddrive/0fIpVg83LFG-OACxsMtjVtZHw#apple1/
Both of them are related to ICB creation/execution.

Thank you!
Regarding FB8254449:

A render pipeline using ICBs can definitely use textures. The texture references just need to be in an argument buffer set the ICB render command.

iPhone 11 and 12 support tier 2 argument buffers. iPhone 10 and 10S can only access 96 textures and 96 buffers for an executeIndirect command on the CPU encoder, but, unlike earlier devices, you can write to the argument buffers in a shader or kernel. In other words iPhone 10 and 10S support all the tier 2 features, but cannot access thousands of buffers and textures per executeIndirect command as iPhone 11 and iPhone 12 devices can. iPads of the similar generation have the same features and limitation. Although the DTK may not support tier2. the M1 in retail products do.

Regarding FB8638856:

The magenta screen issue was not mentioned in the feedback report. Does this happen on any particular device? I can add a note, but I think it would be clearer to the Metal team if you created a separate report with Feedback Assistant and post the number here.
But what if somebody doesn't need thousands of textures and buffers. We need 12 textures and 4 buffers for the whole scene rendering. Accessing textures through Argument buffer is an additional indirection during shader execution.

What we need to execute is just a simple loop:
for(pipeline in pipelines) {
bind pipeline
bind 12 textures
bind 4 buffers
drawIndexedIndirectCount(indirect buffer, count buffer)
}

So my idea with ICB was to implement code like this:
for(pipeline in pipelines) {

bind ICB generation rendering pipeline with rasterizer discard
bind indirectbuffer
bind ICB
drawPointsIndirect(count buffer)

bind pipeline
bind 12 textures
bind 4 buffers
executeIndirect(ICB)
}

But it looks that I have to patch pipeline shaders for ICB additionally.

I will submit the magenta screen issue on M1 and 20 seconds startup time with eGPU into another FBs.

Thank you!
The separate error for the magenta screen is FB8928674

The report for 20 seconds startup time with eGPU is FB8928678

Thank you!
Thanks for the Feedback requests. I've assigned them to the teams that can help here.

As far as indirections go; arguments buffers are designed to minimize this compared to other APIs. The object metadata itself is stored in the buffer rather than having a table with the data from which you need to index. (You can, of course, create your own table with another argument buffer so long as your indexing code properly takes into account the object size).
DrawIndexedIndirectCount functionality under Metal
 
 
Q