Post

Replies

Boosts

Views

Activity

Reply to vkCmdDrawIndexedIndirectCount functionality under Metal
Everything is described in the original post. The main problem is performance because even a loop of draw indirect is faster than an indirect command buffer: https://www.icloud.com/iclouddrive/0ICuhBkHgGuLjCxaJwRyHoLmw#execute_commands_in_buffer https://www.icloud.com/iclouddrive/0hDo_q0oXs4uzC25yZdKmL83A#multiple_draw_indirect I made Feedback Assistant more than half of year ago. There was no answer. After that, I wrote here. Thank you!
May ’21
Reply to DrawIndexedIndirectCount functionality under Metal
Thank you for the new value for the device wrapper type. I will retest everything. A validation message on M1 tells that ICB is not yet supported :) [MTLGPUDebugDevice newIndirectCommandBufferWithDescriptor:maxCommandCount:options:]:1035: failed assertion `Indirect Command Buffers are not currently supported with Shader Validation' I will check it on other devices a bit later.
Dec ’20
Reply to DrawIndexedIndirectCount functionality under Metal
Hello, Can you advise me please how to run existed .ipa file with Xcode shader validation/debug? We are not using xcodeproject files. We have a couple of bash scripts and Makefiles, which are doing all jobs well and fast for all platforms. On MacOS it's possible to set METALDEVICEWRAPPER_TYPE=1 variable to run the Metal debug layer, but unfortunately, we cannot do the same on iOS. The Xcode feature to run an already installed app on the device would be awesome. I can provide you reproductions samples if you need them. Thank you!
Dec ’20
Reply to DrawIndexedIndirectCount functionality under Metal
Hello, I have checked the ICB performance of serial drawIndexedPrimitives commands in comparison with drawPrimitives indirect method. The test scene is 16K DIPs of 2 triangle quads. The static ICB is created on the CPU. Vega 56: Combined geometry (single DIP): 200M tri/sec Serial drawPrimitivesIndirect: 12M tri/sec Single executeCommandsInBuffer: 7M tri/sec CPU and GPU ICB are working without any issues. GPU ICB is 4-5 times faster than the CPU ICB. The funny thing that AMD GPU has a native multiDrawIndirectCount command, which is working much faster... Apple M1 (MacBook Air): Combined geometry (single DIP): 50M tri/sec Serial drawPrimitivesIndirect: 8M tri/sec Single executeCommandsInBuffer: hangs after 1 second of execution with the random magenta noise. Debugging runtime nothing tells. Apple A12 (iPhone XR): Combined geometry (single DIP): 27M tri/sec Serial drawPrimitivesIndirect: 13M tri/sec Single executeCommandsInBuffer: hangs after 1 second of execution (with CPU ICB). Copying from CPU ICB to Private ICB causes app crash. Intel Iris Plus (MacBook Air 2020): Combined geometry (single DIP): 4.3M tri/sec Serial drawPrimitivesIndirect: 1.46M tri/sec Single executeCommandsInBuffer: draws nothing, debug runtime crashes with the message that ICB is empty. executeCommandsInBuffer telling that source CPU ICB is not an ICB. Thank you!
Dec ’20
Reply to DrawIndexedIndirectCount functionality under Metal
Hello, The iPhone 11 Pro Max (A13) (13.5.1 and 14.2) reports that Tier 2 is supported. ICB generation on Compute shader is working, but ~5% of objects partially rendered (or with corruption). ICB generation on Vertex shader produces a black screen with console error: "Execution of the command buffer was aborted due to an error during execution. Ignored (for causing prior/excessive GPU errors) (IOAF code 4)" The iPhone XR (A12, which is newer than the iPhone 10) (13.5.1 and 14.2) reports that Tier 2 is not supported. iPad Pro (12.9-inch) (4th generation with LiDAR A12) (13.5.1 and 14.2) reports that argument buffer Tier 2 is not supported (same as DTK). What am I doing wrong, guys? Ignoring the Tier 2 test makes a random magenta pattern over the screen. Does that mean that all currently available iPad Pro models are not compatible with ICB? So it's just technically impossible to create vkCmdDrawIndexedIndirectCount() functionality. Thank you!
Dec ’20
Reply to DrawIndexedIndirectCount functionality under Metal
But what if somebody doesn't need thousands of textures and buffers. We need 12 textures and 4 buffers for the whole scene rendering. Accessing textures through Argument buffer is an additional indirection during shader execution. What we need to execute is just a simple loop: for(pipeline in pipelines) { bind pipeline bind 12 textures bind 4 buffers drawIndexedIndirectCount(indirect buffer, count buffer) } So my idea with ICB was to implement code like this: for(pipeline in pipelines) { bind ICB generation rendering pipeline with rasterizer discard bind indirectbuffer bind ICB drawPointsIndirect(count buffer) bind pipeline bind 12 textures bind 4 buffers executeIndirect(ICB) } But it looks that I have to patch pipeline shaders for ICB additionally. I will submit the magenta screen issue on M1 and 20 seconds startup time with eGPU into another FBs. Thank you!
Dec ’20
Reply to DrawIndexedIndirectCount functionality under Metal
FB8254449: Yes, that was what I tried to achieve with ICB. But the following issues make it impossible at that moment: Rendering pipeline for ICB cannot use textures, except M1 GPU. 20 seconds start time with AMD eGPU with whole system freeze during this time. Big chance to have magenta screen instead of normal rendering on M1 while using ICB. Argument buffer tier 2 is not available on iPhone/iPad/DTK. ICB and Argument buffers specification are very flexible. It makes it impossible to implement them on all HW. So maybe a single function solution with internal driver implementation for different HW will be more flexible as a result? FB8638856: Reproduction applications for 2 and 3 are available in the single archive with all descriptions. https://www.icloud.com/iclouddrive/0fIpVg83LFG-OACxsMtjVtZHw#apple1/ Both of them are related to ICB creation/execution. Thank you!
Dec ’20