Posts

Post marked as solved
1 Replies
602 Views
Question 1: Maximum Length In the documentation for Indirect Command Buffers, it says: "The maximum length of the range is 16384 commands.". However, the Apple-provided sampler for Indirect Command Buffers on the GPU creates and dispatches an ICB with 65536 potential draw calls, and this seems to run without validation errors on iOS and macOS. Is the documentation wrong? I haven't hit a limit executing upwards of 1 million commands. Question 2: Optimization In both the Metal GPU Frame Capture and the Metal System Trace, the optimizeIndirectCommandBuffer blit encoder on macOS seems to take zero time regardless of ICB size. Adding to the suspicion that it isn't actually optimizing anything, longer ICBs take a linearly longer length of time on the GPU, and this seems to show up as vertex cost (although I presume it's really the command processor chewing through a large ICB). For example, running the IndirectCommandBuffersWithGPUEncoding sample, we see on an M1 Ultra that it takes about 1.4 ms to process the ICB on the vertex shader. If we multiply AAPLNumObjects by 16 we see this cost balloon to ~21.5 ms, almost exactly 16x (and subsequently see the frame rate fall below 60 fps) despite dispatching the same number of draws. Is the Optimize blit encoder doing anything? Is there a solution to this performance issue? It's a huge bottleneck to using "sparse" ICBs, and seems like what the optimize step is designed to solve.
Posted Last updated
.