I have on the order of 50k small meshes (~64 vertices), all different connectivity, some subset of which change each frame (generated by a compute kernel). Can I render those in a performant way with Metal?
I'm assuming 50k separate draw calls would be too slow. I have a few ideas:
- encode those draw calls on the GPU
- or lay out the meshes linearly in blocks, with some maximum size, and use a single draw call, but wasting vertex shader threads on the blocks that aren't full
- or use another kernel to combine the little meshes into a big mesh
thanks!
Good question!
The answer really depends on a few factors, maybe the most important question is: are the meshes sharing pipeline state objects and bindings? Considering that in point 3 above you are talking about combining all little meshes into a big mesh I'd assume that all the input mesh share the same PSO. If this is actually the case then the best solution would likely be to use the indirect drawing API. As you mentioned as well, you can have a compute kernel to encode a unified index buffer. The kernel should also write the size of the produced index buffer into a second buffer which can then be used with the Metal API below:
- (void)drawIndexedPrimitives:(MTLPrimitiveType)primitiveType
indexType:(MTLIndexType)indexType
indexBuffer:(id<MTLBuffer>)indexBuffer // <-- index buffer built by the kernel
indexBufferOffset:(NSUInteger)indexBufferOffset // <-- offset into the index buffer built by the kernel
indirectBuffer:(id<MTLBuffer>)indirectBuffer // <-- buffer containing the number of valid indices produced by the kernel (index count)
indirectBufferOffset:(NSUInteger)indirectBufferOffset; // <-- offset of index count in the indirect buffer
If more than one PSO is needed, then I'd recommend to use the same approach above creating one unified index buffer per PSO. There are also other indirect draw methods if they suit your use case better.