MTLIndirectCommandBuffer GPU command encoding, instanced rendering

I've implemented GPU command encoding as described in the second part of the Modern Rendering with Metal WWDC19 talk.

The implementation applies frustum culling to each individual mesh instance and creates a draw_indexed_primitives command, in the same way as outlined in the talk. Each command has an instance count of 1.

My previous CPU command encoding implementation would group the visible mesh instances by mesh and pipeline state (after frustum culling) and encode the appropriate multi-instance draw call. With GPU command encoding running in parallel, I don't see a way to group meshes this way.

Is there any significant performance impact for issuing multiple draw calls for individual instances of the same mesh, as opposed to using instanced rendering?

This might very well be something that's not worth worrying about, but it would be good to have some input on this.

MTLIndirectCommandBuffer GPU command encoding, instanced rendering
 
 
Q