Hello!
I'm working on a opensource game project, that runs with all 3 major graphics-api's including metal.
Mesh shader is key-component for my gpu-driven workflow. Implementation is done with GL_EXT_mesh_shader
on PC, and for Metal I'm cross-compiling with the my fork of spirv-cross.
Unfortunately Metal-version appears to be supper slow, showing 2x regression (Apple-M1) in compare to draw-call based version. This is quite surprising, on RTX3070 numbers are quite opposite (1.5x speedup).
Note, that mesh shader does culling and ibo-compression, opposite to draw-call based version.
Shader source GLSL:
https://github.com/Try/OpenGothic/blob/master/shader/materials/main.mesh
Cross-compiled MSL:
https://shader-playground.timjones.io/0f60082c67e30fbb8ad9015b48405628
My question is what are possible causes of performance regression?
Any general performance recommendation? Are there any rule similar to prefersLocalInvocationVertexOutput
, from Vulkan?
How expensive threadgroup memory is?
Note1: Cross-compiling flow on my side is not perfect - it emits all varying as shared memory arrays. This is something that is hard to workaround.
Note2: No Task(aka Object) shader - that one has bad performance on PC(NVidia).
Note3: C++ / MacOs 13.0.1 (22A400) / Apple-M1
Thanks in advance!