I was familiarising myself with the Metal mesh shaders and run into some issues. First, a trivial application that uses mesh shaders to generate simple rectangular geometry hangs the GPU when dispatching 2D grids of mesh shader threadgroups, but it's really weird as it is sensitive to the grid shape. E.g.
// these work!
meshGridProperties.set_threadgroups_per_grid(uint3(512, 1, 1));
meshGridProperties.set_threadgroups_per_grid(uint3(16, 8, 1));
meshGridProperties.set_threadgroups_per_grid(uint3(32, 5, 1));
// these (and anything "bigger") hang!
meshGridProperties.set_threadgroups_per_grid(uint3(16, 9, 1));
meshGridProperties.set_threadgroups_per_grid(uint3(32, 6, 1));
The sample shader code is attached. The invocation is trivial enough:
re.drawMeshThreadgroups(
MTLSizeMake(1, 1, 1),
threadsPerObjectThreadgroup: MTLSizeMake(1, 1, 1),
threadsPerMeshThreadgroup: MTLSizeMake(1, 1, 1)
)
For apple engineers: a bug has been submitted under FB10367407
Mesh shader code:
I also have a more complex application where mesh shaders are used to generate sphere geometry: each mesh shader thread group generates a single slice of the sphere. Here the problem is similar: once there more than X slices to render, some of the dispatched mesh threadtroups don't seem to do anything (see screenshot below). But the funny thing is that the geometry is produced, as it would occasionally flicker in and out of existence, and if I manually block out some threadgroups from running (e.g. by using something like if(threadgroup_index > 90) return;
in the mesh shader, the "hidden" geometry works! It almost looks like different mesh shaders thread group would reuse the same memory allocation for storing the output mesh data and output of some threadgroups is overwritten. I have not submitted this as a bug, since the code is more complex and messy, but can do so if someone from the Apple team wants to have a look.