Try9998’s Profile | Apple Developer Forums

Bad mesh shader performance

Hello! I'm working on a opensource game project, that runs with all 3 major graphics-api's including metal. Mesh shader is key-component for my gpu-driven workflow. Implementation is done with GL_EXT_mesh_shader on PC, and for Metal I'm cross-compiling with the my fork of spirv-cross. Unfortunately Metal-version appears to be supper slow, showing 2x regression (Apple-M1) in compare to draw-call based version. This is quite surprising, on RTX3070 numbers are quite opposite (1.5x speedup). Note, that mesh shader does culling and ibo-compression, opposite to draw-call based version. Shader source GLSL: https://github.com/Try/OpenGothic/blob/master/shader/materials/main.mesh Cross-compiled MSL: https://shader-playground.timjones.io/0f60082c67e30fbb8ad9015b48405628 My question is what are possible causes of performance regression? Any general performance recommendation? Are there any rule similar to prefersLocalInvocationVertexOutput, from Vulkan? How expensive threadgroup memory is? Note1: Cross-compiling flow on my side is not perfect - it emits all varying as shared memory arrays. This is something that is hard to workaround. Note2: No Task(aka Object) shader - that one has bad performance on PC(NVidia). Note3: C++ / MacOs 13.0.1 (22A400) / Apple-M1 Thanks in advance!

Graphics & Games General Metal

4

0

2.1k

Dec ’22

Mesh-shader culling is broken?

Followup to https://developer.apple.com/forums/thread/722047 After experimenting a bit more with mesh-shader on M1, come to theory(can't really proof, as there is no profiler for them), that culling is broken in Metal3: in my content culling is somewhat simple: First 16 invocations do poke HiZ pyramid and vote. a) If all vote for non-visible, then shader set primitive-count to zero and exits b) if visible - each thread processes one vertex (usual geometry process) and writes valid meshlet Yet, if HiZ-test is ignored and mesh processed anyway performance is close to same. Also noted, that culling with mesh-shader was never mentioned in any official materials(in oppose to object-shader). Here I'm reading in between lines a bit: maybe driver assumes only object-shader based culling, and mesh threadgoup always allocates resources for worst possible case? My questions at this point: what is cost of empty meshlet? any upfront cost of launching mesh-threadgrid, like it is with ios-compute shader? any issues with large(1024+) workgroup sizes? Thanks in advance!

Graphics & Games General Metal wwdc2022-10162

0

805

Feb ’23

Raytracing bugs on M1

Hi, I'm working on integrating Rayquery, into my game-engine. Vulkan/DX12 work fine on PC, but Metal(on Mac) doesn't: Here is screenshot on how rendering looks: And similar spot from XCode-debugger shows: Ship, cannons, items are there - TLAS look as they should. Note: in game screenshoot above there are no shadows, but it not always the case: Here only some parts of object do cast shadow. Fragment shader: https://shader-playground.timjones.io/44de178b7b8a715ea235c7f12cd0aabc // relevant part bool isShadow(...) { ... uint flags = 4u; flags |= 128u; rayQuery.reset(ray(rayOrigin, rayDirection, tMin, rayDistance), topLevelAS, spvMakeIntersectionParams(flags)); for (;;) // spirv-cross not pretty here :( { bool _116 = rayQuery.next(); if (_116) { continue; } else { break; } } uint _120 = uint(rayQuery.get_committed_intersection_type()); if (_120 == 0u) { return false; } return true; } ---- intersection_params spvMakeIntersectionParams(uint flags) { // hacked this part, while debugging - setting up for simple most any-hit intersection_params ip; ip.force_opacity(forced_opacity::opaque); ip.accept_any_intersection(true); return ip; } After verifying TLAS and ray-query loop can conclude, that most likely it's a driver bug here, or generated shader code is wrong (but looks correct to me!). PS: one more small thing about Metal-RT: Metal doc about MTL::AccelerationStructureTriangleGeometryDescriptor::setIndexBufferOffset says: "Specify an offset that is a multiple of the index data type size and a multiple of the platform’s buffer offset alignment." Buffer-offset-alignment (32 bytes in worst case) is very hard to workaround for multi-material meshes . No other api requires so, and there is no good workaround for this.

Graphics & Games General Metal

6

0

1k

Jun ’23

iOS: shader compliler crash

Hello! I run into, what seem to be compiler issue. The shader source given to Metal is: https://shader-playground.timjones.io/1bcf3ffbb313878ccd594ddbb27b746e This shader is generated by spirv-cross, from GLSL source, so for readability here is original source: https://github.com/Try/OpenGothic/blob/master/shader/hiz/hiz_mip.comp (shader variant uses SSBO counter, not atomic-image) Here is relevant path of application log: 2024-04-21 16:27:13.621218+0200 Gothic2Notr[23992:2003969] Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED 2024-04-21 16:27:13.656559+0200 Gothic2Notr[23992:2003969] Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED 2024-04-21 16:27:13.701323+0200 Gothic2Notr[23992:2003969] Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED 2024-04-21 16:27:13.701477+0200 Gothic2Notr[23992:2003969] MTLCompiler: Compilation failed with XPC_ERROR_CONNECTION_INTERRUPTED on 3 try 2024-04-21 16:27:13.701817+0200 Gothic2Notr[23992:2003969] Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED iOS version: 15.8.2 MTL::CompileOptions::languageVersion: 2.4 (also tested other version - same result) Offended part of shader: void store(int mip, ivec2 uv, float z) { // NOTE: replacing this function to NOP, avoid the crash // NOTE2: this switch-case is crude emulation of bindless storage-image switch(mip) { case 1: imageStore(mip1, uv, vec4(z)); break; case 2: imageStore(mip2, uv, vec4(z)); break; case 3: imageStore(mip3, uv, vec4(z)); break; case 4: imageStore(mip4, uv, vec4(z)); break; case 5: imageStore(mip5, uv, vec4(z)); break; case 6: imageStore(mip6, uv, vec4(z)); break; case 7: imageStore(mip7, uv, vec4(z)); break; case 8: imageStore(mip8, uv, vec4(z)); break; } } Some extra info: The shader is simplified single-pass mip-map generator. The same shader is know to work on mac M1 laptop without any issues Please have a look and looking forward for driver-fix. Thanks!

Graphics & Games General Metal

1

0

608

Apr ’24

Try9998

Post

Replies

Boosts

Views

Activity