roserg’s Profile | Apple Developer Forums

Reply to [Metal] What's wrong with simdgroup_load or simdgroup_store?

Hi PYNing, You should understand how SIMD instructions works. SIMD instruction must be the same for all threads in SIMD group, unless it specified in API. And the same SIMD instruction must be executed by all threads in SIMD group. mmmetal already clarified that in your case only 1 thread will be active and will execute the code. But for correct behaviour all threads in SIMD group should execute this instructions. For Apple GPUs SIMD size is fixed and equal to 32. So in case of M1 all 32 threads must execute the same code path to produce correct result. Be careful also from CPU side, you probably should run tasks in SIMD group size quantities.

Graphics & Games General

Apr ’22

Reply to Any Demo for Metal Struct simdgroup_float8x8 and simdgroup_half8x8 ?

Hi, have you tried MPSMatrixMultiplication? It should use this features when possible and it supports fp16/fp32 precision.

Graphics & Games Metal

Jan ’22

roserg

Post

Replies

Boosts

Views

Activity