Post

Replies

Boosts

Views

Activity

Reply to [Metal] What's wrong with simdgroup_load or simdgroup_store?
Hi PYNing, You should understand how SIMD instructions works. SIMD instruction must be the same for all threads in SIMD group, unless it specified in API. And the same SIMD instruction must be executed by all threads in SIMD group. mmmetal already clarified that in your case only 1 thread will be active and will execute the code. But for correct behaviour all threads in SIMD group should execute this instructions. For Apple GPUs SIMD size is fixed and equal to 32. So in case of M1 all 32 threads must execute the same code path to produce correct result. Be careful also from CPU side, you probably should run tasks in SIMD group size quantities.
Apr ’22