I'm trying to optimize code for the M1 GPU family (Apple 7 variants where F32 compute power equals F16 compute power). xcrun metal-opt
mentions a compile option called -fma-shff-hoist-depth-agx2=<uint>
. There's something called fma.shff
. I'd like to know what the "SHFF" acronym stands for. Shuffle? Shift? Perhaps it's related to SIMD-group reductions.