I Instrument's CPU Profiling tool I've noticed that a significant portion (22.5%) of the CPU-side overhead related to MPS matrix multiplication (GEMM) is in a call to getenv(). Please see attached screenshot.
It seems unnecessary to perform this same check over and over, as whatever hack that needs this should be able to perform the getenv() only once and cache the result for future use.