Ensuring peak M1 GPU performance for short running kernels

There is currently an ongoing discussion about the validity of GPU compute performance estimates like those offered by popular benchmarking tools such as Geekbench 5. It has been observed that Apple GPUs have a relatively slow frequency ramp up do not reach their peak performance if the submitted kernels have a runtime under a few seconds. I understand that these GPUs are designed for throughtput rather than latency, but sometimes one does work with “small” work packages (such as processing a single image). Is there an official way to tell the system that it should use peak performance for such work? E.g. some sort of hint along the lines of “I will now submit some GPU work and I want you to power up all the relevant subsystems” instead of relying on the OS to lazily adjust the performance profile?

Replies

Hi jcookie,

There is currently no way to hint the GPU to ramp up before executing work. Last year however, we introduced GPU Performance State inducer in our GPU tooling. The WWDC session "Discover Metal debugging, profiling, and asset creation tools" explains how to use these tools.

  • Thank you! Are there plans to add such an API? In my testing the GPU remains in the low power state if the delay between submitting short running kernels is as short 1ms. So one can be submitting GPU work hundreds of times per second without ever reaching maximal performance. This makes sense for most applications, but it can be a problem for some software where you cannot continuously submit GPU work but do want things to be done as quickly as possible.

Add a Comment