No Speedup with CoreML SDPA

I am testing the new scaled dot product attention CoreML op on macOS 15 beta 1. Based on the session video I was expecting to see a speedup when running on GPU however I see roughly equivalent performance to the same model on macOS 14.

I ran tests with two models:

  • one that simply repeats y = sdpa(y, k, v) 50 times
  • gpt2 124M converted from nanoGPT (the only change is not returning loss from the forward method)

I converted both models using coremltools 8.0b1 with minimum deployment targets of macOS 14 and also macOS 15. In Xcode, I can see that the new op was used for the macOS 15 target. Running on macOS 15 both target models take the same time, and that time matches the runtime on macOS 14.

Should I be seeing performance improvements?

No Speedup with CoreML SDPA
 
 
Q