When I run the performance test on a CoreML model, it shows predictions are 834% faster running on the Neural Engine as it is on the GPU.
It also shows, that 100% of the model can run on the Neural Engine:
GPU only:
But when I set the compute units to all:
let config = MLModelConfiguration()
config.computeUnits = .all
and profile, it shows that the neural engine isn’t used at all. Well, other than loading the model which takes 25 seconds when allowed to use the neural engine versus less than a second when not allowing the neural engine:
The difference in speed is the difference between the app being too slow to even release versus quite reasonable performance. I have a lot of work invested in this, so I am really hoping that I can get it to run on the Neural Engine.
Why isn't it actually running on the Neural Engine when it shows that it is supported and I have the compute unit set to run on the Neural Engine?
I figured it out; apparently flexible shapes do not run on the ANE.
I really wish this was documented; the docs just state to use enumerated shapes for best performance.
But in this case, using flexible shapes is nearly 10 times slower and I don't understand why they are supported at all with that kind of penalty.
It would have saved me much trouble not having flexible shapes since I now need to refactor inference in shipped products. Good chance that is why one of the products I spent six months of my life developing has largely been a flop. Very frustrating.