CoreML not using Neural Engine even though it should

When I run the performance test on a CoreML model, it shows predictions are 834% faster running on the Neural Engine as it is on the GPU.

It also shows, that 100% of the model can run on the Neural Engine:

GPU only:

But when I set the compute units to all:

let config = MLModelConfiguration()
config.computeUnits = .all

and profile, it shows that the neural engine isn’t used at all. Well, other than loading the model which takes 25 seconds when allowed to use the neural engine versus less than a second when not allowing the neural engine:

The difference in speed is the difference between the app being too slow to even release versus quite reasonable performance. I have a lot of work invested in this, so I am really hoping that I can get it to run on the Neural Engine.

Why isn't it actually running on the Neural Engine when it shows that it is supported and I have the compute unit set to run on the Neural Engine?

Answered by 3DTOPO in 753348022

I figured it out; apparently flexible shapes do not run on the ANE.

I really wish this was documented; the docs just state to use enumerated shapes for best performance.

But in this case, using flexible shapes is nearly 10 times slower and I don't understand why they are supported at all with that kind of penalty.

It would have saved me much trouble not having flexible shapes since I now need to refactor inference in shipped products. Good chance that is why one of the products I spent six months of my life developing has largely been a flop. Very frustrating.

Accepted Answer

I figured it out; apparently flexible shapes do not run on the ANE.

I really wish this was documented; the docs just state to use enumerated shapes for best performance.

But in this case, using flexible shapes is nearly 10 times slower and I don't understand why they are supported at all with that kind of penalty.

It would have saved me much trouble not having flexible shapes since I now need to refactor inference in shipped products. Good chance that is why one of the products I spent six months of my life developing has largely been a flop. Very frustrating.

It seems like ANE will work with EnumeratedShapes within Flexible Shapes. https://apple.github.io/coremltools/docs-guides/source/faqs.html#neural-engine-with-flexible-input-shapes

When converting a fixed-shape model that already runs on the Neural Engine (NE) to use flexible inputs, you should specify a flexible input shape with a set of predetermined shapes using EnumeratedShapes. The converted model will run on the NE, unless the conversion introduces dynamic layers not supported on the NE, such as converting a static reshape to a fully dynamic reshape.

With EnumeratedShapes the model can be optimized for the finite set of input shapes on the device during compilation. You can provide up to 128 different shapes. If you need more flexibility for inputs, consider setting the range for each dimension.

How did you find the performance testing, is it built in CoreML?

CoreML not using Neural Engine even though it should
 
 
Q