When I use CoreML to infer a w8a8 model on iPhone 15 (iOS 18 beta 8), the model uses CPU inference instead of ANE, which results in slower inference speed. The model I am using is from the coremltools documentation, which indicates that on iOS 17, quantized models can run on ANE properly and achieve faster speeds. How can I make the quantized model run correctly on ANE to achieve the desired inference speed?
To reproduce this issue, you can download the Weight & Activation quantized model from the following link: https://apple.github.io/coremltools/docs-guides/source/opt-quantization-perf.html.