Will Tensorflow-converted models use the A11/A12 Neural Engine?

Hello,


I'm new to the forums and iOS dev, but have a ML & software engineering background. I'm pretty intrigued by the promise of the GPUs and neural engine in Apple's mobile architectures, it introduces huge potential, and I want to explore it. Reading the developer docs and watching a couple of WWDC videos, it seems that yes you can convert models from Tensorflow to CoreML, but it's not clear to me whether they will use the "neural engine" (whatever that actually is) once converted. It seems that using Metal Performance Shaders is a way to (quasi-?)guarantee execution on the GPU, but it's not clear whether that's the same as the Neural Engine (it seems not). I'm assuming that models built with CreateML will be smart enough to use the Neural Engine, but I couldn't find that stated anywhere. Not to mention the other enhancements like quantization and half precision weights.


So the question is, to access the potential of the hardware, will I have to reimplement my model using the CreateML toolkit? How sophisticated is the Tensorflow converter when it comes to maximising the potential of the hardware?


I'm also curious about how to determine whether the code is executing on the hardware or not, but I suppose that's a different topic.


Edit to add - I've dug around a bit more and thanks to a couple of the Metal Computation videos from 2017 & 2018 WWDCs I now see it's possible to build our own CNNs & RNNs through the Metal Performance Shaders framework, and they will be executed on the GPU. However it's not yet clear to me how this relates to CoreML, or the Neural Engine, if at all.


Many thanks.

-Thom

Replies

Metal Performance Shaders are GPU kernels. They are not related to the Neural Engine at all.


Core ML will use MPS to run models on the GPU, or at least those parts of the model that has layers that are supported in MPS. If you add a custom layer to your model and it only has a CPU implementation, Core ML may still run the rest of the model on the GPU.


I say "may" because it's impossible to know what Core ML will actually do, as it depends on the model, the hardware, etc.


On devices with an A12 and a Neural Engine, it has been my experience that Core ML will automatically use that Neural Engine. It doesn't matter who created this Core ML model (I converted a model from Keras). It only matters that the model has operations that are supported by the Neural Engine.


If you add a custom layer to the model, Core ML cannot use the Neural Engine and will drop back to MPS. So it appears that the *entire* model must be able to run on the Neural Engine, and if not, you're out of luck.


It's hard to say if _all_ Core ML models (with no custom layers) can run on the Neural Engine. I expect so, but I haven't tried all possible layer types (nor have I tried decision trees etc, only neural nets). There is no documentation for this.


However, try it out. If you make a prediction with Core ML and in the debug navigator thing in Xcode there is a thread called H11ANEblabla then Core ML is using the Neural Engine. You can also put a symbolic breakpoint on -[_ANEModel program]. If that gets triggered, then Core ML is using the Neural Engine to run the model.

Thanks for the info, that definitely helps, especially how to determine whether it's actually running on the Neural Engine, that's great.


If I understand correctly - as long as I'm using an `.mlmodel` file (whether from CreateML or some conversion process), and the model uses layers/operations which are "supported" by the NE and/or the GPU, CoreML should be taking care of executing the operations on either the Neural Engine or GPU as appropriate, with a preference for the Neural Engine. Is that about right?


Is there a list of supported operations for the GPU and/or Neural Engine?


I'm still a little confused about the relationship between the MPS framework and the CoreML framework - e.g. if you implement a network manually using the MPS primitives and the neural network graph API like in this example, does that count as a CoreML model or not? - but I might start a different thread on that. I suspect that might clear up after actually playing with it for a day or two.


The next step is definitely to try it out. Thanks again.

Core ML is built on top of different technologies that let it use different types of hardware. These technologies are MPS for the GPU, Accelerate and BNNS for the CPU, and a private framework for the Neural Engine.


If you use MPS yourself, you're not using Core ML.