There was an issue in the past on coremltools that was closed saying this is the appropriate forum for discussing how to get CoreML models to run on the Neural Engine: https://github.com/apple/coremltools/issues/337.
I have a tensorflow model where the vast majority of layers can run on the GPU or Neural Engine. Conceptually, I don't see why all of it can't use the Neural Engine. I see that there are a couple layers associated with the GRU cannot run on the Neural Engine like get_shape (even though all of the shapes are known). Coremltools spit out the converted model, so I don't have much insight to why dynamic dimension layers are used instead of static dimensions.
Is there any way to have some of the model inferenced on the GPU/NE or have coremltools guarantee that a generated model runs on the NE?
I converted the model with coreml_model = ct.convert(probability_model, convert_to='mlprogram', compute_precision=ct.precision.FLOAT16, compute_units=ct.ComputeUnit.ALL)
where ct is coremltools and probability_model is a Tensorflow 2 keras model that has a GRU in it.
Some of the similar models I tried without the GRU run 20-30x faster on the NE.
Here is an example performance report screenshot:
One thing I notice that doesn't seem to match my expectations with coremltools is the storage and compute types differ. I don't know why because I exported from coremltools with float16 compute precision.