CoreML inference on iOS HW uses only CPU on CoreMLTools imported Pytorch model

I have exported a Pytorch model into a CoreML mlpackage file and imported the model file into my iOS project. The model is a Music Source Separation model - running prediction on audio-spectrogram blocks and returning separated audio source spectrograms.

Model produces correct results vs. desktop+GPU+Python but the inference on iPhone 15 Pro Max is really, really slow. Using Xcode model Performance tool I can see that the inference isn't automatically managed between compute units - all of it runs on CPU. The Performance tool notation hints all that ops should be supported by both the GPU and Neural Engine.

One thing to note, that when initializing the model with MLModelConfiguration option .cpuAndGPU or .cpuAndNeuralEngine there is an error in Xcode console:

`Error(s) occurred compiling MIL to BNNS graph:
[CreateBnnsGraphProgramFromMIL]: Failed to determine convolution kernel at location at /private/var/containers/Bundle/Application/2E3C4AFF-1FA4-4C95-AAE4-ECEBC0FB0BF9/mymss.app/mymss.mlmodelc/model.mil:2453:12
 @ CreateBnnsGraphProgramFromMIL`

Before going back hammering the model in Python, are there any tips/strategies I could try in CoreMLTools export phase or in configuring the model for prediction on iOS?

My export toolchain is currently Linux with CoreMLTools v8.1, export target iOS16.

Answered by AriJR in 826814022

This particular issue was caused by the upsampling path of the model having torch.nn.ConvTranspose2d operations with stride (16,1) and kernel (16,1) - for some reason ANE-compiler did not like them. Resolved by changing 16x to 4x+4x operation instead.

Furthermore, ANE is 16-bit only. CoreMLTools may compile your 32-bit model without issues but loading the runtime model on iOS/macOS will not be assigned on ANE.

Accepted Answer

This particular issue was caused by the upsampling path of the model having torch.nn.ConvTranspose2d operations with stride (16,1) and kernel (16,1) - for some reason ANE-compiler did not like them. Resolved by changing 16x to 4x+4x operation instead.

Furthermore, ANE is 16-bit only. CoreMLTools may compile your 32-bit model without issues but loading the runtime model on iOS/macOS will not be assigned on ANE.

One more thing. The CoreML-compiled model downsampling torch.nn.Conv2d with stride (16,1) and kernel (16,1) has similar issues with ANE-processing. Upon iOS runtime initialization the model compilation does not give any console warnings or errors but everything just hangs without any remarks of anything.

Changing 16x downsampling to 4x+4x solves that hang-issue.

CoreML inference on iOS HW uses only CPU on CoreMLTools imported Pytorch model
 
 
Q