We use several CoreML models on our swift application. Memory footprint of these coreML models varies in a range from 15 kB to 3.5 MB according to the XCode coreML utility tool. We observe a huge difference of loading time in function of the type of the compute units selected to run the model. Here is a small sample code used to load the model:
let configuration = MLModelConfiguration()
//Here I use the the .all compute units mode:
configuration.computeUnits = .all
let myModel = try! myCoremlModel(configuration: configuration).model
Here are the profiling results of this sample code for different models sizes in function of the targeted compute units:
Model-3.5-MB :
- computeUnits is .cpuAndGPU: 188 ms ⇒ 18 MB/s
- computeUnits is .all or .cpuAndNeuralEngine on iOS16: 4000 ms ⇒ 875 kB/s
Model-2.6-MB:
- computeUnits is .cpuAndGPU: 144 ms ⇒ 18 MB/s
- computeUnits is .all or .cpuAndNeuralEngine on iOS16: 1300 ms ⇒ 2 MB/s
Model-15-kB:
- computeUnits is .cpuAndGPU: 18 ms ⇒ 833 kB/s
- computeUnits is .all or .cpuAndNeuralEngine on iOS16: 700 ms ⇒ 22 kB/s
What explained the difference of loading time in function en the computeUnits mode ? Is there a way to reduce the loading time of the models when using the .all or .cpuAndNeuralEngine computeUnits mode ?