Recently, deep learning model have been getting larger, and sometimes loading models has become a bottleneck. I download the .mlpackage format CoreML from the internet and need to use compileModelAtURL to convert the .mlpackage into an .mlmodelc, then call modelWithContentsOfURL to convert the .mlmodelc into a handle. Generally, generating a handle with modelWithContentsOfURL is very slow. I noticed from WWDC 2023 that it is possible to cache the compiled results (see https://developer.apple.com/videos/play/wwdc2023/10049/?time=677, which states "This compilation includes further optimizations for the specific compute device and outputs an artifact that the compute device can run. Once complete, Core ML caches these artifacts to be used for subsequent model loads."). However, it seems that I couldn't find how to cache in the documentation.
how speed up modelWithContentsOfURL?
Hello @coreml_student, once you download a remote model, compile, and load it, Core ML caches the model's specialized assets on the disk for you, there is no further action needed on your implementation's side.
There are a few circumstances that may cause this cache to be freed, such as low storage space, system updates, and modifications to the mode itself, as the cache is tied to the model's path and configuration.
To investigate whether your implementation is using a cached model, while profiling your app with the Core ML instrument in the Instruments app, look for the "cached" label during the load event.