The CoreML runtime is inconsistent.

for (int i = 0; i < 1000; i++){
     double st_tmp = CFAbsoluteTimeGetCurrent();
     retBuffer = [self.enhancer enhance:pixelBuffer error:&error];
     double et_tmp = CFAbsoluteTimeGetCurrent();
     NSLog(@"[enhance once] %f ms ", (et_tmp - st_tmp) * 1000);
}

When I run a CoreML model using the above code, I notice that the runtime gradually decreases at the beginning.

output:

[enhance once] 14.965057 ms
[enhance once] 12.727022 ms
[enhance once] 12.818098 ms
[enhance once] 11.829972 ms
[enhance once] 11.461020 ms
[enhance once] 10.949016 ms
[enhance once] 10.712981 ms
[enhance once] 10.367990 ms
[enhance once] 10.077000 ms
[enhance once] 9.699941 ms
[enhance once] 9.370089 ms
[enhance once] 8.634090 ms
[enhance once] 7.659078 ms
[enhance once] 7.061005 ms
[enhance once] 6.729007 ms
[enhance once] 6.603003 ms
[enhance once] 6.427050 ms
[enhance once] 6.376028 ms
[enhance once] 6.509066 ms
[enhance once] 6.452084 ms
[enhance once] 6.549001 ms
[enhance once] 6.616950 ms
[enhance once] 6.471038 ms
[enhance once] 6.462932 ms
[enhance once] 6.443977 ms
[enhance once] 6.683946 ms
[enhance once] 6.538987 ms
[enhance once] 6.628990 ms
...

In most deep learning inference frameworks, there is usually a warmup process, but typically, only the first inference is slower. Why does CoreML have a decreasing runtime at the beginning? Is there a way to make only the first inference time longer, while keeping the rest consistent?

I use the CoreML model in the (void)display_pixels:(IJKOverlay *)overlay function.

Core ML prediction latency should be reasonably stable after the first prediction, so these numbers look odd to me.

Do you observe the same thing if you measure only the -[MLModel predictionFromFeatures:] method? I suspect there are other activities happening outside of Core ML.

Core ML template in Instruments (included in Xcode) would give us more insight. (https://developer.apple.com/videos/play/wwdc2022/10027/)

The CoreML runtime is inconsistent.
 
 
Q