I'm trying to run a temporal neural net (essentially an LSTM with a convolutional featurizer) on iOS. It was originally trained in Pytorch and then converted to CoreML via onnx. It needs to run sequentially on video frames (i.e. cannot be parallelized).
In my Xcode unit tests, I always get the same run time (~0.06s or ~17 FPS on iPhone 11). When I actually run the app, however, I only achieve the 17FPS some of the time on an iPhone 11 - the other times, the run-time goes down to about ~0.01s (~10 FPS) on iPhone 11.
Firstly, I'm a bit surprised the model runs so slowly (17 FPS) at base value as I've tried many large off the shelf CNNs that can run at well over 30 FPS on iPhone 11. What's more concerning is that the run-time performance is inconsistent, and seems to depend on OTHER apps that I've backgrounded. If I force quit all other open apps, I can guarantee the unit test performacne of ~17FPS every single time I run the app!
My only guess is that my model is not running on Apple Neural Engine and is instead running on the CPU... otherwise, why would run-time performance depend on what other apps I have backgrounded?
In any case, any help or suggestions on my architecture would be greatly appreciated! I'm attaching a link to my 16-bit quantized model here.