Tensorflow-Metal TFLite Inference orders of magnitude slower than regular Tensorflow

Hardware: 16" 2023 MBP M3 Pro OS: 14.4.1 Memory: 36 GB

python version: 3.8.16 TF-Metal version: tensorflow-metal 1.0.1 installed via pip TF version: 2.13.0

Tensorflow-Metal starts pretty slow, approximately 10s/iteration and over the course of 36 iteration progressively slows down to over 120s/iteration. Info log prints out that TFLite is using XNNPack. Can't share the TFLite model but it is relatively shallow, small, and simple.

Uninstalled TF-Metal, and installed tensorflow. Inference speed picks right up and is rock solid at 0.78s/iteration. What is going on???

**TLDR, TFLite inference speed:

TF Metal = 120s/iteration

TF = 0.78s/iteration**

It sounds like the issue is related to what is posted on this page: https://developer.apple.com/metal/tensorflow-pl

CPU performance is faster than GPU on your network. Find out if your workload is sufficient to take advantage of the GPU. On small networks running with small batch sizes, the CPU may perform faster overall due to the overhead related to dispatching computations to the GPU. This will get amortized when the batch or model sizes grow, since the GPU can then take better advantage of the parallelism in performing the computations.

Also, I was also experiencing memory leaks w/ tensorflow-metal when doing large hypertuning runs, and originally thought it must have been related to my model, but since switching to CPU haven't experienced the same issues.

Tensorflow-Metal TFLite Inference orders of magnitude slower than regular Tensorflow
 
 
Q