TensorFlow model predictions are incorrect on M1 GPU

I have a TensorFlow 2.x object detection model (ssd resnet50 v1) that was trained on an Ubuntu 20.04 box with a GPU.

The predictions from the model preform as expected on Linux CPU&GPU, Windows 10 CPU&GPU, and Intel MacBook Air CPU, and the M1 MacBook Air CPU.

However, when I install the tensorflow-metal plugin on the M1, I can see the GPU is being used but the predictions are garbage.

I followed these install instruction:

https://developer.apple.com/metal/tensorflow-plugin/

Which gives me:

  • tensorflow-macos 2.6.0
  • tensorflow-metal 0.2.0

and

  • Python 3.9.5

Anyone have insight as to what may be the problem? The M1 Air is running the public release of Monterey.

UPDATE: It may be something specific to the SSD Resnet50 v1 architecture. I have several other models built with the same pipeline and data which do seem to be working.

Hi AdkPete, I have the same problems here. I compared my results with the same model on Windows 10, Intel MacBook and Linux CPU. The predictions are very bad when I installed TensorFlow-metal plugin. So far I created another environment without this thing and use CPU only to train the model. Do you have any idea about how we deal with it? Many thanks.

Mona190: I don't have a fix and the other models that I thought were working, are not actually working. I looked at the outputs from the models and scores are numbers like 90000 when they should be between 0 and 1. Also the output of a prediction can produce wacky non-existent class values. Something is very wrong with the plugin.

I am experiencing the same issue too with a custom dataset. I am using the faster_rcnn_inception_resnet_v2_1024x1024_coco17_tpu model from tensorflow model zoo.

Trained under Ubuntu 20.04 using CPU, and the eval results come out okay.

When I trained the data using metal on macOS 12.0.1, the prediction results of the training evaluation are garbage. They don't align with class names either. Tensorboard shows the bounding boxes without class labels, and most bounding boxes are scattered in bottom left corner.

I also reran just the tensorflow eval using Metal on the checkpoints from the Ubuntu CPU training (which gave good results on Ubuntu) and output was also garbage. I used the tensorflow model_main_tf2.py script to train and evaluate in all experiments.

So I can also reproduce your issue @AdkPete. Something isn't right here.

Versions for me are:

tensorflow-macos 2.6.0 tensorflow-metal 0.2.0 tf-models-official 2.6.0 tensorflow-deps 2.6.0

python 3.8.12 macOS 12.0.1

16" M1 Max Macbook Pro

A follow-up. I uninstalled the metal package and re-ran the training and evaluation using the m1 CPUs instead. Like @AdkPete, I was able to get sensible output. I am seeing reports from other users such as @radagast who are experiencing similar incorrect output from metal, and were able to provide a reproducible recipe (Juypter notebook) for the Apple engineers to work with. The engineers haven't acknowledged there is a problem yet though.

(Accidentally wrote an answer when I meant to write a comment and now I can't delete this — terrible forum UX)

I do not experience this issue with simple models (single input/output ~100k params, some LSTM and dense layers), but am seeing it for a larger and more complex model (~3M params, multi input/output, LSTM + self-attention + cnn layers). For this larger model, training hangs at a very high loss and even pre-trained models that I know perform well will evaluate at extremely high loss and the model.predict() function on reasonable data will return almost random vectors. I've run the exact same code with the same data on a Vertex AI VM with a Tesla GPU and there I see the results I would expect (decreasing loss, decent evaluation loss for pre-trained models, reasonable inference outputs).

I've been experiencing this issue with the official faster_rcnn_inception_resnet_v2_1024x1024_coco17_tpu tensorflow model since I started experimenting with tensorflow metal version 0.2.0. Tested 0.4.0 last week with the same training data and the problem persists. Predictions that work when trained on M1 CPU, and other hardware come out fine. On tf metal, training results visualised through tensorboard are nonsense. Objects are located in a number of fixed positions within images, and prediction scores are substantially greater than 100%.

@damoclark I too tried tensorflow-metal 0.4.0 last week and confirm the problem still persists. If you load a model and run a prediction on a single image, the predictions are correct. It is the subsequent predictions that are providing nonsense results.

We are developing a simple GAN an when training the solution, the behavior of the convergence of the discriminator is different if we use GPU than using only CPU or even executing in Collab. We've read a lot, but this is the only one post that seems to talk about similar behavior. Unfortunately, after updating to 0.4 version problem persists. My Hardware/Software: MacBook Pro. model: MacBookPro18,2. Chip: Apple M1 Max. Cores: 10 (8 de rendimiento y 2 de eficiencia). Memory: 64 GB. firmware: 7459.101.3. OS: Monterey 12.3.1. OS Version: 7459.101.3. Python version 3.8 and libraries (the most related) using !pip freeze keras==2.8.0 Keras-Preprocessing==1.1.2 .... tensorboard==2.8.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorflow-datasets==4.5.2 tensorflow-docs @ git+https://github.com/tensorflow/docs@7d5ea2e986a4eae7573be3face00b3cccd4b8b8b%C2%A0tensorflow-macos==2.8.0 tensorflow-metadata==1.7.0 tensorflow-metal==0.4.0 #####. CODE TO REPRODUCE. ####### Code does not fit in the max space in this message... I've shared a Google Collab Notebook at: https://colab.research.google.com/drive/1oDS8EV0eP6kToUYJuxHf5WCZlRL0Ypgn?usp=sharing You can easily see that loss goes to 0 after 1 or 2 epochs when GPU is enabled, buy if GPU is disabled everything is OK

TensorFlow model predictions are incorrect on M1 GPU
 
 
Q