Hello, everyone,
I have been testing tensorflow-metal in my 2020 Macbook Pro (M1) running macOS 12.0.1 by performing the inference of a pre-trained model on a known dataset.
To my surprise, Tensorflow produces different (wrong) results when performing the inference using the Metal pluggable device GPU vs when performing it in the CPU.
I might very well be doing something wrong, but my test program is fairly simple:
#!/usr/bin/env python3
import pathlib
import numpy as np
import tensorflow as tf
from tensorflow import keras
def main(model_path, dataset_path):
# Print some system info
print('Tensorflow configuration:')
print(f'\tVersion: {tf.__version__}')
print('\tDevices usable by Tensorflow:')
for device in tf.config.get_visible_devices():
print(f'\t\t{device}')
# Load the model & the input data
model = keras.models.load_model(model_path)
matrix_data = np.genfromtxt(dataset_path)
matrix_data = matrix_data.reshape([1, matrix_data.shape[0], matrix_data.shape[1]])
# Perform inference in CPU
with tf.device('/CPU:0'):
prediction = model.predict(matrix_data)[1]
print('Model Evaluation on CPU')
print(f'\tPrediction: {prediction[0, 0]}')
# Perform inference in GPU
with tf.device('/GPU:0'):
prediction = model.predict(matrix_data)[1]
print('Model Evaluation on GPU')
print(f'\tPrediction: {prediction[0, 0]}')
if __name__ == "__main__":
main('model/model.h5', 'dataset/01.csv')
The CPU path produces a result of 4.890502452850342 and this is coherent with the results I'm seeing in Ubuntu Linux using CPU & GPU (CUDA) based inference. The GPU code path results in a prediction of 3.1839447021484375, which is way off.
I have set up a GitLab repo with all the resources required for replicating the problem here
This is quite concerning for me, since the big difference in results is something that I was not expecting and -if confirmed- makes me not trust the results provided by the Metal backend.
Am I doing something wrong? Is there any place where I can report this as a bug?