Hello, everyone,
I have been testing tensorflow-metal in my 2020 Macbook Pro (M1) running macOS 12.0.1 by performing the inference of a pre-trained model on a known dataset.
To my surprise, Tensorflow produces different (wrong) results when performing the inference using the Metal pluggable device GPU vs when performing it in the CPU.
I might very well be doing something wrong, but my test program is fairly simple:
#!/usr/bin/env python3
import pathlib
import numpy as np
import tensorflow as tf
from tensorflow import keras
def main(model_path, dataset_path):
# Print some system info
print('Tensorflow configuration:')
print(f'\tVersion: {tf.__version__}')
print('\tDevices usable by Tensorflow:')
for device in tf.config.get_visible_devices():
print(f'\t\t{device}')
# Load the model & the input data
model = keras.models.load_model(model_path)
matrix_data = np.genfromtxt(dataset_path)
matrix_data = matrix_data.reshape([1, matrix_data.shape[0], matrix_data.shape[1]])
# Perform inference in CPU
with tf.device('/CPU:0'):
prediction = model.predict(matrix_data)[1]
print('Model Evaluation on CPU')
print(f'\tPrediction: {prediction[0, 0]}')
# Perform inference in GPU
with tf.device('/GPU:0'):
prediction = model.predict(matrix_data)[1]
print('Model Evaluation on GPU')
print(f'\tPrediction: {prediction[0, 0]}')
if __name__ == "__main__":
main('model/model.h5', 'dataset/01.csv')
The CPU path produces a result of 4.890502452850342
and this is coherent with the results I'm seeing in Ubuntu Linux using CPU & GPU (CUDA) based inference. The GPU code path results in a prediction of 3.1839447021484375
, which is way off.
I have set up a GitLab repo with all the resources required for replicating the problem here
This is quite concerning for me, since the big difference in results is something that I was not expecting and -if confirmed- makes me not trust the results provided by the Metal backend.
Am I doing something wrong? Is there any place where I can report this as a bug?
Hi @josebagar,
Thanks for reporting this issue. I am able reproduce it locally and have triaged it to the Conv1D layer when running on the GPU with certain combinations of the input parameters. We will update here once we have a solution for the problem.