josebagar’s Profile | Apple Developer Forums

Inconsistent results when performing inference in CPU vs Metal

Hello, everyone, I have been testing tensorflow-metal in my 2020 Macbook Pro (M1) running macOS 12.0.1 by performing the inference of a pre-trained model on a known dataset. To my surprise, Tensorflow produces different (wrong) results when performing the inference using the Metal pluggable device GPU vs when performing it in the CPU. I might very well be doing something wrong, but my test program is fairly simple: #!/usr/bin/env python3 import pathlib import numpy as np import tensorflow as tf from tensorflow import keras def main(model_path, dataset_path): # Print some system info print('Tensorflow configuration:') print(f'\tVersion: {tf.__version__}') print('\tDevices usable by Tensorflow:') for device in tf.config.get_visible_devices(): print(f'\t\t{device}') # Load the model & the input data model = keras.models.load_model(model_path) matrix_data = np.genfromtxt(dataset_path) matrix_data = matrix_data.reshape([1, matrix_data.shape[0], matrix_data.shape[1]]) # Perform inference in CPU with tf.device('/CPU:0'): prediction = model.predict(matrix_data)[1] print('Model Evaluation on CPU') print(f'\tPrediction: {prediction[0, 0]}') # Perform inference in GPU with tf.device('/GPU:0'): prediction = model.predict(matrix_data)[1] print('Model Evaluation on GPU') print(f'\tPrediction: {prediction[0, 0]}') if __name__ == "__main__": main('model/model.h5', 'dataset/01.csv') The CPU path produces a result of 4.890502452850342 and this is coherent with the results I'm seeing in Ubuntu Linux using CPU & GPU (CUDA) based inference. The GPU code path results in a prediction of 3.1839447021484375, which is way off. I have set up a GitLab repo with all the resources required for replicating the problem here This is quite concerning for me, since the big difference in results is something that I was not expecting and -if confirmed- makes me not trust the results provided by the Metal backend. Am I doing something wrong? Is there any place where I can report this as a bug?

Machine Learning & AI General tensorflow-metal

1.3k

Dec ’21

josebagar

Post

Replies

Boosts

Views

Activity