Inconsistent results when performing inference in CPU vs Metal

Hello, everyone,

I have been testing tensorflow-metal in my 2020 Macbook Pro (M1) running macOS 12.0.1 by performing the inference of a pre-trained model on a known dataset.

To my surprise, Tensorflow produces different (wrong) results when performing the inference using the Metal pluggable device GPU vs when performing it in the CPU.

I might very well be doing something wrong, but my test program is fairly simple:

#!/usr/bin/env python3

import pathlib
import numpy as np
import tensorflow as tf
from tensorflow import keras


def main(model_path, dataset_path):
    # Print some system info
    print('Tensorflow configuration:')
    print(f'\tVersion: {tf.__version__}')
    print('\tDevices usable by Tensorflow:')
    for device in tf.config.get_visible_devices():
        print(f'\t\t{device}')

    # Load the model & the input data
    model = keras.models.load_model(model_path)
    matrix_data = np.genfromtxt(dataset_path)
    matrix_data = matrix_data.reshape([1, matrix_data.shape[0], matrix_data.shape[1]])

    # Perform inference in CPU
    with tf.device('/CPU:0'):
        prediction = model.predict(matrix_data)[1]
        print('Model Evaluation on CPU')
        print(f'\tPrediction: {prediction[0, 0]}')

    # Perform inference in GPU
    with tf.device('/GPU:0'):
        prediction = model.predict(matrix_data)[1]
        print('Model Evaluation on GPU')
        print(f'\tPrediction: {prediction[0, 0]}')


if __name__ == "__main__":
    main('model/model.h5', 'dataset/01.csv')

The CPU path produces a result of 4.890502452850342 and this is coherent with the results I'm seeing in Ubuntu Linux using CPU & GPU (CUDA) based inference. The GPU code path results in a prediction of 3.1839447021484375, which is way off.

I have set up a GitLab repo with all the resources required for replicating the problem here

This is quite concerning for me, since the big difference in results is something that I was not expecting and -if confirmed- makes me not trust the results provided by the Metal backend.

Am I doing something wrong? Is there any place where I can report this as a bug?

Answered by Frameworks Engineer in 697255022

Hi @josebagar,

Thanks for reporting this issue. I am able reproduce it locally and have triaged it to the Conv1D layer when running on the GPU with certain combinations of the input parameters. We will update here once we have a solution for the problem.

I tested your code with the latest version:

  • tensorflow-mac==2.7.0
  • tensorflow-metal==0.3.0

The bug still is there:

CPU Prediction: 4.890502452850342

GPU Prediction: 3.1839447021484375

Indeed, very concerning!

Hopefully someone from apple sees this!!

Ps: other reports:

Accepted Answer

Hi @josebagar,

Thanks for reporting this issue. I am able reproduce it locally and have triaged it to the Conv1D layer when running on the GPU with certain combinations of the input parameters. We will update here once we have a solution for the problem.

Inconsistent results when performing inference in CPU vs Metal
 
 
Q