Installing Tensorflow-Metal on Mac M1 results in reduced model accuracy

Hello,everyone!

I recently purchased a MacBook Air with the M1 chip and used it for neural network training. While testing the VGG neural network on the CIFAR-10 dataset, I found the training speed to be too slow. Following a recommendation, I installed TensorFlow-Metal for hardware acceleration. Prior to installing TensorFlow-Metal, each epoch took approximately 12 minutes, and after 5 epochs, the model's accuracy reached 0.72.

However, after installing TensorFlow-Metal and conducting the model training again, the runtime was significantly reduced. Unfortunately, after 5 epochs, the accuracy remained between 0.1 to 0.2, almost equivalent to random selection. I am puzzled as to why this is happening.After installing TensorFlow-Metal, are there any specific considerations to keep in mind? What changes are required in the code compared to not installing it?

Hi @Peanu11,

Can you please provide us with a sample code so we can try and reproduce this on our end? Also what environment do you use: macOS and python versions? Do you use conda virtual env?

I had the same issue reported on a project of mine: https://github.com/msiemens/HypheNN-de/issues/6. I was able to reproduce this with macOS 13.5 on a Mac Studio M1 Max, Python 3.11.4 and Tensorflow 2.13.0. I installed the tensorflow-metal plugin in a virtual env using pip (following https://developer.apple.com/metal/tensorflow-plugin/). In my case, the model barely reaches 94 % accuracy using tensorflow-metal, compared to >99 % without it (72 % vs. 98 % in validation).

+1 Problem with tensorflow-metal. Using the basic autoencoder code from tensorflow.org the training and output are not as expected compared to the tutorial output. I describe the issue, code and results in StackOverflow.

import tensorflow as tf
import os
import numpy as np
from matplotlib import pyplot as plt
from tensorflow.keras.layers import Conv2D, BatchNormalization, Activation, MaxPool2D, Dropout, Flatten, Dense
from tensorflow.keras import Model

np.set_printoptions(threshold=np.inf)

cifar10 = tf.keras.datasets.cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0


class VGG16(Model):
    def __init__(self):
        super(VGG16, self).__init__()
        self.c1 = Conv2D(filters=64, kernel_size=(3, 3), padding='same')  # 卷积层1
        self.b1 = BatchNormalization()  # BN层1
        self.a1 = Activation('relu')  # 激活层1
        self.c2 = Conv2D(filters=64, kernel_size=(3, 3), padding='same', )
        self.b2 = BatchNormalization()  # BN层1
        self.a2 = Activation('relu')  # 激活层1
        self.p1 = MaxPool2D(pool_size=(2, 2), strides=2, padding='same')
        self.d1 = Dropout(0.2)  # dropout层

        self.c3 = Conv2D(filters=128, kernel_size=(3, 3), padding='same')
        self.b3 = BatchNormalization()  # BN层1
        self.a3 = Activation('relu')  # 激活层1
        self.c4 = Conv2D(filters=128, kernel_size=(3, 3), padding='same')
        self.b4 = BatchNormalization()  # BN层1
        self.a4 = Activation('relu')  # 激活层1
        self.p2 = MaxPool2D(pool_size=(2, 2), strides=2, padding='same')
        self.d2 = Dropout(0.2)  # dropout层

        self.c5 = Conv2D(filters=256, kernel_size=(3, 3), padding='same')
        self.b5 = BatchNormalization()  # BN层1
        self.a5 = Activation('relu')  # 激活层1
        self.c6 = Conv2D(filters=256, kernel_size=(3, 3), padding='same')
        self.b6 = BatchNormalization()  # BN层1
        self.a6 = Activation('relu')  # 激活层1
        self.c7 = Conv2D(filters=256, kernel_size=(3, 3), padding='same')
        self.b7 = BatchNormalization()
        self.a7 = Activation('relu')
        self.p3 = MaxPool2D(pool_size=(2, 2), strides=2, padding='same')
        self.d3 = Dropout(0.2)

        self.c8 = Conv2D(filters=512, kernel_size=(3, 3), padding='same')
        self.b8 = BatchNormalization()  # BN层1
        self.a8 = Activation('relu')  # 激活层1
        self.c9 = Conv2D(filters=512, kernel_size=(3, 3), padding='same')
        self.b9 = BatchNormalization()  # BN层1
        self.a9 = Activation('relu')  # 激活层1
        self.c10 = Conv2D(filters=512, kernel_size=(3, 3), padding='same')
        self.b10 = BatchNormalization()
        self.a10 = Activation('relu')
        self.p4 = MaxPool2D(pool_size=(2, 2), strides=2, padding='same')
        self.d4 = Dropout(0.2)

        self.c11 = Conv2D(filters=512, kernel_size=(3, 3), padding='same')
        self.b11 = BatchNormalization()  # BN层1
        self.a11 = Activation('relu')  # 激活层1
        self.c12 = Conv2D(filters=512, kernel_size=(3, 3), padding='same')
        self.b12 = BatchNormalization()  # BN层1
        self.a12 = Activation('relu')  # 激活层1
        self.c13 = Conv2D(filters=512, kernel_size=(3, 3), padding='same')
        self.b13 = BatchNormalization()
        self.a13 = Activation('relu')
        self.p5 = MaxPool2D(pool_size=(2, 2), strides=2, padding='same')
        self.d5 = Dropout(0.2)

        self.flatten = Flatten()
        self.f1 = Dense(512, activation='relu')
        self.d6 = Dropout(0.2)
        self.f2 = Dense(512, activation='relu')
        self.d7 = Dropout(0.2)
        self.f3 = Dense(10, activation='softmax')

    def call(self, x):
        x = self.c1(x)
        x = self.b1(x)
        x = self.a1(x)
        x = self.c2(x)
        x = self.b2(x)
        x = self.a2(x)
        x = self.p1(x)
        x = self.d1(x)

        x = self.c3(x)
        x = self.b3(x)
        x = self.a3(x)
        x = self.c4(x)
        x = self.b4(x)
        x = self.a4(x)
        x = self.p2(x)
        x = self.d2(x)

        x = self.c5(x)
        x = self.b5(x)
        x = self.a5(x)
        x = self.c6(x)
        x = self.b6(x)
        x = self.a6(x)
        x = self.c7(x)
        x = self.b7(x)
        x = self.a7(x)
        x = self.p3(x)
        x = self.d3(x)

        x = self.c8(x)
        x = self.b8(x)
        x = self.a8(x)
        x = self.c9(x)
        x = self.b9(x)
        x = self.a9(x)
        x = self.c10(x)
        x = self.b10(x)
        x = self.a10(x)
        x = self.p4(x)
        x = self.d4(x)

        x = self.c11(x)
        x = self.b11(x)
        x = self.a11(x)
        x = self.c12(x)
        x = self.b12(x)
        x = self.a12(x)
        x = self.c13(x)
        x = self.b13(x)
        x = self.a13(x)
        x = self.p5(x)
        x = self.d5(x)

        x = self.flatten(x)
        x = self.f1(x)
        x = self.d6(x)
        x = self.f2(x)
        x = self.d7(x)
        y = self.f3(x)
        return y


model = VGG16()

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['sparse_categorical_accuracy'])

checkpoint_save_path = "./checkpoint/VGG16.ckpt"
if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True)

history = model.fit(x_train, y_train, batch_size=32, epochs=5, validation_data=(x_test, y_test), validation_freq=1,
                    callbacks=[cp_callback])
model.summary()

# print(model.trainable_variables)
file = open('./weights.txt', 'w')
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

###############################################    show   ###############################################

# 显示训练集和验证集的acc和loss曲线
acc = history.history['sparse_categorical_accuracy']
val_acc = history.history['val_sparse_categorical_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.show()

I'm experiencing the same thing with tensorflow-metal. I get a significant speed up when training a model with vs without, however the loss steadily increases after a few epochs. In addition to that, my console is spammed with the following message:

metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation.

I've done quite a bit of searching trying to take advantage of the M1 Pro with tensorflow and it seems to be nothing but problems.

Hi Everyone, i am new to ML but encounter the same issue. uninstalling the metal extension is giving better results in training. hereafter is my notebook should you want to reproduce the issue and share results. best regards Eric (rock, paper, scisors tensorflow's example) https://github.com/etiquet/pierre-feuille-ciseau . for precision : the shown notebook are result example shown are WITHOUT the GPU. The one with the GPU does not activante the dense layers and have much lower accuracy.

In my iMac 5k Retina 2017 with Radeon Pro 580 I've been able to get to a simpler testccase (a hidden layer with MNIST)

Moreover, I noticed that a set of random values gets more decimal figures in GPU than in CPU.

So with this code

import tensorflow as tf   # TensorFlow registers PluggableDevices here.

with tf.device("/GPU:0"):
    tf.random.set_seed(1972)
    agpu = tf.random.normal(shape=[5], dtype=tf.float32)
    print("GPU - Random",agpu)

with tf.device("/CPU:0"):
    tf.random.set_seed(1972)
    acpu = tf.random.normal(shape=[5], dtype=tf.float32)
    print("CPU - Random",acpu)

print("equal ", tf.equal(agpu, acpu))

I get

GPU - Random tf.Tensor([-0.88528407  0.33968228 -2.0363083   1.1200726  -1.0055897 ], shape=(5,), dtype=float32)
CPU - Random tf.Tensor([-0.8852841  0.3396823 -2.036308   1.1200724 -1.00559  ], shape=(5,), dtype=float32)
equal  tf.Tensor([False False False False False], shape=(5,), dtype=bool)

If I do the same in colab (Google) I get

GPU - Random tf.Tensor([-0.8852841  0.3396823 -2.036308   1.1200724 -1.00559  ], shape=(5,), dtype=float32)
CPU - Random tf.Tensor([-0.8852841  0.3396823 -2.036308   1.1200724 -1.00559  ], shape=(5,), dtype=float32)
equal  tf.Tensor([ True  True  True  True  True], shape=(5,), dtype=bool)```
Installing Tensorflow-Metal on Mac M1 results in reduced model accuracy
 
 
Q