I recently got an M1 Mac mini and I was doing some testing in TensorFlow. I installed tensorflow-macos to do CPU testing and tensorflow-metal to do GPU testing.
I followed the procedure here: https://developer.apple.com/metal/tensorflow-plugin/ to install tensorflow-metal. I did not see any warnings or error during the installation process.
I was pleased to see that TensorFlow CPU testing on the M1 went smoothly.
I then tested the M1 integrated GPU to see if everything is working correctly by running sample TensorFlow code:
from tensorflow import keras
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
y_train = y_train[:1000]
y_test = y_test[:1000]
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train = x_train[:1000].reshape(-1, 28*28)
x_test = x_test[:1000].reshape(-1, 28*28)
def create_model():
model = tf.keras.models.Sequential([
keras.layers.Dense(512, activation='relu', input_shape=(784,)),
keras.layers.Dropout(0.2),
keras.layers.Dense(1000, activation='relu'),
keras.layers.Dense(10)
])
model.compile(optimizer='adam',
loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.metrics.SparseCategoricalAccuracy()])
return model
# Create a basic model instance
model = create_model()
# Display the model's architecture
model.summary()
predictions = model(x_train[:1]).numpy()
model.fit(x_train, y_train, epochs=10)
loss, acc = model.evaluate(x_test, y_test, verbose=2)
print("Accuracy: {:5.2}%".format(100*acc))
When I tried to run this, the program stopped entirely with some strange errors which I could trace back to the model.fit
line. I have attached an error log file containing the exact terminal output.
Switching the optimizer in the model.compile
line to sgd
allows the program to complete, but accuracy and loss is stuck at 1000.00:
2021-07-02 10:27:26.486274: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
32/32 [==============================] - 0s 5ms/step - loss: 1000.0000 - sparse_categorical_accuracy: 1000.0000
Epoch 2/10
32/32 [==============================] - 0s 5ms/step - loss: 1000.0000 - sparse_categorical_accuracy: 1000.0000
Epoch 3/10
32/32 [==============================] - 0s 5ms/step - loss: 1000.0000 - sparse_categorical_accuracy: 1000.0000
Epoch 4/10
32/32 [==============================] - 0s 5ms/step - loss: 1000.0000 - sparse_categorical_accuracy: 1000.0000
Epoch 5/10
32/32 [==============================] - 0s 5ms/step - loss: 1000.0000 - sparse_categorical_accuracy: 1000.0000
Epoch 6/10
32/32 [==============================] - 0s 5ms/step - loss: 1000.0000 - sparse_categorical_accuracy: 1000.0000
Epoch 7/10
32/32 [==============================] - 0s 5ms/step - loss: 1000.0000 - sparse_categorical_accuracy: 1000.0000
Epoch 8/10
32/32 [==============================] - 0s 5ms/step - loss: 1000.0000 - sparse_categorical_accuracy: 1000.0000
Epoch 9/10
32/32 [==============================] - 0s 5ms/step - loss: 1000.0000 - sparse_categorical_accuracy: 1000.0000
Epoch 10/10
32/32 [==============================] - 0s 5ms/step - loss: 1000.0000 - sparse_categorical_accuracy: 1000.0000
2021-07-02 10:27:28.218923: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
32/32 - 0s - loss: 1000.0000 - sparse_categorical_accuracy: 1000.0000
Accuracy: 1e+05%
During my testing, I also tried giving a pre-trained network and only testing ML inferencing performance. To be specific, this inferencing benchmark I set up used 10,000 upscaled CIFAR10 images through a pre-trained ResNet50 network. I found that this worked completely on both the M1 CPU and GPU, but was about twice as slow on the GPU as the CPU running the exact same code. I was surprised by this result since I expected the GPU to outperform the CPU. I also found, through sudo powermetrics
logs, that GPU power consumption was almost twice that of the CPU, around 10 Watts on the GPU and around 5 Watts on the CPU. These tests were performed separately.
It appears that my issue is specific to training on the GPU, however I'm wondering if there is a larger issue here that results in poor optimization on the inferencing side too.
What I have tried to fix these issues:
- Full restart
- Verified on a friend's M1 Mac mini to see if it is specific to my device. They had the exact same issue as me on the training test.
Both tests (training and inferencing) are completely repeatable and occur the exactly the same way every time I have tried.