Performance issue on Macbook Pro M1

System information

  • Script can be found below
  • MacBook Pro M1 (Mac OS Big Sir (11.5.1))
  • TensorFlow installed from (source)
  • TensorFlow version (2.5 version) with Metal Support
  • Python version: 3.9
  • GPU model and memory: MacBook Pro M1 and 16 GB

Steps needed for installing Tensorflow with metal support. https://developer.apple.com/metal/tensorflow-plugin/

I am trying to train a model on Macbook Pro M1, but the performance is so bad and the train doesn't work properly. It takes a ridiculously long time just for a single epoch.

Code needed for reproducing this behavior.

import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.layers import Embedding, Dense, LSTM
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Model configuration
additional_metrics = ['accuracy']
batch_size = 128
embedding_output_dims = 15
loss_function = BinaryCrossentropy()
max_sequence_length = 300
num_distinct_words = 5000
number_of_epochs = 5
optimizer = Adam()
validation_split = 0.20
verbosity_mode = 1

# Disable eager execution
tf.compat.v1.disable_eager_execution()

# Load dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=num_distinct_words)
print(x_train.shape)
print(x_test.shape)

# Pad all sequences
padded_inputs = pad_sequences(x_train, maxlen=max_sequence_length, value = 0.0) # 0.0 because it corresponds with <PAD>
padded_inputs_test = pad_sequences(x_test, maxlen=max_sequence_length, value = 0.0) # 0.0 because it corresponds with <PAD>

# Define the Keras model
model = Sequential()
model.add(Embedding(num_distinct_words, embedding_output_dims, input_length=max_sequence_length))
model.add(LSTM(10))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer=optimizer, loss=loss_function, metrics=additional_metrics)

# Give a summary
model.summary()

# Train the model
history = model.fit(padded_inputs, y_train, batch_size=batch_size, epochs=number_of_epochs, verbose=verbosity_mode, validation_split=validation_split)

# Test the model after training
test_results = model.evaluate(padded_inputs_test, y_test, verbose=False)
print(f'Test results - Loss: {test_results[0]} - Accuracy: {100*test_results[1]}%')

I have noticed this same problem with LSTM layers

Also, this issue is been reported in Keras and they can't debug.

Keras issue https://github.com/keras-team/keras/issues/15003

I tried for few hours, due to slow training I only trained for 1 epoch, this is a log

2021-07-26 23:09:28.130352: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-07-26 23:09:28.185390: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-07-26 23:09:28.217406: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-07-26 23:09:28.229984: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. Epoch 1/1 20000/20000 [==============================] - loss: 0.5489 - accuracy: 0.6923 --- 6894.8485770225524902 seconds ---

Just for one epoch, it takes around 2 hours that's a nightmare

It is not fair to achieve TensorFlow repo, before fixing issues of code

Hi @OriAlpha, We recommend users to upgrade to 12.0 for best support and performance of Metal plugin. I tried the attached script with MacOS 12.0 on a M1 machine and Tensorflow-metal==0.1.2 (I recommend updating to latest metal plugin version). And I got following performance. Please let us know if that helps.

2021-08-24 23:20:50.927094: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.

157/157 [==============================] - 46s 271ms/step - loss: 0.6877 - accuracy: 0.5416 - val_loss: 0.6579 - val_accuracy: 0.6034

Epoch 2/5

157/157 [==============================] - 38s 243ms/step - loss: 0.5634 - accuracy: 0.7459 - val_loss: 0.4508 - val_accuracy: 0.8192

Epoch 3/5

157/157 [==============================] - 38s 244ms/step - loss: 0.4140 - accuracy: 0.8303 - val_loss: 0.3805 - val_accuracy: 0.8410

Epoch 4/5

157/157 [==============================] - 38s 245ms/step - loss: 0.3474 - accuracy: 0.8609 - val_loss: 0.4135 - val_accuracy: 0.8380

Epoch 5/5

157/157 [==============================] - 39s 251ms/step - loss: 0.3075 - accuracy: 0.8814 - val_loss: 0.3535 - val_accuracy: 0.8554


I saw the same issue, over 7000 seconds per epoch and a lot of warning messages. Then I tried with tf.device("/gpu:0"). Each epoch takes about 38 seconds. However, then I tried with tf.device("/cpu:0"). Each epoch takes only about 7 seconds. So GPU performance is still awful.

I have not yet found a neural net architecture where the M1 GPU is faster than the CPU. For matrix multiplication, the GPU can be 9x faster, but this does not carry over to network training.

Based on other threads and on the comment above by an Apple engineer, it looks like the Apple team doesn't even realize how bad their TensorFlow speed is.

MacBook Air M1 (Mac OS 12 beta) TensorFlow version (2.5 version) with Metal Support Python version: 3.8 GPU model and memory: MacBook Air M1 and 16 GB

I have the exact same problem!! Started noticing really long training times for a simple BLSTM, and decided to test the above code. I'm also using MacBook Air M1 (Mac OS 12 beta) TensorFlow version (2.5 version) with Metal Support Python version: 3.9 GPU model and memory: MacBook Air M1 and 16 GB. This completely undermines my work! Apple should do something!

Yep for me both CPU and GPU performance are not good at all, a relatively simple CNN on a free google colab (with a K80) took about 7 minutes to train, while this same model took about 30minutes on GPU and 42 on CPU in tf 2.6 on my mac mini m1 16gb.

I have seen multiple posts of people experiencing the same issue and the solution always seems to be that you need to upgrade to 12.0 or use CPU (for smaller batch sizes), which both don't seem to fix the issue at hand for most cases.

I would really expect Apple to come up some solution to this, it has been a year since this m1 model was released and I am paying for 3 party notebooks while I would expect such an optimised machine for ML (according to the marketing) to be able to at least run tf at a similar pace as a free colab notebook.

Hello, Today, I stil getting the same issue in 2022.

it seems the problem has never been solved... I will start started un class on Tensor soon and getting something whitch is very slow like this, that is just so awful.

I don't have choice to use google collab..

Any new update for 22/12/2022 ?

Same issue in 2023

Apple M laptop doesnt care about providing support, if your tasks are GPU and ML use nvidia GPU’s those are best, works out of box.

I am having the same issue -- 4/14/2023 -- Not to mention that I still get the warning to use the from keras.optimizers import Adam as AdamLegacy to make my binary classifier work. Is there any update I should be aware of?

Als0 I don't see a distribution for tensorflow-metal==0.12.0 (latest version is 0.8.0) where can I get it?

Same issue on Tensorflow and the newest Sys env. MacOS 14.0 Beta (23A5286i) Pls help us dear apple!

October 2023 and the issue is still there -- after my upgrade to Sonoma OS I can't get my tensorflow metal to behave well with batch-size of 128 -- I used to run at 64 just fine (it was speedy) and now with higher batches I do see some (not great) performance improvements but the model overfits with large batch sizes. I have read the blogs for all sort of suggestions, reverting back to older version of TF for MAC (I don't want to do that). One suggestion I saw from some postings is to disable GPU alltogether -- anyone had any succces with that?

Hey team, any update on this? Still having the issue with next env: absl-py==1.3.0 aio-pika==8.2.3 aiofiles==22.1.0 aiogram==2.23.1 aiohttp==3.8.3 aiormq==6.4.0 aiosignal==1.3.1 APScheduler==3.9.1.post1 astunparse==1.6.3 async-timeout==4.0.2 attrs==22.1.0 Babel==2.9.1 bert-serving-client==1.10.0 bidict==0.22.1 boto3==1.26.136 botocore==1.29.136 CacheControl==0.12.11 cachetools==5.2.1 certifi==2023.7.22 cffi==1.15.1 charset-normalizer==2.1.1 click==8.1.3 cloudpickle==2.2.0 colorclass==2.2.2 coloredlogs==15.0.1 colorhash==1.2.1 confluent-kafka==1.9.2 cryptography==41.0.7 cycler==0.11.0 dask==2022.10.2 dnspython==2.3.0 docopt==0.6.2 fbmessenger==6.0.0 fire==0.5.0 flatbuffers fonttools==4.38.0 frozenlist==1.3.3 fsspec==2022.11.0 future==0.18.3 gast==0.2.1 google-auth==2.16.0 google-auth-oauthlib==0.4.1 google-pasta==0.2.0 greenlet==3.0.3 grpcio==1.51.1 h5py==3.10.0 httptools==0.5.0 humanfriendly==10.0 idna==3.4 jmespath==1.0.1 joblib==1.2.0 jsonpickle==2.2.0 jsonschema==4.16.0 keras Keras-Preprocessing==1.1.2 kiwisolver==1.4.4 libclang==15.0.6.1 locket==1.0.0 magic-filter==1.0.9 Markdown==3.4.1 MarkupSafe==2.1.2 matplotlib==3.5.3 mattermostwrapper==2.2 msgpack==1.0.4 multidict==5.2.0 networkx==2.6.3 numpy==1.23.5 oauthlib==3.2.2 opt-einsum==3.3.0 packaging pamqp==3.2.0 partd==1.3.0 Pillow==9.4.0 pip==22.3.1 pluggy==1.0.0 prompt-toolkit==3.0.28 protobuf psycopg2-binary==2.9.5 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser==2.21 pydot==1.4.2 PyJWT==2.6.0 pykwalify==1.8.0 pymongo==4.0.1 pyparsing==3.0.9 pyrsistent==0.19.3 python-crfsuite==0.9.8 python-dateutil==2.8.2 python-engineio==4.3.4 python-socketio==5.7.2 pytz==2022.7.1 pytz-deprecation-shim==0.1.0.post0 PyYAML==6.0.1 pyzmq==25.0.0 questionary==1.10.0 randomname==0.1.5 rasa rasa-sdk redis==4.5.3 regex==2022.10.31 requests==2.28.2 requests-oauthlib==1.3.1 requests-toolbelt==0.10.1 rocketchat-API==1.28.1 rsa==4.9 ruamel.yaml==0.17.21 ruamel.yaml.clib==0.2.7 s3transfer==0.6.0 sanic==21.12.2 Sanic-Cors==2.0.1 sanic-jwt==1.8.0 sanic-routing==0.7.2 scikit-learn==1.1.3 scipy==1.12 sentry-sdk==1.11.1 setuptools==65.6.3 six sklearn-crfsuite==0.3.6 slack-sdk==3.19.5 SQLAlchemy==1.4.46 tabulate==0.9.0 tarsafe==0.0.3 tensorboard==2.9 tensorboard-data-server tensorboard-plugin-wit==1.8.1 tensorflow-macos==2.9 tensorflow-metal==0.5.0 tensorflow-addons==0.18.0 tensorflow-estimator==2.9 tensorflow-hub==0.13.0 tensorflow-io-gcs-filesystem==0.36.0 tensorflow-text termcolor==2.2.0 terminaltables==3.1.10 threadpoolctl==3.1.0 toolz==0.12.0 tqdm==4.64.1 twilio==7.14.2 typeguard==2.13.3 typing_extensions==4.4.0 typing-utils==0.1.0 tzdata==2022.7 tzlocal==4.2 ujson==5.7.0 urllib3==1.26.14 uvloop==0.17.0 wcwidth==0.2.6 webexteamssdk==1.6.1 websockets==10.4 Werkzeug==2.2.2 wheel==0.38.1 wrapt==1.14.1 yarl==1.8.2

Performance issue on Macbook Pro M1
 
 
Q