Training LSTM: 100x Slower on M1 GPU vs. CPU

Summary: Training an LSTM on M1 GPU vs CPU shows an astounding 168x slower training per epoch. This is based on a relatively simple example chosen for reproducibility:

https://www.machinecurve.com/index.php/2021/01/07/build-an-lstm-model-with-tensorflow-and-keras/#full-model-code

Steps to Reproduce:

  1. git clone https://github.com/radagast-the-brown/tf2-keras-lstm-sample.git

  2. cd tf2-keras-lstm-sample

  3. python lstm.py

  4. Results:

M1 CPU Compute time: 7s per epoch Loss: 0.34 - Accuracy: 86%

M1 GPU (tensorflow-metal) Compute time: > 2h per epoch Didn't allow to finish.

System Details: Model: Macbook Pro (16-inch, 2021) Chip: Apple M1 Max Memory: 64GB OS: MacOS 12.0.1 Key Libraries: tensforflow-metal (0.2), tensorflow-macos (2.6.0), python (3.9.7)

Math Correction: 7s per epoch vs. 2hrs per epoch (i.e. 60min/hr * 60s/min * 2hrs = 7200 seconds), which means that M1 GPU is on the order of 1000x slower than M1 CPU in this case (i.e. 7200s / 7s ~ 1,029)

remove line 22 .

tf.compat.v1.disable_eager_execution()

and miracle happens

Training LSTM: 100x Slower on M1 GPU vs. CPU
 
 
Q