radagast’s Profile | Apple Developer Forums

Training LSTM: Low Accuracy on M1 GPU vs CPU

Summary: I have noticed low test accuracy during and after training Tensorflow LSTM models on M1-based Macs with tensorflow-metal/GPU. While I have observed this over more than one model, I chose to use a standard Github repo to reproduce the problem, utilizing a community and educational example based on the Sequential Models Course, Deep Learning Specialization (Andrew Ng). Steps to Reproduce: git clone https://github.com/omerbsezer/LSTM_RNN_Tutorials_with_Demo.git cd LSTM_RNN_Tutorials_with_Demo/SentimentAnalysisProject python main.py Results Test accuracy (CPU only, without tensorflow-metal): ~83% Test accuracy (GPU using tensorflow-metal): ~37% A similar pattern can be observed in epoch steps for accuracy, loss etc. System Details Model: Macbook Pro (16-inch, 2021) Chip: Apple M1 Max Memory: 64GB OS: MacOS 12.0.1 Key Libraries: tensforflow-metal (0.2), tensorflow-macos (2.6.0), python (3.9.7)

Machine Learning & AI General tensorflow-metal

2

1

1.1k

Nov ’21

Training LSTM: 100x Slower on M1 GPU vs. CPU

Summary: Training an LSTM on M1 GPU vs CPU shows an astounding 168x slower training per epoch. This is based on a relatively simple example chosen for reproducibility: https://www.machinecurve.com/index.php/2021/01/07/build-an-lstm-model-with-tensorflow-and-keras/#full-model-code Steps to Reproduce: git clone https://github.com/radagast-the-brown/tf2-keras-lstm-sample.git cd tf2-keras-lstm-sample python lstm.py Results: M1 CPU Compute time: 7s per epoch Loss: 0.34 - Accuracy: 86% M1 GPU (tensorflow-metal) Compute time: > 2h per epoch Didn't allow to finish. System Details: Model: Macbook Pro (16-inch, 2021) Chip: Apple M1 Max Memory: 64GB OS: MacOS 12.0.1 Key Libraries: tensforflow-metal (0.2), tensorflow-macos (2.6.0), python (3.9.7)

Machine Learning & AI General tensorflow-metal

2

3

2.1k

Nov ’21

Importance of Accuracy and Performance Benchmarking

Considering the beta/in-development status of tensorflow-metal and current issues (i.e. see "Tensorflow-metal selected issues" below as noticed by this author in LSTM models alone), and that given it is a closed-source library, it is critical to have open and detailed benchmarks on the quality, accuracy, and performance of tensorflow-metal. If the M1-type processors are to be trusted and heavily utilized by data scientists and ML engineers, we need a commitment to excellence. Can apple create a Python pip-based package that can be used to test such benchmarks between tensorflow-metal releases? Some useful benchmarking https://github.com/tensorflow/benchmarks https://github.com/tensorflow/models/tree/master/official https://github.com/cgnorthcutt/benchmarking-keras-pytorch https://github.com/tlkh/tf-metal-experiments Some of these approaches, notably the Tensorflow official models may require further work on making tensorflow-addons and tensorflow-text available as binary M1 ARM64 packages. The latter is especially hard to compile on M1 Macs per existing Github issues. Tensorflow-metal selected issues https://developer.apple.com/forums/thread/695150 https://developer.apple.com/forums/thread/695216 https://developer.apple.com/forums/thread/695134

Machine Learning & AI General tensorflow-metal

1

0

654

Nov ’21

Inference: Evaluating Trained/Saved LSTM Model from another system gives wrong results on M1 GPU

Summary:. Deep learning on large datasets often requires training on hundreds of thousands or more than millions of records. This often is best done using cloud hardware (i.e. AWS Sagemaker or AWS EC2 instances with high CPU and often state-of-the-art GPU cards [i.e. Tesla V100/A100, etc.], given much faster time to results. One of the known workflow techniques is to save your TensorFlow model (i.e. using Keras h5 format or Tensorflow SavedModel), and then evaluate it (i.e. inference) on the test data set, or new data set on normal hardware. Given just such a Keras h5 model, I tried to evaluate locally on a M1-based Mac both GRU and LSTM models that were pre-trained in AWS Sagemaker. While the GRU-based models evaluated with some minor variation on M1 CPU vs. GPU and checked on other systems, there was radical accuracy errors when evaluating the LSTM-based models on M1 GPU specifically. This gives me low confidence in using tensorflow-metal with LSTM models for inference. Steps to Reproduce: git clone https://github.com/radagast-the-brown/tf2-inference-m1-issue.git Ensure you have Jupyter Lab or notebook installed, and all dependencies listed in top section of "results-summary.ipynb", i.e. pip install jupyterlab notebook sklearn keras numpy pandas matplotlib scikitplot Start jupyter in Terminal jupyter notebook Browse to "results-summary.ipynb", open, and execute. Note, you can install/uninstall tensorflow-metal locally to toggle on M1 GPU vs M1 CPU only. You'll need to restart the notebook kernel between runs. Look at "results-summary-m1-cpu.pdf" and "results-summary-m1-gpu.pdf" for details. System Details: Model: Macbook Pro (16-inch, 2021) Chip: Apple M1 Max Memory: 64GB OS: MacOS 12.0.1 Key Libraries: tensforflow-metal (0.2), tensorflow-macos (2.6.0), python (3.9.7)

Machine Learning & AI General tensorflow-metal

0

627

Nov ’21

Clarification: Multi-GPU Support

The official Apple documentation (https://developer.apple.com/metal/tensorflow-plugin/) for tensorflow-metal (0.2) indicates that "Multi-GPU support" is not currently supported. What does this mean? I could imagine that this means one of the following: Option 1: Your M1 Mac has so many GPU cores (i.e. 32 cores in the case of M1-Max). Only 1 core is usable by tensorflow-metal currently. Option 2: All GPU cores are used, but show up as a single core. Consequently, you can't and don't need to use Distributed Training (https://www.tensorflow.org/guide/distributed_training), one strategy to use multiple GPUs in TensorFlow (https://www.tensorflow.org/guide/gpu) Can you clarify? Thanks.

Machine Learning & AI General tensorflow-metal

0

509

Nov ’21

radagast

Post

Replies

Boosts

Views

Activity