Inference: Evaluating Trained/Sav… | Apple Developer Forums

Inference: Evaluating Trained/Saved LSTM Model from another system gives wrong results on M1 GPU

Summary:. Deep learning on large datasets often requires training on hundreds of thousands or more than millions of records. This often is best done using cloud hardware (i.e. AWS Sagemaker or AWS EC2 instances with high CPU and often state-of-the-art GPU cards [i.e. Tesla V100/A100, etc.], given much faster time to results. One of the known workflow techniques is to save your TensorFlow model (i.e. using Keras h5 format or Tensorflow SavedModel), and then evaluate it (i.e. inference) on the test data set, or new data set on normal hardware.

Given just such a Keras h5 model, I tried to evaluate locally on a M1-based Mac both GRU and LSTM models that were pre-trained in AWS Sagemaker. While the GRU-based models evaluated with some minor variation on M1 CPU vs. GPU and checked on other systems, there was radical accuracy errors when evaluating the LSTM-based models on M1 GPU specifically. This gives me low confidence in using tensorflow-metal with LSTM models for inference.

Steps to Reproduce:

git clone https://github.com/radagast-the-brown/tf2-inference-m1-issue.git
Ensure you have Jupyter Lab or notebook installed, and all dependencies listed in top section of "results-summary.ipynb", i.e.

pip install jupyterlab notebook sklearn keras numpy pandas matplotlib scikitplot

Start jupyter in Terminal

jupyter notebook

Browse to "results-summary.ipynb", open, and execute. Note, you can install/uninstall tensorflow-metal locally to toggle on M1 GPU vs M1 CPU only. You'll need to restart the notebook kernel between runs.
Look at "results-summary-m1-cpu.pdf" and "results-summary-m1-gpu.pdf" for details.

System Details:

Model: Macbook Pro (16-inch, 2021)
Chip: Apple M1 Max
Memory: 64GB OS: MacOS 12.0.1
Key Libraries: tensforflow-metal (0.2), tensorflow-macos (2.6.0), python (3.9.7)