Importance of Accuracy and Performance Benchmarking

Considering the beta/in-development status of tensorflow-metal and current issues (i.e. see "Tensorflow-metal selected issues" below as noticed by this author in LSTM models alone), and that given it is a closed-source library, it is critical to have open and detailed benchmarks on the quality, accuracy, and performance of tensorflow-metal.

If the M1-type processors are to be trusted and heavily utilized by data scientists and ML engineers, we need a commitment to excellence.

Can apple create a Python pip-based package that can be used to test such benchmarks between tensorflow-metal releases?

Some useful benchmarking

Some of these approaches, notably the Tensorflow official models may require further work on making tensorflow-addons and tensorflow-text available as binary M1 ARM64 packages. The latter is especially hard to compile on M1 Macs per existing Github issues.

Tensorflow-metal selected issues

Hi @radagast. There are other reports for wrong results from metal. TensorFlow model predictions are incorrect on M1 GPU has confirmations from 3 people as problematic (including myself). Thank-you for providing the Apple engineers with a reproducible juypter notebook. Given the significance of such a flaw in the metal library, I am hoping that the engineers will be able to respond soon. The reports I cited above were from 3 weeks ago.

Importance of Accuracy and Performance Benchmarking
 
 
Q