I think you will find that the alpha3 v Tensorflow-metal difference is CPU versus GPU differences. For small models CPU is far faster and the Tensorflow_macos alpha3 seemed to use CPU for these. If you run the same model with the latests Tensorflow-macos it is still faster without GPU. However, once the models become large (both image size and batch size impact here) the GPU can become much faster. The new M1 Max chips does really well when you are looking at anything above small images and tiny batch sizes.