I'm experiencing the same thing with tensorflow-metal. I get a significant speed up when training a model with vs without, however the loss steadily increases after a few epochs. In addition to that, my console is spammed with the following message:
metal_plugin/src/kernels/stateless_random_op.cc:282] Note the GPU implementation does not produce the same series as CPU implementation.
I've done quite a bit of searching trying to take advantage of the M1 Pro with tensorflow and it seems to be nothing but problems.