I ran into the same issue. The training would stop at some random epochs with no error or warning when using tensorflow-metal 0.5.1.
The only way I could fix this was to reinstall my environment following Apple's instructions but now using this version of Miniforge3-MacOSX-arm64.sh from scratch and, this time, use tensorflow-metal 0.4.0.