I see this behaviour exactly. With smaller and less complex models, everything is fine, but with a larger and more complex architecture, the model does not train and model.predict() returns random noise. I have tried with the same model and same data on a Nvidia GPU and it trains fine. Even a model that I have trained on a Nvidia GPU which has great performance returns noise when run on tensorflow-metal. Switching to CPU on my macbook is very slow, but the results are consistent with the Nvidia GPU (i.e. not just noise) and the model trains as expected.
I think this is very clearly a low-level bug in tensorflow-metal with a slightly non-vanilla network operation. My guess would be custom layers (I'm using a custom time series Attention layer in the model that fails to train on tensorflow-metal