Thanks for looking into this. Please note I am not using the latest tensorflow-macos and tensorflow-metal. I used versions informed by others that work.
I never try TF on a Mac till now, so I don't have any comparison with prior version to know if this is a regression.
This behaviour suggests a workaround where I just have to ensure every batch has the same max_len and padding, which may preclude certain memory and performance saving technique. This is not repro on Google Colab (cuda with T4)