(Accidentally wrote an answer when I meant to write a comment and now I can't delete this — terrible forum UX)
Post
Replies
Boosts
Views
Activity
I do not experience this issue with simple models (single input/output ~100k params, some LSTM and dense layers), but am seeing it for a larger and more complex model (~3M params, multi input/output, LSTM + self-attention + cnn layers). For this larger model, training hangs at a very high loss and even pre-trained models that I know perform well will evaluate at extremely high loss and the model.predict() function on reasonable data will return almost random vectors. I've run the exact same code with the same data on a Vertex AI VM with a Tesla GPU and there I see the results I would expect (decreasing loss, decent evaluation loss for pre-trained models, reasonable inference outputs).