Tensorflow Model Stops Training when Process reaches 47GB (e.g. ru_maxrss)

Question

dbl001 OP

Created Jul ’21

Replies 1

Boosts 0

Participants 2

Details here:

https://stackoverflow.com/questions/68551935/why-does-my-tensorflow-model-stop-training

I can run the VariationalDeepSemantic Hashing model in an Anaconda python 3.85 virtual environment with tensorflow 2.5 on CPUs.

If I run the same code in the tensorflow-metal virtual environment with python 3.82, accessing my AMD Radeon Pro 5700 XT GPU, the process stops training at epoch 5 on the 5200 batch.

Boost

Answer 1

dbl001 OP

Aug ’21

Fixed with tensorflow-metal version 0.1.2

0