Hi, I'm running scaaml which starts running fine, after several iterations its slows right down.
76: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2022-07-04 06:25:08.268023: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2048/2048 [==============================] - 512s 250ms/step - loss: 1.8051 - acc: 0.3809 - val_loss: 1.9365 - val_acc: 0.3350 Epoch 19/30 536/2048 [======>.......................] - ETA: 44:10:15 - loss: 1.7715 - acc: 0.3911
Previous flows were processed in a reasonable amount of time
173: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2022-07-04 06:16:20.906834: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 46). These functions will not be directly callable after loading. 2048/2048 [==============================] - 538s 263ms/step - loss: 1.8303 - acc: 0.3744 - val_loss: 1.8793 - val_acc: 0.3452 Epoch 18/30 2048/2048 [==============================] - ETA: 0s - loss: 1.8051 - acc: 0.38092022-07-04 06:25:08.264476: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2022-07-04 06:25:08.268023: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2048/2048 [==============================] - 512s 250ms/step - loss: 1.8051 - acc: 0.3809 - val_loss: 1.
I'm running the code elsewhere and it runs just fine.
I could run other GPU tasks and these picked up the GPU no problem, its as if running after an extended period of time, the resources/application stopped - but kept running very slowly.