Post

Replies

Boosts

Views

Activity

Reply to scaaml - very slow processing after some time - tensorflow metal
Hi Team, Any updates ? Because I have to progress my research I'm now using alternative platforms (i..e I've bought a new laptop with a CUDA / nVidia GPU). However, it would be good to solve this issue of not being able to use technically higher specification machine without having to boot it into Ubuntu on dual-boot (which leads to its own problems), I should be able to run tensorflow optimized for GPUs on my mac and its native OS and I'm sure many other ML/AI people do as well. Are we going to see any progress on this issue ? Kind Regards, Alze.
Jul ’22
Reply to scaaml - very slow processing after some time - tensorflow metal
Hi, tensorflow-macos 2.9.2 tensorflow-metal 0.5.0 macOS Montery 12.4 (patched and upto date) Machine : iMac Retina 5K, 27 Inch, 2020, 3.8GHz 8-Core Intel Core i7, 128Gb 2667 Mhz DDR4, Graphics AMD Radeon Pro 5500 XT 8GB Command to run (as per documentation) python3 train.py -c config/stm32f415_tinyaes.json When running on GPU the slow down occurs exactly the same epoch (19), as a test I disabled the GPU in a duplicate script and whilst taking considerably longer, passed epoch 19, as you can see on GPU enable epoch 19 the time has gone upto 122:06:17 Commend to run (for CPU only, slight modification to script included) python3 train_cpu.py -c config/stm32f415_tinyaes.json Script modification to disable GPU (I have left in the last line and first line of the original script so the placement can be identified, else its identical. from scaaml.utils import tf_cap_memory try: # Disable all GPUS tf.config.set_visible_devices([], 'GPU') visible_devices = tf.config.get_visible_devices() for device in visible_devices: assert device.device_type != 'GPU' except: # Invalid device or cannot modify virtual devices once initialized. pass def train_model(config): CPU ONLY 2048/2048 [==============================] - 5014s 2s/step - loss: 1.3966 - acc: 0.4811 - val_loss: 1.5574 - val_acc: 0.4297 Epoch 25/30 1502/2048 [=====================>........] - ETA: 22:02 - loss: 1.3701 - acc: 0.4919 GPU ENABLED 2022-07-05 14:43:20.822168: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 46). These functions will not be directly callable after loading. 2048/2048 [==============================] - 516s 252ms/step - loss: 1.9292 - acc: 0.3521 - val_loss: 1.9108 - val_acc: 0.3503 Epoch 18/30 2048/2048 [==============================] - ETA: 0s - loss: 1.8986 - acc: 0.35982022-07-05 14:52:39.447402: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2022-07-05 14:52:39.450685: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2048/2048 [==============================] - 546s 267ms/step - loss: 1.8986 - acc: 0.3598 - val_loss: 2.0514 - val_acc: 0.3303 Epoch 19/30 741/2048 [=========>....................] - ETA: 122:06:17 - loss: 1.8543 - acc: 0.3750/Users/alan/.pyenv/versions/3.9.5/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker I have run the code on an external system with GPUs based on linux and it runs without problem. This is blocking my research project (MSc) and whilst I can still use the CPU mode, the idea is to compare/baseline against various platforms and functionalities (whilst also using my own traces), so relevant to be able to use all the features available of the host system (GPUs in this case). Hope this helps and you can offer a solution. Regards, alz0r
Jul ’22