JulienVincenot’s Profile | Apple Developer Forums

Reply to Is it possible to use HuggingFace via TF-macOS and TF-Metal?

Hello, is there any news on that front? I'm a total newb with TS so I have zero sense of what is going on, but I consistently have this error "Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support." either with this test (first reply here) or "TensorFlow 2 quickstart for beginners" Strangly the training does seem to run : simple tests actually go through epochs pretty fast (I guess) and my AMD usage goes around 30-50% My specs are : Intel Macbook Pro with Monterrey and AMD Radeon Pro 5500M 8 Go Python 3.8.10 Here's an example of the simple test output : (tensorflow-metal-test) jv@192 tensorflow-metal-test % python /Users/jv/tensorflow-exp/test.py 2021-11-22 23:50:48.066315: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Metal device set to: AMD Radeon Pro 5500M systemMemory: 32.00 GB maxCacheSize: 3.99 GB 2021-11-22 23:50:48.067311: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-11-22 23:50:48.067826: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) 2021-11-22 23:50:48.505048: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-11-22 23:50:48.505092: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) 2021-11-22 23:50:48.712043: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-11-22 23:50:48.734335: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-11-22 23:50:48.827487: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-11-22 23:50:48.858801: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-11-22 23:50:49.081885: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-11-22 23:50:49.113821: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-11-22 23:50:49.169179: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-11-22 23:50:49.208235: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-11-22 23:50:49.243817: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) Train on 469 steps, validate on 79 steps 2021-11-22 23:50:49.282608: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. Epoch 1/12 2021-11-22 23:50:49.309804: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1564 - accuracy: 0.9539/Users/julienvincenot/tensorflow-metal-test/lib/python3.8/site-packages/keras/engine/training.py:2470: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically. warnings.warn('`Model.state_updates` will be removed in a future version. ' 2021-11-22 23:51:01.268461: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 469/469 [==============================] - 14s 21ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1564 - accuracy: 0.9539 - val_loss: 0.0707 - val_accuracy: 0.9782 Epoch 2/12 469/469 [==============================] - 12s 19ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0453 - accuracy: 0.9857 - val_loss: 0.0487 - val_accuracy: 0.9848 Epoch 3/12 469/469 [==============================] - 12s 19ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0284 - accuracy: 0.9912 - val_loss: 0.0378 - val_accuracy: 0.9878 Epoch 4/12 469/469 [==============================] - 12s 19ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0191 - accuracy: 0.9939 - val_loss: 0.0346 - val_accuracy: 0.9886 Epoch 5/12 469/469 [==============================] - 12s 19ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0135 - accuracy: 0.9958 - val_loss: 0.0400 - val_accuracy: 0.9892 Epoch 6/12 469/469 [==============================] - 12s 19ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0099 - accuracy: 0.9968 - val_loss: 0.0332 - val_accuracy: 0.9902 Epoch 7/12 469/469 [==============================] - 12s 19ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0069 - accuracy: 0.9978 - val_loss: 0.0376 - val_accuracy: 0.9894 Epoch 8/12 469/469 [==============================] - 12s 19ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0078 - accuracy: 0.9973 - val_loss: 0.0389 - val_accuracy: 0.9889 Epoch 9/12 469/469 [==============================] - 12s 19ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0059 - accuracy: 0.9980 - val_loss: 0.0448 - val_accuracy: 0.9887 Epoch 10/12 469/469 [==============================] - 12s 19ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0047 - accuracy: 0.9985 - val_loss: 0.0434 - val_accuracy: 0.9902 Epoch 11/12 469/469 [==============================] - 12s 19ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0053 - accuracy: 0.9984 - val_loss: 0.0486 - val_accuracy: 0.9873 Epoch 12/12 469/469 [==============================] - 12s 19ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0047 - accuracy: 0.9984 - val_loss: 0.0383 - val_accuracy: 0.9896

Machine Learning & AI General

Nov ’21

JulienVincenot

Post

Replies

Boosts

Views

Activity