Post

Replies

Boosts

Views

Activity

Reply to tensorflow-macos slow (Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.)
The script is in my GitHub: my script code Just follow the steps to reproduce [1] with GPU (slow) and [2] with CPU (fast) How to reproduce: [1]: SLOW (with tensorflow-metal installed) == GPU conda create --name lab_slow python=3.9 conda activate lab_slow conda install -c apple tensorflow-deps python -m pip install tensorflow-macos python -m pip install tensorflow-metal python -m pip install gym python -m pip install icecream With [1] I see the GPU burning but the results are very slow and some Information are showed: (lab_slow) fernando@minidefernando restml-muzero % python mymuzero.py             Init Plugin Init Graph Optimizer Init Kernel Metal device set to: Apple M1 2021-07-23 17:10:41.158886: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-07-23 17:10:41.159108: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) 2021-07-23 17:10:41.301535: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) 2021-07-23 17:10:41.303570: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 2021-07-23 17:10:41.304081: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. ic| elapsed: -8.216638209 ic| elapsed: -11.702662209000001 ic| elapsed: -7.210992999999998 ic| elapsed: -7.76780325 ic| elapsed: -6.552902541999998 ic| elapsed: -7.051604083000001 ic| elapsed: -7.802484458000002 The time of each iteration is ~7 seconds using GPU. [2]: FAST (without tensorflow-metal installed) == CPU conda create --name lab_fast python=3.9 conda activate lab_fast conda install -c apple tensorflow-deps python -m pip install tensorflow-macos python -m pip install gym python -m pip install icecream With [2] the CPU is used and the speed is good. I got this: (lab_fast) fernando@minidefernando restml-muzero % python mymuzero.py             2021-07-23 17:12:54.201964: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) 2021-07-23 17:12:54.204263: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz ic| elapsed: -0.6991699170000001 ic| elapsed: -1.001101625 ic| elapsed: -0.745720833 ic| elapsed: -0.5131082500000002 ic| elapsed: -0.680853667 ic| elapsed: -0.5841365830000003 ic| elapsed: -0.6107562499999997 ic| elapsed: -0.5290834160000006 ic| elapsed: -1.3289379590000001 ic| elapsed: -0.508716166000001 ic| elapsed: -1.1658726250000004 ic| elapsed: -0.5299397080000006 ic| elapsed: -0.5714273750000007 ic| elapsed: -0.6107442499999998 The time of each iteration is ~0.5 seconds using 1 CPU.
Jul ’21