Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

Question

suprateembanerjee OP

Created Oct ’21

Replies 44

Boosts 8

Views 34k

Participants 59

Device: MacBook Pro 16 M1 Max, 64GB running MacOS 12.0.1.

I tried setting up GPU Accelerated TensorFlow on my Mac using the following steps:

Setup: XCode CLI / Homebrew/ Miniforge
Conda Env: Python 3.9.5
conda install -c apple tensorflow-deps
python -m pip install tensorflow-macos
python -m pip install tensorflow-metal
brew install libjpeg
conda install -y matplotlib jupyterlab
In Jupyter Lab, I try to execute this code:

from tensorflow.keras import layers
from tensorflow.keras import models
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
model.summary()

The code executes, but I get this warning, indicating no GPU Acceleration can be used as it defaults to a 0MB GPU. Error:

Metal device set to: Apple M1 Max
2021-10-27 08:23:32.872480: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-27 08:23:32.872707: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)

Anyone has any idea how to fix this? I came across a bunch of posts around here related to the same issue but with no solid fix. I created a new question as I found the other questions less descriptive of the issue, and wanted to comprehensively depict it. Any fix would be of much help.

Boost

Answer 1

almlaa OP

Apr ’22

I have the same issue with m1, my nn code was working boyfriend's MACBOOK PRO with m1 max.(but not faster than a huawei matebook ) I AM SOO ANGRY MY PROJECT DEVELOPMENT STOPPED.

1

Answer 2

clecarosc OP

Apr ’22

Having the same issue here, on macbook pro m1 16gb & monterey

0

Answer 3

IbrahimSalameh OP

Apr ’22

It was working with me on Miniforg3 env, after moving to the other environment (Anaconda3) to do some work of torch and back again to my arm64 environment it shows the same message and didn't work, I made the below steps and it back to work again.

I uninstalled the tensorflow from all environments.
Uninstalled tensorflow-metal and tensorflow-deps
Restart the device.
Reinstalled the packages again.

Thanks.

0

Answer 4

arthur-fontaine OP

May ’22

Changing from v0.4.0 to v.0.1.1 worked for me

pip uninstall tensorflow-me
pip install tensorflow-metal==0.1.1

0

Answer 5

ZviStein OP

May ’22

I tried : pip install tensorflow-metal==0.1.1 but it cause different problem... :(

Init Plugin

2022-05-15 14:00:41.859860: F tensorflow/c/c_api_experimental.cc:739] Non-OK-status: tensorflow::RegisterPluggableDevicePlugin(lib_handle->lib_handle) status: FAILED_PRECONDITION: 'host_callback' field in SP_StreamExecutor must be set.

zsh: abort python

0

Answer 6

a0dh OP

May ’22

Try adding the following:

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

1

Answer 7

Rayrayray OP

Jun ’22

Same error here when I do scheduling learning rate, jupyterlab dies

0

Answer 8

MikeBee OP

Jun ’22

Changing the version of tensorflow-metal didn't help. It made my jupyter kernel restart. This is why I use pytorch. It works.

0

Answer 9

LuizHDRamos OP

Jun ’22

I'm having the same issue, and, sometimes my training just suddenly stops in a random epoch. So now, in order to work, I have to use the parameter with tf.device('cpu:0'), to force CPU use instead of GPU. The worse part is that apple seems like it don't care about fixing that stuffs.

0

Answer 10

MikeBee OP

Jul ’22

Apple Silicon is UMA, Unified Memory Architecture. This annoying message is just telling you that you are running on Apple Silicon. So ignore it.

5

Answer 11

MikeBee OP

Jul ’22

Why can't this be marked as solved so the answer can appear at the top? The M1 is a UMA device -- not a NUMA. A central point of this SOC is its unified memory. This message just says that Tensorflow recognizes that it's not a non-unified memory architecture system. Then it chooses the appropriate UMA algorithms. How does this forum help if answers are buried in noise? Stack Overflow does a better job.

2

Answer 12

karbapi OP

Jul ’22

FOR M1 ULTRA (128GB RAM, 20c CPU, 64c GPU) on MacOS 12.5, getting the following message:

Metal device set to: Apple M1 Ultra systemMemory: 128.00 GB maxCacheSize: 48.00 GB

2022-07-22 16:44:43.488061: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

2022-07-22 16:44:43.488273: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: )

My question is: is Why is this error coming at all? Why NUMA? Moreover, GPU has 0MB memory? How is this possible?

Python: 3.9.13 tensorflow-macos: 2.9.2 tensorflow-metal: 0.5.0

Please help. Thanks, Bapi

1

Answer 13

karbapi OP

Aug ’22

I intended to speedup the training process. now what is this (got during training with workers=8, use_multiprocessing=True)? STRANGE!!!! Never got it with my MBP-13 (2017, i5 core, 16GB RAM) with the same code.

Traceback (most recent call last): File "", line 1, in File "/Users/bapikar/miniforge3/envs/tf28_python38/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/Users/bapikar/miniforge3/envs/tf28_python38/lib/python3.8/multiprocessing/spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) File "/Users/bapikar/miniforge3/envs/tf28_python38/lib/python3.8/multiprocessing/synchronize.py", line 110, in setstate self._semlock = _multiprocessing.SemLock._rebuild(*state) FileNotFoundError: [Errno 2] No such file or directory

0

Answer 14

te_neon OP

Sep ’22

I bought a new M1 Macbook pro M1 and I was hoping i could use it for machine learning on GPU (with metal). But i it doesn't work on GPU at all!! It gets stuck on model.fit with the famous error message "NUMA node of platform GPU ID 0". And if i switch to CPU, then it does work actually, but the point of (deep) machine learning is to utilize the GPU, to make training faster, right?! Apple, please help us, fix this issue, take it seriously.

3

Answer 15

wsimpson2019 OP

Oct ’22

Was having the same issue. script would crash after message, bus error. I'm using Mac Pro (Late 2013) AMD FirePro D500 3 GB

What fixed it for me was, in my tensorflow-metal virtual env, I changed my version $pip install =Iv tensorflow-metal==0.60

Then inside my script set the following, after importing os os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

After this I ran the script and it worked and in Activity Monitor I could see it uses the GPU

0