karbapi’s Profile | Apple Developer Forums

Reply to stellargraph compatibility issue with TensorFlow-metal

Anyone? Please help! Thanks.

Machine Learning & AI General

Mar ’22

Reply to stellargraph compatibility issue with TensorFlow-metal

Thank you very much. I will update asap!

Machine Learning & AI General

Mar ’22

Reply to Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

FOR M1 ULTRA (128GB RAM, 20c CPU, 64c GPU) on MacOS 12.5, getting the following message: Metal device set to: Apple M1 Ultra systemMemory: 128.00 GB maxCacheSize: 48.00 GB 2022-07-22 16:44:43.488061: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-07-22 16:44:43.488273: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) My question is: is Why is this error coming at all? Why NUMA? Moreover, GPU has 0MB memory? How is this possible? Python: 3.9.13 tensorflow-macos: 2.9.2 tensorflow-metal: 0.5.0 Please help. Thanks, Bapi

Machine Learning & AI General

Jul ’22

Reply to GPU training deadlock with tensorflow-metal 0.5

Same with me. (Python: 3.9.13 tensorflow-macos: 2.9.2 tensorflow-metal: 0.5.0)

Machine Learning & AI General

Jul ’22

Reply to M1 GPU is extremely slow, how can I enable CPU to train my NNs?

what is the point of having a "GPU"? My Mac Studio M1 Ultra GPU (20c CPU, 64c GPU) is dead slow while training, slower than even my MBP13-2017 for the same code, same data points!!! What is going on? Please see the History:

Machine Learning & AI General

Jul ’22

Reply to Huge memory leakage issue with tf.keras.models.predict()

Moreover, in predict function, when I turned on multi_processing as below, it merely turns on 4cores (seen on activity monitor- CPU HISTORY). Rest of the cores are dead! predict(data, max_queue_size=10, workers=8, use_multiprocessing=True )

Machine Learning & AI General

Aug ’22

Reply to Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

I intended to speedup the training process. now what is this (got during training with workers=8, use_multiprocessing=True)? STRANGE!!!! Never got it with my MBP-13 (2017, i5 core, 16GB RAM) with the same code. Traceback (most recent call last): File "", line 1, in File "/Users/bapikar/miniforge3/envs/tf28_python38/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/Users/bapikar/miniforge3/envs/tf28_python38/lib/python3.8/multiprocessing/spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) File "/Users/bapikar/miniforge3/envs/tf28_python38/lib/python3.8/multiprocessing/synchronize.py", line 110, in setstate self._semlock = _multiprocessing.SemLock._rebuild(*state) FileNotFoundError: [Errno 2] No such file or directory

Machine Learning & AI General

Aug ’22

Reply to Huge memory leakage issue with tf.keras.models.predict()

Hi, Please see the output of memory_profiler below (Example: first instance of call to predict() with 2.3GB to nth instance with 30.6GB). As mentioned in my previous comment, it was going up to ~80GB and counting up. Sorry I could not share the code. But I can tell you that it is pretty straightforward to recreate. Any help would be appreciated. Thanks! ############################################### #First Instance of leakage (@predict() highlighted below as bold): Line # Mem usage Increment Occurrences Line Contents ============================================================= 29 2337.5 MiB 2337.5 MiB 1 @profile 30 def predict_hpwl(graph, graph_label, model): 31 2337.5 MiB 0.0 MiB 1 lindex = range(len([graph_label])) 32 2337.6 MiB 0.0 MiB 2 gdata = DataGenerator("Prediction",graphs=[graph], 33 2337.5 MiB 0.0 MiB 1 labels=[graph_label], 34 2337.5 MiB 0.0 MiB 1 indices=lindex, 35 2337.5 MiB 0.0 MiB 1 shuffle=True, 36 2337.5 MiB 0.0 MiB 1 cache_size=10, 37 2337.5 MiB 0.0 MiB 1 debug=False, 38 2337.5 MiB 0.0 MiB 1 isSparse=True) 39 40 ## Test the GNN 41 2487.5 MiB 149.9 MiB 2 hpwl = GNN.predict(gdata, 42 2337.6 MiB 0.0 MiB 1 max_queue_size=10, 43 2337.6 MiB 0.0 MiB 1 workers=8, 44 2337.6 MiB 0.0 MiB 1 use_multiprocessing=True 45 ) 46 47 48 2486.5 MiB -1.0 MiB 1 keras.backend.clear_session() 49 50 51 2486.5 MiB 0.0 MiB 1 return hpwl ############################################### #n'th Instance of leakage (@predict() highlighted below as bold): Line # Mem usage Increment Occurrences Line Contents ============================================================= 29 30661.9 MiB 30661.9 MiB 1 @profile 30 def predict_hpwl(graph, graph_label, model): 31 30661.9 MiB 0.0 MiB 1 lindex = range(len([graph_label])) 32 30661.9 MiB 0.0 MiB 2 gdata = DataGenerator("Prediction",graphs=[graph], 33 30661.9 MiB 0.0 MiB 1 labels=[graph_label], 34 30661.9 MiB 0.0 MiB 1 indices=lindex, 35 30661.9 MiB 0.0 MiB 1 shuffle=True, 36 30661.9 MiB 0.0 MiB 1 cache_size=10, 37 30661.9 MiB 0.0 MiB 1 debug=False, 38 30661.9 MiB 0.0 MiB 1 isSparse=True) 39 40 ## Test the GNN 41 30720.0 MiB 58.1 MiB 2 hpwl = GNN.predict(gdata, 42 30661.9 MiB 0.0 MiB 1 max_queue_size=10, 43 30661.9 MiB 0.0 MiB 1 workers=8, 44 30661.9 MiB 0.0 MiB 1 use_multiprocessing=True 45 ) 46 47 48 30720.0 MiB -0.0 MiB 1 keras.backend.clear_session() 49 50 51 30720.0 MiB 0.0 MiB 1 return hpwl

Machine Learning & AI General

Aug ’22

Reply to Huge memory leakage issue with tf.keras.models.predict()

Thanks for the response. I would await your next fixes/updates. Just thought to share with you that the above results are based on TF-MACOS (2.8.0) and TF-METAL(0.4.0) with python=3.8.13 in my CURRENT ENV. Although my BASE ENV is TF-MACOS (2.9.2) and TF-METAL(0.5.0) with python=3.9.13 does exhibit the same behaviour, I faced some other issues as well (beyond the scope of this thread). That's why I am using the above ENV. Finally, I would like to ask: why does the latest TF-MACOS have version number as 2.9.2, while TensorFlow.org shows the latest TF version is 2.9.1 (ref: https://www.tensorflow.org/api_docs/python/tf)? Thanks, Bapi

Machine Learning & AI General

Aug ’22

Reply to Huge memory leakage issue with tf.keras.models.predict()

A quick update: when I use with CPU only as: **tf.device('/CPU'): predict_hpwl()** the memory leak is insignificant (1-10MB max initially, then <=1MB). Please see a single output instance of memory_profiler for predict_hpwl() as below: Line # Mem usage Increment Occurrences Line Contents ============================================================= 31 2448.5 MiB 2448.5 MiB 1 @profile 32 def predict_hpwl(graph, graph_label, model): 33 2448.5 MiB 0.0 MiB 1 lindex = range(len([graph_label])) 34 2448.5 MiB 0.0 MiB 2 gdata = DataGenerator("Prediction",graphs=[graph], 35 2448.5 MiB 0.0 MiB 1 labels=[graph_label], 36 2448.5 MiB 0.0 MiB 1 indices=lindex, 37 2448.5 MiB 0.0 MiB 1 shuffle=True, 38 2448.5 MiB 0.0 MiB 1 cache_size=10, 39 2448.5 MiB 0.0 MiB 1 debug=False, 40 2448.5 MiB 0.0 MiB 1 isSparse=True) 41 42 ## Test the GNN 43 2449.4 MiB 0.9 MiB 2 hpwl = model.predict(gdata, 44 2448.5 MiB 0.0 MiB 1 max_queue_size=10, 45 2448.5 MiB 0.0 MiB 1 workers=8, 46 2448.5 MiB 0.0 MiB 1 use_multiprocessing=True 47 ) 48 49 2449.3 MiB -0.0 MiB 1 tf.keras.backend.clear_session() 50 51 2449.3 MiB 0.0 MiB 1 return hpwl when I use GPU only as: tf.device('/GPU'): predict_hpwl() I see a similar (large) memory leakage as reported earlier. Apparently, the GPU is causing the memory leak issue! Hope it will help you in providing the fix. Note: my env is still python 3.8.13 with TF-macos ==2.8.0 and TF-metal==0.4.0 Thanks, Bapi

Machine Learning & AI General

Aug ’22

Reply to Huge memory leakage issue with tf.keras.models.predict()

Hi Thanks for your update. I don't see any improvement w.r.t tensorflow-metal==0.5.1 (along with both using tensorflow-macos==2.9.2 along with python 3.9.13 and tensorflow-macos==2.8.0 along python 3.8.13). In fact, I see quite a similar output from memory_profiler as below: Line # Mem usage Increment Occurrences Line Contents ============================================================= 31 3534.6 MiB 3534.6 MiB 1 @profile 32 def predict_hpwl(graph, graph_label, model): 33 3534.6 MiB 0.0 MiB 1 lindex = range(len([graph_label])) 34 3534.6 MiB 0.0 MiB 2 gdata = DataGenerator("Prediction",graphs=[graph], 35 3534.6 MiB 0.0 MiB 1 labels=[graph_label], 36 3534.6 MiB 0.0 MiB 1 indices=lindex, 37 3534.6 MiB 0.0 MiB 1 shuffle=True, 38 3534.6 MiB 0.0 MiB 1 cache_size=10, 39 3534.6 MiB 0.0 MiB 1 debug=False, 40 3534.6 MiB 0.0 MiB 1 isSparse=True) 41 42 ## Test the GNN 43 3594.9 MiB 60.3 MiB 2 hpwl = model.predict(gdata, 44 3534.6 MiB 0.0 MiB 1 max_queue_size=10, 45 3534.6 MiB 0.0 MiB 1 workers=8, 46 3534.6 MiB 0.0 MiB 1 use_multiprocessing=True 47 ) 48 49 3594.9 MiB -0.0 MiB 1 tf.keras.backend.clear_session() 50 51 3594.9 MiB 0.0 MiB 1 return hpwl

Machine Learning & AI General

Aug ’22

Reply to GPU training deadlock with tensorflow-metal 0.5

Hi, I did not see any improvement with TF-MACOS=2.9.2 and TF-METAL=0.5.1 with python 3.9.13. please see my latest (relevant) response in thread https://developer.apple.com/forums/thread/711753 This is why I am sticking to my old setup of TF-MACOS==2.8.0 and TF-METAL=0.4.0 along with python 3.8.13. And I am using CPUONLY option, which gives relatively less memory leakage. Yet I wish to wait till end when all the epochs (merely 3) end. Thanks, Bapi

Machine Learning & AI General

Aug ’22

Reply to GPU training deadlock with tensorflow-metal 0.5

Hi, I wish to share a strange thing I noticed apart from this issue (and the memory leak issue for GPU): **There is a huge gap between two epochs. Typically an epoch is taking around 4-5mins, but this inter-epoch gap spans around 6-7mins. This is possibly due to M1 Ultra process scheduler is under-optimised. ** Hope this pointer helps in your fix and yield better resolution in the subsequent TF-METAL releases. Thanks, Bapi

Machine Learning & AI General

Aug ’22

Reply to GPU training deadlock with tensorflow-metal 0.5

I am really disappointed with my Mac Studio (M1 Ultra, 64c GPU, 128GB RAM). Now I am thinking why did I spend that huge money on this ****** machine! Now I am getting error due to multiprocessing and the training is stopped! 2022-08-26 07:49:49.615373: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-08-26 07:49:49.615621: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Process Keras_worker_SpawnPoolWorker-92576: Traceback (most recent call last): File "/Users/bapikar/miniforge3/envs/tf28_python38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/bapikar/miniforge3/envs/tf28_python38/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/bapikar/miniforge3/envs/tf28_python38/lib/python3.8/multiprocessing/pool.py", line 109, in worker initializer(*initargs) File "/Users/bapikar/miniforge3/envs/tf28_python38/lib/python3.8/site-packages/keras/utils/data_utils.py", line 812, in init_pool_generator id_queue.put(worker_proc.ident, block=True, timeout=0.1) File "/Users/bapikar/miniforge3/envs/tf28_python38/lib/python3.8/multiprocessing/queues.py", line 84, in put raise Full queue.Full

Machine Learning & AI General

Aug ’22

Reply to GPU training deadlock with tensorflow-metal 0.5

Hi, Thanks for sharing the info. However, my issue is little different (please see the thread on memory leakage https://developer.apple.com/forums/thread/711753). My training is stopped apparently due to memory leakage and one potential (guessed by me) reason of CPU/GPU scheduling issue when the memory usage is too high (say ~125GB out of 128GB RAM in my system, with no SWAP being used for whatsoever reason) in my M1 ULTRA machine with 64c GPU (Mac Studio). And FYI, my training setup use: with tf.GradientTape() as tape: ....... And I do not use Jupyter notebook. My work setup is run on command line, while my code (structured over multiple files) is written with a text editor such as GVIM. --Bapi

Machine Learning & AI General

Sep ’22

karbapi

Post

Replies

Boosts

Views

Activity