karbapi’s Profile | Apple Developer Forums

Comment on stellargraph compatibility issue with TensorFlow-metal

Yeah. It worked by changing the setup.py file. Thanks a lot again.

Machine Learning & AI General

Mar ’22

Comment on GPU training deadlock with tensorflow-metal 0.5

Edit: some respite with Python: 3.8.13 tensorflow-macos: 2.8.0 tensorflow-metal: 0.4.0

Machine Learning & AI General

Jul ’22

Comment on Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

Getting the exact same messages with Python: 3.8.13 tensorflow-macos: 2.8.0 tensorflow-metal: 0.4.0. Moreover, there is a "tangible time gap between the epochs".

Machine Learning & AI General

Jul ’22

Comment on GPU training deadlock with tensorflow-metal 0.5

Looks like there is some scheduling issue! Mine is stoped somewhere in the middle of Epoch two. And I did not use a very large dataset. Does anyone know how to upload snapshots here?

Machine Learning & AI General

Aug ’22

Comment on GPU training deadlock with tensorflow-metal 0.5

I think there problem might be linked with memory leakage issue (https://developer.apple.com/forums/thread/711753). Btw, when is tensorflow-metal==0.5.1 coming? Thanks!

Machine Learning & AI General

Aug ’22

Comment on GPU training deadlock with tensorflow-metal 0.5

TF-METAL==0.4.0 so far serves the purpose for me, along with TF-MACOS==2.8.0 and python==3.8.13. But, I am desperately looking to jump over to TF-METAL=0.5.X or higher with TF-MACOS==2.9.x for python 3.9.x for faster performance with the GPU (we paid for). Otherwise, my 2017 MBP-13 (intel i5 16GB RAM) does a decent job for smaller dataset.

Machine Learning & AI General

Aug ’22

Comment on GPU training deadlock with tensorflow-metal 0.5

I wish I could share a screenshot to showcase my observation in a better way!

Machine Learning & AI General

Aug ’22

Comment on Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

any update on this?

Machine Learning & AI General

Aug ’22

Comment on Huge memory leakage issue with tf.keras.models.predict()

Any update/comment on this?

Machine Learning & AI General

Aug ’22

Comment on GPU training deadlock with tensorflow-metal 0.5

for the memory leakage issue, please search with "Huge memory leakage issue with tf.keras.models.predict()"

Machine Learning & AI General

Sep ’22

Comment on M1 GPU is extremely slow, how can I enable CPU to train my NNs?

I agree with you. Otherwise, what is point of having such a "extraordinary GPU" that can beat RTX 3090? I am stuck since last few weeks due to memory leakage issue (related to GPU) and GPUs are dead slow. Not only that, when the memory leakage reaches ~125GB out of 128GB in my Mac Studio, the training simply stops!!! I am utterly frustrated and disgusted!!! I should have gone with INTEL machine instead with a decent GPU rather than paying hefty price for this "hyped GPU" and TF-METAL. :-(

Machine Learning & AI General

Sep ’22

Comment on Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

my 64c GPU is sitting IDLE too! what METAL is it? :-(

Machine Learning & AI General

Sep ’22

Comment on Huge memory leakage issue with tf.keras.models.predict()

Yes you are right. I have just tested and it shows TF2.10/METAL-0.6 shows the same LEAKY behaviour with "GPU".

Machine Learning & AI General

Sep ’22

Comment on Huge memory leakage issue with tf.keras.models.predict()

when I started this thread almost 3months ago, I thought they would address the issue (was apparent based on their enthusiastic comments by dev-engineer). Now it looks like, either they do not have engineering resources to address the issue or they quickly realised managing TENSORFLOW is not their CUP of TEA (getting to the level of Google TF Engineers is a mammoth task). Grossly disappointed for spending ~$8K on a M1-Ultra Machine (probably hype does not work all the time) for TF HW.

Machine Learning & AI General

Nov ’22

Comment on Huge memory leakage issue with tf.keras.models.predict()

Update from me! I am fed up with TF-MACOS/METAL and have migrated to PyTorch 1.13 (also tried 1.14dev version) in Python 3.9/3.10 env. At least I could see my training is going on with MUCH MUCH MUCH MUCH LESS memory usage while using GPU (60-75% usage depending on the data) in my M1 ULTRA machine with 64c GPU. I will soon try on Python 3.11 (PyTorch is yet to support it) and update you all. Thanks, Bapi

Machine Learning & AI General

Dec ’22

karbapi

Post

Replies

Boosts

Views

Activity