Yeah. It worked by changing the setup.py file. Thanks a lot again.
Post
Replies
Boosts
Views
Activity
Edit: some respite with Python: 3.8.13 tensorflow-macos: 2.8.0 tensorflow-metal: 0.4.0
Getting the exact same messages with Python: 3.8.13 tensorflow-macos: 2.8.0 tensorflow-metal: 0.4.0. Moreover, there is a "tangible time gap between the epochs".
Looks like there is some scheduling issue! Mine is stoped somewhere in the middle of Epoch two. And I did not use a very large dataset. Does anyone know how to upload snapshots here?
I think there problem might be linked with memory leakage issue (https://developer.apple.com/forums/thread/711753). Btw, when is tensorflow-metal==0.5.1 coming? Thanks!
TF-METAL==0.4.0 so far serves the purpose for me, along with TF-MACOS==2.8.0 and python==3.8.13. But, I am desperately looking to jump over to TF-METAL=0.5.X or higher with TF-MACOS==2.9.x for python 3.9.x for faster performance with the GPU (we paid for). Otherwise, my 2017 MBP-13 (intel i5 16GB RAM) does a decent job for smaller dataset.
I wish I could share a screenshot to showcase my observation in a better way!
any update on this?
Any update/comment on this?
for the memory leakage issue, please search with
"Huge memory leakage issue with tf.keras.models.predict()"
I agree with you. Otherwise, what is point of having such a "extraordinary GPU" that can beat RTX 3090?
I am stuck since last few weeks due to memory leakage issue (related to GPU) and GPUs are dead slow. Not only that, when the memory leakage reaches ~125GB out of 128GB in my Mac Studio, the training simply stops!!! I am utterly frustrated and disgusted!!! I should have gone with INTEL machine instead with a decent GPU rather than paying hefty price for this "hyped GPU" and TF-METAL. :-(
my 64c GPU is sitting IDLE too! what METAL is it? :-(
Yes you are right. I have just tested and it shows TF2.10/METAL-0.6 shows the same LEAKY behaviour with "GPU".
when I started this thread almost 3months ago, I thought they would address the issue (was apparent based on their enthusiastic comments by dev-engineer). Now it looks like, either they do not have engineering resources to address the issue or they quickly realised managing TENSORFLOW is not their CUP of TEA (getting to the level of Google TF Engineers is a mammoth task). Grossly disappointed for spending ~$8K on a M1-Ultra Machine (probably hype does not work all the time) for TF HW.
Update from me!
I am fed up with TF-MACOS/METAL and have migrated to PyTorch 1.13 (also tried 1.14dev version) in Python 3.9/3.10 env. At least I could see my training is going on with MUCH MUCH MUCH MUCH LESS memory usage while using GPU (60-75% usage depending on the data) in my M1 ULTRA machine with 64c GPU. I will soon try on Python 3.11 (PyTorch is yet to support it) and update you all.
Thanks,
Bapi