Hi, could you be more specific about the device are you using? Laptop (MBP/MBA) or Mac-Mini? And also detailed spec if possible?
Thanks
Post
Replies
Boosts
Views
Activity
Hi @johnny_A,
which python version did you use? I want to give it a try (after months of lost hope with TF).
thanks
Update from me!
I am fed up with TF-MACOS/METAL and have migrated to PyTorch 1.13 (also tried 1.14dev version) in Python 3.9/3.10 env. At least I could see my training is going on with MUCH MUCH MUCH MUCH LESS memory usage while using GPU (60-75% usage depending on the data) in my M1 ULTRA machine with 64c GPU. I will soon try on Python 3.11 (PyTorch is yet to support it) and update you all.
Thanks,
Bapi
when I started this thread almost 3months ago, I thought they would address the issue (was apparent based on their enthusiastic comments by dev-engineer). Now it looks like, either they do not have engineering resources to address the issue or they quickly realised managing TENSORFLOW is not their CUP of TEA (getting to the level of Google TF Engineers is a mammoth task). Grossly disappointed for spending ~$8K on a M1-Ultra Machine (probably hype does not work all the time) for TF HW.
Yes you are right. I have just tested and it shows TF2.10/METAL-0.6 shows the same LEAKY behaviour with "GPU".
my 64c GPU is sitting IDLE too! what METAL is it? :-(
I agree with you. Otherwise, what is point of having such a "extraordinary GPU" that can beat RTX 3090?
I am stuck since last few weeks due to memory leakage issue (related to GPU) and GPUs are dead slow. Not only that, when the memory leakage reaches ~125GB out of 128GB in my Mac Studio, the training simply stops!!! I am utterly frustrated and disgusted!!! I should have gone with INTEL machine instead with a decent GPU rather than paying hefty price for this "hyped GPU" and TF-METAL. :-(
for the memory leakage issue, please search with
"Huge memory leakage issue with tf.keras.models.predict()"
Any update/comment on this?
any update on this?
I wish I could share a screenshot to showcase my observation in a better way!
TF-METAL==0.4.0 so far serves the purpose for me, along with TF-MACOS==2.8.0 and python==3.8.13. But, I am desperately looking to jump over to TF-METAL=0.5.X or higher with TF-MACOS==2.9.x for python 3.9.x for faster performance with the GPU (we paid for). Otherwise, my 2017 MBP-13 (intel i5 16GB RAM) does a decent job for smaller dataset.
I think there problem might be linked with memory leakage issue (https://developer.apple.com/forums/thread/711753). Btw, when is tensorflow-metal==0.5.1 coming? Thanks!
Looks like there is some scheduling issue! Mine is stoped somewhere in the middle of Epoch two. And I did not use a very large dataset. Does anyone know how to upload snapshots here?
Getting the exact same messages with Python: 3.8.13 tensorflow-macos: 2.8.0 tensorflow-metal: 0.4.0. Moreover, there is a "tangible time gap between the epochs".