Seems like a nasty bug to me. Or a huge step backwards in user experience. Apple please fix this!
This issue was apparently raised in the beta version, but I don't run beta software, and still ran into this in the released version of Xcode in the course of a normal OS upgrade (13.5 → 13.5.2?)
Post
Replies
Boosts
Views
Activity
Just FYI, 3 months ago I saw a similar leakage-until-mem-full problem on GPU. I found this forum and a now 4 month old report of what seemed an identical problem by user @wangcheng — The new tensorflow-macos and tensorflow-metal incapacitate training. I've been able to limp along by switching to CPU-only prediction since then.
I assume that TensorFlow on Metal on Apple Silicon is not a huge priority for Apple. Still, clearly some skilled engineering effort went into the current version. Which led some of us to buy M1 machines hoping to run our TensorFlow code on it. I write just in hopes this bug can be addressed “soonish.”
Rereading @wangcheng’s report from two months ago, it says “If I train the model on CPU, it can finish training, and surprisingly near 20% faster each epoch.” I tried this in my code. I saw what I assume is more typical, on GPU my epochs were taking about 10 minutes (before it quit with the IOGPUResource), whereas on CPU it seems to be taking about 45 minutes per epoch.
I had come to his forum to report my problem, but reading recent posts, this one seems nearly identical.
A few days ago I followed the instructions at https://developer.apple.com/metal/tensorflow-plugin/ and was able to run the simple (MNIST) proof-of-life suggested here https://caffeinedev.medium.com/how-to-install-tensorflow-on-m1-mac-8e9b91d93706
Then I tried to run my own Jupyter notebook I was using on Colab. It ran for 16 minutes then hung, the GPU usage dropped to zero, nothing further happened. This sounds very close to what @wangcheng reported last month.
I am definitely not an expert, but: I saw this yesterday. It seemed to be encouraging me to retry, so I did. Same result. I tried several more times with increasing retry delays (an hour later, two hours later...). This morning I tried again, just for the heck of it, and it worked fine.
I suggest simply retrying the conda install -c apple tensorflow-deps
This is so bizarre. The “traditional” bug reporting paths (e.g. https://bugreport.apple.com ) tell you to use the iOS Feedback Assistant app and why it is so much more convenient and preferred (“Automatic on-device diagnostics! Remote filing! More detailed forms! More feedback statuses!” Oh boy!)
But that app is not installed on my iPhone, and when I tried to install it from App Store, it is not even listed.
Make up your mind Apple, do you want us to use the Feedback Assistant app, or is this just your idea of a little joke?