I have similar problem with my 14'' Macbook Pro. My office VPN server setup doesn't have IPSec so I have the following in my /etc/ppp/options
plugin L2TP.ppp
l2tpnoipsec
I can connect to the server, and ping internal hosts, but ssh or curl would timeout. The problem still exists on 12.1.
I have filed a bug but no feedback yet (FB9810608).
I have found a workaround for me, according to https://discussionschinese.apple.com/thread/253302896?answerId=256624128322#256624128322
Turn off your WiFi on you MacBook, connect to your iPhone over physical cable and turn on personal hotspot on your iPhone.
Post
Replies
Boosts
Views
Activity
I ran my notebook again, observed that the training progress paused once the count reached 200000.
So maybe the tensorflow-metal is indeed leaking kernel resources which prevents training to continue at some point.
No, that script was for reproducing the groups arg issue only.
Here is the script to reproduce this problem. You need to obtain the data from kaggle first. https://www.kaggle.com/competitions/tabular-playground-series-may-2022/data
On my macbook pro, after running the script for around 10 mins, I started to see the leaking IOGPUResource messages from Console.app.
Hardware info: 14' M1 Pro, 32G, 10 core CPU, 16 core GPU
Software info: macOS 12.4, python 3.9.7, tensorflow-macos 2.9.2, tensorflow-metal 0.5.0
If I train the model on CPU, it can finish training, and surprisingly near 20% faster each epoch.
python script
Hi apple engineer, I think I've found out the issue:tensorflow-metal is leaking GPU resource.
Run that script for 20s and I can see a bunch of IOReturn IOGPUDevice::new_resource(IOGPUNewResourceArgs *, struct IOGPUNewResourceReturnData *, IOByteCount, uint32_t *): PID 3963 likely leaking IOGPUResource (count=101000) messages in Console.app.
See also https://developer.apple.com/forums/thread/706920
Run your code again, and open Console.app, start streaming messages with Errors and Faults filters.
If you see a lot of
IOReturn IOGPUDevice::new_resource(IOGPUNewResourceArgs *, struct IOGPUNewResourceReturnData *, IOByteCount, uint32_t *): PID ???? likely leaking IOGPUResource (count=????)
after some time, we are experiencing the same issue.
In my case, when the count above reaches 200000, the training will stop making progress.
See https://developer.apple.com/forums/thread/706920
Maybe your macOS version is not compatible? You need at least macOS 12.0+. https://developer.apple.com/metal/tensorflow-plugin/
I'm using 12.4 and no issue importing tensorflow.