M1 GPU is extremely slow, how can I enable CPU to train my NNs?

Hi everyone,

I found that the performance of GPU is not good as I expected (as slow as a turtle), I wanna switch from GPU to CPU. but mlcompute module cannot be found, so wired.

The same code ran on colab and my computer (jupyter lab) take 156s vs 40 minutes per epoch, respectively.

I only used a small dataset (a few thousands of data points), and each epoch only have 20 baches.

I am so disappointing and it seems like the "powerful" GPU is a joke.

I am using 12.0.1 macOS and the version of tensorflow-macos is 2.6.0

Can anyone tell me why this happens?

It seems like the small batch size will reduce the performance of GPU https://developer.apple.com/forums/thread/685623 so I increased the batch size from 256 to 1024, which reduced the running time from 40 minutes to 10 minutes per epoch. However, again, one epoch only takes around 2 minutes with CPU.

I am so confused now, it seems like I need to increase the batch size from 1024 to 1024 * 5 so that the running time will be reduced to 2 minutes per epoch.....

Update: I found M1 chip is extremely slow on LSTM compared with CNN.

Update: I ran exactly the same LSTM code on Macbook Pro M1 Pro and Macbook Pro 2017, It turns out M1 Pro costs 6 hrs for one epoch, and 2017 model only needs 158s.

I use pip uninstall tensorflow-metal and I get CPU acceleration again!

An alternative to uninstalling tensorflow-metal is to disable GPU usage. This is a copy-paste from my other post...

To disable the GPU completely on the M1 use tf.config.experimental.set_visible_devices([], 'GPU'). To disable the GPU for certain operations, use:

with tf.device('/cpu:0'):
    # tf calls here

what is the point of having a "GPU"? My Mac Studio M1 Ultra GPU (20c CPU, 64c GPU) is dead slow while training, slower than even my MBP13-2017 for the same code, same data points!!! What is going on? Please see the History:

using cpu isn't a solution. It is just a workaround.

LSTM takes 3 hours per epoch on gpu and 3 minutes on cpu.

I am too much frustated

Got M2 Max here (2023), I tried to run inference (one by one, no batch) using huggingface "distilbert-base-cased" (after fine-tuning with my dataset). It runs 10it/s in the beginning, but after a few min, GPU utilization drops to less than 1%, and now it took >1s per it! that's huge disappointment. I don't know what I have done wrong. I tried to turn on an external fan thinking it may be heat throttling, but I don't see utilization going back up.

How can I debug this?

I don't think this is the right question - the integrated GPU will be useless for ML work as its not optmized for it. We need to use the Apple Neural Engine - its 16 cores optimized for ML tasks.

M1 GPU is extremely slow, how can I enable CPU to train my NNs?
 
 
Q