Hi, I have started experimenting with using my MBP with M1 Pro (10CPU cores / 16 GPU cores) for Tensorflow.
Two things were odd/noteworthy:
I've compared training models in a tensorflow environment with tensorflow-metal, running the code with either
with tf.device('gpu:0'): or
with tf.device('cpu:0'):
as well as in an environment without the tensorflow-metal plugin. Specifiying the device as CPU in tf-metal almost always leads to a lot longer training times compared to specifying using the GPU, but also compared to running the standard (non-metal environment). Also, the GPU was running at quite high power despite of telling TF to use the CPU. Is this an intended or expected behaviour? As it will be preferable to use the non-metal environment when not benefitting from a GPU.
Secondly, at small batch sizes, the GPU power in system stats increases with the batch size, as expected. However, when chaning the batch size from 9 to 10 (this appears like a hard step specifically at this number), GPU power drops by about half, and training time doubles. Increasing batch size from about 10 leads again to a gradual increase in GPU power, on my model the same GPU power as batchsize=9 is reached only at about batchsize=50. Making GPU acceleration using batch-sizes from 10 to about 50 rather useless. I've noticed this behavior on several models, which makes me wonder that this is a general tf-metal behaviour. As a result, I've only been able to benefit from GPU acceleration at a batchsize of 9 and above 100. Once again, is this intended or to be expected?