Looks like the version of Python matters.
My environment: MacBook Pro 14-inch, 2021, M1 Pro, 16 GB
Using this code example
I've created two different virtual environments:
Python 3.8.19
Python 3.11.9
Results
Python 3.8 (CPU)
Epoch 1/5
782/782 [==============================] - 403s 513ms/step - loss: 4.8157 - accuracy: 0.0648
Python 3.8 (GPU)
Epoch 1/5
2024-07-22 21:35:48.809586: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
782/782 [==============================] - 64s 77ms/step - loss: 4.9219 - accuracy: 0.0574
Python 3.11 (CPU)
Epoch 1/5
782/782 ━━━━━━━━━━━━━━━━━━━━ 435s 544ms/step - accuracy: 0.0480 - loss: 5.0793
Python 3.11 (GPU)
Epoch 1/5
2024-07-22 21:48:42.497240: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.
782/782 ━━━━━━━━━━━━━━━━━━━━ 412s 472ms/step - accuracy: 0.0487 - loss: 5.1804
I did not include the results for Python versions between 3.8 and 3.11, but the behavior is the same: slow. It looks like tensorflow-metal utilizes the Apple Silicon GPU well only in Python 3.8 🤷♂️