Update to this: I was doing this testing in MacOS 11.0. After updating to 11.4, GPU tests were working! Much faster than the CPU too (for my application). I'm not really sure which update in specific fixed this issue, maybe someone could let me know. Also, the tensorflow-metal page says that the required OS is MacOS 12.0, which is only available as a beta. Not really sure why this is the case.