I'd like to control whether the network training happens on CPU or GPU when using tensorflow-metal.
How to do this?
Thanks!
Post
Replies
Boosts
Views
Activity
When I train a model (private, for work) using Apple Tensorflow, I get an error like this:
The Metal Performance Shaders operations encoded on it may not have completed.
Error:
(null)
Internal Error (0000000e:Internal Error)
<AGXG13XFamilyCommandBuffer: 0x355c49fc0>
label = <none>
device = <AGXG13XDevice: 0x10d981400>
name = Apple M1 Pro
commandQueue = <AGXG13XFamilyCommandQueue: 0x11dedb600>
label = <none>
device = <AGXG13XDevice: 0x10d981400>
name = Apple M1 Pro
retainedReferences = 1
When I run the same script on a server with a Geforce GPU, then it works fine.
It happens already during the first epoch. I also see that the memory leaks as it starts with 3 GB and reaches 20 GB within this epoch.
Does anyone know how to deal with this problem? Thank you!