dmitry-kabanov’s Profile | Apple Developer Forums

Apple Tensorflow Internal Error (0000000e:Internal Error)

When I train a model (private, for work) using Apple Tensorflow, I get an error like this: The Metal Performance Shaders operations encoded on it may not have completed. Error: (null) Internal Error (0000000e:Internal Error) <AGXG13XFamilyCommandBuffer: 0x355c49fc0> label = <none> device = <AGXG13XDevice: 0x10d981400> name = Apple M1 Pro commandQueue = <AGXG13XFamilyCommandQueue: 0x11dedb600> label = <none> device = <AGXG13XDevice: 0x10d981400> name = Apple M1 Pro retainedReferences = 1 When I run the same script on a server with a Geforce GPU, then it works fine. It happens already during the first epoch. I also see that the memory leaks as it starts with 3 GB and reaches 20 GB within this epoch. Does anyone know how to deal with this problem? Thank you!

Machine Learning & AI General tensorflow-metal

1.1k

Sep ’22

How to choose whether I use M1 CPU or GPU in tensorflow-metal?

I'd like to control whether the network training happens on CPU or GPU when using tensorflow-metal. How to do this? Thanks!

Machine Learning & AI General tensorflow-metal

2.2k

Jun ’22

dmitry-kabanov

Post

Replies

Boosts

Views

Activity