Experimenting with the Tensorflow text_classification example from (https://www.tensorflow.org/tutorials/keras/text_classification) I am constantly getting the following error when increasing the batch size to 512:
Epoch 2/10
5/40 [==>...........................] - ETA: 5s - loss: 0.6887 - binary_accuracy: 0.7086
Error: command buffer exited with error status.
The Metal Performance Shaders operations encoded on it may not have completed.
Error:
(null)
Internal Error (0000000e:Internal Error)
<AGXG13XFamilyCommandBuffer: 0x2e1897c10>
label = <none>
device = <AGXG13XDevice: 0x119460c00>
name = Apple M1 Max
commandQueue = <AGXG13XFamilyCommandQueue: 0x11946e400>
label = <none>
device = <AGXG13XDevice: 0x119460c00>
name = Apple M1 Max
retainedReferences = 1
With other experiments (which are working on other GPUs/Systems) I am getting the same error.
How is it to be interpreted? Are there workarounds?
Setup:
Tensorflow 2.6.0 (installed as described here)
Apple M1 Max, 64 GB
Monterey 12.0.1