Hi, I am getting this error with test script from the tensorflow plugin metal page. I have a power mac M3 on OS 14.4 (latest at this time.) Unfortunately, I created another thread https://developer.apple.com/forums/thread/748413. Should I close that one?
Tensorflow metal was working GREAT on my Power Mac Mac M3 until Tuesday. Then my code started freezing. I ran the test script from https://developer.apple.com/metal/tensorflow-plugin/ and it now crashes - this used to work fine, but all of a sudden it does not. The results are shown below.
Was there ever any answers on the previous posts? Could this be a hardware problem?
The test script is just this:
import tensorflow as tf
cifar = tf.keras.datasets.cifar100
(x_train, y_train), (x_test, y_test) = cifar.load_data()
model = tf.keras.applications.ResNet50(
include_top=True,
weights=None,
input_shape=(32, 32, 3),
classes=100,)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5, batch_size=64)
The errors I get are like the following:
Epoch 1/5
1/782 [..............................] - ETA: 51:53 - loss: 6.0044 - accuracy: 0.0312Error: command buffer exited with error status.
The Metal Performance Shaders operations encoded on it may not have completed.
Error:
(null)
Ignored (for causing prior/excessive GPU errors) (00000004:kIOGPUCommandBufferCallbackErrorSubmissionsIgnored)
<AGXG15XFamilyCommandBuffer: 0x1172515e0>
label = <none>
device = <AGXG15SDevice: 0x1588e6000>
name = Apple M3 Pro
commandQueue = <AGXG15XFamilyCommandQueue: 0x17427e400>
label = <none>
device = <AGXG15SDevice: 0x1588e6000>
name = Apple M3 Pro
retainedReferences = 1
Error: command buffer exited with error status.
The Metal Performance Shaders operations encoded on it may not have completed.
Error:
(null)
Ignored (for causing prior/excessive GPU errors) (00000004:kIOGPUCommandBufferCallbackErrorSubmissionsIgnored)
<AGXG15XFamilyCommandBuffer: 0x117257b40>
label = <none>
device = <AGXG15SDevice: 0x1588e6000>
name = Apple M3 Pro
commandQueue = <AGXG15XFamilyCommandQueue: 0x17427e400>
label = <none>
device = <AGXG15SDevice: 0x1588e6000>
name = Apple M3 Pro
retainedReferences = 1
Post
Replies
Boosts
Views
Activity
I have fixed this with two changes:
python 3.8, rather than 3.9 (specificaly 3.8.18 which is latest at this time)
pandas 1.5.3 rather than 2.x
As a result of this I'm on the following tensorflow package versions:
tensorboard==2.13.0
tensorboard-data-server==0.7.2
tensorflow==2.13.0
tensorflow-datasets==4.9.2
tensorflow-estimator==2.13.0
tensorflow-macos==2.13.0
tensorflow-metadata==1.14.0
tensorflow-metal==1.0.1
With these everything works. I still have no idea why python 3.9 stopped working after working fine for months, but I wasn't particularly attached to it.