I'm having the same issue, anybody in a year has ever found a solution? I'm really "sad" (otherwise the post will be "reviewed") that 3k+ pc is not able to do such basic things that nvidia and others are able to do without any sort of issue, and the only fix is to avoid using GPU with a old tf-metal, which was the whole point of buying the Pro instead of the Air!
Post
Replies
Boosts
Views
Activity
I'm having the same issue, anybody in a year has ever found a solution? I'm really ****** that 3k+ pc is not able to do such basic things that nvidia and others are able to do without any sort of issue, and the only fix is to avoid using GPU with a old tf-metal, which was the whole point of buying the Pro instead of the Air!
Hi there
I got a much simpler snipped that causes the same error, without external datasets:
import tensorflow as tf
import tensorflow.keras as K
import numpy as np
num_words = 10000
(X_train, y_train), (X_test, y_test) = K.datasets.imdb.load_data(num_words=num_words)
(X_valid, X_test) = X_test[:12500], X_test[12500:]
(y_valid, y_test) = y_test[:12500], y_test[12500:]
maxlen = 500
X_train_trim = K.preprocessing.sequence.pad_sequences(X_train, maxlen=maxlen)
X_test_trim = K.preprocessing.sequence.pad_sequences(X_test, maxlen=maxlen)
X_valid_trim = K.preprocessing.sequence.pad_sequences(X_valid, maxlen=maxlen)
model_K = K.models.Sequential([
K.layers.Embedding(input_dim=num_words, output_dim=10),
K.layers.SimpleRNN(32),
K.layers.Dense(1, "sigmoid")
])
model_K.compile(loss='binary_crossentropy', optimizer="adam", metrics=["accuracy"])
with tf.device("/device:CPU:0"):
history_K = model_K.fit(X_train_trim, y_train, epochs=10, batch_size=128, validation_data=(X_valid_trim, y_valid))
In addition to this, there is also the fact that SimpleRNN does not work on M1 GPU what so ever (thus the tf.device), as reported here: https://github.com/tensorflow/tensorflow/issues/56082 (on the other hand, LSTM works fine)
However, I think this might be due to the Graph creation, as a simple reimplementation of SimpleRNN have the same issue (however, this does not really hold, otherwise LSTM would have the same issue)
Sure, let me know if you need more info about this
Hi there, so the problem is very sporadic, and is happening during the training of a heavy TF model, and it's not so "deterministic", however I can provide you a link to a ZIP file with jupyter notebook and dataset
However if you want, the images come from the facades dataset, so maybe I can just share you the code, the dataset is downloadable from here https://www.kaggle.com/datasets/balraj98/facades-dataset, and you need to place it in the directory of the notebook, so something like this:
...
├── notebook.ipynb
└── dataset
├── trainA
├── trainB
├── testA
└── testB
the whole code can be downloaded from here: https://drive.google.com/file/d/1Clqf1uSzMIntA551dp8B1Z-hZFPAa8VL/view?usp=sharing
It requires basic packages, and the likelihood to see that error message is directly proportional to be batchsize (so I suspect it has something to do with the memory)
My pc is a 2021 16" MacBook Pro M1 MAX 26 core GPU 32Gb RAM with 2Tb SSD running MacOS 12.4 (21F79)