Post

Replies

Boosts

Views

Activity

Reply to GPU training deadlock with tensorflow-metal 0.5
I'm having the same issue, anybody in a year has ever found a solution? I'm really "sad" (otherwise the post will be "reviewed") that 3k+ pc is not able to do such basic things that nvidia and others are able to do without any sort of issue, and the only fix is to avoid using GPU with a old tf-metal, which was the whole point of buying the Pro instead of the Air!
Jan ’23
Reply to Tensorflow metal: The Metal Performance Shaders operations encoded on it may not have completed.
Hi there I got a much simpler snipped that causes the same error, without external datasets: import tensorflow as tf import tensorflow.keras as K import numpy as np num_words = 10000 (X_train, y_train), (X_test, y_test) = K.datasets.imdb.load_data(num_words=num_words) (X_valid, X_test) = X_test[:12500], X_test[12500:] (y_valid, y_test) = y_test[:12500], y_test[12500:] maxlen = 500 X_train_trim = K.preprocessing.sequence.pad_sequences(X_train, maxlen=maxlen) X_test_trim = K.preprocessing.sequence.pad_sequences(X_test, maxlen=maxlen) X_valid_trim = K.preprocessing.sequence.pad_sequences(X_valid, maxlen=maxlen) model_K = K.models.Sequential([ K.layers.Embedding(input_dim=num_words, output_dim=10), K.layers.SimpleRNN(32), K.layers.Dense(1, "sigmoid") ]) model_K.compile(loss='binary_crossentropy', optimizer="adam", metrics=["accuracy"]) with tf.device("/device:CPU:0"): history_K = model_K.fit(X_train_trim, y_train, epochs=10, batch_size=128, validation_data=(X_valid_trim, y_valid)) In addition to this, there is also the fact that SimpleRNN does not work on M1 GPU what so ever (thus the tf.device), as reported here: https://github.com/tensorflow/tensorflow/issues/56082 (on the other hand, LSTM works fine) However, I think this might be due to the Graph creation, as a simple reimplementation of SimpleRNN have the same issue (however, this does not really hold, otherwise LSTM would have the same issue)
Aug ’22
Reply to Tensorflow metal: The Metal Performance Shaders operations encoded on it may not have completed.
Hi there, so the problem is very sporadic, and is happening during the training of a heavy TF model, and it's not so "deterministic", however I can provide you a link to a ZIP file with jupyter notebook and dataset However if you want, the images come from the facades dataset, so maybe I can just share you the code, the dataset is downloadable from here https://www.kaggle.com/datasets/balraj98/facades-dataset, and you need to place it in the directory of the notebook, so something like this: ... ├── notebook.ipynb └── dataset ├── trainA ├── trainB ├── testA └── testB the whole code can be downloaded from here: https://drive.google.com/file/d/1Clqf1uSzMIntA551dp8B1Z-hZFPAa8VL/view?usp=sharing It requires basic packages, and the likelihood to see that error message is directly proportional to be batchsize (so I suspect it has something to do with the memory) My pc is a 2021 16" MacBook Pro M1 MAX 26 core GPU 32Gb RAM with 2Tb SSD running MacOS 12.4 (21F79)
Aug ’22