ALecoq’s Profile | Apple Developer Forums

Reply to GPU training deadlock with tensorflow-metal 0.5

Regarding the deadlock, it seems I found a way around accidentally. You have to include a line which in fact say you want to use the GPU especially if like me you do it cell by cell . Exemple below: with tf.device('/gpu:0'): <write your model here> then here you do other things in your notebook like batch and such... Then you train your model with tf.device('/gpu:0'): hist_1 = model_1.fit Somehow, this stopped my deadlock. In addition (and I don't know if it is related but just in case), I stopped to use Safari for my Jupyter Notebook and went on chrome instead (not for this reason but mainly because safari kept reloading my "heavy" notebook...) Hope this help cheers

Machine Learning & AI General

Sep ’22

Reply to GPU training deadlock with tensorflow-metal 0.5

Hello, I don't know if the same reason but I tried to fine tune a BERT model and at at some point, I also have a deadlock after some time (need to kill the kernel and start over). The dead lock will happen depending on the quantity of data I used to fine tuned. In the cas below the training will stop in the middle of the 3rd epoch my machine: MacOS 12.5 Mac Book Pro Apple M1 Max I use : python 3.10.5 tensorflow-macos 2.9.2 tensorflow-metal 0.5.0 tokenizers 0.12.1.dev0 transformers 4.22.0.dev0 data : https://www.kaggle.com/datasets/kazanova/sentiment140 Quantity of tweets used: 11200 tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2) model.compile( optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5), loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=tf.metrics.SparseCategoricalAccuracy(), ) model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=4, )

Machine Learning & AI General

Aug ’22

ALecoq

Post

Replies

Boosts

Views

Activity