Regarding the deadlock, it seems I found a way around accidentally. You have to include a line which in fact say you want to use the GPU especially if like me you do it cell by cell . Exemple below:
with tf.device('/gpu:0'):
<write your model here>
then here you do other things in your notebook like batch and such... Then you train your model
with tf.device('/gpu:0'):
hist_1 = model_1.fit
Somehow, this stopped my deadlock. In addition (and I don't know if it is related but just in case), I stopped to use Safari for my Jupyter Notebook and went on chrome instead (not for this reason but mainly because safari kept reloading my "heavy" notebook...)
Hope this help
cheers
Post
Replies
Boosts
Views
Activity
Hello, I don't know if the same reason but I tried to fine tune a BERT model and at at some point, I also have a deadlock after some time (need to kill the kernel and start over). The dead lock will happen depending on the quantity of data I used to fine tuned. In the cas below the training will stop in the middle of the 3rd epoch
my machine:
MacOS 12.5
Mac Book Pro Apple M1 Max
I use :
python 3.10.5
tensorflow-macos 2.9.2
tensorflow-metal 0.5.0
tokenizers 0.12.1.dev0
transformers 4.22.0.dev0
data : https://www.kaggle.com/datasets/kazanova/sentiment140
Quantity of tweets used: 11200
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-uncased",
num_labels=2)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=tf.metrics.SparseCategoricalAccuracy(),
)
model.fit(tf_train_dataset,
validation_data=tf_validation_dataset,
epochs=4,
)