Hello, I don't know if the same reason but I tried to fine tune a BERT model and at at some point, I also have a deadlock after some time (need to kill the kernel and start over). The dead lock will happen depending on the quantity of data I used to fine tuned. In the cas below the training will stop in the middle of the 3rd epoch
my machine:
MacOS 12.5
Mac Book Pro Apple M1 Max
I use :
python 3.10.5
tensorflow-macos 2.9.2
tensorflow-metal 0.5.0
tokenizers 0.12.1.dev0
transformers 4.22.0.dev0
data : https://www.kaggle.com/datasets/kazanova/sentiment140
Quantity of tweets used: 11200
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-uncased",
num_labels=2)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=tf.metrics.SparseCategoricalAccuracy(),
)
model.fit(tf_train_dataset,
validation_data=tf_validation_dataset,
epochs=4,
)
Post
Replies
Boosts
Views
Activity
Regarding the deadlock, it seems I found a way around accidentally. You have to include a line which in fact say you want to use the GPU especially if like me you do it cell by cell . Exemple below:
with tf.device('/gpu:0'):
<write your model here>
then here you do other things in your notebook like batch and such... Then you train your model
with tf.device('/gpu:0'):
hist_1 = model_1.fit
Somehow, this stopped my deadlock. In addition (and I don't know if it is related but just in case), I stopped to use Safari for my Jupyter Notebook and went on chrome instead (not for this reason but mainly because safari kept reloading my "heavy" notebook...)
Hope this help
cheers