LSTM recurrent_dropout causes Jupyter restart (keras, tensorflow)

Question

Created Jan ’22

Replies 3

Boosts 0

Views 1.7k

Participants 2

This code causes a kernel restart. But, it runs fine if I take out the "recurrent_dropout" parameter in the LSTM layer or set it to zero.

inputs = keras.Input(shape=(sequence_length, raw_data.shape[-1]))
x = layers.LSTM(32, recurrent_dropout=0.25)(inputs)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1)(x)
model = keras.Model(inputs, outputs)

callbacks = [  keras.callbacks.ModelCheckpoint("jena_lstm_dropout.keras",
                                 save_best_only=True)
]
model.compile(optimizer="rmsprop", loss="mse", metrics=["mae"])
history = model.fit(train_dataset,
                    epochs=50,
                    validation_data=val_dataset,
                    callbacks=callbacks)

Code is straight from the book Deep Learning with Python and works in Google Colab.

Using MacOS 12.1, tensorflow-macos, metal plugin, Jupyterlab 3.2.8.

Jupyter server logs show:

2022-01-28 12:08:38.703448: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
[I 2022-01-28 12:08:41.871 ServerApp] AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports
[W 2022-01-28 12:08:41.872 ServerApp] kernel 7e7dc757-87dc-426a-8769-2e152e81d7b4 restarted
[W 2022-01-28 12:08:41.872 ServerApp] kernel 7e7dc757-87dc-426a-8769-2e152e81d7b4 restarted
[W 2022-01-28 12:08:41.873 ServerApp] kernel 7e7dc757-87dc-426a-8769-2e152e81d7b4 restarted

Boost

Answer 1

brettcoryell OP

Jan ’22

(Edit: Apparently you can't do code blocks in comments. Sorry). Moving on to the next section of the book, we have the same issue with GRU layers. Recurrent_dropout causes a forced kernel restart.

The error given is: WARNING:tensorflow:Layer gru will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.

Clearly, the fallback is not graceful.

inputs = keras.Input(shape=(sequence_length, raw_data.shape[-1])) x = layers.GRU(32, recurrent_dropout=0.5, return_sequences=True)(inputs) x = layers.GRU(32, recurrent_dropout=0.5)(x) x = layers.Dropout(0.5)(x) outputs = layers.Dense(1)(x) model = keras.Model(inputs, outputs)

0

Answer 2

Frameworks Engineer OP

Apple

Jan ’22

Hi @brettcoryell

Thanks for reporting this issue and the script for reproducing it! Indeed it sounds like something goes wrong when calling the fallback op. I tried reproducing this locally and when I'm not using jupyterlab to run the scripts, it works with both the recurrent_dropout and without it. So this would seem like something that would have to be debugged from JupyterLab side of things to understand what the root cause is.

In the meanwhile in order to be able to continue with the tasks in the book I'd recommend either to test setting the code inside a block with with tf.device('/device:CPU:0'): which should circumvent the problem if the issue is with the GPU fallback by running the block on the CPU instead or by running the code directly without using the JupyterLab interface until the issue is addressed.

0

Answer 3

brettcoryell OP

Jan ’22

Hi, and thanks for the response.

I can confirm that blocking with tf.device('/device:CPU:0'): does allow the code to run without fault though obviously with a performance penalty.

However, I did run the code (without the tf.device statement) from the command line and got the fatal error shown below.

To test further, I also set up an account on Paperspace. The code runs correctly (with recurrent_dropout and without tf.device) inside their Jupyter notebooks (which they call Gradient notebooks).

Summarizing:

The code works in Google Colab and Paperspace in their Jupyter-based notebooks.
It crashes on my M1 Mac both in Jupyter (IPython) and at the command line (straight Python 3.9.5)

Fatal error at the command line

WARNING:tensorflow:Layer lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
Epoch 1/50
2022-01-31 21:29:47.106964: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2022-01-31 21:29:47.365931: F tensorflow/core/framework/tensor.cc:681] Check failed: IsAligned() ptr = 0x17adba1f0
zsh: abort   python test.py

1