Same for me. I used the code below with the next library versions:
tensorflow-macos 2.14.0 - tensorflow-metal 1.1.0. - python 3.10.12
import tensorflow as tf
import tensorflow_datasets as tfds
raw_train_set, raw_valid_set, raw_test_set = tfds.load(
name="imdb_reviews",
split=["train[:90%]", "train[90%:]", "test"],
as_supervised=True
)
tf.random.set_seed(42)
train_set = raw_train_set.shuffle(5000, seed=42).batch(32).prefetch(1)
valid_set = raw_valid_set.batch(32).prefetch(1)
test_set = raw_test_set.batch(32).prefetch(1)
vocab_size = 1000
text_vec_layer = tf.keras.layers.TextVectorization(max_tokens=vocab_size)
text_vec_layer.adapt(train_set.map(lambda reviews, labels: reviews))
embed_size = 128
tf.random.set_seed(42)
model = tf.keras.Sequential([
text_vec_layer,
tf.keras.layers.Embedding(vocab_size, embed_size, mask_zero=True),
tf.keras.layers.GRU(128),
tf.keras.layers.Dense(1, activation="sigmoid")
])
model.compile(loss="binary_crossentropy", optimizer="nadam",
metrics=["accuracy"])
history = model.fit(train_set, validation_data=valid_set, epochs=3)
--
Mac Mini M1 - Sonoma 14:
The most weird thing it is not only slow but it does not converge at all... val_accuracy after last epoch still ~0.49...
2023-10-06 12:01:37.596357: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1
2023-10-06 12:01:37.596384: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB
2023-10-06 12:01:37.596389: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB
2023-10-06 12:01:37.596423: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-10-06 12:01:37.596440: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: )
2023-10-06 12:01:37.930853: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.
Epoch 1/3
704/704 [==============================] - 434s 601ms/step - loss: 0.6935 - accuracy: 0.4989 - val_loss: 0.6931 - val_accuracy: 0.5020
Epoch 2/3
704/704 [==============================] - 290s 411ms/step - loss: 0.6933 - accuracy: 0.5048 - val_loss: 0.6945 - val_accuracy: 0.4988
Epoch 3/3
704/704 [==============================] - 276s 392ms/step - loss: 0.6916 - accuracy: 0.5021 - val_loss: 0.6955 - val_accuracy: 0.4988
I tried to run my script with disabled GPU usage on Mac (tf.config.set_visible_devices([], 'GPU')). It converges at least...
Epoch 1/3
704/704 [==============================] - 345s 485ms/step - loss: 0.5163 - accuracy: 0.7340 - val_loss: 0.4181 - val_accuracy: 0.8180
Epoch 2/3
704/704 [==============================] - 339s 482ms/step - loss: 0.3322 - accuracy: 0.8604 - val_loss: 0.3782 - val_accuracy: 0.8384
Epoch 3/3
704/704 [==============================] - 337s 478ms/step - loss: 0.2840 - accuracy: 0.8839 - val_loss: 0.3229 - val_accuracy: 0.8576
My old notebook with a Nvidia 960 mobile GPU (Windows11 + WSL2):
2023-10-06 12:15:25.031824: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:432] Loaded cuDNN version 8902
Could not load symbol cublasGetSmCountTarget from libcublas.so.11. Error: /home/mzperx/miniconda3/envs/tf/lib/libcublas.so.11: undefined symbol: cublasGetSmCountTarget
2023-10-06 12:15:26.012204: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fda8c02a3e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-10-06 12:15:26.012311: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce GTX 960M, Compute Capability 5.0
2023-10-06 12:15:26.180842: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:255] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY to enable.
2023-10-06 12:15:27.076801: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
704/704 [==============================] - 143s 176ms/step - loss: 0.4835 - accuracy: 0.7684 - val_loss: 0.4299 - val_accuracy: 0.8260
Epoch 2/3
704/704 [==============================] - 60s 85ms/step - loss: 0.3379 - accuracy: 0.8570 - val_loss: 0.3256 - val_accuracy: 0.8600
Epoch 3/3
704/704 [==============================] - 57s 81ms/step - loss: 0.2904 - accuracy: 0.8813 - val_loss: 0.3132 - val_accuracy: 0.8640
Google Colab with T4:
Dataset imdb_reviews downloaded and prepared to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0. Subsequent calls will reuse this data.
Epoch 1/3
704/704 [==============================] - 74s 89ms/step - loss: 0.4796 - accuracy: 0.7576 - val_loss: 0.4048 - val_accuracy: 0.8304
Epoch 2/3
704/704 [==============================] - 28s 40ms/step - loss: 0.3402 - accuracy: 0.8589 - val_loss: 0.3149 - val_accuracy: 0.8676
Epoch 3/3
704/704 [==============================] - 27s 38ms/step - loss: 0.2899 - accuracy: 0.8824 - val_loss: 0.3065 - val_accuracy: 0.8684