Post

Replies

Boosts

Views

Activity

Reply to TensorFlow is slow after upgrading to Sonoma
Same for me. I used the code below with the next library versions: tensorflow-macos 2.14.0 - tensorflow-metal 1.1.0. - python 3.10.12 import tensorflow as tf import tensorflow_datasets as tfds raw_train_set, raw_valid_set, raw_test_set = tfds.load( name="imdb_reviews", split=["train[:90%]", "train[90%:]", "test"], as_supervised=True ) tf.random.set_seed(42) train_set = raw_train_set.shuffle(5000, seed=42).batch(32).prefetch(1) valid_set = raw_valid_set.batch(32).prefetch(1) test_set = raw_test_set.batch(32).prefetch(1) vocab_size = 1000 text_vec_layer = tf.keras.layers.TextVectorization(max_tokens=vocab_size) text_vec_layer.adapt(train_set.map(lambda reviews, labels: reviews)) embed_size = 128 tf.random.set_seed(42) model = tf.keras.Sequential([ text_vec_layer, tf.keras.layers.Embedding(vocab_size, embed_size, mask_zero=True), tf.keras.layers.GRU(128), tf.keras.layers.Dense(1, activation="sigmoid") ]) model.compile(loss="binary_crossentropy", optimizer="nadam", metrics=["accuracy"]) history = model.fit(train_set, validation_data=valid_set, epochs=3) -- Mac Mini M1 - Sonoma 14: The most weird thing it is not only slow but it does not converge at all... val_accuracy after last epoch still ~0.49... 2023-10-06 12:01:37.596357: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 2023-10-06 12:01:37.596384: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB 2023-10-06 12:01:37.596389: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB 2023-10-06 12:01:37.596423: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2023-10-06 12:01:37.596440: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) 2023-10-06 12:01:37.930853: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled. Epoch 1/3 704/704 [==============================] - 434s 601ms/step - loss: 0.6935 - accuracy: 0.4989 - val_loss: 0.6931 - val_accuracy: 0.5020 Epoch 2/3 704/704 [==============================] - 290s 411ms/step - loss: 0.6933 - accuracy: 0.5048 - val_loss: 0.6945 - val_accuracy: 0.4988 Epoch 3/3 704/704 [==============================] - 276s 392ms/step - loss: 0.6916 - accuracy: 0.5021 - val_loss: 0.6955 - val_accuracy: 0.4988 I tried to run my script with disabled GPU usage on Mac (tf.config.set_visible_devices([], 'GPU')). It converges at least... Epoch 1/3 704/704 [==============================] - 345s 485ms/step - loss: 0.5163 - accuracy: 0.7340 - val_loss: 0.4181 - val_accuracy: 0.8180 Epoch 2/3 704/704 [==============================] - 339s 482ms/step - loss: 0.3322 - accuracy: 0.8604 - val_loss: 0.3782 - val_accuracy: 0.8384 Epoch 3/3 704/704 [==============================] - 337s 478ms/step - loss: 0.2840 - accuracy: 0.8839 - val_loss: 0.3229 - val_accuracy: 0.8576 My old notebook with a Nvidia 960 mobile GPU (Windows11 + WSL2): 2023-10-06 12:15:25.031824: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:432] Loaded cuDNN version 8902 Could not load symbol cublasGetSmCountTarget from libcublas.so.11. Error: /home/mzperx/miniconda3/envs/tf/lib/libcublas.so.11: undefined symbol: cublasGetSmCountTarget 2023-10-06 12:15:26.012204: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fda8c02a3e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2023-10-06 12:15:26.012311: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce GTX 960M, Compute Capability 5.0 2023-10-06 12:15:26.180842: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:255] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY to enable. 2023-10-06 12:15:27.076801: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. 704/704 [==============================] - 143s 176ms/step - loss: 0.4835 - accuracy: 0.7684 - val_loss: 0.4299 - val_accuracy: 0.8260 Epoch 2/3 704/704 [==============================] - 60s 85ms/step - loss: 0.3379 - accuracy: 0.8570 - val_loss: 0.3256 - val_accuracy: 0.8600 Epoch 3/3 704/704 [==============================] - 57s 81ms/step - loss: 0.2904 - accuracy: 0.8813 - val_loss: 0.3132 - val_accuracy: 0.8640 Google Colab with T4: Dataset imdb_reviews downloaded and prepared to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0. Subsequent calls will reuse this data. Epoch 1/3 704/704 [==============================] - 74s 89ms/step - loss: 0.4796 - accuracy: 0.7576 - val_loss: 0.4048 - val_accuracy: 0.8304 Epoch 2/3 704/704 [==============================] - 28s 40ms/step - loss: 0.3402 - accuracy: 0.8589 - val_loss: 0.3149 - val_accuracy: 0.8676 Epoch 3/3 704/704 [==============================] - 27s 38ms/step - loss: 0.2899 - accuracy: 0.8824 - val_loss: 0.3065 - val_accuracy: 0.8684
Oct ’23