Help! Cannot train any NNs with tensorflow on M1 Chip

here is the example code:

import sys
import time
import tensorflow as tf
import tensorflow.keras
import pandas as pd
import sklearn as sk
try:
  import tensorflow_datasets as tfds
except:
  !pip install -q tensorflow_datasets
  import tensorflow_datasets as tfds
import tensorflow.compat.v2 as tf

tf.enable_v2_behavior()

from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()

# from tensorflow.python.compiler.mlcompute import mlcompute
# mlcompute.set_mlc_device(device_name='gpu')
# (mlcompute cannot be found)

(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

batch_size = 128

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(batch_size)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)


ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(batch_size)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)


model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(32, kernel_size=(3, 3),
                 activation='relu'),
  tf.keras.layers.Conv2D(64, kernel_size=(3, 3),
                 activation='relu'),
  tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
#   tf.keras.layers.Dropout(0.25),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
#   tf.keras.layers.Dropout(0.5),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=tf.keras.optimizers.Adam(0.001),
    metrics=['accuracy'],
)

start_time = time.time()
model.fit(
    ds_train,
    epochs=10,
    validation_data=ds_test,
)
print("--- %s minutes with GPU ---" % ((time.time() - start_time)/60))

here are the outputs:

commenting

from tensorflow.python.framework.ops import disable_eager_execution 
disable_eager_execution()

kernel died:

Metal device set to: Apple M1 Pro

systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

Epoch 1/10
2021-10-31 19:10:20.277599: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-31 19:10:20.278606: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-31 19:10:20.279185: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)

with

from tensorflow.python.framework.ops import disable_eager_execution 
disable_eager_execution()

RuntimeError: Caught an unknown exception!:

I installed tensorflow-macos and tensorflow-metal by following https://developer.apple.com/metal/tensorflow-plugin/

I am struggling of using tensorflow on new MBP for many days and it just totally a nightmare. I met tons of issues, searched ways to solved them and tried tons of methods and none of them can really help me to train a NN.

Honestly, I am out of patience right now. I brought MBP as I suppose it can facilitate my work and it turns out I even can use tensorflow! so ridiculous!

  • Tensorflow: 2.6.0
  • Keras Version: 2.6.0
  • Python 3.8.12
  • macOS: 12.0.1
  • Anaconda: 2.0.3
  • tensorflow-macos: 2.6.0
  • tensorflow-metal: 0.2.0

ouputs when uncommon following code:

from tensorflow.python.framework.ops import disable_eager_execution 
disable_eager_execution()
2021-10-31 18:54:21.304180: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-31 18:54:21.305317: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-31 18:54:21.306071: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2021-10-31 18:54:21.392880: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-31 18:54:21.392913: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2021-10-31 18:54:21.486630: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-31 18:54:21.499261: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
Metal device set to: Apple M1 Pro

systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

2021-10-31 18:54:21.560669: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-31 18:54:21.577903: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-31 18:54:21.671397: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-31 18:54:21.690163: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-31 18:54:21.725452: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-31 18:54:21.745059: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-31 18:54:21.762898: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2021-10-31 18:54:21.782170: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-31 18:54:21.796932: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-31 18:54:21.839 python[42307:1235107] -[MPSGraph adamUpdateWithLearningRateTensor:beta1Tensor:beta2Tensor:epsilonTensor:beta1PowerTensor:beta2PowerTensor:valuesTensor:momentumTensor:velocityTensor:gradientTensor:name:]: unrecognized selector sent to instance 0x6000036816c0
Train on 469 steps, validate on 79 steps
Epoch 1/10
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/var/folders/mq/pnk708vd76xfjjjmy_4yn2lw0000gn/T/ipykernel_42307/2486947832.py in <module>
     58 
     59 start_time = time.time()
---> 60 model.fit(
     61     ds_train,
     62     epochs=10,

/opt/anaconda3/lib/python3.8/site-packages/keras/engine/training_v1.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    775 
    776     func = self._select_training_loop(x)
--> 777     return func.fit(
    778         self,
    779         x=x,

/opt/anaconda3/lib/python3.8/site-packages/keras/engine/training_arrays_v1.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, **kwargs)
    638       val_x, val_y, val_sample_weights = None, None, None
    639 
--> 640     return fit_loop(
    641         model,
    642         inputs=x,

/opt/anaconda3/lib/python3.8/site-packages/keras/engine/training_arrays_v1.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, validation_freq, mode, validation_in_fit, prepared_feed_values_from_dataset, steps_name, **kwargs)
    284           else:
    285             actual_inputs = ins()
--> 286           batch_outs = f(actual_inputs)
    287         except tf.errors.OutOfRangeError:
    288           if is_dataset:

/opt/anaconda3/lib/python3.8/site-packages/keras/backend.py in __call__(self, inputs)
   4027         feed_symbols != self._feed_symbols or self.fetches != self._fetches or
   4028         session != self._session):
-> 4029       self._make_callable(feed_arrays, feed_symbols, symbol_vals, session)
   4030 
   4031     fetched = self._callable_fn(*array_vals,

/opt/anaconda3/lib/python3.8/site-packages/keras/backend.py in _make_callable(self, feed_arrays, feed_symbols, symbol_vals, session)
   3963       callable_opts.run_options.CopyFrom(self.run_options)
   3964     # Create callable.
-> 3965     callable_fn = session._make_callable_from_options(callable_opts)
   3966     # Cache parameters corresponding to the generated callable, so that
   3967     # we can detect future mismatches and refresh the callable.

/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/client/session.py in _make_callable_from_options(self, callable_options)
   1509     """
   1510     self._extend_graph()
-> 1511     return BaseSession._Callable(self, callable_options)
   1512 
   1513 

/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/client/session.py in __init__(self, session, callable_options)
   1467           compat.as_bytes(callable_options.SerializeToString()))
   1468       try:
-> 1469         self._handle = tf_session.TF_SessionMakeCallable(
   1470             session._session, options_ptr)
   1471       finally:

RuntimeError: Caught an unknown exception!

An exception occurred in the data module of tensorflow, precisely because of an exception in its Datasets module. You can try model.fit(x_train, y_train,...) while training. This issue has been kept until recently, and it still exists in Tensorflow v2.9.2.

Help! Cannot train any NNs with tensorflow on M1 Chip
 
 
Q