tensorflow-metal failure on M1 Ultra

Followed instructions and example here.. Error on execution abbreviated:


2022-12-26 18:43:15.162697: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz

Epoch 1/5

/Users/<username>/venv-metal/lib/python3.8/site-packages/keras/backend.py:5585: UserWarning: "`sparse_categorical_crossentropy` received `from_logits=True`, but the `output` argument was produced by a Softmax activation and thus does not represent logits. Was this intended?

  output, from_logits = _get_logits(

2022-12-26 18:43:21.329698: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.

2022-12-26 18:43:26.638422: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x131ea6e20

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "/Users/<username>/venv-metal/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler

    raise e.with_traceback(filtered_tb) from None

  File "/Users/<username>/venv-metal/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute

    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,

tensorflow.python.framework.errors_impl.NotFoundError: Graph execution error:



Detected at node 'StatefulPartitionedCall_212' defined at (most recent call last):

    File "<stdin>", line 1, in <module>

    File "/Users/<username>/venv-metal/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler

      return fn(*args, **kwargs)

    File "/Users/<username>/venv-metal/lib/python3.8/site-packages/keras/engine/training.py", line 1650, in fit

      tmp_logs = self.train_function(iterator)

    File "/Users/<username>/venv-metal/lib/python3.8/site-packages/keras/engine/training.py", line 1249, in train_function

      return step_function(self, iterator)

    File "/Users/<username>/venv-metal/lib/python3.8/site-packages/keras/engine/training.py", line 1233, in step_function

      outputs = model.distribute_strategy.run(run_step, args=(data,))

    File "/Users/<username>/venv-metal/lib/python3.8/site-packages/keras/engine/training.py", line 1222, in run_step

      outputs = model.train_step(data)

    File "/Users/<username>/venv-metal/lib/python3.8/site-packages/keras/engine/training.py", line 1027, in train_step

      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)

    File "/Users/<username>/venv-metal/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize

      self.apply_gradients(grads_and_vars)

    File "/Users/<username>/venv-metal/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients

      return super().apply_gradients(grads_and_vars, name=name)

    File "/Users/<username>/venv-metal/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients

      iteration = self._internal_apply_gradients(grads_and_vars)

    File "/Users/<username>/venv-metal/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients

      return tf.__internal__.distribute.interim.maybe_merge_call(

    File "/Users/<username>/venv-metal/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn

      distribution.extended.update(

    File "/Users/<username>/venv-metal/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var

      return self._update_step_xla(grad, var, id(self._var_key(var)))

Node: 'StatefulPartitionedCall_212'

could not find registered platform with id: 0x131ea6e20

	 [[{{node StatefulPartitionedCall_212}}]] [Op:__inference_train_function_23355]

Note that in the past I've used tensorflow-metal regularly (and successfully) since it was first a GitHub repo . This is very much a new issue.

Update: Tried this with python 3.10, got this slightly modified id change:

2022-12-26 20:34:43.821008: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x16b325db0

As far as I can tell, M1 Ultra isn't supported now (it was before).

Update 2: Tried skipping the creation of a virtual env just in case I was confusing something. Had to upgrade numpy to load tensorflow. But still the same basic error.

2022-12-26 20:52:30.752129: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1337af7b0

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler

    raise e.with_traceback(filtered_tb) from None

  File "/Users/<username>/miniconda/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute

    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,

tensorflow.python.framework.errors_impl.NotFoundError: Graph execution error:



Detected at node 'StatefulPartitionedCall_212' defined at (most recent call last):

    File "<stdin>", line 1, in <module>

    File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler

      return fn(*args, **kwargs)

    File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit

      tmp_logs = self.train_function(iterator)

    File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function

      return step_function(self, iterator)

    File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function

      outputs = model.distribute_strategy.run(run_step, args=(data,))

    File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step

      outputs = model.train_step(data)

    File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step

      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)

    File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize

      self.apply_gradients(grads_and_vars)

    File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients

      return super().apply_gradients(grads_and_vars, name=name)

    File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients

      iteration = self._internal_apply_gradients(grads_and_vars)

    File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients

      return tf.__internal__.distribute.interim.maybe_merge_call(

    File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn

      distribution.extended.update(

    File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var

      return self._update_step_xla(grad, var, id(self._var_key(var)))

Node: 'StatefulPartitionedCall_212'

could not find registered platform with id: 0x1337af7b0

	 [[{{node StatefulPartitionedCall_212}}]] [Op:__inference_train_function_23355]

Issue is discussed here. The problem is that Apple implemented code that is apparently not supported on (m)any of its devices. This can be fixed by using legacy optimizers, but that foregoes attempted improvements. Until this is fixed (bad versions are tensorflow-macos 2.11 and tensorflow-metal 0..7), it is best for anyone experiencing similar issues to install a functional versions:

python -m pip install tensorflow-macos==2.9 
python -m pip install tensorflow-metal==0.5.0

thanks

tensorflow-metal failure on M1 Ultra
 
 
Q