Tensorflow on M1 Macbook Pro, error when model fit executes

It doesn't matter if I install miniforge or mamba, directly or through brew, when I try to fit the sample model from https://developer.apple.com/metal/tensorflow-plugin/, even with a simple sequential model, I always get this error.

Is there any workaround on this? I'll appreciate any help, thanks!

2022-12-10 11:18:19.941623: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 2022-12-10 11:18:20.427283: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled. 2022-12-10 11:18:21.222950: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.223003: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.363366: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.364757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388739: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90 2022-12-10 11:18:21.388757: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x28edf1f90

NotFoundError Traceback (most recent call last) Cell In[25], line 2 1 model = create_model() ----> 2 history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64);

File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs) 67 filtered_tb = _process_traceback_frames(e.traceback) 68 # To get the full stack trace, call: 69 # tf.debugging.disable_traceback_filtering() ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb

File /opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 50 try: 51 ctx.ensure_initialized() ---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 53 inputs, attrs, num_outputs) 54 except core._NotOkStatusException as e: 55 if name is not None:

NotFoundError: Graph execution error:

Detected at node 'StatefulPartitionedCall_4' defined at (most recent call last): File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in app.launch_new_instance() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/traitlets/config/application.py", line 992, in launch_instance app.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 711, in start self.io_loop.start() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 215, in start self.asyncio_loop.run_forever() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 603, in run_forever self._run_once() File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/asyncio/base_events.py", line 1899, in _run_once handle._run() ...

File "/var/folders/f9/bp40pn0d401d974fy48dxm8h0000gn/T/ipykernel_63636/3393788193.py", line 2, in <module>
  history = model.fit(Xf_train, yf_train, epochs=3, batch_size=64);
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
  return fn(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit
  tmp_logs = self.train_function(iterator)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function
  return step_function(self, iterator)
......

File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step
  outputs = model.train_step(data)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step
  self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
  self.apply_gradients(grads_and_vars)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
  return super().apply_gradients(grads_and_vars, name=name)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
  iteration = self._internal_apply_gradients(grads_and_vars)
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
  return tf.__internal__.distribute.interim.maybe_merge_call(
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
  distribution.extended.update(
File "/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
  return self._update_step_xla(grad, var, id(self._var_key(var)))

Node: 'StatefulPartitionedCall_4' could not find registered platform with id: 0x28edf1f90 [[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_1241]

Answered by dweilert in 739446022

I dropped back to the following versions: tensorflow-macos==2.9 and tensorflow-metal==0.5.0. Was using the tensorflow-macos==2.11 and tensorflow-metal==0.7.0 version and just couldn't get things to work. After dropping back I was able to use the GPU and all my validations worked. I'll check back later to see if a more current version will worl.

I don't think I understand the error message better. But here is what I did to make tensorflow working on my Macbook Pro M1 Pro and hopefully it helps. First I removed all previous Anaconda installation I had. See tutorial here (https://docs.anaconda.com/anaconda/install/uninstall/). Then I followed every single step in the link you attached (https://developer.apple.com/metal/tensorflow-plugin/). Make sure you use miniconda instead of anaconda, but bash or graphical installer should not matter. I tried anaconda and it didn't work well due to package conflicts. I was able to run the sample model in Step 4 without a problem.

I have the same error. I used minconda and still get the error. It occurs in the model.fit. Everything before that looks normal until I run model.fit and get so many warning then error messages

OT_FOUND: could not find registered platform with id: 0x282f9b6f0 2022-12-10 23:22:22.325619: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x282f9b6f0


NotFoundError Traceback (most recent call last) Cell In[27], line 13 11 loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) 12 model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"]) ---> 13 model.fit(x_train, y_train, epochs=5, batch_size=64)

File ~/opt/miniconda3/envs/tensorflow/lib/python3.9/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs) 67 filtered_tb = _process_traceback_frames(e.traceback) 68 # To get the full stack trace, call: 69 # tf.debugging.disable_traceback_filtering() ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb

File ~/opt/miniconda3/envs/tensorflow/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 50 try: 51 ctx.ensure_initialized() ---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 53 inputs, attrs, num_outputs) 54 except core._NotOkStatusException as e: 55 if name is not None:

NotFoundError: Graph execution error:

Node: 'StatefulPartitionedCall_212' could not find registered platform with id: 0x282f9b6f0 [[{{node StatefulPartitionedCall_212}}]] [Op:__inference_train_function_25966]

I was also facing the exact problem, when I.install TensorFlow and test by running the code example "mint database" everything was fine except the model.fit function hope anyone can help to solve the problem

Yes. I'm facing the same problem. I guess you are using Tensorflow-macos 2.11.0 and Tensorflow-metal 0.7.0.

From my understanding, the problem is the 'conda install -c apple tensorflow-deps' step as per the website instruction https://developer.apple.com/metal/tensorflow-plugin/. We are still installing tensorflow-deps 2.9.0 (https://anaconda.org/apple/tensorflow-deps/files). I was facing problem to even run tensorflow-macos 2.10.0 on my mac, had to downgrade to tensorflow-macos 2.9 (to match tensorflow-deps 2.9.0). Also, as per the conda link, there is no tensoflow-deps 2.11.0 from apple yet. Hopefully, this issue is fixed soon.

Same error!!

M1 MAX

Mac OS Ventura 13.1 tensorflow-metal 0.7.0 tensorflow-macos 2.11.0

Metal device set to: Apple M1 Max systemMemory: 32.00 GB maxCacheSize: 10.67 GB

2022-12-11 18:48:12.915462: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.

2022-12-11 18:48:12.915489: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: )

2022-12-11 18:48:13.971037: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz

Epoch 1/5

/Users/macstudio/miniconda/envs/tf/lib/python3.10/site-packages/keras/backend.py:5585: UserWarning: "sparse_categorical_crossentropy received from_logits=True, but the output argument was produced by a Softmax activation and thus does not represent logits. Was this intended?

  output, from_logits = _get_logits(

2022-12-11 18:48:19.160047: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.

2022-12-11 18:48:20.283908: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x12c5c26d0

2022-12-11 18:48:20.283938: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x12c5c26d0

Same question here.Did it because of the system version?I'm in macos13.0.1 vurtura.

Hello all, I was also facing the same problem then I installed the recommended versions at here https://developer.apple.com/metal/tensorflow-plugin/. For tensorflow-macos, it is currently 2.9. For tensorflow-metal, it is currently 0.5. With that, I was able to use my gpu. I hope this help

You need to use the tensorflow-metal version 0.5.0. See the version table on https://developer.apple.com/metal/tensorflow-plugin/.

Install the proper version with:

python -m pip install tensorflow-metal==0.5.0

In my case I should specify versions

python -m pip install tensorflow-macos==2.9 python -m pip install tensorflow-metal==0.5.0

Hi @ppobar

I assume you are seeing this on the latest wheels with tensorflow-macos==2.11 and tensorflow-metal==0.7.0? In that case this most probably has to do with recent changes on tensorflow side for version 2.11 where a new optimizer API has been implemented where a default JIT compilation flag is set (https://blog.tensorflow.org/2022/11/whats-new-in-tensorflow-211.html). This forces the optimizer op to take an XLA path that the pluggable architecture has not implemented yet causing the inelegant crash as it cannot fall back to supported operations. Currently the workaround is to use the older API for optimizers that was used up to TF 2.10 by exporting it from the .legacy folder of optimizers. So more concretely by using Adam optimizer as an example one should change

from tensorflow.keras.optimizers import Adam

to

from tensorflow.keras.optimizers.legacy import Adam.

This should restore previous behavior while the XLA path support is being worked on. Let me know if this solves the issue for you! And if not, could you provide details on which OS version, tf-macos and tf-metal versions you are seeing this and a script I can use to reproduce the issue?

16
Accepted Answer

I dropped back to the following versions: tensorflow-macos==2.9 and tensorflow-metal==0.5.0. Was using the tensorflow-macos==2.11 and tensorflow-metal==0.7.0 version and just couldn't get things to work. After dropping back I was able to use the GPU and all my validations worked. I'll check back later to see if a more current version will worl.

33

@Frameworks Engineer I can confirm that switching to from tensorflow.keras.optimizers.legacy import Adam fixes the XLA problem, and TF 2.11 works fine. So no need to downgrade the tensorflow version. Thank you!

Thank yo so much to everyone.

I had the latest wheels (tensorflow-macos==2.11 and tensorflow-metal==0.7.0), so I tryied with this: from tensorflow.keras.optimizers.legacy import Adam, as @Frameworks Engineer suggested. Even though it was helpfull still some model.fit failed.

So, at the end I went back to tensorflow-macos==2.9 and tensorflow-metal==0.5.0 and, as many of you suggested and, now everything is working fine.

I just followed the suggestion provided above (downgrade to tensorflow-macos==2.9 and tensorflow-metal==0.5.0) works!! Thank you all!

Reinstall miniforge3 with Python 3.9 version. Command- conda create --prefix ./env python=3.8 conda activate ./env 2.conda install -c apple tensorflow-deps. 3.python -m pip install tensorflow-macos==2.9 4.python -m pip install tensorflow-metal==0.5.0 5. Run sample script available on https://developer.apple.com/metal/tensorflow-plugin/%C2%A0This worked for me. Check versions properly.

Tensorflow on M1 Macbook Pro, error when model fit executes
 
 
Q