Post

Replies

Boosts

Views

Activity

Cannot Use tf.zeros_like with tensorflow-metal (Monterey)
Hi, I am reliably able to get the following results after running pip install tensorflow-metal. Note I did not cull anything (including some device registration messages that only appear the first time you use tensorflow - hopefully not too distracting, but thought it would provide helpful context about my environment in case something is fishy). Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14) [Clang 12.0.1 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> tf.config.list_physical_devices('GPU') [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] >>> tf.zeros_like([1]) Metal device set to: Apple M1 systemMemory: 8.00 GB maxCacheSize: 2.67 GB 2022-06-05 18:54:29.515755: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-06-05 18:54:29.516007: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/homebrew/Caskroom/miniforge/base/envs/ml/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "/opt/homebrew/Caskroom/miniforge/base/envs/ml/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 7164, in raise_from_not_ok_status raise core._status_to_exception(e) from None # pylint: disable=protected-access tensorflow.python.framework.errors_impl.InvalidArgumentError: Multiple Default OpKernel registrations match NodeDef '{{node ZerosLike}}': 'op: "ZerosLike" device_type: "DEFAULT" constraint { name: "T" allowed_values { list { type: DT_INT32 } } } host_memory_arg: "y"' and 'op: "ZerosLike" device_type: "DEFAULT" constraint { name: "T" allowed_values { list { type: DT_INT32 } } } host_memory_arg: "y"' [Op:ZerosLike] Whereas after uninstalling tensorflow-metal (pip uninstall tensorflow-metal) the same commands produce: Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14) [Clang 12.0.1 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> tf.config.list_physical_devices('GPU') [] >>> tf.zeros_like([1]) <tf.Tensor: shape=(1,), dtype=int32, numpy=array([0], dtype=int32)> It looks like a simple double registration issue, but I've only just found out about the 'PluggableDevice' API, so I don't know if it has recommendations for resolving multiple registrations. If I had to guess it is unexpected in the extreme for a pluggable device extension to contain default device op registrations, but without being able to see the code I cannot guess further about what might be wrong.
4
0
1.2k
Jun ’22