Post

Replies

Boosts

Views

Activity

Reply to The new tensorflow-macos and tensorflow-metal incapacitate training
Ok, a bit of work to get the older and more stable versions up and running. First and foremost, you'll need homebrew Then you'll need to use the version of python supported for the targeted release. The table for how to match up archival versions of tensorflow-macos and tensorflow-metal is near the bottom of this page. You can then use brew to install the legacy python brew install python@3.9 And then use that to create a virtual environment. Code follows for my install, though double check the location of your homebrew. /opt/homebrew/opt/python@3.9/bin/python3.9 -m venv ~/tensorflow source ~/tensorflow/bin/activate With the virtual environment created, you then need to get the urls for the old pip installs. Apple prohibits the linking of external urls on this forum, but you can look up tensorflow-macos and tensoflow-metal at pypi dot org and find their release history on the left side column. Then right click/command click the release. pip install <url> is an acceptable way to install packages. Take careful note of the c38 or c39 in the filename - this tells you whether you need python 3.8 or 3.9 for a particular release. With that, you just need to install using the urls. So in my example, I want to use tensorflow-macos 2.8 and tensorflow-metal 0.4.0, which did not have the deadlock issue (at least not that I recall, will add another comment with a stable configuration if I need to find it). pip install https://files.pythonhosted.org/packages/4d/74/47440202d9a26c442b19fb8a15ec36d443f25e5ef9cf7bfdeee444981513/tensorflow_macos-2.8.0-cp39-cp39-macosx_11_0_arm64.whl pip install https://files.pythonhosted.org/packages/d5/37/c48486778e4756b564ef844b145b16f3e0627a53b23500870d260c3a49f3/tensorflow_metal-0.4.0-cp39-cp39-macosx_11_0_arm64.whl With that, I am off to the races. I am using tensorflow-macos to build a chatbot ai. The older configuration of tensorflow-macos and tensoflow-metal have the same training time on my configuration - about an hour per epoch. Which is not bad at all for a model with 82 million parameters and a dataset of hundreds of thousands of scientific papers (this is with M1Ultra and batch sizes of 64). Tensorflow on Mac is very powerful, but unfortunately you can't rely on latest releases or the provided installation instructions to get anything functional.
Jan ’23
Reply to The new tensorflow-macos and tensorflow-metal incapacitate training
As far as I can tell, @tux_o_matic is correct about the only workable solution. Still a problem on M2 (with 16GB of unified RAM). And still stuck on tensorflow-macos 2.9 and tensorflow-metal 0.5.0 since newer versions are broken. If I recall correctly, tensorflow-metal 0.4.0 didn't stop randomly during training (e.g. the deadlock that @Namalek mentioned) - does anyone know how to get that version? pip can only find 0.5.0 at the earliest, and that has the stalling bug. I am mystified by how this keeps getting updated with broken fixes - even the simple tutorial models don't work. Unfortunately I'm on the other side of the issue as @wbattel4607. I bought a Mac Studio with the M1Ultra only to discover that Apple had effectively nerfed tensorflow by creating broken updates and removing the tensorflow-macos < 2.9.0 and tensorflow-metal = 0.4.0 configurations that could actually train models.
Jan ’23
Reply to symbol not found in tensorflow-metal version 0.7.0
That fixed the import tensorflow issue. I installed on a brand new Mac installation with a new user account - so perhaps include that in the setup instructions? Next error comes when trying to train a model: Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/Users//<user>/miniconda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler     raise e.with_traceback(filtered_tb) from None   File "/var/folders/v0/w_k546h500q00yr1lhwd78640000gn/T/__autograph_generated_filenv9ppeuc.py", line 15, in tf__train_function     retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)   File "/Users//<user>/miniconda/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 1557, in train_step     self.optimizer.minimize(loss, self.trainable_variables, tape=tape)   File "/Users/<user>/miniconda/lib/python3.10/site-packages/transformers/optimization_tf.py", line 246, in apply_gradients     return super(AdamWeightDecay, self).apply_gradients(zip(grads, tvars), name=name, **kwargs) TypeError: in user code:     File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function  *         return step_function(self, iterator)     File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function  **         outputs = model.distribute_strategy.run(run_step, args=(data,))     File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step  **         outputs = model.train_step(data)     File "/Users/<user>/miniconda/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 1557, in train_step         self.optimizer.minimize(loss, self.trainable_variables, tape=tape)     File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize         self.apply_gradients(grads_and_vars)     File "/Users/<user>/miniconda/lib/python3.10/site-packages/transformers/optimization_tf.py", line 246, in apply_gradients         return super(AdamWeightDecay, self).apply_gradients(zip(grads, tvars), name=name, **kwargs)     File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients         return super().apply_gradients(grads_and_vars, name=name)     File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 632, in apply_gradients         self._apply_weight_decay(trainable_variables)     File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1159, in _apply_weight_decay         tf.__internal__.distribute.interim.maybe_merge_call(     File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1155, in distributed_apply_weight_decay         distribution.extended.update(     File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1149, in weight_decay_fn  **         if self._use_weight_decay(variable):     File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 587, in _use_weight_decay         for exclude_id in exclude_from_weight_decay:     TypeError: 'NoneType' object is not iterable I have to revert to earlier versions to continue my work still. But they're problem (tensorflow-metal 0.5.0 and tensorflow-macos 2.9, but that stops randomly during training. Not sure if there is any stable configuration.
Jan ’23
Reply to symbol not found in tensorflow-metal version 0.7.0
I used the versions you requested (tensorflow_macos==2.11.0 version with 0.7.0), here is the error I get: RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf ImportError: numpy.core._multiarray_umath failed to import ImportError: numpy.core.umath failed to import Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/__init__.py", line 37, in <module>     from tensorflow.python.tools import module_util as _module_util   File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/__init__.py", line 42, in <module>     from tensorflow.python import data   File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/data/__init__.py", line 21, in <module>     from tensorflow.python.data import experimental   File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/data/experimental/__init__.py", line 96, in <module>     from tensorflow.python.data.experimental import service   File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/data/experimental/service/__init__.py", line 419, in <module>     from tensorflow.python.data.experimental.ops.data_service_ops import distribute   File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/data/experimental/ops/data_service_ops.py", line 22, in <module>     from tensorflow.python.data.experimental.ops import compression_ops   File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/data/experimental/ops/compression_ops.py", line 16, in <module>     from tensorflow.python.data.util import structure   File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/data/util/structure.py", line 22, in <module>     from tensorflow.python.data.util import nest   File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/data/util/nest.py", line 34, in <module>     from tensorflow.python.framework import sparse_tensor as _sparse_tensor   File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/framework/sparse_tensor.py", line 24, in <module>     from tensorflow.python.framework import constant_op   File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 25, in <module>     from tensorflow.python.eager import execute   File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 21, in <module>     from tensorflow.python.framework import dtypes   File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/framework/dtypes.py", line 34, in <module>     _np_bfloat16 = _pywrap_bfloat16.TF_bfloat16_type() TypeError: Unable to convert function return value to a Python type! The signature was () -> handle This release is clearly unusable.
Jan ’23
Reply to tensorflow-metal failure on M1 Ultra
Issue is discussed here. The problem is that Apple implemented code that is apparently not supported on (m)any of its devices. This can be fixed by using legacy optimizers, but that foregoes attempted improvements. Until this is fixed (bad versions are tensorflow-macos 2.11 and tensorflow-metal 0..7), it is best for anyone experiencing similar issues to install a functional versions: python -m pip install tensorflow-macos==2.9 python -m pip install tensorflow-metal==0.5.0
Dec ’22
Reply to tensorflow-metal failure on M1 Ultra
Update 2: Tried skipping the creation of a virtual env just in case I was confusing something. Had to upgrade numpy to load tensorflow. But still the same basic error. 2022-12-26 20:52:30.752129: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1337af7b0 Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler     raise e.with_traceback(filtered_tb) from None   File "/Users/<username>/miniconda/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.NotFoundError: Graph execution error: Detected at node 'StatefulPartitionedCall_212' defined at (most recent call last):     File "<stdin>", line 1, in <module>     File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler       return fn(*args, **kwargs)     File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit       tmp_logs = self.train_function(iterator)     File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function       return step_function(self, iterator)     File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function       outputs = model.distribute_strategy.run(run_step, args=(data,))     File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step       outputs = model.train_step(data)     File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step       self.optimizer.minimize(loss, self.trainable_variables, tape=tape)     File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize       self.apply_gradients(grads_and_vars)     File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients       return super().apply_gradients(grads_and_vars, name=name)     File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients       iteration = self._internal_apply_gradients(grads_and_vars)     File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients       return tf.__internal__.distribute.interim.maybe_merge_call(     File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn       distribution.extended.update(     File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var       return self._update_step_xla(grad, var, id(self._var_key(var))) Node: 'StatefulPartitionedCall_212' could not find registered platform with id: 0x1337af7b0 [[{{node StatefulPartitionedCall_212}}]] [Op:__inference_train_function_23355]
Dec ’22
Reply to tensorflow-metal failure on M1 Ultra
Update: Tried this with python 3.10, got this slightly modified id change: 2022-12-26 20:34:43.821008: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x16b325db0 As far as I can tell, M1 Ultra isn't supported now (it was before).
Dec ’22