Can confirm that python 3.9 + tensorflow-macos 2.8 + tensorflow-metal 0.4.0 is the combination you want to avoid the deadlock/freezing issue. Model successfully ran overnight.
Post
Replies
Boosts
Views
Activity
Ok, a bit of work to get the older and more stable versions up and running. First and foremost, you'll need homebrew Then you'll need to use the version of python supported for the targeted release. The table for how to match up archival versions of tensorflow-macos and tensorflow-metal is near the bottom of this page.
You can then use brew to install the legacy python
brew install python@3.9
And then use that to create a virtual environment. Code follows for my install, though double check the location of your homebrew.
/opt/homebrew/opt/python@3.9/bin/python3.9 -m venv ~/tensorflow
source ~/tensorflow/bin/activate
With the virtual environment created, you then need to get the urls for the old pip installs. Apple prohibits the linking of external urls on this forum, but you can look up tensorflow-macos and tensoflow-metal at pypi dot org and find their release history on the left side column. Then right click/command click the release. pip install <url> is an acceptable way to install packages.
Take careful note of the c38 or c39 in the filename - this tells you whether you need python 3.8 or 3.9 for a particular release.
With that, you just need to install using the urls. So in my example, I want to use tensorflow-macos 2.8 and tensorflow-metal 0.4.0, which did not have the deadlock issue (at least not that I recall, will add another comment with a stable configuration if I need to find it).
pip install https://files.pythonhosted.org/packages/4d/74/47440202d9a26c442b19fb8a15ec36d443f25e5ef9cf7bfdeee444981513/tensorflow_macos-2.8.0-cp39-cp39-macosx_11_0_arm64.whl
pip install https://files.pythonhosted.org/packages/d5/37/c48486778e4756b564ef844b145b16f3e0627a53b23500870d260c3a49f3/tensorflow_metal-0.4.0-cp39-cp39-macosx_11_0_arm64.whl
With that, I am off to the races. I am using tensorflow-macos to build a chatbot ai. The older configuration of tensorflow-macos and tensoflow-metal have the same training time on my configuration - about an hour per epoch. Which is not bad at all for a model with 82 million parameters and a dataset of hundreds of thousands of scientific papers (this is with M1Ultra and batch sizes of 64). Tensorflow on Mac is very powerful, but unfortunately you can't rely on latest releases or the provided installation instructions to get anything functional.
As far as I can tell, @tux_o_matic is correct about the only workable solution.
Still a problem on M2 (with 16GB of unified RAM). And still stuck on tensorflow-macos 2.9 and tensorflow-metal 0.5.0 since newer versions are broken.
If I recall correctly, tensorflow-metal 0.4.0 didn't stop randomly during training (e.g. the deadlock that @Namalek mentioned) - does anyone know how to get that version? pip can only find 0.5.0 at the earliest, and that has the stalling bug. I am mystified by how this keeps getting updated with broken fixes - even the simple tutorial models don't work.
Unfortunately I'm on the other side of the issue as @wbattel4607. I bought a Mac Studio with the M1Ultra only to discover that Apple had effectively nerfed tensorflow by creating broken updates and removing the tensorflow-macos < 2.9.0 and tensorflow-metal = 0.4.0 configurations that could actually train models.
That fixed the import tensorflow issue. I installed on a brand new Mac installation with a new user account - so perhaps include that in the setup instructions?
Next error comes when trying to train a model:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users//<user>/miniconda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/var/folders/v0/w_k546h500q00yr1lhwd78640000gn/T/__autograph_generated_filenv9ppeuc.py", line 15, in tf__train_function
retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
File "/Users//<user>/miniconda/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 1557, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/Users/<user>/miniconda/lib/python3.10/site-packages/transformers/optimization_tf.py", line 246, in apply_gradients
return super(AdamWeightDecay, self).apply_gradients(zip(grads, tvars), name=name, **kwargs)
TypeError: in user code:
File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function *
return step_function(self, iterator)
File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step **
outputs = model.train_step(data)
File "/Users/<user>/miniconda/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 1557, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
self.apply_gradients(grads_and_vars)
File "/Users/<user>/miniconda/lib/python3.10/site-packages/transformers/optimization_tf.py", line 246, in apply_gradients
return super(AdamWeightDecay, self).apply_gradients(zip(grads, tvars), name=name, **kwargs)
File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
return super().apply_gradients(grads_and_vars, name=name)
File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 632, in apply_gradients
self._apply_weight_decay(trainable_variables)
File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1159, in _apply_weight_decay
tf.__internal__.distribute.interim.maybe_merge_call(
File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1155, in distributed_apply_weight_decay
distribution.extended.update(
File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1149, in weight_decay_fn **
if self._use_weight_decay(variable):
File "/Users/<user>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 587, in _use_weight_decay
for exclude_id in exclude_from_weight_decay:
TypeError: 'NoneType' object is not iterable
I have to revert to earlier versions to continue my work still. But they're problem (tensorflow-metal 0.5.0 and tensorflow-macos 2.9, but that stops randomly during training. Not sure if there is any stable configuration.
I used the versions you requested (tensorflow_macos==2.11.0 version with 0.7.0), here is the error I get:
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf
ImportError: numpy.core._multiarray_umath failed to import
ImportError: numpy.core.umath failed to import
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/__init__.py", line 37, in <module>
from tensorflow.python.tools import module_util as _module_util
File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/__init__.py", line 42, in <module>
from tensorflow.python import data
File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/data/__init__.py", line 21, in <module>
from tensorflow.python.data import experimental
File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/data/experimental/__init__.py", line 96, in <module>
from tensorflow.python.data.experimental import service
File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/data/experimental/service/__init__.py", line 419, in <module>
from tensorflow.python.data.experimental.ops.data_service_ops import distribute
File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/data/experimental/ops/data_service_ops.py", line 22, in <module>
from tensorflow.python.data.experimental.ops import compression_ops
File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/data/experimental/ops/compression_ops.py", line 16, in <module>
from tensorflow.python.data.util import structure
File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/data/util/structure.py", line 22, in <module>
from tensorflow.python.data.util import nest
File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/data/util/nest.py", line 34, in <module>
from tensorflow.python.framework import sparse_tensor as _sparse_tensor
File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/framework/sparse_tensor.py", line 24, in <module>
from tensorflow.python.framework import constant_op
File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 25, in <module>
from tensorflow.python.eager import execute
File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 21, in <module>
from tensorflow.python.framework import dtypes
File "/Users/<user>/miniconda/lib/python3.10/site-packages/tensorflow/python/framework/dtypes.py", line 34, in <module>
_np_bfloat16 = _pywrap_bfloat16.TF_bfloat16_type()
TypeError: Unable to convert function return value to a Python type! The signature was
() -> handle
This release is clearly unusable.
Issue is discussed here. The problem is that Apple implemented code that is apparently not supported on (m)any of its devices. This can be fixed by using legacy optimizers, but that foregoes attempted improvements. Until this is fixed (bad versions are tensorflow-macos 2.11 and tensorflow-metal 0..7), it is best for anyone experiencing similar issues to install a functional versions:
python -m pip install tensorflow-macos==2.9
python -m pip install tensorflow-metal==0.5.0
Update 2: Tried skipping the creation of a virtual env just in case I was confusing something. Had to upgrade numpy to load tensorflow. But still the same basic error.
2022-12-26 20:52:30.752129: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1337af7b0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Users/<username>/miniconda/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.NotFoundError: Graph execution error:
Detected at node 'StatefulPartitionedCall_212' defined at (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit
tmp_logs = self.train_function(iterator)
File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function
return step_function(self, iterator)
File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step
outputs = model.train_step(data)
File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
self.apply_gradients(grads_and_vars)
File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
return super().apply_gradients(grads_and_vars, name=name)
File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
iteration = self._internal_apply_gradients(grads_and_vars)
File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
return tf.__internal__.distribute.interim.maybe_merge_call(
File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
distribution.extended.update(
File "/Users/<username>/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_212'
could not find registered platform with id: 0x1337af7b0
[[{{node StatefulPartitionedCall_212}}]] [Op:__inference_train_function_23355]
Update: Tried this with python 3.10, got this slightly modified id change:
2022-12-26 20:34:43.821008: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x16b325db0
As far as I can tell, M1 Ultra isn't supported now (it was before).
Unfortunately, this didn't work - I was able to text from my Mac for about 5 minutes. First message received from the other person, and the message field has disappeared again.
u/mikeportanova is right - you need to share something with the contact outside of messages, but with the selected conversation in messages having a type field. I have no idea why this works, but it does.