I initially raised this issue in the tensorflow forum, and they directed me back here since this is a tf-macos specific problem [see https://github.com/tensorflow/tensorflow/issues/60673].
When calling Model.compile() with the AdamW optimizer, a warning is thrown saying that v2.11+ optimizers have a known slowdown on M1/M2 devices, and so the backend attempts to fallback to a legacy version. However, no legacy version of the AdamW optimizer exists. In a previous tf-macos version 2.12, this lead to an error during Model.compile() [see issue https://github.com//issues/60652 and https://developer.apple.com/forums/thread/729732]. In the current nightly, this error is not thrown - however, after calling model.compile(), the attribute model.optimizer is set to string 'adamw' instead of an optimizer object.
Later, when we call model.fit(), this leads to an AttributeError, because model.optimizer.minimize() does not exist when model.optimizer is a string.
Expected behaviour: correctly compile the model with either a v2.11+ optimiser without slowdown, or a legacy-compatible implementation of the AdamW optimizer. Then the model will train correctly with a valid AdamW optimizer when calling model.fit().
Note: a warning message suggests using the optimizer located at tf.keras.optimizers.legacy.AdamW, but this does not exist
It would be nice to be able to either use modern optimizers, or have a legacy-compatible version of AdamW, since weight-decay is an important tool in modern ML research, and currently cannot be used on mac.
Standalone code to reproduce the issue
##===========##
## Imports ##
##===========##
import sys
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.optimizers import AdamW
##===================##
## Report versions ##
##===================##
#
# Expected outputs:
# Python version is: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 19:01:19) [Clang 14.0.6 ]
# TF version is: 2.14.0-dev20230523
# Numpy version is: 1.23.2
#
print(f"Python version is: {sys.version}")
print(f"TF version is: {tf.__version__}")
print(f"Numpy version is: {np.__version__}")
##==============================##
## Create a very simple model ##
##==============================##
#
# Expected outputs:
# Model: "model_1"
# _________________________________________________________________
# Layer (type) Output Shape Param #
# =================================================================
# Layer_in (InputLayer) [(None, 2)] 0
#
# Layer_hidden (Dense) (None, 10) 30
#
# Layer_out (Dense) (None, 2) 22
#
# =================================================================
# Total params: 52 (208.00 Byte)
# Trainable params: 52 (208.00 Byte)
# Non-trainable params: 0 (0.00 Byte)
# _________________________________________________________________
#
x_in = Input(2 , dtype=tf.float32, name="Layer_in" )
x = x_in
x = Dense(10, dtype=tf.float32, name="Layer_hidden", activation="relu" )(x)
x = Dense(2 , dtype=tf.float32, name="Layer_out" , activation="linear")(x)
model = Model(x_in, x)
model.summary()
##===================================================##
## Compile model with MSE loss and AdamW optimizer ##
##===================================================##
#
# Expected outputs:
# WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.AdamW` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.AdamW`.
# WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.AdamW`.
#
model.compile(
loss = "mse",
optimizer = AdamW(learning_rate=1e-3, weight_decay=1e-2)
)
##===========================##
## Generate some fake data ##
##===========================##
#
# Expected outputs:
# X shape is (100, 2), Y shape is (100, 2)
#
dataset_size = 100
X = np.random.normal(size=(dataset_size, 2))
X = tf.constant(X, dtype=tf.float32)
Y = np.random.normal(size=(dataset_size, 2))
Y = tf.constant(Y, dtype=tf.float32)
print(f"X shape is {X.shape}, Y shape is {Y.shape}")
##===================================##
## Fit model to data for one epoch ##
##===================================##
#
# Expected outputs:
# ---------------------------------------------------------------------------
# AttributeError Traceback (most recent call last)
# Cell In[9], line 51
# 1 ##===================================##
# 2 ## Fit model to data for one epoch ##
# 3 ##===================================##
# (...)
# 48 # • mask=None
# 49 #
# ---> 51 model.fit(X, Y, epochs=1)
# File ~/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
# 67 filtered_tb = _process_traceback_frames(e.__traceback__)
# 68 # To get the full stack trace, call:
# 69 # `tf.debugging.disable_traceback_filtering()`
# ---> 70 raise e.with_traceback(filtered_tb) from None
# 71 finally:
# 72 del filtered_tb
# File /var/folders/6_/gprzxt797d5098h8dtk22nch0000gn/T/__autograph_generated_filezzqv9k36.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator)
# 13 try:
# 14 do_return = True
# ---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
# 16 except:
# 17 do_return = False
# AttributeError: in user code:
# File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1338, in train_function *
# return step_function(self, iterator)
# File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1322, in step_function **
# outputs = model.distribute_strategy.run(run_step, args=(data,))
# File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1303, in run_step **
# outputs = model.train_step(data)
# File "/Users/Ste/miniforge3/envs/tf_macos_nightly_230523/lib/python3.10/site-packages/keras/src/engine/training.py", line 1084, in train_step
# self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
# AttributeError: 'str' object has no attribute 'minimize'
model.fit(X, Y, epochs=1)