TF-Metal Custom Loss Functions do not work

Hi,

I am getting the following error in TF on M1 Max when I use custom loss function (that I define myself)


2022-02-14 21:23:44.437000: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-02-14 21:23:44.437119: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) Process Process-82: Traceback (most recent call last): File "/Users/sebtac/miniforge3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Users/sebtac/miniforge3/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/sebtac/Documents/executor_metal.py", line 892, in executor history=model.fit(train_data, File "/Users/sebtac/miniforge3/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/Users/sebtac/miniforge3/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 7107, in raise_from_not_ok_status raise core._status_to_exception(e) from None # pylint: disable=protected-access tensorflow.python.framework.errors_impl.InvalidArgumentError: Can not squeeze dim[0], expected a dimension of 1, got 512 [Op:Squeeze]


Custom function:             def my_rmse(y_true, y_pred):                 error = y_true-y_pred                 sqr_error = K.square(error)                 mean_sqr_error = K.mean(sqr_error)                 sqrt_mean_sqr_error = K.sqrt(mean_sqr_error)                 return sqrt_mean_sqr_error

model.compile(optimizer=optimizer,loss=my_rmse,run_eagerly=True) #model.compile(optimizer=optimizer,loss="mae",run_eagerly=True)

Additional Details: -same does not happen when I use built-in functions

  • 512 is the Batch size and batching works fine without custom loss function
  • it works well when I set batch to 1
  • it works well on non M1 MACs
  • I run the model from within microprocessing process

Additional Details.

I narrowed the conditions for the issue to the need of using weights in tf.data.dataset creation":

train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train, weights))

Once the weights are removed (train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train))) OR!!! we use non-custom loss function the learning proceeds.

Does the Dev Team has any insights into it?

Further Detail:

the issue is only present when running it on GPU (M1 Max in my case). Once, I block GPU (with tf.config.set_visible_devices([], 'GPU') ) all works as expected!

this is definitively an implementation issue. Could Apple comment on it please?

Hi @sebtac

The error itself seems to make sense, squeeze op will only remove dimensions of 1. This also explains why it works when batch size is set to 1 since it is trying to squeeze out the batch dimension. But the observation that it works on the CPU and not on the GPU makes me think there might be some issue with regards to the layout in comparison between the two which we will investigate. Do you have a test script we can use to reproduce this issue so we could verify this more quickly? Additionally can you confirm which version of MacOS, tensorflow-macos and tensorflow-metal you are using?

thx,

Mac OS 12.1 tf 2.7 metal 0.3

will provide the example syntax on Saturday

The whole point is that the squeeze seems not to be performed when:

  • using built-in loss functions or
  • not ruining on M1 GPU
  • or weights are not used in the tf.dataset definition.

maybe broadcasting is broken in such scenario or weights are applied in not the appropriate moment

also, it might be that the custom loss function definition does not assume existence of the weight while the built in does. but if so than why:

  • it works with same custom functions just on CPU or on not M1 Macs and windowes
  • same error happens when I define the loss function as class inheriting from loss

Also added to an already reported case where TF does not train M1 GPU but does on its CPU with no changes to the code. Maybe those are related.

I pinpointed the issue further. The core of the issue is the dimensionality of the weight vector provided to the dataset. In my Non-M1 implementations it was (none,) and it was working well. on M1, I need to change it to (None,1). that said:

  • it is only required when we we work with 3-dimensional data (possibly just output (i.e. input can be of any dimension) -- but I did not test that), (possibly that dimensionality must be increased further as the dimensionality of our data increase -- not tested)

  • we use either custom loss function or we wrap the built-in one in a custom loss wrapper (using class and def() has the same effect)

  • the odd behavior is that my initial explorations as well as my research syntax works on well on M1 CPU without any modification. the syntax below fails with the above conditions both on M1 CPU and GPU. I have not investigated it further.

I also worked with TF 2.8 and experienced the same behavior.

Thx for looking into that. the expected solution is either alignment of behavior across environments or further investigation of the required structure of the weight vector and update in documentation.

Here is the syntax:


TEST CONDITIONS:

breaking condition: 1,1,3,1,1 and 1,1,3,1,2

dataset_weight = 1 # 0 No, 1 Yes dw_type = 1 # 1 unidimensional, 2 dimensional data_shape = 3 # 2 two dimensional # 3 dimensional gpu = 1 # 0 No, 1 Yes loss = 1 # 0 No, 1 Yes, 2 pseudo custom loss

import numpy as np import pandas as pd import sys

""" if 'tensorflow' in sys.modules: print("tensorflow uploaded") del sys.modules["tensorflow"] del tf import tensorflow as tf

else: print("tensorflow not uploaded") import tensorflow as tf

if gpu == 1: pass else: tf.config.set_visible_devices([], 'GPU')

#print("GPUs:", tf.config.list_physical_devices('GPU'))
print("GPUs:", tf.config.list_logical_devices('GPU'))
#print("CPUs:", tf.config.list_physical_devices('CPU'))
print("CPUs:", tf.config.list_logical_devices('CPU'))

"""

from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras import backend as K

import tensorflow as tf print("TensorFlow version:", tf.version)

batch = 128

url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data' column_names = ['MPG', 'Displacement', 'Horsepower', 'Weight']

dataset = pd.read_csv(url, names=column_names, na_values='?', comment='\t', sep=' ', skipinitialspace=True).dropna()

if data_shape == 2: x_train = np.array(dataset[['Horsepower', 'Weight']]).reshape(-1,2) y_train = np.array(dataset[['MPG','Displacement']]).reshape(-1,2) else: x_train = np.array(dataset[['Horsepower', 'Weight']]).reshape(-1,2,2) y_train = np.array(dataset[['MPG','Displacement']]).reshape(-1,2,2)

if dw_type == 2: weight = np.expand_dims(np.ones(x_train.shape[0]), axis = 1) else: weight = np.ones(x_train.shape[0])

#print(dataset) print(x_train.shape) print(y_train.shape) print(weight.shape)

if dataset_weight == 0: train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train)).cache().shuffle(x_train.shape[0]).batch(batch).repeat().prefetch(tf.data.experimental.AUTOTUNE) else: train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train, weight)).cache().shuffle(x_train.shape[0]).batch(batch).repeat().prefetch(tf.data.experimental.AUTOTUNE)

model = Sequential([ Dense(64, activation='relu'), Dense(32, activation='relu'), Dense(2)])

loss_tf = tf.keras.losses.MeanSquaredError()

def custom_loss(y_true, y_pred): error = y_true-y_pred
sqr_error = K.square(error) mean_sqr_error = K.mean(sqr_error) sqrt_mean_sqr_error = K.sqrt(mean_sqr_error) return sqrt_mean_sqr_error

def pseudo_custom_loss(y_true, y_pred): return loss_tf(y_true, y_pred)

if loss == 0: model.compile(optimizer='adam', loss=loss_tf, run_eagerly=True) elif loss == 1: model.compile(optimizer='adam', loss=custom_loss, run_eagerly=True) else: model.compile(optimizer='adam', loss=pseudo_custom_loss, run_eagerly=True)

model.fit(train_data, epochs=2, steps_per_epoch = 3)

print(model.summary())

TF-Metal Custom Loss Functions do not work
 
 
Q