We run into an issue that a more complex model fails to converge on M1 Max GPU while it converges on its CPU and on Non-M1 based models.
the performance is the same for CPU and GPU for models with single RNN but once we use two RNNs GPU fails to converge.
That said, the below example is based on non-sensical data for the model architecture used. but we can observe here the same behavior as the one we observe in our production models (which for obvious reasons we cannot share here). Mainly:
-
the loss goes down to the bottom of the e-06 precision in all cases but when we use two RNNs on GPU. during training we often test e-07 precision level
-
for double RNN with GPU condition, the results do not go that low sometimes reaching also e-05 value level.
-
for our production data we see that double RNN with GPU results in loss of 1.0 and basically stays the same from the first epoch; but for the other conditions it often reaches 0.2 level with clear learning curve.
-
in production model increasing the LSTM_Cell number made the divergence more visible (in this syntactic date it does not happen)
-
the more complex the model is (after the RNN layers) the more visible the issue.
Suspected issues:
-
different precision used in CPU and GPU training - we had to decrease the data values a lot to make the effect visible ( if you work with raw data all approaches seem to produce the comparable results)
-
somehow the vanishing gradient problem is more pronounced on GPU as indicated by worse performance as the complexity of the model increases.
please let me know if you need any further details
Software Stack: Mac OS 12.1 tf 2.7 metal 0.3 also tested on tf. 2.8
Sample Syntax:
TEST CONDITIONS:
#conditions with issue: 1,2 gpu = 1 # 0 CPU, 1 GPU model_size = 2 # 1 single RNN, 2 double RNN
#PARAMETERS LSTM_Cells = 64 epochs = 300 batch = 128
import numpy as np import pandas as pd import sys from sklearn import preprocessing
#""" if 'tensorflow' in sys.modules: print("tensorflow uploaded") del sys.modules["tensorflow"] #del tf import tensorflow as tf
else: print("tensorflow not uploaded") import tensorflow as tf
if gpu == 1: pass else: tf.config.set_visible_devices([], 'GPU')
#print("GPUs:", tf.config.list_physical_devices('GPU')) print("GPUs:", tf.config.list_logical_devices('GPU')) #print("CPUs:", tf.config.list_physical_devices('CPU')) print("CPUs:", tf.config.list_logical_devices('CPU')) #"""
from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data' column_names = ['MPG', 'Displacement', 'Horsepower', 'Weight']
dataset = pd.read_csv(url, names=column_names, na_values='?', comment='\t', sep=' ', skipinitialspace=True).dropna()
scaler = preprocessing.StandardScaler().fit(dataset)
X_scaled = scaler.transform(dataset)
X_scaled = X_scaled * 0.001
Large Values
#x_train = np.array(dataset[['Horsepower', 'Weight']]).reshape(-1,2,2) #y_train = np.array(dataset[['MPG','Displacement']]).reshape(-1,2,2)
Small Values
x_train = np.array(X_scaled[:,2:]).reshape(-1,2,2) y_train = np.array(X_scaled[:,:2]).reshape(-1,2,2)
#print(dataset) print(x_train.shape) print(y_train.shape) print(weight.shape)
train_data = tf.data.Dataset.from_tensor_slices((x_train[:,:,:8], y_train)).cache().shuffle(x_train.shape[0]).batch(batch).repeat().prefetch(tf.data.experimental.AUTOTUNE)
if model_size == 2: #""" # MINIMAL NOT WORKING encoder_inputs = tf.keras.Input(shape=(x_train.shape[1],x_train.shape[2])) encoder_l1 = tf.keras.layers.LSTM(LSTM_Cells,return_sequences = True, return_state=True) encoder_l1_outputs = encoder_l1(encoder_inputs) encoder_l2 = tf.keras.layers.LSTM(LSTM_Cells, return_state=True) encoder_l2_outputs = encoder_l2(encoder_l1_outputs[0]) dense_1 = tf.keras.layers.Dense(128, activation='relu')(encoder_l2_outputs[0]) dense_2 = tf.keras.layers.Dense(64, activation='relu')(dense_1) dense_3 = tf.keras.layers.Dense(32, activation='relu')(dense_2) dense_4 = tf.keras.layers.Dense(16, activation='relu')(dense_3) flat = tf.keras.layers.Flatten()(dense_2) dense_5 = tf.keras.layers.Dense(22)(flat) reshape_output = tf.keras.layers.Reshape([2,2])(dense_5) model = tf.keras.models.Model(encoder_inputs, reshape_output) #""" else: #""" # WORKING encoder_inputs = tf.keras.Input(shape=(x_train.shape[1],x_train.shape[2])) encoder_l1 = tf.keras.layers.LSTM(LSTM_Cells,return_sequences = True, return_state=True) encoder_l1_outputs = encoder_l1(encoder_inputs) dense_1 = tf.keras.layers.Dense(128, activation='relu')(encoder_l1_outputs[0]) dense_2 = tf.keras.layers.Dense(64, activation='relu')(dense_1) dense_3 = tf.keras.layers.Dense(32, activation='relu')(dense_2) dense_4 = tf.keras.layers.Dense(16, activation='relu')(dense_3) flat = tf.keras.layers.Flatten()(dense_2) dense_5 = tf.keras.layers.Dense(22)(flat) reshape_output = tf.keras.layers.Reshape([2,2])(dense_5) model = tf.keras.models.Model(encoder_inputs, reshape_output) #"""
print(model.summary())
loss_tf = tf.keras.losses.MeanSquaredError()
model.compile(optimizer='adam', loss=loss_tf, run_eagerly=True)
model.fit(train_data, epochs = epochs, steps_per_epoch = 3)