Post

Replies

Boosts

Views

Activity

Reply to Memory Leak Using TensorFlow-Metal
Just to reinforce, I'm currently starting a training model with 7gb of memory usage and at the end of the training, the memory usage hits 100gb+ (getting slower and slower because of the swap). On the other hand, on google colab (nvidia version), it worked perfectly without this excessive memory usage.
Aug ’23
Reply to Memory Leak Using TensorFlow-Metal
Hey, Thank you for your detailed response. I've prepared two standalone scripts that try to replicate the issue using randomly generated data. In both scripts, I generate synthetic data of varying sizes, to mirror the dynamic input size scenario in my actual project. This synthetic data is then passed through a Keras model, that includes a tf.keras.layers.Resizing layer. The first script uses tf.data.Dataset to feed the model, while the second one utilizes a generator function to yield the data in batches. Interestingly, the memory issue it seems to occur only in the script that uses tf.data.Dataset (filling the memory cache) and does not seem to occur when using the generator (~1.5gb of memory). However, in my actual code, where I use the generator approach, I do observe the memory issue (reaching more than memory cache). Furthermore, the issue is absent when using CPU or Nvidia GPU (via Google Colab), which both reach less than 1.5gb of memory. Anyway, you can find the two scripts below. Script using tf.data.Dataset: import numpy as np import tensorflow as tf # # Use cpu test memory # tf.config.set_visible_devices([], 'GPU') def generate_data(num_samples, max_size): """Generate synthetic data of varying sizes""" data = [] labels = [] for _ in range(num_samples): size = np.random.randint(1, max_size+1) data.append(np.ones((size, size)) * 255) # Example of image labels.append(np.random.randint(0, 2)) # binary classification for simplicity return data, labels class DynamicResizeModel(tf.keras.Model): """A model that includes a resizing layer""" def __init__(self, target_size): super().__init__() self.target_size = target_size self.expand_dims = tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, -1)) self.resize = tf.keras.layers.Resizing(*target_size) self.flatten = tf.keras.layers.Flatten() self.dense = tf.keras.layers.Dense(1, activation='sigmoid') def call(self, inputs): x = self.expand_dims(inputs) x = self.resize(x) x = self.flatten(x) return self.dense(x) # Generate training data train_data, train_labels = generate_data(100, 1024) # you can adjust these parameters as needed # Convert the variable-sized data to ragged tensors train_data = tf.ragged.constant(train_data) train_labels = tf.constant(train_labels) # Prepare a dataset train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_labels)) train_dataset = train_dataset.shuffle(buffer_size=1024).batch(8) # adjust batch size as needed # Create and train the model model = DynamicResizeModel(target_size=(128, 32)) # resize all inputs to 128x32 model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) model.fit(train_dataset, epochs=1000) Script using a generator: import numpy as np import tensorflow as tf # # Use cpu test memory # tf.config.set_visible_devices([], 'GPU') def generate_data(num_samples, max_size): """Generate synthetic data of varying sizes""" data = [] labels = [] for _ in range(num_samples): size = np.random.randint(1, max_size+1) data.append(np.ones((size, size)) * 255) # Example of image labels.append(np.random.randint(0, 2)) # binary classification for simplicity return data, labels def data_generator(data, labels, batch_size): """Create a generator that returns batches of data""" num_samples = len(data) indices = np.arange(num_samples) while True: for i in range(0, num_samples, batch_size): batch_indices = indices[i:i+batch_size] batch_data = tf.ragged.constant([data[idx] for idx in batch_indices], dtype=tf.float32) batch_labels = np.array([labels[idx] for idx in batch_indices], dtype=np.float32) yield batch_data, batch_labels np.random.shuffle(indices) class DynamicResizeModel(tf.keras.Model): """A model that includes a resizing layer""" def __init__(self, target_size): super().__init__() self.target_size = target_size self.expand_dims = tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, -1)) self.resize = tf.keras.layers.Resizing(*target_size) self.flatten = tf.keras.layers.Flatten() self.dense = tf.keras.layers.Dense(1, activation='sigmoid') def call(self, inputs): x = self.expand_dims(inputs) x = self.resize(x) x = self.flatten(x) return self.dense(x) # Generate training data num_samples = 100 # Total number of samples in your dataset max_size = 1024 # Maximum size of matrix train_data, train_labels = generate_data(num_samples, max_size) # Create and train the model model = DynamicResizeModel(target_size=(1024, 128)) # resize all inputs to 1024, 128 model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # Set the parameters batch_size = 8 # Number of samples per batch # Create a generator train_generator = data_generator(train_data, train_labels, batch_size) # Use fit to train the model model.fit(train_generator, steps_per_epoch=num_samples // batch_size, epochs=1000) Given these findings, I would like to understand if this behavior is expected with the tensorflow-metal plugin or if it is indeed an anomaly. If it's the former, could you provide guidance on optimizing my code to prevent the memory issue while using tensorflow-metal? Looking forward to your insights.
Jul ’23