System Information
MacOS version: 13.4
TensorFlow (macos) version: tf-nightly==2.14.0.dev20230616
TensorFlow-Metal Plugin Version: 1.0.1
Problem Description
I'm trying to compute the CTC Loss using TensorFlow's tf.nn.ctc_loss on M1 Mac, but an error is thrown indicating that no OpKernel was registered to support the CTCLossV2 operation. However, when using the CPU or even tf.keras.backend.ctc_batch_cost, it works fine. The error stack trace is as follows:
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
Detected at node CTCLossV2 defined at (most recent call last):
<stack traces unavailable>
No OpKernel was registered to support Op 'CTCLossV2' used by {{node CTCLossV2}} with these attrs: [ctc_merge_repeated=true, preprocess_collapse_repeated=false, ignore_longer_outputs_than_inputs=false]
Registered devices: [CPU, GPU]
Registered kernels:
<no registered kernels>
[[CTCLossV2]]
[[ctc_loss_func/PartitionedCall]] [Op:__inference_train_function_13095]
Post
Replies
Boosts
Views
Activity
Hi,
I've found a memory leak issue when using the tensorFlow-metal plugin for running a deep learning model on a Mac with the M1 chip. Here are the details of my system:
System Information
MacOS version: 13.4
TensorFlow (macos) version: 2.12.0, 2.13.0-rc1, tf-nightly==2.14.0.dev20230616
TensorFlow-Metal Plugin Version: 0.8, 1.0.0, 1.0.1
Model Details
I've implemented a custom model architecture using TensorFlow's Keras API. The model has a dynamic Input, which I resize the images in a Resizing layer. Moreover, the data is passed to the model through a data generator class, using model.fit().
Problem Description
When I train this model using the GPU on M1 Mac, I observe a continuous increase in memory usage, leading to a memory leak. This memory increase is more prominent with larger image inputs. For smaller images or average sizes (1024x128), the increase is smaller, but continuous, leading to a memory leak after several epochs.
On the other hand, when I switch to using the CPU for training (tf.config.set_visible_devices([], 'GPU')), the memory leak issue is resolved, and I observe normal memory usage. In addition, I've tested the model with different sizes of images and various layer configurations. The memory leak appears to be present only when using the GPU, indeed.
I hope this information is helpful in identifying and resolving the issue. If you need any further details, please let me know. The project code is private, but I can try to provide it with pseudocode if necessary.