I measured a significant performance difference running the 'keras-io' example 'text_extraction_with_bert.ipynb' on Google Colab and my tensorflow_metal GPU (AMD Radeon Pro 5700 XT).
Google Colab Pro w/TPU finished 3 epochs in 11 minutes, while tensorflow_metal ran for many hours for 1 epoch.
So, I tried to profile the model in both environments. I was able to profile text_extraction_with_bert.ipynb on Google Colab Pro, but not on tensorflow_metal.
My Mac has 128gb ... the OOM exception happened when the Python 3.8 process got to ~85GB.
ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[256,384,3072] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator Simple allocator
[[node model/tf_bert_model/bert/encoder/layer_._6/intermediate/Gelu/add (defined at Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/transformers/models/bert/modeling_tf_bert.py:354) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[model/tf_bert_model/bert/encoder/layer_._7/attention/output/dense/Tensordot/Prod/_632]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) Resource exhausted: OOM when allocating tensor with shape[256,384,3072] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator Simple allocator
[[node model/tf_bert_model/bert/encoder/layer_._6/intermediate/Gelu/add (defined at Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/transformers/models/bert/modeling_tf_bert.py:354) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_24748]
Function call stack:
train_function -> train_function
Here's the model: