Post

Replies

Boosts

Views

Activity

Comment on GPU utilization decays from 50% to 10% in non-batch inference for huggingface distilbert-base-cased
Thanks for looking into this. Please note I am not using the latest tensorflow-macos and tensorflow-metal. I used versions informed by others that work. I never try TF on a Mac till now, so I don't have any comparison with prior version to know if this is a regression. This behaviour suggests a workaround where I just have to ensure every batch has the same max_len and padding, which may preclude certain memory and performance saving technique. This is not repro on Google Colab (cuda with T4)
Apr ’23