I am TF macos 2.9, and TF metal 0.5, M2 Max 96gb
I ran into this issue using HF distilled Bert model to train on my dataset. My batch size is just 128 (less than 512 you reported, but impact sort of depends on the model).
I suspect this may be a memory issue (or mismanagement/misalignment due to framework bugs). I will try to reduce the batch size and see if this improves.
But even so, this may be quite a disappointment since i got 96gb to really push the batch size up in my local env.
Post
Replies
Boosts
Views
Activity
I found out this has something to do with the variation in length of input tokens from one inference to the next. It doesn't seem to like receiving lengths that vary greatly, maybe this causes some sort of weird fragmentation in GPU memory?? Here's the code that only extract IMDB sentences that has >512 tokens. And it is able to sustain GPU utilization, with ~30it/s.
from transformers import AutoTokenizer, TFDistilBertForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-cased')
from datasets import load_dataset
imdb = load_dataset('imdb')
print('starting collecting sentences with tokens >= 512')
sentences = [sentence for sentence in imdb['train']['text'] if tokenizer(sentence, truncation=True, return_tensors='tf')['input_ids'].shape[-1] >= 512]
print('finished collecting sentences with tokens >= 512')
for k, sentence in tqdm(enumerate(sentences)):
inputs = tokenizer(sentence, truncation=True, return_tensors='tf')
output = model(inputs).logits
pred = np.argmax(output.numpy(), axis=1)
if k % 100 == 0:
print(f"len(input_ids): {inputs['input_ids'].shape[-1]}")
print:
7it [00:00, 31.12it/s]
len(input_ids): 512
107it [00:03, 32.38it/s]
len(input_ids): 512
...
...
3804it [02:00, 31.85it/s]
len(input_ids): 512
3904it [02:03, 32.50it/s]
len(input_ids): 512
3946it [02:04, 31.70it/s]
Got M2 Max here (2023), I tried to run inference (one by one, no batch) using huggingface "distilbert-base-cased" (after fine-tuning with my dataset). It runs 10it/s in the beginning, but after a few min, GPU utilization drops to less than 1%, and now it took >1s per it! that's huge disappointment. I don't know what I have done wrong. I tried to turn on an external fan thinking it may be heat throttling, but I don't see utilization going back up.
How can I debug this?
Are you able to follow “Frameworks Engineer” and get this issue resolved?
As of Mar 2023, I was advised to use tensorflow-macos 2.9.0 and tensorflow-metal 0.5
This seems to have no such error:
from tensorflow.keras.layers import RandomFlip, RandomRotation
data_augmentation = tf.keras.Sequential([
RandomFlip("horizontal"),
RandomRotation(0.1),
])