Post

Replies

Boosts

Views

Activity

Reply to Error: command buffer exited with error status.
I am TF macos 2.9, and TF metal 0.5, M2 Max 96gb I ran into this issue using HF distilled Bert model to train on my dataset. My batch size is just 128 (less than 512 you reported, but impact sort of depends on the model). I suspect this may be a memory issue (or mismanagement/misalignment due to framework bugs). I will try to reduce the batch size and see if this improves. But even so, this may be quite a disappointment since i got 96gb to really push the batch size up in my local env.
Apr ’23
Reply to GPU utilization decays from 50% to 10% in non-batch inference for huggingface distilbert-base-cased
I found out this has something to do with the variation in length of input tokens from one inference to the next. It doesn't seem to like receiving lengths that vary greatly, maybe this causes some sort of weird fragmentation in GPU memory?? Here's the code that only extract IMDB sentences that has >512 tokens. And it is able to sustain GPU utilization, with ~30it/s. from transformers import AutoTokenizer, TFDistilBertForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased") model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-cased') from datasets import load_dataset imdb = load_dataset('imdb') print('starting collecting sentences with tokens >= 512') sentences = [sentence for sentence in imdb['train']['text'] if tokenizer(sentence, truncation=True, return_tensors='tf')['input_ids'].shape[-1] >= 512] print('finished collecting sentences with tokens >= 512') for k, sentence in tqdm(enumerate(sentences)): inputs = tokenizer(sentence, truncation=True, return_tensors='tf') output = model(inputs).logits pred = np.argmax(output.numpy(), axis=1) if k % 100 == 0: print(f"len(input_ids): {inputs['input_ids'].shape[-1]}") print: 7it [00:00, 31.12it/s] len(input_ids): 512 107it [00:03, 32.38it/s] len(input_ids): 512 ... ... 3804it [02:00, 31.85it/s] len(input_ids): 512 3904it [02:03, 32.50it/s] len(input_ids): 512 3946it [02:04, 31.70it/s]
Apr ’23
Reply to M1 GPU is extremely slow, how can I enable CPU to train my NNs?
Got M2 Max here (2023), I tried to run inference (one by one, no batch) using huggingface "distilbert-base-cased" (after fine-tuning with my dataset). It runs 10it/s in the beginning, but after a few min, GPU utilization drops to less than 1%, and now it took >1s per it! that's huge disappointment. I don't know what I have done wrong. I tried to turn on an external fan thinking it may be heat throttling, but I don't see utilization going back up. How can I debug this?
Apr ’23