Post

Replies

Boosts

Views

Activity

GPU utilization decays from 50% to 10% in non-batch inference for huggingface distilbert-base-cased
MacBook Pro M2 Max 96gb macOS 13.3 tensorflow-macos 2.9.0 tensorflow-metal 0.5.0 Here's the reproducible test case from transformers import AutoTokenizer, TFDistilBertForSequenceClassification from datasets import load_dataset imdb = load_dataset('imdb') sentences = imdb['train']['text'][:500] tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased") model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-cased') for i, sentence in tqdm(enumerate(sentences)): inputs = tokenizer(sentence, truncation=True, return_tensors='tf') output = model(inputs).logits pred = np.argmax(output.numpy(), axis=1) if i % 100 == 0: print(f"len(input_ids): {inputs['input_ids'].shape[-1]}") I monitored GPU utilization slowly decayed from 50% to 10%. It is excruciating slow towards the end. The print statement also confirmed this: Metal device set to: Apple M2 Max systemMemory: 96.00 GB maxCacheSize: 36.00 GB 3it [00:00, 10.87it/s] len(input_ids): 391 101it [00:13, 6.38it/s] len(input_ids): 215 201it [00:34, 4.78it/s] len(input_ids): 237 301it [00:55, 4.26it/s] len(input_ids): 256 401it [01:54, 1.12it/s] len(input_ids): 55 500it [03:40, 2.27it/s] I found no evidence yet this is a heat throttling issue, 'cos after the huge drop in GPU utilization, other processes will overtake using the GPU (like 2%). I wonder what's going on? Is there any profiling tips I can do to help investigate. I am aware I can "fix" this by doing batch inferences. But seeing this GPU utilization decay is unsettling, since this can potentially happen for a training session (which is far longer).
2
0
758
Apr ’23