MacBook Pro M2 Max 96gb
macOS 13.3
tensorflow-macos 2.9.0
tensorflow-metal 0.5.0
Here's the reproducible test case
from transformers import AutoTokenizer, TFDistilBertForSequenceClassification
from datasets import load_dataset
imdb = load_dataset('imdb')
sentences = imdb['train']['text'][:500]
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-cased')
for i, sentence in tqdm(enumerate(sentences)):
inputs = tokenizer(sentence, truncation=True, return_tensors='tf')
output = model(inputs).logits
pred = np.argmax(output.numpy(), axis=1)
if i % 100 == 0:
print(f"len(input_ids): {inputs['input_ids'].shape[-1]}")
I monitored GPU utilization slowly decayed from 50% to 10%. It is excruciating slow towards the end. The print statement also confirmed this:
Metal device set to: Apple M2 Max
systemMemory: 96.00 GB
maxCacheSize: 36.00 GB
3it [00:00, 10.87it/s]
len(input_ids): 391
101it [00:13, 6.38it/s]
len(input_ids): 215
201it [00:34, 4.78it/s]
len(input_ids): 237
301it [00:55, 4.26it/s]
len(input_ids): 256
401it [01:54, 1.12it/s]
len(input_ids): 55
500it [03:40, 2.27it/s]
I found no evidence yet this is a heat throttling issue, 'cos after the huge drop in GPU utilization, other processes will overtake using the GPU (like 2%).
I wonder what's going on? Is there any profiling tips I can do to help investigate. I am aware I can "fix" this by doing batch inferences. But seeing this GPU utilization decay is unsettling, since this can potentially happen for a training session (which is far longer).