Stack Overflow Asked by Lei Hao on December 30, 2021
I am curious about the memory usage of transformers.BertModel. I would like to use the pretrained model to transform text and save the output of token [CLS]. No training, only inference.
My input to bert is 511 tokens. With the batch size being 16, my code runs out of memory. The GPU has 32GB memory. My question is how to estimate the memory usage of Bert.
Strangely the other job having batch size 32 finished successfully, with the same set up. My code are listed below.
# Create dataloader
bs = 16
train_comb = ConcatDataset([train_data, valid_data])
train_dl = DataLoader(train_comb, sampler=RandomSampler(train_data), batch_size=bs)
model = BertModel.from_pretrained('/my_dir/bert_base_uncased/',
output_attentions=False,
output_hidden_states=False)
model.cuda()
out_list = []
model.eval()
with torch.no_grad():
for d in train_dl:
d = [i.cuda() for i in d]. # d = [input_ids, attention_mask, token_type_ids, labels]
inputs, labels = d[:3], d[3] # input_ids has shape 16 x 511
output = model(*inputs)[0][:, 0, :]
out_list.append(output)
outputs = torch.cat(out_list)
Later I changed the for loop to below
with torch.no_grad():
for d in train_dl:
d = [i.cuda() for i in d[:3]] # don't care about the labels
out_list.append(model(*d)[0][:, 0, :]) # remove the intermediary variables
del d
To summarize, my questions are:
After some searching, it turns out the error was caused by appending the output to list in GPU. With following code, the error is gone.
with torch.no_grad():
for d in train_dl:
d = [i.cuda() for i in d[:3]]
out_list.append(model(*d)[0][:, 0, :].cpu())
del d
Without .cpu(), the memory keep increasing
Tensor size: torch.Size([4, 511]), Memory allocated: 418.7685546875MB
Tensor size: torch.Size([4, 768]), Memory allocated: 424.7568359375MB
Tensor size: torch.Size([4, 511]), Memory allocated: 424.7568359375MB
Tensor size: torch.Size([4, 768]), Memory allocated: 430.7451171875MB
Tensor size: torch.Size([4, 511]), Memory allocated: 430.7451171875MB
Tensor size: torch.Size([4, 768]), Memory allocated: 436.7333984375MB
With .cpu(), the memory doesn't change.
Tensor size: torch.Size([128, 511]), Memory allocated: 420.21875MB
Tensor size: torch.Size([128, 768]), Memory allocated: 420.21875MB
Tensor size: torch.Size([128, 511]), Memory allocated: 420.21875MB
Tensor size: torch.Size([128, 768]), Memory allocated: 420.21875MB
Tensor size: torch.Size([128, 511]), Memory allocated: 420.21875MB
Tensor size: torch.Size([128, 768]), Memory allocated: 420.21875MB
Answered by Lei Hao on December 30, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP