Why the training loss increases, and predict everything as '1' or '0'

Data Science Asked on March 10, 2021

Those two pictures are from two similar experiments using same code.

I am fine-tuning a pretrained-Bert model to do a binary text classification task, the dataset is 50% positive vs 50% negative, so the classifier shouldn’t classify everything as one class in the validation set as the picture shows.

I used AdamW optimizer with decreased learning rate.

I did the gradient clipping.

When I decrease the learning rate, it works fine, from 5e^-5 to 3e-5 or 2e-5.

What might be the problem here?

classification deep learning nlp pytorch

Add your own answers!

Ask a Question

Get help from others!

Recent Questions

How can I transform graph image into a tikzpicture LaTeX code?
How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
Need help finding a book. Female OP protagonist, magic
Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?

Recent Answers

haakon.io on Why fry rice before boiling?
Joshua Engel on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?
Peter Machado on Why fry rice before boiling?
Jon Church on Why fry rice before boiling?