TransWikia.com

Why the training loss increases, and predict everything as '1' or '0'

Data Science Asked on March 10, 2021

enter image description here

enter image description here

Those two pictures are from two similar experiments using same code.

I am fine-tuning a pretrained-Bert model to do a binary text classification task, the dataset is 50% positive vs 50% negative, so the classifier shouldn’t classify everything as one class in the validation set as the picture shows.

I used AdamW optimizer with decreased learning rate.

I did the gradient clipping.

When I decrease the learning rate, it works fine, from 5e^-5 to 3e-5 or 2e-5.

What might be the problem here?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP