How is Large BERT is less accurate than basic BERT?

Data Science Asked by mitra mirshafiee on June 1, 2021

I’m using BERT for text classification in this NLP competition.

When I use Basic BERT with 12 layers, 3 epochs, and 32 batch sizes, I get a training accuracy of about 0.84 and a val_accuracy of about 0.82. Though when I use the Larger BERT with 24 layers and the same number of epochs and batch sizes, I get a training accuracy of about 0.6 and a val_accuracy of about 0.55.

Why is this happening? Isn’t the large BERT supposed to have more layers and parameters so that It can recognize the patterns better? Or maybe it can perform better in the long run with more epochs?

(Both BERTs are uncased, English based and pretrained and I’m using trainable = True in the bertlayer model.)

bert deep learning machine learning neural network nlp

Add your own answers!

Ask a Question

Get help from others!