Data Science Asked by mitra mirshafiee on June 1, 2021
I’m using BERT for text classification in this NLP competition.
When I use Basic BERT with 12 layers, 3 epochs, and 32 batch sizes, I get a training accuracy of about 0.84 and a val_accuracy of about 0.82. Though when I use the Larger BERT with 24 layers and the same number of epochs and batch sizes, I get a training accuracy of about 0.6 and a val_accuracy of about 0.55.
Why is this happening? Isn’t the large BERT supposed to have more layers and parameters so that It can recognize the patterns better? Or maybe it can perform better in the long run with more epochs?
(Both BERTs are uncased, English based and pretrained and I’m using trainable = True in the bertlayer model.)
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP