Data Science Asked by saran on April 30, 2021
I am trying to fine-tune BERT-base model for binary text classification using multiple features. 3 text features, 4 categorical features. Text features having more than 500 tokens length, and four categorical features having binary values 0,1. For each categorical features created its corresponding derived feature. for ex. ‘Impact_Code’ having values 0 / 1, i derived new feature as ‘Impact_Code_Derived’ with the value ‘Impact Code Not Available’ for ‘0’, and ‘Impact Code Available’ for 1. Like this, derived new features for all the 4 categorical features and using text features as it is.
While fine-tuning, i am getting bert embeddings for all the 8 features. i.e., I am taking last-hidden-state of BERT for each feature. The size is (Batch_Size x 512 x 768). I do avg. pooling for each token (1 token x 768 features) then the size become (Batch_Size x 512). and then concatenate all the 8 features avg. pooled output (Batch_size x 512 x 8) then pass it to Fully connected layer with tanh activation. This FC layer output size is 1024. Then this 1024 will be passed as input to another FC layer it will provides 2 outputs.
Since, i could see accuracy around 60%, I am not sure, whether
Please let me know your valuable suggestions. I searched in net, but couldn’t find much clear solutions to clear my doubts.
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP