TransWikia.com

meaning of fine-tuning in nlp task

Data Science Asked by sovon on June 28, 2021

There are two types of transfer learning model. One is
feature extraction, where the weights of the pre-trained model are not changed while training on the actual task and other is the weights of the pre-trained model can be changed.

According to those categorizations, static word vector like word2vec is a feature extraction model, where each vector encodes the meaning of the word.

The meaning of the word changes context. For example, “Bank of the river” vs “Bank as a financial institute”. These word2vec vectors do not differentiate between these meaning.

Current models like Bert consider the context. Bert is a language representation model. That means, it internally can represent word by contextual word vectors.

By default, Bert is a fine-tuning model. This is where my imagination about fine-tuning started to fall apart.
Let’s say, on top of Bert model we created some task-specific layer. Now, if we fine tune, by definition, the weights of the lower level (language representation layer) will change at least a bit that means, the vector of the word will also change (if we compare before and after fine tune). That means the meaning of the word change a bit because of the new task.
If my above interpretation is correct, I can not comprehend this phenomenon for example, the word vectors of task sentiment analysis are different that the word vectors (of the same word) of task question answering. Can anybody help me?

Please correct me if anything above is wrong. Thanks

2 Answers

In my understanding, when you are fine-tuning for any task you use additional data (not used during pre-training) and those examples will change the weights on lower levels so that your model is better prepared for the context in which you will use it. A good example is a Twitter sentiment classifier.

Answered by Inês Soveral on June 28, 2021

You are right when you say :

the word vectors of task sentiment analysis are different that the word vectors (of the same word) of task question answering

Each task have a specific domain, and words have different representation in different domain.

Talking about "Apple" in a cookbook is different than talking about "Apple" in a company review.

The pretraining of BERT is based on books and wikipedia. But some finetuning task are not based on such sources. Words may have a different meaning. This is the reason why BERT is finetuned : to "update" the meaning of words based on your specific domain.

Answered by Astariul on June 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP