Prediction using words which were not in training in a CNN with pre-trained word embeddings

Question

In sentence classification using pre-trained embeddings(fasttext) in a CNN,  how does the CNN predict the category of a sentence when the words were not in the training set?
I think the trained model contains weights, these weights are not updated in the prediction stage, are they?. Then, what happens when the words in the sentence (for which the cnn will predict a category) were not seen in the training? I think they do not have a word vector, only the words that were found in the training.

Jindřich · Answer

If you keep the FastText embeddings unchanged a do not finetune them during training, it does not really matter that the words were not in the training set as long as they are in the FastText embeddings. After all, this is the biggest advantage of using pre-trained word embeddings.
The important property of the embeddings is that similar words get similar embeddings. The CNN might not have seen the exact same embedding, but similar words probably were in the training data.
Words that are not covered by the pre-trained embeddings, got a common representation for an unknown (out-of-vocabulary, OOV) word. These are usually proper names. It is usually good if you make sure that CNN learns to deal with them already at the training time (you can randomly replace some infrequent words by some random strings) because if the unknown token embedding (that is typically dissimilar to all other embeddings) appears at the inference time and it was never seen at training, it can lead to unexpected behavior.

Prediction using words which were not in training in a CNN with pre-trained word embeddings

One Answer

Add your own answers!

Ask a Question