Data Science Asked on June 7, 2021
If I want to construct a word embedding by predicting a target word given context words, is it better to remove stop words or keep them?
the quick brown fox jumped over the lazy dog
or
quick brown fox jumped lazy dog
As a human, I feel like keeping the stop words makes it easier to understand even though they are superfluous.
So what about for a Neural Network?
In general stop-words can be omitted since they do not contain any useful information about the content of your sentence or document.
The intuition behind that is that stop-words are the most common words in a language and occur in every document independent of the context. Therefore they contain no valuable information which could hint to the content of the document.
Correct answer by Tinu on June 7, 2021
It's not mandatory. Removing stopwords can sometimes help and sometimes not. You should try both.
A case for not using stopwords: Using stopwords will provide context to the user's intent. So when you use a contextual model like BERT, all stopwords are kept to provide enough context information like the negation words (not, nor, never) which are considered to be stopwords.
According to this paper:
Surprisingly, the stopwords received as much attention as non-stop words, but removing them has no effect in MRR performances.
Answered by Soroush Faridan on June 7, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP