Data Science Asked by Sabbiu Shah on May 6, 2021
Who teaches English?
Now, after tokenizing, stemming..
it gives me
Who, teach, English
In my list of word, I have a word called
teacher
Lemmatizing, stemming teacher gives teacher and lemmatizing, stemming teaches gives teach
Even, calculating edit_distance will not solve this.. As, edit_distance comes out to be 2.
Now, What do I do to have teacher and teach treated as similar?
Similarly, there may be other cases with extra ‘s’ at the end. Is there some stemmer that solves this problem? Is there any solution?
Other similar example can be: instructor and instructs
Use an aggressive stemmer. The Lancaster Stemmer is one the most aggressive and popular stemmers around.
Here is the Python code:
from nltk.stem.lancaster import LancasterStemmer
lancaster_stemmer = LancasterStemmer()
assert 'teach' == lancaster_stemmer.stem('teacher') == lancaster_stemmer.stem('teaches')
Correct answer by Brian Spiering on May 6, 2021
Check out Fasttext. Fasttext works similarly to word2vec in that you can create word embeddings, however, it actually analyzes character n-grams, to force the syntactic similarity to what you're thinking about.
Answered by j.a.gartner on May 6, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP