TransWikia.com

What can be done so that 'teacher' and 'teaches' are treated similar?

Data Science Asked by Sabbiu Shah on May 6, 2021

Who teaches English?

Now, after tokenizing, stemming..
it gives me

Who, teach, English

In my list of word, I have a word called

teacher

Lemmatizing, stemming teacher gives teacher and lemmatizing, stemming teaches gives teach

Even, calculating edit_distance will not solve this.. As, edit_distance comes out to be 2.

Now, What do I do to have teacher and teach treated as similar?
Similarly, there may be other cases with extra ‘s’ at the end. Is there some stemmer that solves this problem? Is there any solution?

Other similar example can be: instructor and instructs

2 Answers

Use an aggressive stemmer. The Lancaster Stemmer is one the most aggressive and popular stemmers around.

Here is the Python code:

from nltk.stem.lancaster import LancasterStemmer

lancaster_stemmer = LancasterStemmer()
assert 'teach' == lancaster_stemmer.stem('teacher') == lancaster_stemmer.stem('teaches')

Correct answer by Brian Spiering on May 6, 2021

Check out Fasttext. Fasttext works similarly to word2vec in that you can create word embeddings, however, it actually analyzes character n-grams, to force the syntactic similarity to what you're thinking about.

Answered by j.a.gartner on May 6, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP