What can be done so that 'teacher' and 'teaches' are treated similar?

Question

Who teaches English?

Now, after tokenizing, stemming..
    it gives me

Who, teach, English

In my list of word, I have a word called

teacher

Lemmatizing, stemming teacher gives teacher and lemmatizing, stemming teaches gives teach

Even, calculating edit_distance will not solve this.. As, edit_distance comes out to be 2.

Now, What do I do to have  teacher and teach treated as similar?
Similarly, there may be other cases with extra 's' at the end. Is there some stemmer that solves this problem? Is there any solution?

Other similar example can be: instructor and instructs

Brian Spiering · Accepted Answer

Use an aggressive stemmer. The Lancaster Stemmer is one the most aggressive and popular stemmers around.
Here is the Python code:
from nltk.stem.lancaster import LancasterStemmer

lancaster_stemmer = LancasterStemmer()
assert 'teach' == lancaster_stemmer.stem('teacher') == lancaster_stemmer.stem('teaches')

j.a.gartner · Answer

Check out Fasttext. Fasttext works similarly to word2vec in that you can create word embeddings, however, it actually analyzes character n-grams, to force the syntactic similarity to what you're thinking about.

Answered by j.a.gartner on May 6, 2021

What can be done so that 'teacher' and 'teaches' are treated similar?

2 Answers

Add your own answers!

Ask a Question