TransWikia.com

What machine learning algorithms to use for unsupervised POS tagging?

Data Science Asked by Tido on June 30, 2021

I am interested in an unsupervised approach to training a POS-tagger.

Labeling is very difficult and I would like to test a tagger for my specific domain (chats) where users typically write in lower cases etc. If it matters, the data is mostly in German.

I read about about old techniques like HMM, but maybe there are newer and better ways?

3 Answers

There are no unsupervised methods to train a POS-Tagger that have similar performance to human annotations or supervised methods.

The current state-of-the-art supervised methods for training POS-Tagger are Long short-term memory (LSTM) neural networks.

Correct answer by Brian Spiering on June 30, 2021

Very interested to hear what do you need tagger for in context of chatbots?

Maybe you need just a stemmer - to produce 'base form' for an inflected word - ?

In that case, you can check this.

Answered by MkL on June 30, 2021

Fortunately, you don't need unsupervised methods for PoS tagging for most languages, especially for German. There are semi or "weakly" supervised methods like mentioned old HMM/EM approaches, however there is new and quite fresh solution with Error-Correcting Output-Code classification: Weakly supervised POS tagging without disambiguation.

Of course the accuracy of fully supervised methods like LSTM is far far better from semi supervised, but due to known issues of fully supervised methods (eg. lot of manual work) people still try to find lazy approaches. Excellent accuracy always cause higher costs.

Answered by Edward Weinert on June 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP