Data Science Asked by Tido on June 30, 2021
I am interested in an unsupervised approach to training a POS-tagger.
Labeling is very difficult and I would like to test a tagger for my specific domain (chats) where users typically write in lower cases etc. If it matters, the data is mostly in German.
I read about about old techniques like HMM, but maybe there are newer and better ways?
There are no unsupervised methods to train a POS-Tagger that have similar performance to human annotations or supervised methods.
Correct answer by Brian Spiering on June 30, 2021
Very interested to hear what do you need tagger for in context of chatbots?
Maybe you need just a stemmer - to produce 'base form' for an inflected word - ?
In that case, you can check this.
Answered by MkL on June 30, 2021
Fortunately, you don't need unsupervised methods for PoS tagging for most languages, especially for German. There are semi or "weakly" supervised methods like mentioned old HMM/EM approaches, however there is new and quite fresh solution with Error-Correcting Output-Code classification: Weakly supervised POS tagging without disambiguation.
Of course the accuracy of fully supervised methods like LSTM is far far better from semi supervised, but due to known issues of fully supervised methods (eg. lot of manual work) people still try to find lazy approaches. Excellent accuracy always cause higher costs.
Answered by Edward Weinert on June 30, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP