TransWikia.com

Which is the best algorithm for entity extraction for unstructured document

Data Science Asked by Rajesh das on October 20, 2020

I have unstructured documents from which I have to extract the information like
let buyer name, seller name, expiry date, buying date etc. I had planned to use spacy(Custom entity recolonization(Followed this blog https://medium.com/@manivannan_data/how-to-train-ner-with-custom-training-data-using-spacy-188e0e508c6)). But it seems sometimes buyer name predict as seller name and vice-versa and also sometimes got multiple predicted data wrongly in single entity when I passed whole document content. FYI.. This documents have approx 2-20 pages. so it has large content.

Can someone share if we can use any other packages for higher accuracy? if not how I need to train the model so that accuracy will be higher? Thanks in advance

One Answer

Try to clean your document and use the flair library, it's a user friendly library from Zalando Research that allows you do do all sorts of nlp tasks very quickly. Especially NER.

Answered by user87451 on October 20, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP