TransWikia.com

Address parsing using spaCy

Data Science Asked on June 17, 2021

I am trying to parse addresses from various documents using spaCy using NER but the results are not so accurate.

I know this is bit generic question but it would be a great help if I could get reference of any past work or good articles or techniques to apply to this.

One Answer

Please look at my comment to add more information to your post. Based on the information you provided, here are my remarks:

  • SpaCy is trained to find locations, not addresses per se

If you use a "common" language, SpaCy is trained using WikiNER data, where locations aren't addresses but more like geographical places like city names, country names etc. So it's quite normal to not be able to detect full addresses.

You likely need to train your own entity recognizer. They detail how to do this on their website, including code samples: https://spacy.io/usage/training#ner

  • Don't underestimate SpaCy's rule-based matching

Is it a fancy neural network? No. Does it matter? Also no. SpaCy allows you to create rules to find entities and in cases like addresses which are generally following a pattern across entities.

Correct answer by Valentin Calomme on June 17, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP