Create a model that can extract only specific data out of receipts or invoices?

Data Science Asked on February 25, 2021

I’m trying to build a model that is capable of identifying only some of the information on receipts and invoices.

All the documents having different structure in image format.

I have used pypdf2 and pytesseract for text extraction from the receipt but the problem is that just returns all the text from a receipt. Tried to work with regex but as the varieties for documents are different everytime so it is not working in this case.

Looking on to build a model that returns only a certain fields such as total price, Date, Tax from a receipt.

I could parse the text to extract by hard coding things but it’s not optimal I think. Is there any way to build model for this use case which can identify the required parameters and capture the values. I am looking for something to go on this project.

computer vision image recognition named entity recognition ocr python

Add your own answers!

Ask a Question

Get help from others!

Recent Answers

Lex on Does Google Analytics track 404 page responses as valid page views?
Peter Machado on Why fry rice before boiling?
haakon.io on Why fry rice before boiling?
Joshua Engel on Why fry rice before boiling?
Jon Church on Why fry rice before boiling?