Data Science Asked on February 25, 2021
I’m trying to build a model that is capable of identifying only some of the information on receipts and invoices.
All the documents having different structure in image format.
Sample Data :
Click here
I have used pypdf2 and pytesseract for text extraction from the receipt but the problem is that just returns all the text from a receipt. Tried to work with regex but as the varieties for documents are different everytime so it is not working in this case.
Looking on to build a model that returns only a certain fields such as total price, Date, Tax from a receipt.
I could parse the text to extract by hard coding things but it’s not optimal I think. Is there any way to build model for this use case which can identify the required parameters and capture the values. I am looking for something to go on this project.
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP