Data Science Asked by vikasreddy on March 11, 2021
I have images of identity cards (manually taken so not of same size) and I need to extract the text in it.
I used tesseract to predict bounding boxes for each letter and am successful to some extent but some letters are not bounded.
So, I have around 5000 bounding boxes in all images combined.
I want to train it so as to predict bounding boxes for remaining letters.
After predicting the bounding boxes I will try to classify the image into characters.
This is different from conventional machine learning problem where I donot have training and testing data separately.
As you seem to plan to build your own character classifier, I assume that you are happy to do some programming but just want to avoid heavy graphics stuff. A practical approach would be to use tesseract's bounding boxes just as a guide. Look to the left and right for more characters. Knowing where the rows are and what height and width the characters are is a good basis. An extension of rows gives you the upper and lower boundaries of the new candidate boxes. To find the left and right boundaries, you can project the pixels to the baseline of the row and look for positions with a low number of black pixels. A second factor for the boundary decision can be the expected width, which should be somewhere between the width for "I" and "W". A third factor can be the letter/number that your classifier recognises and how well this fits into the context (e.g. use character language model).
Answered by Joachim Wagner on March 11, 2021
A simple solution might be to reduce the image resolution to hide details such as the small deformations such as in the "O" in "model". Half the resolution should still be enough.
Answered by Joachim Wagner on March 11, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP