Handwriting recognition with Mathematica

Question

I am trying to teach Mathematica to read my handwriting. Since I don't want to ruin my reputation by showing my own handwriting, I am going to use a  font called blackjack.

Lets say this is a paragraph I have written.

para = StringTake[ExampleData[{"Text", "OriginOfSpecies"}, "FormattedText"], 401]

INTRODUCTION.
  
  When on board H.M.S. 'Beagle,' as naturalist, I was much struck with 
  certain facts in the distribution of the inhabitants of South 
  America, and in the geological relations of the present to the past 
  inhabitants of that continent. These facts seemed to me to throw some 
  light on the origin of species--that mystery of mysteries, as it has 
  been called by one of our greatest philosophers.

First I am going to recognise an alphabet say $h$. (At some point in distant future I am thinking about keeping a sample file of actual handwritten alphabets.)

font = "blackjack";
text = Binarize@Rasterize@Style[para, Bold, 30, FontFamily -> font];
w = Binarize@Rasterize@Style["h", Bold, 30, FontFamily -> font];

x = ImageCorrelate[text, w, NormalizedSquaredEuclideanDistance];
w1 = ColorNegate[Binarize[x, 0.12]];
loc = ComponentMeasurements[w1, {"Centroid", "EquivalentDiskRadius"}];
pos = loc[[All, 2, 1]]; Length[pos1]
Length[pos]
Show[text, Graphics[{Opacity[0.5], Red, Disk[#, 10] & /@ pos}], ImageSize -> 500]

Then I iterate over all alphabets, signs and digits.

alph = Join[ToUpperCase[#] & /@ Alphabet[], Alphabet[],
       {".", ",", ";", ":", "-", "?"}, {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0"}]

wlist = {};
Do[
  w = Binarize@Rasterize@Style[abc, Bold, 30, FontFamily -> font];
  x = ImageCorrelate[text, w, NormalizedSquaredEuclideanDistance];
  w1 = ColorNegate[Binarize[x, 0.11]];
  loc = ComponentMeasurements[w1, {"Centroid", "EquivalentDiskRadius"}];
  pos = loc[[All, 2, 1]];
  If[Length[pos] > 0, AppendTo[wlist, {abc, pos}]],
{abc, alph}]

Then convert it to a machine readable font and use TextRecognize

newtext = Graphics[Block[{w = #[[1]], pos = #[[2]]}, 
 Text[Style[w, 18], #] & /@ pos] & /@ wlist, ImageSize -> 700]

TextRecognize[newtext]

INTRQDUCTIQN
  
  When on board H M S 'B glef 5 naiurallsl, I W5 mu h stru k
  wlth ceftaln fa ts ln the d|str|but|0n of the lnhabltants of South
  Amenca, and |n the geologncat r Iahons of the prsem lo the
  pal lnhdiltants of thal tnnllnem ThSe fa ts seemed to me
  to throw some ||ght on the orlgln of pec|s-- that myslety of
  mysletls, as || ha been called by ore ofour grealsl ph||os phefs

Now the question - How to improve this?

The major challenge is to identify all the alphabets. Some alphabets are missing. In some cases $c$ looks like $e$ etc. I was thinking about
using different font families and create a list with Classify for better comparison - still not sure how good that would be.

The last part concerning TextRecognize probably can be improved by rearranging the positions of individual alphabets to avoid any overlapping.

acacia · Answer

I think writing code in mathematica to read a font that has only one image per character is a much different task then teaching mathematica to recognize your hand writting. you already have the training data with that script font you used. you can achieve %100 accuracy without using any machine learning. you need to pick one raster size for your TTF font and stick to it. if not, then there are multiple raster sizes there for multiple possible matches for each test case. you can also sort the alignment of text problem by carefully selecting the origin of each character in the data set or by using perfectly aligned text to train some machine learning model.
the idea here is that you create text that is in a normal font with perfect alignment to train the computer how to generate perfectly aligned text. you might not even need to do that if you find a simpler way to do code snapping of the detected object coordinates. this obviously requires weights and lines that reward properly aligned text. you need to define the spacing vertically between lines of output text. also some ratio of character height vs line spacing. you should be recording character height to use as an input to this special alignment correction function. you also have multiple characters being super imposed. there is no code to detect and suppress those kinds of errors. you need to work out how to always use one output character per detected character.
eventually you get to doing your own handwriting but you realize that you could be using tensorflow instead. you need more layers, better data sets, more of the errors need to be worked out. that means you need to carefully design the data sets around what you think the computer will need to do the job. it might not be words or sentences. it would probably be more individual characters with a plain line leading in and out. then you would need to tag all that data and put boxes around it. you also need to score the data manually if you are using words and sentences to test the model. you eventually realize you need a generator because you will never have enough raw tagged data. you need machine learning to recognize statistical patterns over infinite unknown data sets. ImageCorrelate is the dull tool that is not customized to do the trivial case you posted here. you would be better off using one channel xor on binary data and summation as a basic pseudo machine learning one layer discriminator. this simple solution only works with a controlled test with a perfect dictionary with one raster per search term. one search term per raster. 2*26 if you want upper lower. about 70 characters if you really cover everything. you should be able to do some quick tests that look for the worst case of a training letter containing as a subset some pixels that are an exact match for some other character. if t contains l then all t's are true for t and true for l. but since l does not contain t, you must hard code l if and only if (not t) and l. this might not be a problem if everything is part of a controlled experiment with matching exact controlled master rasters. so this subset problem may not even exist. it depends on how the font was constructed and it depends on the quality of the dictionary.

Vitaliy Kaurov · Answer

This topic got some updates since 2016 :-) I will give a short review of resources.

Top recent: Wolfram Technology Conference (WTC) 2020
At the Wolfram Technology Conference (WTC) 2020 (currently in progress) Mikayel Egibyan from Wolfram image processing team just gave a talk on this topic exactly with approaches based on the modern machine learning and neural networks. Video will be available later at WTC site but here is the talk presentation on Wolfram Community:
Handwritten Recognition and Analysis
https://community.wolfram.com/groups/-/m/t/2091085
In the talk the following covered:

History of OCR and Analysis
Text Recognition Techniques
Existing Architectures for Handwritten Text Recognition
Techniques to Analyze Handwritten Text
The Importance of the Loss Function
Building and Training a Toy Network
Applications: Handwritten Recognition
Applications: Handwritten Verification
Applications: Handwritten Identification

William J Turkel FREE Book: Digital Research Methods with Mathematica
William J Turkel in his FREE Book "Digital Research Methods with Mathematica" (notebooks and screencasts) in Lesson 21 Section 3 very nicely discusses some starter topics for Handwriting recognition:

https://williamjturkel.net/digital-research-methods-with-mathematica
https://youtu.be/4peeyWlMDdc

(BTW William also gave a cool talk at WTC 2020 "Text and Image Mining for Historical Research")
Other relevant resources and experiments:

Classifying Japanese characters from the Edo period by Marco Thiel

Character Analysis by Daniel Shin

Handwriting Recognition Using Neural Networks by Luis Fernando Cantu Diaz de Leon

Wolfram Neural Net Repository
I recommend checking from time to time with Wolfram Neural Net Repository
https://resources.wolframcloud.com/NeuralNetRepository
First of all, a net of direct relevance can appear there as new nets are getting added constantly. But also it is a good source for various available net architectures you can explore and modify. For instance, both basic nets - LeNet and CapsNet - for handwritten digit are there, but also many more others:

https://resources.wolframcloud.com/NeuralNetRepository/resources/CapsNet-Trained-on-MNIST-Data

https://resources.wolframcloud.com/NeuralNetRepository/resources/LeNet-Trained-on-MNIST-Data

Handwriting recognition with Mathematica

2 Answers

Top recent: Wolfram Technology Conference (WTC) 2020

William J Turkel FREE Book: Digital Research Methods with Mathematica

Other relevant resources and experiments:

Wolfram Neural Net Repository

Add your own answers!

Ask a Question