TransWikia.com

Classify and Predict results that I don't understand

Mathematica Asked on July 5, 2021

I have a list of rules that has as the left hand side the font size and a letter and for the right hand side the space in pixels between this letter and the following letter. The data is from a postscript file. Two examples.

{8.5, "S"} -> 5.5165
{8.5, "251"} -> 2.363

The second example uses a character code to represent the actual character.

Here is the complete list of rules. Apologize for the length but I don’t know what parts are and are not important.

letterRules={{8.5, "S"} -> 5.5165, {8.5, "P"} -> 5.6695, {8.5, "E"} -> 
  5.508, {8.5, "C"} -> 6.2985, {8.5, "I"} -> 2.5075, {8.5, "A"} -> 
  5.8225, {8.5, "D"} -> 6.2988, {8.5, "O"} -> 6.613, {8.5, "U"} -> 
  6.298, {8.5, "B"} -> 5.984, {8.5, "L"} -> 5.041, {8.5, "E"} -> 
  5.508, {8.5, "S"} -> 5.5165, {8.5, "I"} -> 2.5075, {8.5, "M"} -> 
  7.7095, {8.5, "P"} -> 5.6695, {8.5, "L"} -> 5.0405, {8.5, "O"} -> 
  6.6131, {8.5, "V"} -> 5.355, {8.5, "E"} -> 5.508, {8.5, "R"} -> 
  6.137, {8.5, "C"} -> 6.298, {8.5, "A"} -> 5.823, {8.5, "L"} -> 
  5.04, {8.5, "J"} -> 4.726, {8.5, "U"} -> 6.2985, {8.5, "M"} -> 
  7.7095, {8.5, "O"} -> 6.613, {8.5, "V"} -> 5.3546, {8.5, "E"} -> 
  5.508, {8.5, "R"} -> 6.137, {8.5, "C"} -> 6.299, {8.5, "A"} -> 
  5.822, {8.5, "L"} -> 5.041, {8.5, "O"} -> 6.613, {8.5, "P"} -> 
  5.6695, {8.5, "E"} -> 5.508, {8.5, "N"} -> 6.2985, {8.5, "I"} -> 
  2.5075, {8.5, "N"} -> 6.2985, {8.5, "P"} -> 5.6697, {8.5, "R"} -> 
  6.137, {8.5, "E"} -> 5.508, {8.5, "E"} -> 5.508, {8.5, "M"} -> 
  7.709, {8.5, "P"} -> 5.67, {8.5, "T"} -> 5.193, {8.5, "N"} -> 
  6.299, {8.5, "O"} -> 6.613, {8.5, "T"} -> 5.193, {8.5, "R"} -> 
  6.137, {8.5, "U"} -> 6.299, {8.5, "M"} -> 7.709, {8.5, "O"} -> 
  6.613, {8.5, "V"} -> 5.355, {8.5, "E"} -> 5.508, {8.5, "R"} -> 
  6.137, {8.5, "C"} -> 6.298, {8.5, "A"} -> 5.823, {8.5, "L"} -> 
  5.04, {8.5, "L"} -> 5.041, {8.5, "D"} -> 6.299, {8.5, "E"} -> 
  5.508, {8.5, "F"} -> 5.04, {8.5, "E"} -> 5.508, {8.5, "N"} -> 
  6.299, {8.5, "S"} -> 5.516, {8.5, "V"} -> 5.355, {8.5, "N"} -> 
  6.298, {8.5, "O"} -> 6.613, {8.5, "T"} -> 5.194, {8.5, "R"} -> 
  6.137, {8.5, "U"} -> 6.298, {8.5, "M"} -> 7.71, {8.5, "O"} -> 
  6.613, {8.5, "V"} -> 5.355, {8.5, "E"} -> 5.508, {8.5, "O"} -> 
  6.613, {8.5, "P"} -> 5.669, {8.5, "P"} -> 5.67, {8.5, "251"} -> 
  2.363, {8.5, "T"} -> 5.194, {8.5, "/"} -> 3.153, {8.5, "D"} -> 
  6.299, {8.5, "O"} -> 6.613, {8.5, "U"} -> 6.298, {8.5, "B"} -> 
  5.984, {8.5, "L"} -> 5.041, {8.5, "D"} -> 6.2985, {8.5, "I"} -> 
  2.5075, {8.5, "R"} -> 6.137, {8.5, "E"} -> 5.508, {8.5, "C"} -> 
  6.2985, {8.5, "C"} -> 6.2988, {8.5, "U"} -> 6.298, {8.5, "E"} -> 
  5.508, {8.5, "B"} -> 5.984, {8.5, "I"} -> 2.508, {8.5, "S"} -> 
  5.4315, {8.5, "L"} -> 4.9555, {8.5, "A"} -> 5.7375, {8.5, "C"} -> 
  6.2135, {8.5, "O"} -> 6.528, {8.5, "N"} -> 6.2135, {8.5, "V"} -> 
  5.27, {8.5, "E"} -> 5.423, {8.5, "N"} -> 6.2135, {8.5, "T"} -> 
  5.1085, {8.5, "I"} -> 2.4224, {8.5, "O"} -> 6.528, {8.5, "N"} -> 
  6.214, {8.5, "L"} -> 5.542, {8.5, "E"} -> 6.0605, {8.5, "A"} -> 
  6.409, {8.5, "D"} -> 6.9275, {9, "P"} -> 6.003, {9, "r"} -> 
  3.501, {9, "i"} -> 2.322, {9, "m"} -> 8.154, {9, "a"} -> 
  5.166, {9, "r"} -> 3.501, {9, "s"} -> 4.833, {9, "i"} -> 
  2.322, {9, "g"} -> 5.499, {9, "n"} -> 5.337, {9, "a"} -> 
  5.166, {9, "t"} -> 3.168, {9, "p"} -> 5.499, {9, "a"} -> 
  5.166, {9, "r"} -> 3.501, {9, "t"} -> 3.168, {9, "n"} -> 
  5.337, {9, "e"} -> 5.166, {9, "r"} -> 3.501, {9, "251"} -> 
  2.502, {9, "l"} -> 2.322, {9, "e"} -> 5.166, {9, "a"} -> 
  5.166, {9, "d"} -> 5.499, {8.5, "D"} -> 6.801, {8.5, "E"} -> 
  5.949, {8.5, "F"} -> 5.441, {8.5, "E"} -> 5.949, {8.5, "N"} -> 
  6.801, {8.5, "S"} -> 5.958, {8.5, "I"} -> 2.711, {8.5, "V"} -> 
  5.78, {8.5, "C"} -> 6.8, {8.5, "A"} -> 6.291, {8.5, "R"} -> 
  6.63, {8.5, "D"} -> 6.8, {8.5, "I"} -> 2.711, {8.5, "N"} -> 6.8}

I use both Classify and Predict with the training set.

letterClassify = Classify[letterRules, Method -> "NaiveBayes"];
letterPredict = Predict[letterRules, Method -> "NeuralNetwork"];

In general I am happy with the results. I created tables to show the results for a rule, for example rule #139 has a 9 pt font size, the letter d and a spacing of 5.499 pixels

enter image description here

The letter N with a font size of 8.5 occurs in a number of rules from 38 up to 153, various spacings with the majority around 6.3 pixels.

enter image description here

The columns are the rule number, the rule, the Classify result and the Predict result. Both Classify and Predict give reasonable results for the rule with one occurrence, d, as well as the rule with multiple occurrences, N.

However Predict gives an awful result for this single occurrence rule with a slash, /.

enter image description here

and in the following example there are two single occurrences (because the font size differs) but Classify fails to treat them separately whereas Predict honors the font size.

enter image description here

Finally in this example there are many samples close to 5.8 but Classify picks the largest one which has only one occurrence.

enter image description here

I hope someone can help me to understand why I get mostly reasonable results but apparent failure in the above four cases.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP