English Language & Usage Asked by user4574 on January 15, 2021
Has anyone done work to construct letter frequency charts based on the assumed age of the reader/speaker, and also spoken word vs written text?
One would expect that letter usage would be different in books targeted at (for example) a 3rd grader vs a college student. Also, people often write differently than they speak.
My first grade daughter loves the show "Pokemon". In that show the Pokemon characters only speak sounds that are made from pieces of their name. For example a Pikachu pokemon only speaks words made from combinations of the sounds "Pi", "Ka", and "Chu". She thought it would be cool to make a real Pikachu language. And I think its a good opportunity to teach her about encoding schemes.
The obvious choice is encoding letters of the alphabet using these three sounds. Ideally one would want the length of the words to be minimized. We have three sounds, therefore a ternary Huffman code based on an English letter frequency chart would provide an optimal code.
I have seen many charts (like this one) …
http://pi.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html
Based on that chart, here is the code I came up with so far.
… but this chart is based on generic data and therefore wouldn’t be optimal in terms of the words a first grader would choose to speak.
This would be a spoken language only, so there is no need to encode special symbols or different letter cases. I am only interested in the frequency of English letters A-Z.
The ideal table would come from research based on recordings of spoken interactions of elementary school children in the United States (preferably first grade). But if that’s not available then tables based on books targeted at those age ranges would be the next best choice.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP