English Language & Usage Asked by Wasabi Thumbs on January 8, 2021
I struggled to find this online, so I have turned to the English stack exchange…
Is there a list of relative frequencies of words of a certain length? For example, how much more common is a word with 5 characters than 4 characters?
If you could give any sort of help, that would be amazing.
That data sounds a bit specialised, so I'm not too surprised you can't find it. It would be very dependent on the type of text you take as input. There is lots of data on relative occurrence of particular words or number of times words appear in text.
The is the nearest I could find:
blogs news twitter avg.word.length 4.0 4.3 3.7
Analysis of text data and Natural Language Processing: Coursera’s Data Science Capstone Project
Their description of how that and other data was generated might give you a start in generating the statistics yourself (if you have any programming skills - otherwise persuading someone else to help you!)
If you can't access a suitable corpus of text to use, Project Gutenberg might be a good source.
Answered by user323578 on January 8, 2021
Do you mean words that are frequently used, or words in a dictionary? I assessed the word lengths from two dictionary lists I have, with these results:
Length Words Words
1 2 2
2 51 128
3 580 1182
4 2655 4736
5 5098 9739
6 7959 17879
7 10222 25282
8 10778 31855
9 10049 32020
10 8475 30590
11 6230 25859
12 4354 20391
13 2849 14877
14 1702 9744
15 670 5919
16 470 3373
17 260 1811
18 119 838
19 50 429
20 14 332
21 1 0
22 1 0
23 1 0
Total 72590 236986
Average 8.6 9.6
But this does not represent common usage. Words tend to evolve into smaller words due to frequent use.
For example boatswain to bosun.
Answered by Weather Vane on January 8, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP