First, let's look at the ordering of the most common letters. You know how nowadays on Wheel of Fortune on the final puzzle, you get certain letters for free: r s t l n e? Well, you used to get none for free, and everyone would always select those. Why? Because those are the five most common consonants and vowel in the Oxford dictionary. Here's the list in order from most common to least (from http://letterfrequency.org/):
e a r i o t n s l c u d p m h g b f y w k v x z j q
Not surprisingly, when Wheel of Fortune starting giving out those first six letters for free, and offering for the contestants to pick three more and another vowel, people started choosing c d m a, the next most common three consonants and vowel. Makes sense, right? Because, after all, the game is about figuring out hidden words and phrases from the least amount of letter choices, and those letters maximize your chances. Interestingly, if you look at actual written English, this particular order varies depending on what specifically you are looking at—scientific work, fiction, advertisement, etc... For example, in general fiction the order becomes:
e t a o h n i s r d l u w m c g f y p v k b j x z q
Beyond just ordering the letters, we can look at some hard numbers regarding their relative frequencies. The the numbers I'm going to show here are based on an analysis of about 9,500 literary works from the Gutenberg Project (https://www.gutenberg.org/ as far as I can tell) and reported here. I have plotted the results in the table below as the red bars under the label "Avg English," and for kicks I also plotted the statistics from my novel-in-progress, A Year Owed (blue). So, for example, the letter e is used about 13% of the time, or in other words, there is about a 13% chance that a given letter in any word in English is an e. For whatever reason in A Year Owed, I seem to favor the letters d and h more than would be expected based on the "Avg English" data. Perhaps this is because I enjoy using dilapidated and the name Hugh beyond what is traditionally acceptable.
Word length frequency is another statistic that will change depending on your source material. If you took just the dictionary, for example, you would tend to see longer words showing up more often than if you took, say, a childrens' book, for obvious reasons. I found some word length statistics here which are based on the books scanned and digitized by Google (don't ask me the specifics 'cuz I don't know). I have plotted those numbers below along with the word length frequency from my book. It looks from this like I tend to use 3 to 7 letter words in A Year Owed more than the average. The reason word length frequency may at first seem skewed to relatively low letter-count words is because of the prevalence of common words like the, and, but, is, of, a, was, to, in, I, he, she, it, etc... in almost every sentence we use. Relatively rarely do we use a word like reconceptualization, for example. Anyway, the upshot here is if you really like words that are 3 to 7 letters long, I've got a great book I can recommend to satisfy your strange fetish, though it's not quite out yet.
There's a lot more that could be said about this and why certain letters and word lengths appear more commonly in one form of writing or another, but that would get a little long winded, and I'm not going to pretend like I actually know. Just remember r s t l n e and c d m a if you're ever on TV.


No comments:
Post a Comment