[MUSIC] In the spirit of finding a magical solution, one could ask whether whole word shape, word length, and the first and last letters provides sufficient information for reading words. Only 9% of the thousand most frequent words in American English are uniquely defined by their first and last letters. Adding the number of letters in a word, increased this percentage to 40%. It should be noted, however, that word length isn't such an easy property to pick up given the difference in length of the letters and proportional spacing. Compare the words ill and wow, or lilly and women. Note that the five letter word, lilly, is not longer than the three letter word wow. When exterior letters, interior word shape, and word length are considered as features, 75% of the thousand most frequent words are uniquely described. At first glance, you might believe that three out of four times is not bad. Succeeding at this strategy, however, requires the reader to accurately recognize all three variables. This is not a trivial amount of processing to bypass a strategy, simply of recognizing the letters of the word. We have pursued a misguided endeavor, and conclude that readers must necessarily recognize words, based on the letters that make them up. In fact, there's evidence that the probability of identifying a word is a function of the probability of identifying each of it's individual letters. Although, there are no readily apparent shortcuts to successfully reading words, we have not yet accounted for the ease of word recognition. What is it about words that make them so easy to recognize by the expert reader? The primary reason is that we use multiple sources of information in domains of pattern recognition, such as reading. Many sources of non-visual information supplement the information from the letters. In our infamous paragraph, syntactic and semantic constraints facilitated its reading. We would have expected less skilled readers to have much more trouble with the paragraph, ostensibly because they have less of this top down knowledge. Another important source of information in word recognition is knowledge about the spelling orthography of the language. We call this information orthographic structure. Claude E. Shannon gave a formal description of orthographic structure. He was an American mathematician, electronic engineer, and cryptographer who founded information theory. It should be noted that orthographic structure refers to the constraints within the written language, independently of how the orthography predicts the pronunciation. This later metric is usually called spelling to sound correspondence. And languages can differ in the degree to which the orthography predicts pronunciation. As it's well known, English is notorious for its unpredictability. And spelling reform has more or less been continuously advocated, without success since written English began. The controversy and research on this topic are another story that must be postponed until another time. We saw earlier that neighboring letters can interfere with each other because of crowding. This occurs because fundamental visual processes. However, neighboring letters can also inform one another in the hands, or should I say eyes, of a knowledgeable reader. Consider the puzzle of guessing a word. This demonstration shows that leaders are capable of recognizing words based on partial information from the letters that make the words up. Where do readers get this information? They have read many, many words, and have learned about their statistical occurrences. It is possible to analyze large databases of text to determine these statistical regularities, and then ask, to what extent they are used by readers. Peter Norvik analyzed all of the words occurring in all of the books scanned by Google. Given this huge database, we now have a good measure of how letters occur in written English. One must feel relieved, that only slightly less than 100,000 words, were found in over 743 trillion words. The graph shows the frequency of occurrence of single letters. As expert scrabble players can attest, it is instructive that there is such a disparity in the occurrence of letters, with the letter E occurring about 139 times more often than the letter Z. In reading, however, this base rate frequency may not be important. Most likely, we can expect that a good statistical measure of the orthography will include both letter patterns, and their position in words. In addition, we can expect the statistical predictability of letter patterns to be greater at the beginning, and at the end of words, rather than in the middle. The graphic shows frequent letter patterns in initial and final position of words. Many of these patterns are frequent in initial position, but not in final position, and vice versa. [MUSIC]