0:00

In previous lessons we've seen how frequency analysis can be used to break

Â a monoalphabetic cipher by looking for patterns of non-randomness

Â in the cypher text that reflect underlying non-randomness in the plaintext.

Â Not surprisingly, cryptographers are thus motivated to eliminate or

Â at least dilute these patterns in the ciphertext.

Â There were several techniques that were developed with monoalphabetic

Â ciphers to attempt this, such as using ciphertext letter pairs for

Â each character and using multiple possible pairs chosen randomly for

Â the most common plaintext letters and fewer possibilities for the rarer letters.

Â While this makes cryptanalysis more difficult,

Â the usage patterns are still present and can still be teased out

Â given a sufficient amount of ciphertext material to work with.

Â Another approach is to use multiple monoalphabetic ciphers,

Â switching from one to another as the plaintext is enciphered.

Â This is known as a polyalphabetic cipher.

Â In this lesson, we'll look briefly at the history of these ciphers and

Â learn just what they are.

Â We'll also develop a powerful tool for defeating them.

Â In the next lesson we'll walk through an example of applying this tool.

Â 1:07

Early attempts at crafting polyalphabetic ciphers date to the 9th century Arab

Â cryptographer al-Kindi, and

Â much later by Leon Batista Alberti in the mid-15th century.

Â While both had correctly surmised that a polyalphabetic cipher should be much

Â harder to break, their implementations were so weak that breaking them was quite

Â easy, and as a result little widespread interest was generated.

Â The problem that plagued these early attempts was to implement a practical

Â polyalphabetic cipher algorithm in a simple manner that didn't require

Â the distribution or memorization of significant amounts of key material.

Â Namely, several monoalphabetic ciphers and the order in which to use them.

Â Al-Kindi and Alberti both had no key.

Â The order in which the different alphabets were used was part of the algorithm.

Â In other words, security through obscurity.

Â In the early 16th century, Johans Trithemius published the Tabula Recta,

Â which we'll see in a moment, and

Â which is nothing more than a tabulation of all the possible Caesar shift ciphers

Â one after another, in his case, using the 24 characters of the Latin alphabet.

Â But we only work with the 26 characters of the English alphabet.

Â 2:17

Since the structure is so simple, there's no need to memorize anything.

Â It can be created at need by anyone.

Â While Trimethius' work does not seem to imply mixing different ciphers

Â within a given ciphertext, a few decades later,

Â Giovan Battista Bellaso devised an elegant implementation of a polyalphabetic cipher

Â based on the Tabula Recta that today is known as the VigenÃ¨re cipher.

Â That's right, Blaise de VigenÃ¨re did not invent the cipher that bears his name.

Â This is due to a mis-attribution in the early 19th century that stuck.

Â However, about the same time as Bellaso, Vigenere developed a similar but

Â stronger cipher known as the auto key cipher.

Â Both defied successful cryptanalysis for the better part

Â of the next three centuries, and both fell to the same basic cryptanalytic techniques

Â developed independently by Charles Babbage and Frederick Kasiski.

Â For our purposes,

Â we'll stick to the classic Vigenere cipher that works as follows.

Â A previously agreed upon key phrase, such as secret, is written out.

Â We'll use five letter groups above the plaintext message,

Â which we'll use retreat to river.

Â Since this key phrase is typically short compared to the message, it is simply

Â repeated as many times as needed to match the length of the plaintext.

Â We then go through the message one character at a time and

Â use the character in the key to choose one of the Caesar shift ciphers

Â by simply selecting the row that starts with that character.

Â Next we select the column that is headed by the plaintext character.

Â The ciphertext character is the character at the intersection of this

Â row and column.

Â 3:51

The result is the ciphertext shown here.

Â When received, this process is simply reversed.

Â The plaintext character is the character at the top of the column

Â containing the ciphertext character in the row selected by the key.

Â Notice that the plaintext contains four Rs, but

Â two of them are enciphered as I, one as J, and the other as V.

Â Similarly, there are two Vs in the ciphertext,

Â but one enciphers the letter T while the other enciphers R.

Â We also mask double letter occurrences.

Â Note that that TT enciphers as LX.

Â It turns out that there is a very simple mathematical relationship between

Â the corresponding characters of the plaintext, the key, and the ciphertext.

Â If each of the N characters in the alphabet were assigned a number starting

Â with zero, then the letters are numbered from zero through n- 1.

Â For encryption, ciphertext character is the sum of the plaintext and

Â the key characters reduced modulo N.

Â While for decryption, the plaintext

Â character is the difference between the ciphertext character and the key.

Â Again, reduced modulo N.

Â In both cases, the modulo operation takes care of any necessary wraparound.

Â 5:01

The practical result of a polyalphabetic cipher is that the ciphertext

Â quickly dilutes the letter frequencies.

Â In fact, if the key were to contain all 26 characters exactly once,

Â the ciphertext letters would appear on average with uniform frequency.

Â Taking this even further, if a key is chosen that is as long as the message,

Â and if the letters in the key are chosen randomly, and

Â if the key is never reused, then the result is a one-time pad.

Â In fact, this is how most pen and

Â paper one-time pad ciphers were actually implemented.

Â But most of the time, the key was a word or

Â short phrase that was considerably shorter than the message.

Â But this was sufficient to significantly soften the ciphertext

Â letter frequencies and render simple frequency analysis ineffective.

Â So how might we crack this cipher?

Â 5:51

The answer was arrived at independently, and by slightly different methods,

Â by both Charles Babbage and Friedrich Kasiski in the mid-19th century.

Â Both men observed that if the length of the key could be determined,

Â then the ciphertext could be broken up into sets of ciphertext characters

Â that had been encrypted with the same key.

Â While successful encryption of each set would obviously not result

Â in meaningful text, the fact that they all had the plaintext characters in the set

Â had been encrypted using the same key meant they were encrypted with the same

Â monoalphabetic cipher.

Â And in the case of the Vigenere cipher, this was a simple Caesar shift cipher.

Â This meant that they would exhibit the same frequency distribution as

Â the underlying plaintext.

Â This insight by itself would have been sufficient to afford a considerable break

Â into the cipher, but both men went a major step forward and

Â found a way to determine the length of the key in a very efficient way.

Â Both methods look at coincidences, which are letters or

Â sequences of letters that appear at regular intervals in the ciphertext.

Â We'll look at one of these methods, Babbage's Index of Coincidence, and

Â leave the other, the Kasiski Examination, for your own exploration.

Â 7:05

The Index of Coincidence is a measure of how frequently you expect to find

Â coincidental matches between two unrelated strings of text written

Â in the same language, using the same alphabet laid one on top of the other.

Â Here's an example with the beginning of two unrelated stories.

Â We can see that in these 24 character strings that there are three coincidences.

Â This is a rate of 12.5%,

Â which we'll see is actually about twice what we would expect.

Â But remember that statistical measures tend to vary greatly from the average

Â in small samples.

Â 7:38

To see how this can help us, let's see what we would expect the rate to be.

Â For illustration purposes, we'll start with a simple language that has just two

Â characters, A and B, in which A has a frequency of 75%, and B, 25%.

Â Here are two 32-character plaintext messages

Â written in this language with the coincidences indicated by an overbar.

Â We see that we have 16 coincidences with A, and

Â three with B, giving a total coincidence rate of 59%.

Â But is this close to what we expect?

Â Let's look at what the math tells us.

Â 8:13

Let's say that we have two 2-character alphabets, F and G, where they

Â have the same frequency distribution, but possibly different character mappings.

Â Perhaps one is the plaintext alphabet and

Â the other is one of the possible monoalphabetic cipher alphabets.

Â How often would we expect to see a coincidence involving the letter A?

Â 8:33

In the first string, there's an f sub A,

Â the frequency of A in language f, percent chance of seeing an A.

Â In the second string, there's a g sub A chance of seeing an A.

Â The probability of both happening at the same time

Â is simply the product of these two frequencies.

Â The same is true for the letter B.

Â We then sum up these probabilities for

Â each character in the alphabet, and get the overall expected coincidence rate.

Â For an alphabet with n characters,

Â this can be written succinctly with this equation.

Â 9:07

Going back to our Toy language, if both alphabets are the same,

Â we get an expected coincidence rate of 62.5%.

Â If the alphabets are different, and with just two characters we only have one

Â possible alternative, we get an expected rate of 37.5%.

Â 9:24

Recall that we are interested in detecting when our two sets

Â of characters are drawn from the same alphabet, or when f and g are the same.

Â In this case, our equation reduces to simply the sum of the squares of

Â the single letter frequencies.

Â For English, this rate is generally accepted as being 6.65%.

Â For the frequencies we previously established from that huge set of works,

Â we got 6.85%, which is in pretty good agreement.

Â 9:50

If our two sets of letters were drawn from different alphabets,

Â the coincidence rate would be lower.

Â It would be lowest if the two alphabets used letters randomly, which would

Â also be the case for a language that has a uniform letter frequency distribution.

Â This serves as a good reference because it gives an idea of how non-uniform

Â our frequency distribution is.

Â For the 26 character English alphabet, the random coincidence rate is 3.85%.

Â Coincidence rates are very useful, and in practice this is what we generally use.

Â But in order to compare rates for different languages, having alphabets of

Â different sizes, it is useful to normalize the expected coincidence rate by dividing

Â it by the coincidence rate for a uniformly used alphabet of that same size.

Â This is what the actual index of coincidence refers to.

Â For English, this works out to about 1.73.

Â Although not readily apparent,

Â it is not too difficult to reason out that the expected coincidence rate will be

Â greatest when the letters in each pair are drawn from the same alphabet.

Â This is the critical observation that lets us find the key length.

Â If we count the coincidences between the ciphertext and a copy of the ciphertext

Â that has been shifted by k places, then when k is a multiple of the key length,

Â each pair of letters are drawn from the same alphabet.

Â And we expect to see a spike in the coincidence rate.

Â Because of statistical variation, it will not be too surprising so

Â see spikes at other shift amounts, but

Â we would not expect to see these spikes at every multiple of that shift.

Â In the next lesson, we'll look at an example of applying coincidence analysis

Â to a Vigenere enciphered message, and get a feel for how it works and

Â how the amount of ciphertext material available affects its effectiveness.

Â