0:00

Before we start with the technical material, I want to tell you a little bit

Â about the history of cryptography. There's a beautiful book on this topic by David

Â Kahn called the codebreakers. It covers the history of cryptography all the way

Â from the Babylonian era, to the present. Here, I'm just going to give you

Â a few examples of historical ciphers, all of which are badly broken. So to talk

Â about ciphers the first thing I'm going to do is introduce our friends Alice and Bob,

Â who are gonna be with us for the rest of the quarter. So Alice and Bob are trying

Â to communicate securely and there is an attacker who's trying to eavesdrop on

Â their conversation. So to communicate securely, they're going to share a secret

Â key, which I'll denote by K. They both know the secret key. But the attacker does not

Â know anything about this key K. So now they're gonna use a cipher, which

Â is a pair of algorithms, an encryption algorithm denoted by E, and a

Â decryption algorithm, D. These algorithms work as follows: the encryption algorithm

Â E takes the message M as inputs. And it takes as inputs, the key K. I'm gonna

Â put a wedge above the key input just to denote the fact that this input is

Â really the key input. And then it outputs a ciphertext. Which is the encryption of

Â the message M using the key K. I'm always gonna write the key first. Now, and when I

Â write: = what I mean is that the expression defines what C what the

Â variable C stands for. Now the ciphertext is transmitted over the internet to Bob,

Â somehow. Actually it could be transmitted over the internet. Could be transmitted

Â using an encrypted file system, it doesn't really matter, but when the ciphertext

Â reaches Bob, he can plug it into the decryption algorithm and give the

Â decryption algorithm the same key K. Again, I'm gonna put a wedge here as well. To

Â denote the key inputs and the decryption algorithm outputs the original plaintext

Â message. Now the reason we say that these are symmetric ciphers is that both the

Â encrypter and decrypter actually use the same key K. As we'll see later

Â in the course, there are ciphers where the encrypter uses one key and the decrypter

Â uses a different key. But here we're just going to focus on symmetric cipher where

Â both sides use the same key. Okay, so let me give you a few historic examples of

Â ciphers. The first example, the simplest one is called the substitution cipher. I

Â am sure all of you played the substitution ciphers when you were in kindergarten.

Â Basically a key for a substitution cipher is a substitution table that basically

Â says how to map our letters. So here for example the letter A would be mapped to C,

Â the letter B would be mapped to W the letter C would be mapped to N so on and so

Â forth and then the letter Z would be mapped, say, to A. So this is one example

Â of a key for a substitution cipher. So just to practice the notation we introduced

Â before, the encryption of a certain message using this key, let's say the

Â message is BCZA, the encryption of this message using this key here would be, is

Â done by substituting one letter at a time. So B becomes W, C becomes N, Z becomes A,

Â and A becomes C. So the encryption of BCZA is WNAC, and this defines the ciphertext.

Â Similarly we can decrypt the ciphertext using the same key and of course

Â we'll get back the original message. Okay. So, just for historical

Â reasons, there is one example of something that's related to the substitution cipher

Â called the Caesar cipher. The Caesar cipher, actually, is not really a cipher

Â at all. And the reason is that it doesn't have a key. What a Caesar cipher is, is

Â basically a substitution cipher where the substitution is fixed. Namely, it's a

Â shift by three. So A becomes D, B becomes E, C becomes F and so on and so forth.

Â What is it, Y becomes B and Z becomes C. It's a fixed substitution that's applied

Â to all plaintext messages. So, again, this is not a cipher, because there is no

Â key, the key is fixed. So if an attacker knows how our encryption scheme works, he

Â can easily decrypt the message. The key is not random, and therefore, decryption is

Â very easy once you understand how the scheme actually works. Okay, so now, let's

Â go back to the substitution cipher, where the keys really are chosen at random, the

Â substitution tables are chosen at random. And let's see how we break this

Â substitution cipher. Turns out to be very easy to break. The first question is, how

Â big is the key space? How many different keys are there, assuming we have 26

Â letters? So, I hope all of you said that, the number of keys is 26 factorial,

Â because, a key, a substitution key, is simply a table, a permutation of all 26

Â letters. The number of permutations of 26 letters, is 26 factorial. If you calculate

Â this out, twenty-sixth factorial is about two to the 88th, which means that

Â describing a key in a substitution cipher takes about 88 bits. So, each key is

Â represented by about 88 bits. Now, this is a perfectly fine size for a keyspace. In

Â fact, we're gonna be seeing ciphers that are perfectly secure, or, you know, that

Â are adequately secure, with key spaces that are roughly of this size. However,

Â even though the substitution cipher has a large key space of size 2^88, it's

Â still terribly insecure. So let's see how to break it. And to break it, we're going

Â to be using letter frequencies. So the first question is, what is the most

Â frequent letter in English text? So I imagine all of you know that, in fact, E

Â is the most common letter. And that is gonna, if we make it quantitative, that's

Â gonna help us break a substitution cipher. So just given the ciphertext, we can

Â already recover the plaintext. So the way we do that is, first of all, using

Â frequencies of English letters. So here's how this works. So you give me an

Â encrypted message using the substitution cipher. All I know is that the plaintext

Â is in English and I know that the letter E is the most frequent letter in English.

Â And in fact, it appears 12.7 percent of the time in standard English texts. So

Â what I'll do is I'll look at the ciphertext you gave me and I'm going to count

Â how many times every letter appears. Now the most common letter in the ciphertext

Â is going to be the encryption of the letter E with very high probability. So

Â now I'm able to recover one entry in the key table. Mainly the letter, mainly now I

Â know what the letter E maps to. The next, most common letter in English is the

Â letter T, that appears about 9.1 percent of the time. So now again, I count how

Â many times each letter appears in the ciphertext. And the second most frequent

Â letter is very likely to be the encryption of the letter T. So now I've recovered a

Â second entry in the key table. And I can continue this way. In fact, the letter A

Â is the next most common letter; it appears 8.1 percent of the time. So now I can

Â guess that the third most common letter in the ciphertext is the encryption of the

Â letter A. And now I've recovered three entries in the key table. Well, so now

Â what do I do? The remaining letters in English appear roughly same amount of

Â time, other than some rare letters like Q and X. But we're kinda stuck at this

Â point. We figured out three entries in the key table but what do we do next? So,

Â the next idea is to use frequencies of pairs of letters. Sometimes these are

Â called digrams. So, what I'll do is, I'll count how many times each pair of letters

Â appears in the cipher text, and, I know that in English, the most common pairs of

Â letters are things like, HE, AN. IN, I guess TH is another common pair of

Â letters. And so I know that the most common pair of letters in the ciphertext

Â is likely to be the encryption of one of these four pairs. And so by trial and

Â error I can sort of figure out more entry ... more elements in the key table and again

Â by more trial and error this way by looking at trigrams. I can actually figure

Â out the entire key table. So the bottom line here is that in fact the substitution

Â cipher is vulnerable to the worst possible type of attack namely a ciphertext only

Â attack. Just given the ciphertext the attack that can recover the decryption key

Â and therefore recover the original plaintext. So there's really no point in

Â encrypting anything using the substitution cipher, because the attacker can easily

Â decrypt it all; you might as well send your plaintext completely in the clear.

Â So, now we're gonna fast-forward to the Renaissance, and, I guess we're moving

Â from the Roman Era to the Renaissance, and look at a cipher designed by a fellow

Â named Vigener, who lived in the sixteenth century. He designed a couple

Â of ciphers. Here I'm gonna show you a variant of one of his ciphers, this is

Â called a Vigener cipher. So, in a Vigener cipher, the key is a

Â word. In this case the word, is CRYPTO, it's got six letters in it. And then to

Â encrypt a message, what you do is, you write the message under the key. So in

Â this case the message is "WHAT A NICE DAY TODAY" and then you replicate the key as

Â many times as needed to cover the message. And then the way you encrypt is basically,

Â you add the key letters to the message letters, modulo 26. So just to give

Â you an example here, for example, if you add Y and A, you get Z. If you add T and

Â A, you get U. And you do this for all the letters. And remember, whenever you add,

Â you add modulo 26. So if you go past Z, you go back to A. So, that's the

Â Vigener cipher. And in fact, decryption is just as easy as encryption.

Â Basically, the way you would decrypt is, again, you would write the ciphertext

Â under the key. You would replicate the key and then you would subtract the key from

Â the ciphertext to get the original plain text message. So, breaking the Vigener

Â cipher is actually quite easy. Let me show you how you do it. The first thing we

Â need to do is we need to assume that we know the length of the key. So let's just

Â assume we know that. In this case, the length of the key is six. And then what we

Â do is we break the cipher text into groups of six letters each, okay? So we're gonna

Â get a bunch, a bunch of groups like this. Each one, contains six letters. And then

Â we're gonna look at, the first letter in each group. Okay? So, in this case, yes,

Â we're looking at the first letter, every six characters. Now, what do we know about

Â these six letters? We know that, in fact, they're all encrypted using the same

Â letter in the ciphertext. All of these are encrypted using the letter c. In other

Â words. Z L W is a shift by three of the plaintext letters. So if we collect all

Â these letters then the most common letter among the set is likely to be the

Â encryption of E, right? E is the most common letter in English, therefore, if I

Â look at every sixth letter, the most common letter in that set is likely to be

Â the encryption of the letter E. Ahah! So let's just suppose that in fact the most

Â common letter every sixth letter happens to be the letter H. Then we know that E

Â plus the first letter of the key is equal to H. That says that the first letter of

Â the key is equal to H minus E. And in fact that is the letter C. So now we've

Â recovered the first letter of the key. Now we can continue doing this with the second

Â letter. So we look at the second letter in every group of six characters and again,

Â we repeat the same exercise. We find the most common letter among the sets and

Â we know that the most, this most common letter is likely the encryption of E and

Â therefore whatever this letter, whatever this most common letter is if we

Â subtract 'E' from it we're going to get the second letter of the key. And so on and so

Â forth. With, the third letter every six characters. And this way we recover, the

Â entire key. And that allows us to decrypt, the message. Now, the only caveat

Â is that I had to assume ahead of time that I know the length of the key, which in

Â this case is six. But if I don't know the length of the key ahead of time, that's

Â not a problem either. What I would do is I would run this decryption procedure,

Â assuming the key length is one. Then I'd run it assuming the key length is two.

Â Then I would run it assuming the key lengths is three. And so on, and so on,

Â and so on, until finally I get a message. I get a decryption that makes sense,

Â that's sensical. And once I do that I know that I've kind of recovered the right

Â length of the key and I know that I've also recovered the right key and

Â therefore the right message. Okay? So very, very quickly, you can recover, you

Â can decrypt Vigener cyphers. Again, this is a ciphertext only attack. The

Â interesting thing is, Vigener had a good idea here. This addition mod

Â 26 is actually a good idea, and we'll see that later, except it's executed very

Â poorly here. And so we'll correct that, a little bit later. Okay, we're gonna

Â fast-forward now from the Renaissance to, to the nineteenth century where

Â everything became electric. And so people wanted to design ciphers that use electric

Â motors. In particular, these ciphers are called rotor machines because they use

Â rotors. So an early example is called the Hibber machine which uses a single motor.

Â Here you have a picture of this machine. The, the motor, the, I guess the rotor is

Â over here. And the secret key is captured by this disc here, it's embedded inside of

Â this disc, which rotates by one notch every time you press a key on the

Â typewriter, okay? So every time you, that you hit a key, the disc rotates by one

Â notch. Now what does this key do? Well, the key actually encodes a substitution

Â table. So ... and therefore, the disc actually is the secret key. And as I said, this

Â disc encodes a substitution table. In this case, if you happen to press C as the

Â first letter, output would be the letter T. And then the disc would rotate by one

Â notch. After rotating, rotating by one notch, the new substitution table becomes

Â the one shown here. You see that E, basically, moves up, and then the

Â remaining letters move down. So imagine this is basically a two dimensional

Â rendering of the disc rotating by one notch. Then you press the next letter. And

Â the disc rotates again. You notice again N moved up and the remaining letters moved

Â down. So in particular, if we hit the letter C three times, the first time we

Â would output, the output would be T, the second time the output would be S, and the

Â third time the output wold be K. So this is how the single rotor machine works and

Â as it turned out very quickly after it was advertised it was again broken basically

Â using letter frequency, digram frequencies and trigram frequencies. It's

Â not that hard given enough ciphertext to directly recover the secret key and then

Â the message. Again, a ciphertext only attack. So to kind of work against these

Â frequency attacks, these statistical attacks, these rotor machines became more

Â and more complicated over time. Until finally, I'm sure you've all heard of the

Â Enigma. The Enigma is a kind of complicated rotor machine. It uses

Â three, four, or five rotors. There are different versions of the Enigma

Â machine. Here you see an example of the Enigma machine with three rotors. The

Â secret key in the Enigma machine is the initial setting of the rotors. Okay. So in

Â the case of three rotors there would be 26 cubed possible different keys. When you

Â type on the typewriter basically these rotors here rotate at different rates. Oh,

Â forgot to say this is a diagram of an Enigma machine using four rotors. As you type on

Â the typewriter the rotors rotate and output the appropriate, letters of, the

Â ciphertext. So in this case the number of keys is 26 to the fourth, which is two

Â to the eighteen, which is relatively a small key space. Today you can kind of,

Â brute-force a search using a computer through two to the eighteen different

Â keys, very, very quickly. You know, my wristwatch can do it in just a few

Â seconds, I guess. And so, these, this Enigma machine was, already was using

Â relatively small key spaces. But I'm sure you've all heard that the British

Â cryptographers at Bletchley Park were able to mount ciphertext only attacks on

Â the Enigma machine. They were able to decrypt German ciphers back in World, in

Â World War Two. And that played an important role in many different battles

Â during the war. After the war, I guess that was the end kind of the mechanical

Â age and started the digital age where folks were using computers. And as the

Â world kind of migrated to using computers, the government realized that it's buying a

Â lot of digital equipment from industry. And so they wanted industry to use a good

Â cipher so that when it buys equipment from the, from industry, it would be getting

Â equipment with, with a decent cipher. And so the government put out this request for

Â proposal for a data encryption standard, a Federal data encryption standard. And

Â we're gonna talk about this effort, in more detail later on in the course, but in

Â 1974 a group at IBM put together a cipher that became known as DES, data encryption

Â standard, which became a Federal standard for encrypting data. The key space for DES

Â is two to the 56, which is relatively small these days, but was large enough

Â back in 1974. And another interesting thing about DES is, rather than, unlike

Â rotor machines which encrypt one character at a time the data encryption standard

Â encrypts 64 bits at a time, namely eight characters at a time. And we'll see the

Â significance of this later on in the course. Because DES uses such

Â a small key space, these days it can be broken by a brute-force search and so

Â these days DES is considered insecure and should not be used in

Â projects. Unfortunately, it is used in some legacy systems, but it definitely is

Â on its way out and definitely should not be used in new projects. Today there are

Â new ciphers, things like the advanced encryption standard which uses 128 bit

Â keys. Again, we'll talk about the advanced encryption standards in much more detail

Â later on in the course. There are many, many other types of ciphers. I mentioned

Â Salsa20 here. We'll see why in just a minute. But this is all for the quick

Â historical survey and now we can get into the more technical material.

Â