0:00

Over the years it became clear that DES and triple DES are simply not designed for

Â modern hardware and are too slow. As a result, NIS started a new process to

Â standardize in a new block cypher called the Advanced Encryption Standard or AES

Â for short. Nis started it's effort in 1997 when it requested, proposals for a new

Â block cipher. It received fifteen submissions a year later. And finally in

Â the year 2000, it adopted the cypher called Rindall as the advanced encryption

Â standard. This was a cypher designed in Belgium. We already said that it's block

Â size is 128 bits and it has three possible key sizes. 128 bits, 192, and 256. Now,

Â the assumption is that the larger the key size is, the more secure the block cipher

Â is as a pseudo random permutation. But because it also has more rounds involved

Â in its operation. The slower the cipher becomes. So the larger the key supposedly,

Â the more secure the cipher, but also the slower it becomes. So for example, AES 128

Â is the fastest of these ciphers and AES 256 is the slowest. Now AES is built as

Â what?s called a substitution permutation network. It's not a Faistl network.

Â Remember that in a Faistl network, half the bit were unchanged from round to

Â round. In a substitution permutation network all the bits are changed in every

Â round. And the network works as follows, so here we have the first round of the

Â substitution permutation network, where the first thing we do is we X or the

Â current state with the round key. In this case the first round key. Then we go thru

Â a substitution layer, where blocks of Date are replaced with other blocks based on

Â what the substitution table says. And then we go through a permutation layer where

Â bits are permuted and shuffled around. And then we do this again. We XR with an X

Â round key, we go thru a substitution phase, and we're permute to dance around.

Â And so on, and so on, and so forth Until we reach the final round where we x or

Â with the very last around key, and then out comes the output. Now, and important

Â point about this design. Is that, in fact, because of how it's built, every step in

Â this network needs to be reversible, so that the whole thing is reversible. And so

Â the way we would, decrypt, essentially, is we would take the output and simply apply

Â each step of the network in reverse order. So we start with the permutation step, and

Â we have to make sure that step is reversible. Then we look at the

Â substitution layer, and we have to make sure this step is reversible. And this is

Â very different from DES. In DES, if you remember, the substitution tables were not

Â reversible at all. In fact, they map six bits to four bits. Whereas

Â here, everything has to be reversible, otherwise it would be impossible to

Â decrypt. And of course the x or with the round key is reversible as well. Okay? So

Â inversion of a substitution permutation network is simply done by applying all of

Â the steps in the reverse order. So now that we understand the generic

Â construction. Lets look at the specifics of AS. So AS operates on a 128 bit block.

Â Which is sixteen bytes. So what we do with AS is we write those sixteen bytes as a

Â four by four matrix. Each cell in the matrix contains one byte. And then we

Â start with the first round so we X over with the first round key and then apply a

Â certain function. That, includes substitutions and permutations and other

Â operations on the state. And again, these three functions that are applied here have

Â to be invertible so that in fact the cypher can be decrypted. And then we xor

Â with the next round key and we do that again. Again we apply the round function

Â and xor the round key. And we do that again and again and again. We do it ten

Â times. Although interestingly in the last round, the mix column step is actually

Â missing. And then finally, we XOR with the last rounds key, and out comes the output.

Â Again, in every phase here, we always, always, always keep this four by four

Â array. And so the output is also four by four, which is sixteen bytes, which is 128

Â bits. Now the long key themselves of course come from a sixteen byte AS key

Â using key expansion. So the key expansion maps from a sixteen bytes AS key

Â into eleven keys, each one being sixteen bytes. So these keys themselves are also a

Â four by four array that's XORed into the current state. Okay, so that's the

Â schematic of how AES works. And the only thing that's left to do is specify these

Â three functions, byte sub, shift row, and mixed column. And those are fairly easy to

Â explain. So I'm just gonna give you the high level description of what they are.

Â And, those interested in the details can look it up online. So the way byte

Â substitution works, is literally it's one S box containing 256 bytes. And

Â essentially, what it does is it applies the S box to every byte in the current

Â states. So, let me explain what I mean by that. So the current state is gonna be

Â this four by four, table. So here we have the four by four table. And to each

Â element in this table, we apply the S box. So let's call it the A table. And then

Â what we do is, essentially, for all four by four entries, essentially, the next

Â step, Aij. Becomes the current step evaluated at the look up table. So we use

Â the current cell as an entry, as an index, into the look up table. And then the value

Â of the look up table is what's output. Okay. So, that's the first step. The next

Â step that happens is a shift pro step, which is basically just a permutation. So

Â essentially we kind of do a stick lick shift on each one of the rows. So you can

Â see the second row is stick lick shifted by one position. This third row is

Â [inaudible] shifted by two positions and the third row is [inaudible] shifted by

Â three positions. And the last thing we do is mix columns where literally we apply a

Â linear transformation to each one of these columns. So there's a certain matrix that

Â multiplies each one of these columns and it becomes, the next column. So these

Â linear transformation is applied independently to each one of the columns.

Â Now, I should point out that, so far, shift rows and mixed columns are very easy

Â to implement in code. And I should say that the [inaudible] institution itself is

Â also easily computable, so that you can actually write code that takes less than

Â 256 bytes to write. And you can kind of shrink the description of AES by literally

Â storing code that computes the table rather than hardwiring the table into your

Â implementation. And in fact, this is kind of a generic fact about AES, that if you

Â can have allowed no pre computation at all, including computing the S box on the

Â fly. Then in fact you get a fairly small implementation of AES, so it, it could fit

Â on very constrained environments where there isn't enough room to hold,

Â complicated code. But of course, this will be the slowest implementation, because

Â everything is computed now on the fly, and as a result, the implementation,

Â obviously, is gonna be, slower than things that were pre-computed. And then there is

Â this trade off. For example if you have a lot of space, and you can support large

Â code. You can actually precompute quite a bit of the three steps that I just

Â mentioned. In fact, there are multiple options of pre-computation, you can build

Â a table that's only four kilobyte big. You can build a table that is even longer,

Â maybe 24 kilobytes. So basically you will have these big tables in your

Â implementation. But then your actual performance is going to be really good,

Â because all your doing is just table look-ups and XORs. You're not doing

Â any other complicated arithmetic. And as a result, if you can do a lot of

Â pre-computation, these three steps here, [inaudible] should. [inaudible] and mixed

Â columns can be converted just into a number, a small number of table lookups

Â and some [inaudible]. All you can do is just compute the S box, so now your

Â implementation would just have 256 bytes. Hard coded The rest would just be code

Â that's actually computing these three functions. The performance would be slower

Â than in the previous step but the code footprint would also be smaller. So in

Â overall, there's this nice tradeoff between code size and performance. So on

Â high-end machines, on high-end servers, where you can afford to have a lot of

Â code, you can precompute and store these big tables and get the best performance.

Â Whereas on low-end machines like eight byte smart cards or think of like an eight

Â byte wristwatch, you would actually have a relatively small implementation of AES.

Â But as a result of course it won't be so fast. So here's an example that's a little

Â unusual, suppose you wanted to implement AES in Javascript so you can send an AES

Â library to the browser and have the browser actually do AES by itself. So in

Â this case what you'd like to, to is you'd like to both shrink the code size, so that

Â on the network there's minimum traffic to send a library over to the browser but, at

Â the same time, you'd like the browser performance to be as fast as possible. And

Â so this is something that we did a while ago essentially the idea is that the code

Â that actually gets send to the browser doesn't have any pre computer table and as

Â a result is fairly small code. But then the minute it lands on the browser, what

Â the browser will do is actually pre compute all the tables. So in some sense

Â the code goes from just being small and compact. It gets bloated with all these

Â precomputed tables. But those are stored on the laptop, which presumably has a lot

Â of memory. And then once you have the precomputed tables you actually encrypt

Â using them. And that's how you get the best performance. Okay? So if you have to

Â stand in implementation AES over the network, you can kind of get the best of

Â all worlds. Whereas, the code over the network is small, but when it reaches the

Â target client, it can kind of inflate itself. And then get the best performance

Â as it's doing encryption on the clients. Now AES is such a popular block cipher,

Â now essentially when you build crypto into products essentially your supposed to be

Â using AES, and as a result Intel actually put AES support into the processor itself.

Â So since Westmere there are special instructions in the Intel processor to

Â help accelerate AES. And so I listed these instructions here. They come in two pairs,

Â aesenc and aesenclast. And then, there's aesekygenassist. So, let me explain

Â what they do. So, aesenc essentially implements one round of AES. Namely, apply

Â the three functions in the XOR with the round key. And aesenclast basically

Â implements the last round of AES. Remember, the last round didn't have the

Â mix columns phase. It only had the subs bytes and shift rows. And so that's why

Â the aesenclast does. And the way you call these instructions is using 128 byte

Â registers which correspond to the states of AES. And so you would have one register

Â containing the states and one register containing the current round key. And then

Â when you call AES on these two registers, basically, they would run one round of AES

Â and place the result inside of this XMM one state register. And as a result if you

Â wanted to implement the whole AES. All you would do is, call aesenc nine times. Then

Â you would call aesenclast one time. These ten instructions are basically the entire

Â implementation of AES. That's it. It's that easy to implement AES on this hardware

Â and they claim because these operations are now done inside the processor not

Â using external instructions that are implemented in the processor. They claim

Â that they can get a fourteen X speedup over say an implementation that's running

Â in the same hardware but implementing AES without these special instructions. So

Â this is quite a significant speed up and the facts are now lots of. Products that

Â make use of these special instructions. And let's just say that this is not

Â specific to Intel, if you're in [inaudible], and they also implemented

Â exactly kinda similar instructions in their bulldozer architecture and further

Â and future architectures. Okay, so let's talk about the security of AES. I wanna

Â mention just two attacks here. Obviously, AES has been studied quite a bit. But the

Â only two attacks on the full AES are the following two. So, first of all, if you

Â wanted to do key recovery, the best attack, basically, is only four times

Â faster than exhaustive search. Which mean that instead of a hundred and. 28 bits

Â key, really you should be thinking of AES. Is 126 bit key. Cause exhaustive search,

Â really it's gonna four times faster than it should. Of course due to the 126, it's

Â still. More time than we have to compute, and this really does not hurt the security

Â alias. The more significant attack is, actually, on AES-256. It turns out there's a

Â weakness in the key expansion design of aes which allows for, what's called a

Â related key attack. So, what's a related key attack? Essentially, if you give me

Â about two to the 100 input output pairs for aes, but from four related keys. So,

Â these are keys that are very closely related, namely key number one. Is just

Â one bit flip of key #two, which is just one flip, bit flip of key #three, which is

Â just one flip bit flip of key #four. These are very closely related keys, if you like

Â your [inaudible] distances very short. But if you do that, then, in fact, there is a

Â two to the 100 attack. Now you should say, well, two to the 100 is still impractical.

Â This is still more time than we can actually run today. But nevertheless, the

Â fact that it's so much better than an exhaustive search attack, it's so much

Â better than two to the 56, is kind of a limitation of the cipher. But generally,

Â it's not a significant limitation, because it requires related keys. And so, in

Â practice, of course, you're supposed to be choosing your keys at random, so that you

Â have no related keys in your system. And as a result, this attack wouldn't apply.

Â But if you do have related keys, then there's a problem. So this is the end of

Â the segment, and in the next segment we're gonna talk about more provably secure

Â constructions for block ciphers.

Â