[SOUND] In the last few lectures, we've been talking about security against chosen ciphertext attacks. And in our definition of security against chosen ciphertext attacks, we allow the attacker to obtain the encryption of any cypher text of its choice. Of course, besides the challenge cypher text. And while this makes sense from a definitional point of view because we want to capture any possible sort of chosen cyphertext attack, it's still an unrealistic sort of a model, because in practice, the attacker will not be given the full decryption of arbitrary ciphertext. So what I want to show here is a real-world scenario where only one bit about decrypted ciphertext is leaked to the attacker, but nevertheless, this information can be exploited by an attacker to learn the entire plaintext. And the point of this is to demonstrate the importance of protecting yourself against chosen ciphertext attacks. Because a scheme that was secure against chosen ciphertext attacks would in particular be secure against the attack we're going to show here. Now to describe the attack, I just need to remind you about CBC mode encryption. So remember that in CBC mode encryption, we have a message consisting of say t blocks of plain text. And what we do to encrypt that, is we choose a random initialization vector, IV. Output the IV as the first block of the ciphertext, and then chain preceding blocks of cyphertext XORing them with the next block of the plain text. And then feeding that into our block cipher to produce the next block of ciphertext output. Now in our description of CBC mode encryption, we've been assuming until now that the plain text is an integral multiple of the block length of the block cipher. But in practice, that may not be the case and what we'd like to do is allow for more flexibility and allow the sender to encrypt messages whose length is not an integral multiple of the block length of the underlying cipher. So in general, what we can do to achieve this, is to take the message that the sender wants to send. Encode it in some way, some non-cryptographic way, that will allow efficient reconstruction of the original message, and then encrypt the encoded data using CBC mode encryption to yield the ciphertext. Now in the description I'm going to present here, we're going to assume for simplicity that the message is an integral number of bytes. But it will not necessarily be an integral multiple of the block length of the cipher. And one particular encoding scheme we're going to look at is called PKCS #5 encoding. And what you do in PKCS #5 encoding is the following. So let's let L denote the block length in bytes of our block cipher. So, if we look at the message we've been given, we can determine from the length of the message, how many bytes of padding we need to append to the message in order to get something which is an integral multiple of the block length. And let's let B denote the number of bytes that we need to append to the message in order to get something, whose length is a multiple of L, the block link. I'll note here that B, the number of bytes of padding we need, is going to be strictly between 1 and L, and note that we cannot have B equal to 0. That is we cannot have a case where we don't pad anything. And that might seem a little bit strange, because what if our message actually is an integral multiple of the block length? Well the point is that we need the receiver to be able to unambiguously recover the original message. And if, and if there are cases in which we don't add any padding at all, then that will be impossible. So if it turns out that our message is indeed an integral multiple of the block length, what we end up doing in that case is actually appending an additional L bytes of padding to the end of that message before encrypting it. So what we do then is we take b, the number of bytes that we need to append, and we append that value and code it in one byte, a total of b times. So we're appending indeed b bytes of encoding, which consist of the value b repeated b times. So for example, if we need to append three bytes of padding to our message, then what we do is we append the three bytes, 030303, here written in hexadecimal. To decrypt, the receiver will use CBC-mode decryption to obtain the encoded data. And then they need to recover from the encoded data the original message. And they do this in the following way. So say the final byte of the encoded data has value b. Well, if b is equal to 0, or if b is greater than L, then there's an error. No properly encoded message can have a final byte. Which is either 0 or greater than L. So in that case, the receiver will return an error. Otherwise, what the receiver does is to look at the final b bytes of the encoded data. And these should all be equal to b. Again, any properly encoded message will be of that form. So if the final b bytes of encoded data are not all equal to b, then the receiver again returns an error. And otherwise, what it does is simply strip off the final b bytes of the encoded data and output the initial portion of the message whatever's left as the message itself. So here's just an example. Here we have L equals 8, and we have a 6 byte message that the sender wants to encrypt. So what they'll do is they'll encode their message by appending 2 bytes. And those will of course both have value 2. It will then encrypt that and send it to the receiver. And the receiver will decrypt to obtain the same encoded message. The receiver looks at the final byte, which has value 2. And then strips off the final 2 bytes of padding, which were both indeed equal to 2, to recover the original 6 byte message. Now, the attacker can potentially exploit the behavior of the receiver to learn information about a ciphertext. And this will be done in a sort of a chosen ciphertext attack, but of a limited form. So here we assume that the attacker has intercepted some cyphertext c, which was generated by some honest sender in the appropriate way, i.e., by padding it appropriately and then encrypting it. And what the attacker can then do, is to construct modified cyphertext, say c prime, send that to the receiver as if it were coming from the honest sender. The receiver will decrypt that to obtain some encoded data, and then try to recover a message from that encoded data. I would point the receiver will either return an error or not, i.e., either the encoded data was properly formatted, and then there's no error. Or, as on the previous slide, there will be some error in the processing and the receiver will return an error message. We'll call this a padding oracle. So that is we have imagine that the attacker will interact with this receiver i.e this oracle, that takes cyphertext as input, decrypts them and then checks the encoding and outputs a 0 or 1, depending on whether or not the encoded data is correctly formatted. Now this may seem like something that would never happen in real life. However, padding oracles are frequently present in web applications. And if you think about it, this makes sense. Because when the receiver gets a ciphertext, and decrypts it and gets some encoded data which is not properly formatted, in general, it really does need to inform Defender that something went wrong. Maybe the message, or maybe the ciphertext was garbled in transit. Maybe there was a network level error. But either way, there has to be some way for the receiver to recover from that event. And even if the receiver does not explicitly return an error message in the case of improper encoding. An attacker is still might be able to detect whether or not an error occurred based on differences in the timing of what the receiver does or in the behavior of the receiver overall, i.e if the receiver terminates the connection, then that's indicative of an error. And even any kind of change like that can be enough to effectively give the attacker access to a padding oracle. Now let me go through the main idea of the attack, again, assuming that the attacker has access to this padding oracle. For simplicity, let's assume we have a two-block ciphertext throughout the node by IV, c. So IV is the initial block of ciphertext, and c is the next block. So this means we have a one block encoded data. And therefore a, a, a plain text, which is strictly smaller than one block. So, the encoded data is then equal to Fk inverse of c, XOR with IV. This is simply the, the decryption operation when running CBC mode. This is what the receiver will do to obtain the encoded data. Now the attacker doesn't know k, and the attacker of course doesn't know Fk inverse of c. The point is that it does know that this is exactly how the receiver will be computing the encoded data. And the main observation here, is that if the attacker modifies the ith byte of the IV, this will cause a predictable change only to the ith byte of the encoded data. Right, so if c remains unchanged and the attacker modifies the ith byte of the IV, then Fk inverse remains unchanged. And then when you're, when that gets XORed with IV, the result will be effected only in the ith byte of the result. So here I've tried to draw the picture from the point of view of the attacker. We have Fk inverse of c, which consists of a bunch of bytes which are unknown to the attacker. And the attacker knows that those are going to be XORed with the IV, which is known to the attacker, to give some encoded data, which at this point in time is completely unknown to the attacker. The attacker does know, however, that the encoded data that results from this is properly formatted. And that's because the sender properly formatted the message before encrypting it. So the attacker knows that if it forwards IV,c to the receiver, the receiver will decrypt, the padding will be fine, and there will be no error. What the attacker can now do, is try modifying the first byte of the IV. This will result in a modification only to the first byte of the encoded data, as we said before. The attacker can forward the resulting ciphertext with this modified first byte of IV. And check whether or not this yields an error or not. And let's assume that in this case there's no error, so it does get successfully decrypted. Or the attacker can then modify the second byte of the IV and check whether it's successfully decrypted. And then the third byte of the IV and so on. Let's assume that after modifying the third byte of the IV, the receiver returns an error. So this means that at this point in time, when a third bit of the IV has been modified, suddenly the encoded data is no longer properly formatted. What does that tell the attacker? Well, what that tells the attacker is that the receiver is checking the final 6 bytes of the encoded data to check whether the encoding was done properly, right? And that's because when the attacker had only modified the second byte of the IV, decryption succeeded, but when it modified the 3 byte, decryption suddenly failed. That tells the attacker that exactly the final 6 bytes of encoded data are being checked. And that means in turn, that the final 6 bytes of the original encoded data were all equal to 6. So if we go back to the original situation with the original IV, the attacker has now learned that the final 6 bytes of encoded data were all equal to 6. But the remaining 2 bytes of encoded data, which are the original message, are still unknown. But let's see what the attacker can do next. What they can do is they can modify, say the right-most byte of the IV, in a particular way to cause a predictable change in the right-most byte of the encoded data. So the right most byte of the IV started out being equal to 9E. If the attacker modified that to a value which is equal to 9E XORed with 6, XORed with 7, then what it knows is that that will result in a final byte of encoded data being equal to 7. And it can do something similar with the second to right most byte of the IV in order to cause the second to right most byte of the encoded data to come out equal to 7. And so on for the final 6 bytes of the IV and the 6 bytes of the encoded data. Now if the attacker sends this to the padding oracle, most likely decryption will fail because it's, it's unlikely in general that the second byte of the encoded data will happen to be equal to 7. But what the attacker can do now is simply try all possibilities for the second byte of the IV. So it can start out by setting the second byte of the IV to 0, and check to see whether or not the resulting cyphertext decrypts correctly or not. It can then try this with the byte having value 1, 2, et cetera, until eventually it reaches a value for the second byte, at which decryption succeeds. And when that happens, the attacker then knows that the second byte of the encoded data, at that point, must be equal to the value 7. Because that's the only way decryption will succeed. And now we're done. The attacker has learned the second byte of encoded data. Why is that? Because the attacker now knows that xx, that is the value of the second byte of Fk inverse of c, XORed with 41 in this case, is equal to the value 7. So now we can determine what xx XORed with 1 is equal to. So remember 1 was the actual value in the cyphertext in the second byte of the cyphertext that the attacker had intercepted. And so we can figure out just through some basic math that the second byte of the original encoded data transmitted by the sender was equal to 47. That just comes out from doing the math. And it can repeat the process now with the first byte of the IV, in order to learn the first byte of the encoded data. What's the complexity of this attack? Well, it takes at most L tries for the attacker to learn the number of padding bytes. This is because it's just going through the bytes of the IV one by one. For at most L bytes until decryption fails, at which point the attacker learns exactly how much padding there is and what those padding bytes are. And then for each byte of the original plaintext message, it takes at most 256 tries to cycle through all possibilities. All possible bytes of that position of the IV. And at that point, the attacker was guaranteed to learn the value of the original plaintext at that byte. So it takes 256 tries per byte, so 256 times the number of bytes in the original message, and the attacker learns the entire original message. Now that might sound like a lot, but in fact if the receiver were not doing any sort of throttling to prevent decrypting cyphertext, and, and, and continuing to get errors like that, then it wouldn't actually take very long in terms of time for the attacker to run 256 tries with the receiver. This could be done on the order of seconds. So what I've tried to show in this lecture, is an example where chosen-cyphertext attacks are a significant real-world threat on a deployed use of cryptography. And I'll highlight again that these kind of attacks have been found to exist in the real world. And have, in fact, been shown to be exploitable in the real world. So this is not just a theoretical sort of a threat. But I think it also demonstrates what kind of things can be done with chosen ciphertext attacks, and shows more generally why such attacks are problematic, and why you would want to defend against them. And for this reason, modern encryption schemes are generally designed to be CCA-secure in order to rule out any such attacks, in particular the padding oracle type attacks like we've shown here. Now, we're not going to see an example of a CCA-secure scheme now. However, we will see an example at the end of next week. I'll see you next time.