[MUSIC] In this lecture we'll introduce the concept of message integrity along with a cryptographic primitive called message authentication codes. Until now in this course, we've been concerned with ensuring secrecy of communication between a sender and a receiver. But what about the goal of integrity? Roughly speaking, this means that we might want to ensure that a message received by the receiver. Originated from the intended sender, and was not modified in transit. Note that here we're explicitly concerned with an active attacker who can modify things sent between the sender and receiver. As well as inject his own traffic, to be received by the receiver looking as if it came from the other party. And we're concerned here in ensuring that any messages. Obtained by the receiver came from the intended sender with whom it shares a key. Note here that standard error correction techniques are not enough. Because we're not concerned here with random errors in what's sent from the sender to the receiver. We're concerned here with an attacker who can decide exactly where and how to modify what's being sent between the parties. And the right cryptographic tool for ensuring integrity in the symmetric key setting. Is called a message authentication code. At a high level the way a message authentication code or a MAC works is as follows. So as usual in the private key setting we have a sender and a receiver who have shared a key k in advance. When the sender wants to transmit a message m to the receiver, what it will do is it will computer a tag t. By applying some MAC algorithm to the message m using the shared key k. The sender will then transmit m and t across the channel. And what we're trying to achieve is the following. If we have an attacker, who can modify what's sent. And replace it with some arbitrary message n prime and tag t prime of the attacker's choice. Then the receiver, when trying to verify correctness of the tag t prime, on the message m prime, using the shared key k, should find that the tag is not correct. Verification should in that case return zero indicating an error. In contrast if m and t as sent by the sender were received unaltered by the receiver. Then verification should return one, indicating that that message did indeed originate from the sender. Note here that there's no concern of secrecy at all. In fact, as depicted here, the message m is sent in the clear. As usual, in the private key setting, there are two canonical sort of applications where message authentication codes might be used. The first is when we have two parties sharing a key who are separated in space. And here I've shown as an example a user interacting with their bank. So here the user and the bank share a key k advance. The user is transmitting some message m to the bank and wants to make sure that the bank receives that message unmodified. As an example, if the user is sending a message to transfer $100 from its own account to someone else's account. Then the bank should receive exactly that message. And if an attacker changes that message to reflect a transfer of $1,000. The bank should detect that something went wrong, and that that message did not originate from the intended sender. The other canonical application of message authentication codes are when we have the same party authenticating some communication, separated from itself over time. And here we can imagine a web server holding a key K and it generating a cookie. Which will be stored on the client side. The server wants to make sure that the cookie it receives back at a later point in time from that user has not been modified. So what the web server can do is to generate a key, k that it will store. It can authenticate the cookie using a message authentication code and send the cookie along with the tag to the user. At a later point in time the user will send that cookie along with the tag back to the web server. And at that point in time the web server can verify the message authentication code, to check whether or not the cookie has been altered. Why might this be important? Well imagine for example that the cookie encodes some information about the price of some item. And if the price of that item is ten, say. Then it would obviously be very bad if a malicious client could modify that and change the price embedded in the cookie to, say, $1. Let me stress here that secrecy and integrity are orthogonal concerns. As depicted in the pictures in these cases, we're only concerned with integrity and we're not concerned with secrecy at all. Again, the message is being sent in the clear. Now of course, that doesn't mean that you can't have both secrecy and integrity. But for our purposes right now, we're going to treat them independently. Because in general, it might be desirable to have one without the other, right? You might want secrecy. And not care so much about integrity. You might want integrity but not require secrecy for what you're communicating. And so, we'd like to site them independently of eachother. One thing I'd also like to point out, is that just using encryption. And using that, as it were, as a sort of a tag, does not in general, provide any integrity whatsoever. This is related to an issue we've seen before, the issue of malleability. It's not precisely the same thing. But the fact that we have encryption schemes that are malleable, indicates in particular those schemes don't provide any integrity. And we can see an example of this, very easily. So, imagine if, as it were we were trying to use encryption as a form of message authentication. And sending a cypher text alone, and assuming that if the cypher text decrypts properly. Than it must have originated from the sender and if not, then it's an error. Well imagine very simply that we instantiate the encryption here by the one time pad. So that means the shared key k is an inbit string. The message here let's assume is also an n-bit string. And the ciphertext is just the excor of the key in the message. Well if an attacker intercepts that and just flips the final bit of the ciphertext. Then at the receiving end the receiver will just decrypt that to the same original n minus one bits that the sender sent. With the last bit being flipped. If you work out the math, this is exactly what you get. And from the receiver's point of view, there's no way to distinguish whether the sender indeed intended to send this message. m1 through mn, minus 1, and then to mn prime. Or whether that was caused due to some interference from an adversary. In particular, the one-time pad, in the one-time pad, there's no way to get any error upon decryption. Any end bit string is a valid ciphertext. And so it's sort of, should be clear, that the one-time pad is not providing any sort of integrity protection at all. What this means is that even a strong security notion, like perfect secrecy, does not imply any integrity whatsoever. Formally a message authentication code is defined by three probabilistic polynomial time algorithms, Gen, Mac, and Vrfy. The key generation algorithm Gen takes as input the security parameter, and outputs a key k. And as usual we'll assume that the length of the key is at least n bits. The MAC algorithm takes of input a key k and a message m and outputs a tag t. And as we saw in a previous slide, we'll denote that by subscripting the key and writing t equals Mac sub k of m. In general it's possible to have probabilistic Mac algorithms. But we're going to assume for simplicity and for convenience that the MAC algorithm is deterministic. And as we'll see, most commonly used MACs are in fact deterministic. The verification algorithm takes as input a key k, a message m, and a purported tag t as imput. And outputs either one or zero. With a one indicating acceptance, and a zero indicating rejection. And of course the natural sort of correctness condition. Is that for all messages and all keys k output by the key generation algorithm. If we verify a ma a tag generated by Mac using that key. Along with the message M, then we should get one. This just means that if the honest parties are communicating without any adversarial interference at all. Then whatever the receiver obtained from the sender will verify correctly. Now, as far as defining security, it's sort of lucky that in contrast to the case of private key encryption. There's really only one standard of security for message authentication codes. The threat model here is what's known as an adaptive chosen-message attack. That is, we assume that the attacker is able to induce the sender to authenticate arbitrary messages of the attacker's choice. And the security goal is existential unforgeability. That is, the attacker should be unable to forge a valid tag on any message that was not explicitly authenticated by the sender. Might be easier to understand this in terms of a picture, so here we imagine that an attacker can unteract with the sender. Influence the sender, or induce the sender in some other way to generate tags on messages, m1 through mi of the attacker's choice. So we imagine that the attacker can specify a message m one and then receive back the tag t one computed on that message. And so on for a total of i times, thereby attaining the tags on i different messages of the attacker's choice. After doing so, it should still be infeasible for the attacker to come up with any message m prime not equal to one of the messages in m1 through mi. Along with a valid tag, T prime, that is a tag for which the verification algorithm run by the receiver would output one. You can ask whether this definition is too strong. When would it be possible, for example, for an attacker to arbitrarily choose what sort of messages the sender authenticates, and. Similarly it might be the case that if the attacker can output a valid tag on some arbitrary methods it doesn't really buy anything. It doesn't really get anything for the attacker. However, the important point is that we don't want to make assumptions, about what the sender might authenticate. And we also don't want to make any assumptions about what forgeries are meaningful and which ones aren't meaningful. In general those kind of determinations are application dependent. And what we'd like to do is just as in the case of our definitions of private key encryption, come up with a strong enough definition. Such that a scheme satisfying that definition can be used anywhere. And it's true that a MAC satisfying our definition of security. Can be used anywhere integrity is needed without having to think about the semantics of the underlying application. Formally, we define security for message authentication code as follows. We imagine fixing some message authentication code pi. And if we also fix an attacker A, then we can define, based on those, a randomized experiment that I'll call Forge. So Forge depends on A and pi, as well as a security parameter, n. And the experiment works as follows. First we generate a key k, and then we allow the attacker to interact with a Mac oracle. This oracle just allows the attacker to specify messages. And get back the corresponding tags computed on those messages using the key k generated in the first step. We can then let capital M denote the set of messages that the attacker has submitted to this oracle. Finally, a outputs a pair m comma t. And we'll say say that A succeeds and the experiment evaluates to one, if and only if the following two conditions hold. So, first of all it should be that case that the tag t output by the attacker is a valid tag on the message m. That is, then verify sub k, apply it to m comma t should output one. Also it should be the case that the message m on which the attacker output, it's tagged t, should not be equal to one of the messages. For which the attacker already requested a tag from its Mac oracle. Of course if we drop that condition then it would be trivial for the attacker to always succeed. By simply requesting a single tag on a single message, and then outputting that message along with the tag that it received. So to prevent those kinds of trivial attacks,. We have to enforce that the message m output by the attacker is not in the set m of things whose tags it's already obtained from the oracle. And then in the usual way we can say that a MAC pi is secure. If for all probabilistic polynomial time attackers A there's some negligible, negligible function epsilon. First let the probability that Forge experiment outputs 1 is at most epsilon of n. So note here that in contrast to the case of encryption. Where we require the success probability of the attacker to be at most one half plus epsilon. Here we don't want the attacker to succeed any of the time. And so we bound the probability of success by epsilon. That means it should be infeasible for the attacker to ever output. A valid, eventually every output, a valid tag on a new message. One thing I want to highlight about the definition, is that replay attacks are not prevented. A replay attack occurs whenever the attacker simply replays a message transmitted by the honest receiver. Because of the way we have defined things, and in particular because we have defined max. To be stateless, there's actually no way we can hope to prevent them. Right, the verified algorism, can not distinguish whether its been given a message m and a tag t, that it's seen before. Because we don't allow the verify algorithm to remain state. Now this does not mean that replay attacks are not problematic, and in fact replay attacks. Are often a significant real world concern. If we go back to our example of a user interacting with a bank. If a user requests for a 100 dollars to be transferred to the attacker's account. Well if it uses a MAC, then we know the attacker can't change the amount from $100 to $1,000. However, if we don't take any precaution to defend against replay attacks. Then what an attacker can do is simply replay their request to transfer $100 ten times, and thereby achieve the same effect. So there is very often a need to protect against replay attacks. However, any such protection must occur at a higher level of the protocol. It can't be done by the MAC itself. It has to be done by something which is maintaining state, and is aware of the semantics of the messages being authenticated. In the next lecture we will begin our study of message authentication codes, by constructing a very simple secure MAC for short messages.