[SOUND] In the previous lecture we talked about defining what we mean by secure encryption. And we reached an informal description of the security we were after. Recall that what we said is that we should consider an encryption scheme secure if regardless of any prior information the attacker has about the plaintext, observing a ciphertext should reveal no additional information to the attacker about the plaintext. Remember also that we say we were going to be considering the threat model of a ciphertext only attack, meaning that the attacker only observes ciphertext, passively, but doesn't actively interfere with either the sender or receiver. And that we would be considering the simplest threat model which is that the attacker observes only a single ciphertext. We can adapt our definition to more complicated cases, but this is the one we're going to be considering for now. Now what we want to do in this lecture, is to define this informal definition of security that we have, formally, rigorously and mathematically. To do that we're going to need to start talking about probability and so I just want to go over some very basic concepts in probability that I assume you have encountered before in one form or another before. Again we are not going to need a very formal view of probability. That formal view is, there it's present. We can define everything completely rigorously if we wanted to but I just want to assume here that you seen these concepts before, that you are familiar with them, whether or not you have seen them in their complete rigor. So we're going to use the, terminology of a random variable. Now again, this has a formal definition in probability theory. For our purposes, it's enough to just think of a random variable as a variable that takes on certain values with certain specified probabilities. For our purposes we're only going to be concerned with discrete random variables. That means that there's some finite set of values that the variable can possibly take. And so we don't have to worry actually about taking integrals or about considering continuous parameters or continuous distributions on those random variables. So again, a random variable is just going to be a variable that takes on certain values with some probabilities that we're going to specify. And in particular, a probability distribution for a random variable is going to be a specification of the probabilities with which that variable takes on each possible value. We know that these probabilities must be in the range between 0 and 1 inclusive. And, furthermore, the sum of the probabilities with which the variable takes on all these different values should equal 1, right? The various probabilities should all sum to 1. again, we'll also use the terminology of an event, which is something that has a formal definition within probability theory, but for our purposes, we can just think of, as a particular occurrence, in some randomized experiment. And we'll write probability bracket E, to denote the probability of some particular event E, the probability that some occurrence actually happens. So a typical example of an event that we'll be concerned with is the probability that some random variable takes on some value. We then have the notion of conditional probability which is the probability that one event occurs, assuming that some other event has occurred. And we'll write this as the probability of A Bar B. That is the probability of event A, conditioned on the occurrence of event B. And by definition the probability of A conditioned on B is equal to the probability that both events A and B occur, divided by the probability that event B occurs. We're assuming here that event B occurs with non zero probability so we don't have any division by zero, okay? So this is both an informal definition of what we mean by conditional probability as well as a formal definition of what the notation probability of A conditioned on B actually means. With this definition of conditional probability we can now define formally what it means for two random variables X and Y to be independent. And actually I'll point out here some notation that I'll generally be consistent with which is to use capital letters for random variables and lower case letters for values that those random variables can take. Okay so here we have random variables capital X and capital Y, and we'll say that they're independent, if for all possible values X and Y, the probability that X, takes on the value little x, conditioned on the event, that random variable Y, takes on value lowercase y. Is exactly equal to the probability that random variable X takes on the value X in the first place, i.e. The fact that the random variable Y takes on any particular value is irrelevant as far as determining the probability with which X takes on some particular value. Finally we're going to use in this lecture and some of the examples the law of total probability which is just a very convenient way to break up and therefore calculate the value of some probability. So let's let E1 through EN, be events, that partition the space of all possibilities. This means, that the Ei, are all, pairwise impossible. That means we can't have it be the case that both Ei, and Ej occur for any i and j. So that is if E1 occurs, then it's impossible for E2 to occur also, and if En minus 1 occurs, then it can't be the case that E3 occurs. Okay, so these events are, are pairwise impossible and moreover they partition the space of all possibilities in the sense that at least exactly one of E1 through En have to occur, so some event Ei has to occur. Then for any other event A we can express the probability of event A as the sum over all i of the probabilities that A occurs and also Ei occurs. Right, this completely partitions the possibilities for when A occurs because some Ei has to occur. Exactly one of the Ei's has to occur. And so by summing over the probabilities of A and each of those events Ei, we recover the total probability that event A occurred. And then we can break this up further using our definition of conditional property to be equal to the sum over all i of the probability of A conditioned on event Ei times the probability of event Ei. Think, now again, I do assume that you've seen or encountered all of these concepts before. This was just meant to be a brief review. If you haven't, then I would recommend you look at some introductory probability text. The first chapter or the first chapter and a half perhaps. For more, thorough coverage of these different term, terms and definitions. So let's get back now to cryptography. So remember the formal definition of a private key encryption scheme, okay? A private key encryption scheme is defined by a message base, bold M, all right, that's the set of all possible messages that are allowed to be or that can be encrypted by the scheme, as well as three algorithms, Gen, Enc and Dec. Gen was the key generation algorithm. This is a probabilistic algorithm that generates a key, that I’ll denote by lower case k. The encryption algorithm takes as input a key k, and a message M, in the message base and outputs some ciphertext E. And just remember that we're allowing the encryption algorithm to be randomized meaning that if you run it several times, even using the same inputs, k and m, you can potentially get different ciphertext out. 'Kay, we haven't seen an example of such a scheme yet, but we're, we're allowing that in our definition. Finally, the decryption algorithm takes as input a key k and a ciphertext c and outputs a message m. And we have the correctness condition that decryption of a ciphertext encrypted, that the encryption is the message, where you encrypt and decrypt using the same key, should recover the original message. Again, I haven't put that on this slide. Now by way of some more notation, I want to define in parallel to the notion of using bold face m for the message space. I want to define the notation of a bold face k to denote the key space. That’s just the set of all possible keys, that is the set of all possible things that can be output by the key generation algorithm. We use also a bold face C, for the ciphertext space, that is the set of all possible ciphertexts that can be output by the encryption algorithm, taken over all possible keys that can be output by the key generation algorithm and all possible messages that are in the message space. So now what we want to do is look at probability distributions over the messages that can be sent. So we'll let capital M without the bold face, okay, capital M be a random variable denoting the value of the message that the sender wants to communicate to the receiver. So this is a random variable that can potentially take on values in the entire message space. Right, so a priori, the set of possible values for what the message can be are exactly those values that are in the message space. We're not allowed to encrypt anything outside the message space but we are potentially allowed to encrypt anything we like in the mess, message space. Now, the random variable M is meant to reflect the likelihood of different messages being sent by the parties, and this takes into account anything that the attacker might know about what those messages might be. So this distribution, this random variable, M, over the messages that can be sent is meant to reflect, not only the different probabilities with which the sender might want to transmit a certain message. But also something about, perhaps, the attacker's external knowledge that may have some influence or some bearing over what the sender is sending. So just as a, as a, as one particular example, we might have the following distribution over random variable M, that the probability that the message is equal to attack today Is 0.7 and the probability that the message is don't attack is 0.3. And everything else in the message space has a probability of zero. So the only possibilities are either attack today or don't attack and those are going to occur with respective probability 70% and 30%. Okay, now in other situations of course, it might be different. Maybe again, those are the only two possible messages, but maybe the attacker has no idea whether the sender wants to attack or not. And therefore the probabilities would each be 0.5 or maybe there are some other messages that are possible that can occur with different probability what have you. You're right, you have different possibilities of what those can be, but here we're just fixing some particular distribution for this random variable M as specified, okay? Now we're going to let capital K be a random variable denoting the key. The values that this random variable can take are exactly those in the key space, right? Only those values in the key space are possibilities for the key. And actually any prob, any value in that space is a possible value for the key and so is a possible value that this random variable K can take. Now one important thing is that if we fix the encryption scheme, or fix some encryption scheme, Gen, Enc and Dec, then the key generation algorithm itself defines for you the probability distribution for K, right. The probability that the random variable K, takes on some particular value, lowercase k or the probability that the key is equal to some particular value K is, by definition, equal to the probability that the key generation algorithm outputs that key. The probability that the key generation algorithm outputs the particular key, given by this value lowercase k. So the encryption scheme defines your distribution over the random variable k. Now we're going to make the very crucial and important assumption that the random variables M and K are independent. Right, this means that the probabilities with which the message takes on some value are independent of the probabilities that the key takes on some particular value. Now this is actually a very reasonable assumption, because the messages that a party wants to send to another party should not depend on the key that's used to encrypt that message, right? We imagine that we externally to the encryption scheme, have some message that they want to communicate. And then we're sampling a key and using some encryption scheme to encrypt that message. But the details of the encryption scheme and the particular key we ended up choosing shouldn't influence the message that we're sending. This is a crucial assumption, and one that we're going to make throughout the class. It is possible to concoct scenarios where this assumption doesn't hold. But in that case, the definitions we're giving do not apply, and much more complex definitions need to be considered. But traditionally we, the assumption is made that M and K are independent, and this is the assumption we are going to make here. Moreover, this is the reasonable assumption that should hold except in some very unusual circumstances. So, let's fix some encryption scheme, given by Gen, Enc and Dec, and fix some distribution for the message. Some distribution on capital M on the message that the sender is going to send to the receiver. So we have now fixed by the encryption scheme the distribution over capital K, and we've fixed by assumption a distribution over the message space given by capital M. Consider now the following randomized experiment. What we'll do, is first, chose a message little m, choose a particular message m, according to the given distribution, right? So, going to back to the example before, that means we might pick the message attack now with probability 0.7 and the message don't attack, with probability 0.3. So we sample it according to those probabilities and now we've chosen our message and let's call that little m. We then generate a key, K, using the key generation algorithm, i.e we sample the key k according to probability distribution on the random variable capital K. And then what we're going to do is kind of couple those together by encrypting the message we've chosen using the key that we've chosen, and the encryption algorithm that we're given having fixed the encryption scheme. This gives us a ciphertext that we'll denote by little c. The key point here is that this defines a distribution on the ciphertext, right? This process of sampling a random message, sampling a key and then encrypting the message using that key defines some distribution on the ciphertext, and we can denote the distribution on the random variable that denotes the ciphertext by the variable capital C. 'Kay, so we'll let capital C be the random variable denoting the value that the ciphertext takes on after this experiment. I think it's helpful here to walk through an example of how this works out and do an explicit calculation. So let's consider the shift cipher. We said that the the fixing an encryption scheme defines the distribution over capital K. And indeed in the shift cipher, the key is chosen uniformly from the set of all characters, or from the set of all numbers between 0 and 25, i.e., the distribution on capital K is the one where the probability that the key takes on, any particular value between 0 and 25, is exactly 1 over 26. Right, every key in the range of 0 to, 0 to 25 is chosen with equal probability. Now let's assume we'll take a simple example of a distribution over the message space. Assume that the probability that the message is the single character a is 0.7, and the probability that the message is the single character z is 0.3. Well, let's calculate now the probability that the ciphertext is the single character b. Well, there are only two possibilities. Either the message is equal to a and the key is equal to 1. Or, in that case we shift our message by one position and indeed get the ciphertext b, or the message is equal to z and the key is equal to 2. Those are the only two ways that we can end up with a ciphertext being equal to the character b. And now we just have to compute these probabilities. So the probability that the ciphertext is equal to the character b is equal to the probability that the message is equal to a and the key is equal to 1, plus the probability that the message is equal to z and the key is equal to 2. Because the message and the key are independent, those become the, just the products of the two terms. So we have the product sorry the, we have the probability that M equals a times the probability of K equalling 1, plus the probability that M is equal to z, times the probability that K is equal to 2. And now we just plug in the known values for those different probabilities. Right, the probability that M, is equal to a, we said was 0.7. The probability that k is equal to 1 is 1 over 26 and the probability that m is equal to z is 0.3. The probability that the key is equal to 2 is 1 over 26. Adding that all together, we see that the total probability with which the ciphertext is equal to the character b is exactly 1 over 26. 'Kay, nothing magical, but just going through the steps, I think clarifies a little what we mean, what we meant on the previous slide. Taking another example, a slightly more complicated one, we'll again consider the shift cipher, but now let's look at a different distribution over the message space. So now we'll assume that we have two possible messages that can be sent, like before, but they're now longer than one character each. So we have that with probability one half, the message is equal to the string O-N-E. And with probability one half, the message is equal to the string T-E-N. Right? One or ten. What's the probability that the ciphertext happens to turn out to be equal to the string rqh? Well, if we break it up as before, what we have is that this is equal to, using the law of total probability, the probability that C is equal to rqh conditioned on M being equal to one, times the probability that M is equal to one, plus the probability that C is equal to rqh conditioned on M being equal to ten, times the probability that M is equal to ten. And we can compute that as follows, so conditioned on the message, being equal to one, the cipher text turns out to be equal to rqh if and only if we have a shift of three i.e.,. the key was equal to three. Right in that case we have o going to r, we have n going to q and we have e mapping to h. So now the probability that the ciphertext is equal to rqh conditioned on the fact that the message is equal to one is exactly the probability that the key is equal to three, which is 1 over 26. We multiply that now by the a priori probability with which the message was equal to one, which we're given as one half, and now we come to a more interesting calculation, right? What's the probability that the ciphertext is rqh conditioned on the message being equal to ten? Well, if you think about it a little bit, or if you try to exhaust over all possible keys, you can see actually that there's no possible way the ciphertext can be equal to rqh, given that the message is equal to ten. And the reason for that is simply that there is no shift that will simultaneously shift t to r, e to q, and n to h. There's just no way for that to happen. So the probability that the ciphertext is equal to rqh conditioned on M being equal to ten is 0. We can write one half for the probability that M equals ten. It's not going to matter in the end anyway because it's going to go to zero, but there you have it. And so by summing these probabilities we get that the probability that C, the ciphertext is equal to the string rqh, is exactly 1 over 52. Now, recall the definition of perfect secrecy, from the earlier in the lecture. All right, we said that regardless of any prior information the attacker has about the plaintext, the ciphertext that the attacker observes should leak no additional information about the plaintext. What we're going to do now is equate the attacker's information about the plaintext with the attacker known distribution over the plaintext. So, what this means is that perfect secrecy will require that observing the ciphertext should not change the attacker's knowledge about the distribution of the plaintext at all. Right, so the attacker has some knowledge about the distribution of the plaintext before observing the ciphertext. But that's just given by the distribution capital M, the distribution over the possible messages that might be sent. After observing the ciphertext the attacker can update its knowledge about the distribution, and what we require is that a scheme is, is defined as, to be perfectly secret, if, and only if those two distributions, right, the a priori distribution before observing a ciphertext, and the a posteriori distribution, after observing a ciphertext, should be exactly equal. So more formally, what this means, is that an encryption scheme, defined by the algorithms Gen, Enc, and Dec, and the message space bold face M is perfectly secret, if for every possible distribution over the message, over the message space, right, any possible distribution over the possible messages that might be sent. Every message m and every ciphertext c that can occur with non zero probability, it holds that the probability that the message was equal to m conditioned on observing a ciphertext equal to lowercase c is exactly equal to the a priori probability that the message was equal to m in the first place. So we have here an expression, on the right hand side we have the a priori probability with which the message takes on some value lowercase m. Right, this is the attacker's knowledge about the distribution about the plaintext before observing any ciphertext. And we're requiring that to be equal to the left hand side, which is the probability that the message was equal to some value M, conditioned on observing that the ciphertext took on some particular value lowercase c. Okay, so again this is our definition of perfect secrecy. This is just a formalization, a mathematical description of our informal, intuitive notion of security that we had on the previous slide. Now I know the definition is a little bit difficult to get used to, and so what I wanted to do was walk through just two examples of calculations that pertain to that definition. Let's go back to an example we had before, where we had the shift cipher. And again, the distribution where the message is equal to one with probability one half. And the message is equal to ten with the probability one half as well. And let's take as one particular message the string ten and as one particular ciphertext the string rqh. Well, we can then ask, what's the probability that the message was equal to ten conditioned on the fact that we observed a ciphertext equal to rqh? You might already be able to answer this question just based on intuition, but I want to go through the explicit calculation. Well, the reason, we, we can actually say quickly that this is exactly equal to 0. Sorry, I didn't go through the calculation. I'll do that in the next example. Why is that? Okay, well remember what we said before is that the condition on the message being equal to ten. It's simply not possible for the ciphertext to be equal to rqh. There's no key that will map the message M onto the cipher text rqh. What that means is that if we've observed the ciphertext rqh, then it cannot possibly the case, be the case that the message was ten, right? Because again, there's no possible way for the message to be equal to ten and the ciphertext to be equal to rqh. So therefore, the probability that the message is equal to ten, conditioned on the ciphertext being equal to rqh is precisely 0. It's simply not possible that the message was ten given that we've observed the ciphertext rqh going across the channel. The thing to note here is that this is not equal to the a priori probability that the message is equal to ten right? The a priori probability that the message was equal to ten is one half which is not equal to 0. And this shows actually that the shift cipher does not meet the definition of perfect secrecy. I'm going to turn next to a more complicated example, and in analyzing that we're going to use Bayes's theorem. This is something else that I'm assuming that you've seen before but I just wanted to review it very quickly. So Bayes' theorem is just a way of switching the order of two events in a conditional probability statement. And it simply says that the probability of A conditioned on B is equal to the probability of B conditioned on A times the probability of A divided by the probability of B. It's quite easy to derive this on your own from what we had on the previous slides, but in any case, this is the statement that we're going to be using. So, for the next example, we'll make it a little more complicated yet, we'll take again the shift cipher as our example, but now we'll consider a distribution over three possible messages. Right, so there are three different messages that can potentially be sent. We'll have that the message is equal to hi with probability 0.3, the message is equal to no with probability 0.2, and the message is equal to in with probability 0.5. I just made these numbers up. What's now the probability that the message is equal to hi, conditioned on observing the ciphertext being equal to xy? Well, we could rewrite this using Bayes' theorem in the following way. It's just a probability that the ciphertext is equal to xy, conditioned on the message being equal to hi, times the probability that the message is equal to hi, divided by the probability that the ciphertext is equal to xy. And now we just need to calculate each of those probabilities individually. Well, the probability that the ciphertext is equal to xy conditioned on the message being equal to hi is easy to calculate. The prod, get, conditioned on the message being equal to hi, there's exactly a single key that will result in a ciphertext xy. I forget exactly what the shift is but you can see just by exhausting over all 26 possibilities that there's exactly one value of the key that will map the message M onto the ciphertext. Sorry, that will map the message hi onto the ciphertext xy. So therefore the probability that that key is the one that's chosen by the key generation algorithm is exactly 1 over 26. Because again the shift cipher chooses keys uniformly from the space of 26 possible keys. More interesting or more complicated, is the computation of the probability that the ciphertext is equal to XY. Here, we'll just go back to the law of total probability that we used before. So the probability that the ciphertext is equal to xy, we can break that up as the sum of three different terms. The first term is the probability that C equals xy conditioned on the message being equal to hi times the probability of hi, which is 0.3, plus the probability that c equals xy conditioned on M equals no, times the probability that M equals no, which was 0.2. Plus the probability that C equals xy conditioned on the event that the message was equal to in times the probability of in which was 0.5. Now the probability that the cipher text is equal to xy, conditioned on the message being equal to hi is 1 over 26 as we said earlier. The probability that the ciphertext is xy conditioned on the message being equal to no is also 1 over 26. There's again exactly one key that will map the message M onto the ciphertext xy and the probability that that key is chosen is exactly 1 over 26. On the other hand the probability that the ciphertext is equal to xy conditioned on the message being equal to in is 0. You can again check that there's no possible key that will map the message in onto the ciphertext xy. And so when we add these up we get a total probability of 1 over 52. Finally, the probability that the message is equal to hi conditioned on C equals xy. We can go back to our previous expression in terms of Bayes' Law, plug in the values that we computed, and compute the value of 0.6. So the probability that the message was equal to hi, conditioned on the ciphertext being equal to xy, is exactly 0.6. And the thing to note again, is that this is not equal to the a priori probability, with which the message was equal to hi, which was 0.3. So this demonstrates again that the shift cipher is not perfectly secret because we found a particular distribution over the message space for which the a priori probability that the message took on some particular value given that the observed ciphertext took on some other value, was not equal to the a priori probability with which the message took on that value. So just to summarize, we see here two examples demonstrating that the shift cipher is not perfectly secret, and the question that remains is how can we construct a perfectly secret scheme, right? We have this nice definition of what perfect secrecy means, this nice definition that encapsulates our intuitive understanding or our intuitive desire for what we want to achieve by an encryption scheme. How can we build an encryption scheme and prove that it satisfies that definition? And this is the problem we'll turn to in the next lecture.