[NOISE] In this lecture, we'll talk about identification schemes and then see their application to the construction of digital signatures. Now identification schemes are quite interesting in their own right, both for what they can achieve as well as in their analysis. And I hope you'll agree with me about that after you watch today's lecture. However, identification schemes have only limited efficability on their own. We're covering them here because they are extremely important as a building block for digital signature schemes. That is they provide a very important frame work for the construction of digital signatures. Now because we're only interested in them as a building block and we're ultimately interested in the signature schemes themselves, we're going to be somewhat informal in our treatment of identification schemes. I hope that's okay. An identification scheme is a protocol that's run in the public key setting, just like public key encryption and digital signatures. Here, we're going to have two entities, one of which we'll call a prover, and the prover is going to be the one who locally generates a pair of public and private keys. And then as usual, publicizes their public key and makes it widely available. A second entity, whom we'll call a verifier, is assumed to be able to obtain an authentic copy of the prover's public key. So now we have a situation where the verifier has a copy of the prover's public key, and the prover has a copy of the corresponding private key. Now the goal of an identification scheme is to allow the prover to convince the verifier that the prover is who he claims he is. That is the prover wants to convince the verifier that he indeed is the one who generated. The public key that the verifier holds, along with of course the corresponding private key. And so the prover needs to be able to interact with the verifier, and through this interaction convince the verifier of his own identity. And we're going to be interested only in identification schemes having a very particular form, in particular they are going to work in three rounds of interaction. Where in the first round the prover sends a message we'll denote by A. Then in the second round the verifier chooses its message, by choosing a challenge that we'll denote by c, uniformly from some space omega. After receiving that challenge, the prover, based on the original message A, its own private key, as well as, of course, the challenge, computes a response that we'll denote by s. And the verifier will then be able to run some local verification procedure involving the public key, along with the challenge, and the response s, and will then check whether or not the computation run on those three elements yields the first message, A. So this verification procedure, just think about it now abstractly, as some mechanism that processes the public key, the challenge and the response. And the verifier will accept, that is be convinced it's talking to the prover, if and only if the verification procedure when running those three items yields the first message A. We'll see an example later on and this will make more sense hopefully. Now the security that we're going to be interested in is a relatively weak notion of security, that is, security against passive eavesdropping attacks. And the model here is that you can imagine an attacker who eavesdrops on multiple honest executions between the prover and a verifier, and of course, also has access to the public key itself. And the idea of, about this security notion, is that even after eavesdropping on these multiple honest executions, the attacker should not then be able to turn around, and convince a verifier falsely that the attacker is the prover. That is the attacker should not be able to succeed in carrying out an execution of the identification protocol with a verifier, when the prover, when the real prover, is not around. So this just means that the identification protocol is achieving the notion that it's claiming to, that is, that it's actually identifying this prover, who holds the corresponding private key. And that an imposter, i.e this attacker, cannot fool the verifier into believing that the verifier is interacting with the real prover, even after the, even after the attacker has eavesdropped on multiple honest executions of the protocol. Now for technical reasons we're going to also make the assumption that the identification protocol we're talking about is non-degenerate. And this just means that the first message A has a negligible probably of repeating. And this will be satisfied by almost any natural protocol fitting into this three-round paradigm. Now, a prototypical application of identification schemes is in-person identification. So imagine here something like using a swipe card to gain access to some room. And there what you might have is you might have on your smart card, you might have a private key stored and when you swipe your card you might imagine that you're able to execute a protocol with the access card reader and your card. So that is your card which holds your private key will be acting as a prover, and the card reader which is connected to the door will be acting as the verifier. And then you could imagine, actually, that these, that the card and the reader are actually executing this three-round protocol. And if and only if the card is able to convince the reader that it is who it claims it is, well, the card reader unlocks the door and allows access. Now, why am I mentioning in-person identification. Because in fact, identification protocols as described, are not suitable for remote authentication, i.e authentication over the internet. And the reason for that is a little bit subtle. So in fact, identification protocols are fine for proving to another party that you are indeed who you claim you are. The problem is that if you then communicate after that, for example, if I run an identification protocol with a verifier, and then start sending commands or requests or other messages, there's nothing that binds my subsequent communication with the verifier with the original execution of the protocol. And I, for example, if you imagine that you would try to set up authentication to your bank, using a, an identification protocol as, as just described, and that is, the customer would run the identification protocol with the bank, following which, the customer can then request, for $100.00 to be transferred. That's going to be completely insecure. And the reason is, is that an attacker can simply wait for the customer in the bank to finish executing the protocol, and then immediately afterward after a successful execution of the protocol, the attacker can jump in and send any message it wants to the prover. So it turns out that identification protocol's are not super useful for things over the internet. They can be used for in-person authentication and identification, but it's, that's a relatively limited domain. Now for our purposes again, we're most interested in the application of identification schemes, to constructing digital signatures. And the basic idea by which identification schemes can be used to construct digital signatures, is that we're going to have the prover find a message, by running the identification scheme by itself, as it were. Generating the challenge, that is the second message of the protocol, using a hash function, and I'll show how this works in a moment. This transformation from identification schemes to signatures, is known as the Fiat Shamir transform, after the two authors who first suggested and analyzed it. So let's see how that might work in a little bit more detail. So we have this entity who, before we called a prover, but now we'll call a signer, and they have a private key, along with some message, M, that they wish to sign. And the idea as I said is that this entity will begin running the identification protocol by itself. So that means that what the signer will do is begin by generating an initial message A, but now there's no other entity with which this party's interacting. So as indicated what this party will do, is generate the second message of the protocol, the challenge, c, using a hash function. And in this case, by applying the hash function to the initial message A, along with the message m, that it wishes to sign, and it will interpret the result as a challenge c, and then the signer will compute the last message of the identification protocol that we've been calling s. The signature on the message m, will then consist of c and s. How does somebody verify the signature? What they're going to do, they're going to, as it were rerun the execution of the identification protocol and check that the it results in a accepting execution. How that going to work is verifier will take the message m along with signature c and s. Along with the public key that it must also know, because remember we're still in the public key setting. This is a digital signature scheme, we assume the verifier holds a copy of the signer's public key. The verifier will then run the verification procedure, using the public key and the signature, c and s, and that will result in an output value A. And the idea then is that the verifier wants to check whether this A is the value the signer on the left-hand side here actually used. And it can check that by recomputing the hash over A and m, and checking whether the result is the value c contained in the signature. And we can prove that this signature scheme is secure if the original identification scheme is. More formally, we have the following theorem, if the identification scheme is secure against passive attacks, and if we model the hash function H as a random function. Then the signature scheme we derive by applying the [INAUDIBLE] transform, as in the previous slide, is indeed secure, it gives us a secure signature scheme. The proof of this theorem is a little bit complex and we're not going to go into it here. Now, I want to give an example, a concrete example of an identification scheme. And then the signature scheme that can be derived from that scheme. So let me just recall briefly, the setting for cryptographic schemes based on the discreet logarithm problem. We have some group generation algorithm that outputs a description of a cyclic group G, having prime order q, along with a generator g of that group. And the five of the order of the group, the, the bit length of Q is going to be n bits, where n is the security parameter, as usual. The discrete logarithm assumption says that, given a uniform value h in the group, it's hard to compute the discrete logarithm of h with respect to g. We can build from this assumption an identification scheme called the Shnorr identification scheme, and this is named after the researcher who first developed the scheme in the late 1980's. The basic idea here is that the prover will generate his public and private key, by running the group generation algorithm to obtain G, q, and g. And then choosing a random exponent x and computing h equal to g to the x. The public key of the prover will then consist of the parameters G, q, and g, along with the group element h. And the private key of the prover will be exponent x. So remember that h is equal to g to the x. An execution of the protocol works in the following way. The prover begins by choosing a random exponent, r, and setting the first message A equal to g to the r. The verifier then chooses a random challenge in Zq. The order of the group here is q so the challenge being chosen here from the same space as the possible exponents here for elements in the group. The prover then responds with the response s computed as cx plus r module q. And finally, the verifier computes g to the s times h to the minus c, and checks whether that value is equal to A. And if so, it accepts, and if not, it rejects. Now just a little bit of algebra will convince you of that in an honest execution the verifier will always accept. G to the s times h to the minus c is equal to g to the cx plur r times h to the minus c. By regrouping, we can write that as g to the x raise to the c, times g to the r times h to the minus c. And then we simply use the fact that by definition, h is equal to g to the x. Things nicely cancel, we're left with g to the r, which is of course, exactly equal to the initial message sent by the prover. Now what can we say about security of this identification scheme? Well the first thing I want to convince you is that an attacker who only sees the public key. So here we're not yet looking at an attacker who's also eavesdropping, but an attacker who only sees the, the public key of the prover. I claim that, that attacker cannot impersonate the prover without solving the discrete logarithm problem. Why is that the case? So imaging we have an attacker who observes the public parameters G, q, and g. Along with h. And then tries to impersonate the prover through some honest verifier. Well, the attacker's going to begin by sending some initial message A. A priority, we have no idea how that message was computed, it doesn't really matter. It just sends the message A, and then the idea is that the attacker, if it's going to be successful with high probability, should be able to respond correctly to at least two different challenges from the verifier. If the attacker can only respond to a single challenge from the verifier, he will only potentially impersonate the prover with probability of 1 over q, which is negligible. So roughly speaking then, that means that there's at least one challenge phi for which the attacker can respond correctly with our response s for which the verification condition holds. And there must also be at least one other challenge, c prime, distinct from c for which the attacker can also respond correctly. Now from those two responses, we can in fact, derive that they must be equal. That sorry, the verification conditions must be equal. That is g to the s times h to the minus c, must be equal to g to the s prime times h to the minus c prime. And that's simply because they're both equal to A. And then just a little bit of arithmetic allows you to compute from that, the discreet logarithm of h with respect to g. So in summary, what we've shown here is that if there is an attacker who, just based on on observing the public key, can successfully respond to at least two different challenges, then that attacker must implicitly be able to solve a discreet logarithmic problem. If the discrete logarithm problem is hard, then such an attacker cannot exist. We haven't yet talked about what happens if the attacker can also eavesdrop, and we might think that, in fact, eavesdropping on honest executions of the protocol might potentially give an attacker some additional advantage. What I want to show here is that that's not the case. So let's look first at an execution of the protocol between the prover and the verifier. So I've just written out the steps here, we have the first, first message A equals g to the r. We have the response c, which was a random challenge chosen from Zq and then we have the response s equals cx plus r. I've omitted the mod q for simplicity. And the condition that holds on the values in those transcripts, right, A, C, and S, is that g to the s h to the minus c, is equal to A. Now, what I claim is that it's possible to simulate a valid-looking execution of the protocol, based only on the public key and without knowledge of, of x. Okay, on the left-hand side, we had an interaction of the protocol with the prover sending A, the verifier sending c, and then the prover using the private value x to compute an, an appropriate response. But on the right-hand side, I'm going to show how we can compute an identically, an identically distributed transcript, without knowing x at all. And the way that works, is that we'll take we will have access to the public key of course, just like the attacker does, and what we'll do is we'll begin by doing things out of order. What we'll do first is we'll choose the challenge c uniformly for Zq and choose s as well uniformly from Zq and then compute A, the initial message of the protocol, as g to the s, h to the minus c. And what I claim is that this transcript of three values. A, c s on the right, is distributed identically to the three values on the left. It might take a little bit of playing with this to see that, but let me give you my understanding of how I interpret that. Well, the value c is uniformly distributed in Zq on the left and the right and, furthermore, the value f is also uniformly distributed on the left and the right. Right? And that's simply because, on the right, we chose it uniformly and, on the left, it's equal to c times x plus r, where r is a random, a uniform value. Now the initial message A is computed differently in both cases. But in both cases, A is the unique value for which g to the s which is equal to g to the s, h to the minus c. So in fact, both transcripts, on the left and the right, have the identical distribution. Sorry. And then what this means is that observing or passively eavesdropping honest executions of the protocol is of no use to the attacker, because the attacker could generate simulated transcripts with the identical distribution. And that means that we may as well assume the attacker doesn't eavesdrop at all. And so we're back then in the situation of the previous slide, and we showed there that given only knowledge of the public key, an attacker could not in fact successfully impersonate a prover. I should mention that this idea of being able to simulate an honest execution of a protocol is a very powerful one, and you may have heard of the notion of zero-knowledge proof. And this same core idea is used there as well. But that's a bit beyond what we can hope to cover in our lectures. What can we claim, then, about the security? Well, if the discrete logarithm assumption holds, then in fact, the Schnorr identification scheme is secure against passive attacks, that's what we've proved on the past two slides. And that means as a corollary if the discrete logarithm assumption holds and we model H as a random function. Then the signature scheme that we derive from Schnorr identification protocol by applying the Fiat-Shamir transform, is a secure signature scheme. Now, I did want to mention one other signature scheme which can be viewed as being derived from an identification scheme, even though it's not typically presented that way. And the reason I want to mention this is because it's an example of a very important scheme that's widely used in practice. Unfortunately for reasons I can't fully explain, the Schnorr signature scheme is not very widely used in practice. It has very nice theoretical properties, we can analyze it, we can say a lot about it, but it isn't used very often in practice. In contrast, the DSS signature scheme is a standardized scheme. What one that was standardized by NIST. And is used quite widely. DSS stands for the digital signature standard. The standard incorporates two different schemes, DSA, the digital signature algorithm, which is based on the discrete-logarithm problem in subgroups of Zp star, and the ECDSA, the elliptic-curve digital signature algorithm, which has a very similar signature scheme, but based on a different class of groups, based on elliptic-curve groups. And as I mentioned, these can be viewed as being derived from identification scheme, although using an unproven variant of the Fiat-Shamir transform. Not exactly the Fiat-Shamir transform as we presented it. So I'm not going to go into detail about that, but I did want to mention the scheme because it is so widely used that you really should know about it. Now, in the next two lectures, we're going to look back at what we can use digital signatures for. We're going to talk a little bit about public key infrastructures, and how signatures can be used for securing public key infrastructures. And then we'll kind of try to put everything together that we've learned in this class, and look at the SSL/TLS protocol, and and how that's used to secure communication on the web nowadays.