Hello and welcome the sixth lecture. By now you've seen a lot of the basics of Bitcoin, how the system works, how mining works, and how to use Bitcoin as a currency. Now let's get to what has been one of the most controversial aspects of Bitcoin which is the Anonymity properties of Bitcoin. And in fact there's a lot about Bitcoin and anonymity that you'll hear different opinions on. It's Bitcoin anonymous first of all? Are anonymous cryptocurrency is even a good thing? Is it good for people who have a stake in Bitcoin? Is it good for society? And what are the various proposals that have been made to improve Bitcoin's anonymity? How well did this work, which is the issue we adapt, and so on. So, in this lecture, what we're gonna do, is help cut through all of that confusion. And we're gonna discuss where things are, what are the options and where things seem to be going. So let's start like this. Let's start with a basic understanding of what we even mean when we say anonymity and Bitcoin and some of the overall concepts like, how does anonymity tie into privacy? Is that a good thing or a bad thing? Can we only have the good aspects of anonymity without the bad? A variety of questions like that. And then we will see a variety of proposals, some are already existing and some that may be implemented someday for improving Bitcoin's anonymity or creating different anonymous Cryptocurrency all together. And what's interesting about them is that they offer a variety of increasing levels of cryptographic sophistication as we go down this list. And we'll learn to see what the tradeoffs are and analyze the anonymity properties, how deployable these are, and so on. All right, let's get started. If you look online you'll see there are a number of people in groups saying that Bitcoin is anonymous. There's no shortage of opinions on this. Let me just pull out one quote in particular, this is the WikiLeaks donations page. It says in plain and simple terms, Bitcoin is a secure and anonymous currency. Is that actually true? Well, you'll also find a variety of opinions to the contrary. Again I'm just pulling out one example. This is the Wired UK saying Bitcoin won't hide you from the NSA's prying eyes. So how can we resolve this confusion? Let's look at what the word Anonymous means. At quite a literal level, anonymous means without a name. And so what does that mean exactly? Well, there's two ways to interpret it. We know that in Bitcoin addresses are public keys. You don't need to put in your real name in order to interact with the system. Or public key hashes instead of real identities. But we can interpret this property of being without a name in two different ways. We could interpret it as interacting without your real name. Or we can interpret it as interacting without any name at all. Now if you interpret it as interacting without your real name then certainly Bitcoin is anonymous in that sense. But we do have these public key hashes that act as some sort of pseudo identities. And so when computer scientists look at this situation, they don't use the term anonymous to describe this. They call this Pseudonymity. And there's a very clear difference between the two. And it's an important one. And we'll see why in a second. You might wonder, yeah, even though you're using a pseudonym which is your public key hash, you can create any number of them. You can have as many pseudonyms as you want. Does that make it anonymous? Well the answer is not quite and we'll get into that as well. Okay. So if computer scientists call this pseudonymity, what is anonymity then? Is there a clear definition of what it would take for something to be called anonymous? At a conceptual level the answer is very simple. Anonymity in computer science is just pseudonymity together with unlinkability. So what is this property called Unlinkability? At an intuitive level we'll get into better definitions in a little bit. But at an intuitive level what unlinkability means is that as a user interacts with the system repeatedly these different interactions should not be able to be tied to each other from the point of view of some adversary. So you have to be talking about a specific adversary for this to even make sense. Now this distinction here between full anonymity and mirrors pseudonymity, is something that you might be familiar with from a variety of other contexts. And one good way that I like to explain this is to look at online forums. And again here, the distinction between a mere pseudonymous interaction and an anonymous interaction comes up in different forums. And Reddit as a good example of forum where you pick a long-term pseudonym and interact over a period of time with that pseudonym. You could create different pseudonyms, but it's going to be practically infeasible to create a new pseudonym every single time you want to post a comment, and it's not even very meaningful. So Reddit offers pseudonymous interaction. The opposite of that, fully anonymous interaction, where you can make posts with no attribution at all is the model that you typically have in 4Chan. And there is a similar difference in Bitcoin as well, in Bitcoin is in the pseudonymous model more than the anonymous model. Okay. But let's talk about why this difference is important in Bitcoin. Why is mere pseudonymity not sufficient if you want privacy? After all, if you have pseudonymity it seems like even if somebody can create a pseudonymous profile of all of your interactions on the system, they can't tie it back to your real identity. Well, here is the answer to that, it turns out that if you have this pseudonymous profile, it's pretty fragile. It's very easy for it to get linked back to your real identity at some point. And if that happens at any point, then of course all of your transactions past, present and future have been linked to your identity. So here are couple of different ways in which that can happen. One is that a variety of Bitcoin businesses online, wallet services, exchanges, and others, even vendors in a lot of cases, are going to want your real life identity in order to let you transact with them. Consider this analogy, you go to a coffee shop, you pay for your coffee with Bitcoins, and of course if you're there in the store, then the person who's giving you your coffee who sort of knows who you are even if they don't actually ask for your real name. And so your physical identity does get tied to one of your Bitcoin transactions. And if that Bitcoin transaction then gets tied to all of your Bitcoin transactions then that is a complete violation of anonymity. So this notion of a pseudonymous profile is very fragile. It could easily get compromised in a variety of ways and also, even if such a direct linkage doesn't happen, these linked profiles can be deanonymized due to side channels. What do I mean by Side Channels? Well here's something that I find intriguing that might seem like a tall claim but in fact such things have been known to happen. Maybe somebody looks at a profile of your pseudonymous Bitcoin transactions and finds that you interact at certain times of day. And they're able to correlate the times of day when you're active online with the times of day when your Twitter account is posting Tweets. And so they're able to find a connection between your Twitter identity and your transactions on Bitcoin. Similar attacks have been known to happen, so this is why this notion of this pseudonymous profile is considered quite fragile. And for real anonymity we want the stronger notion of unlinkability. So let's try to define it in a little bit more concrete sense, what unlinkability means in the context of Bitcoin. And we can do that in a variety of different ways. One is that it should be hard to link together different addresses of the same user. Another is that it should be hard to link together different transactions made by the same user. Both of these seem intuitive. Look at this one though, it should be hard to link the sender of a payment to its recipient. This one might sound a little confusing at first because if you interpret a payment as a Bitcoin transaction. Then of course that transaction has inputs and outputs. And these inputs and outputs are inevitably going to be in the block chain publicly and linked together. And so you might think that this is impossible to achieve, but if we interpret this notion of payments in a different way, not as a single direct bit point transaction, but perhaps as an indirect sort of payment. That goes through a circuitous route of transactions, then one might imagine that the ultimate sender and the ultimate recipient of that payment might not immediately be linkable, looking at the Bitcoin block chain. So these are all somewhat more concrete, but still at an intuitive level, varieties of unlinkablility that one might want to shoot for. But if you look at this last definition, it might still be not entirely convincing. Let's say that you pay for a particular product and it costs a certain amount of Bitcoin, and then maybe you send that payment through a circuit as right of transactions. But still, you might think something looking at the block chain must be able to infer something. Specifically that Bitcoins left some address a certain number of Bitcoins, and Bitcoins showed up at some other address. And these two might be slightly different because of transaction fees and so on, but roughly equal. Also roughly in the same time period, because there can't be too much of a lag between the sending and the receiving of the payment. And so clearly, even if we try to achieve this kind of unlinkablility, it can be unlinkablility between all possible transactions, but some smaller subset of transactions that look like each other. So let's make this a little bit more concrete now. And this is how we quantify anonymity. We usually don't try to achieve complete unlinkablility. Which is unlinkability among all possible transactions or addresses in the system. But instead, we go for something more measured. We try to maximize the size of our anonymity set. The anonymity set is the size of the crowd of other addresses or transactions that we're trying to hide in. So if I can be reasonably sure that, with respect to some adversary, there are these thousand other transactions that look just like mine, and the adversary can't tell which one was mine, and that we might consider it to be a pretty good level of anonymity. And to calculate this anonymity set, it's not trivial at all, it takes a few steps. You have to first define concretely what your adversary model is. And you have to reason carefully about what that adversary knows, what they don't know, and what they cannot know. And there's no general formula for doing this, it requires carefully analyzing each protocol and system, and doing it on a case by case basis. I want to point out that in the Bitcoin community, often people carry out intuitive analyses of anonymity services, for example, mixing services that we're gonna see later in this lecture. And often they come up with ways like taint analysis. This is an intuitive way that tracks the flow between a particular sending address and a particular receiving address. And intuitively it might make a lot of sense, but if we consider it from the point of view of how we actually should calculate anonymity,taint analysis is not a very good measure of how much anonymity you get from a system. And the reason for that, is that it assumes a particular type of attack the adversary might carry out, a rather naive attack looking directly for quantities of flow between a sending and a receiving address. And if your adversary were a little bit cleverer than that, then you might carry out a taint analysis and think that you have a lot of anonymity in a certain situation, but in fact you might not. So, the bottom line, from this slide, is that quantifying anonymity must be done in terms of the anonymity set, and in some cases probability distributions on top of that anonymity set. And, it requires a careful analysis of the protocol in the system. You can't apply a simple formula. Okay. Let's switch gears a little bit and talk about the ethics of anonymity. Why do people want anonymity? We've already seen a little bit of the connection between anonymity and privacy, but let's make that very concrete. Now, in block chain based currencies, because all transactions are recorded on the ledger, they're totally and publicly and permanently traceable. And so if your identity ever gets linked to these transactions, you are in a situation where your privacy level is much worse than you get with traditional banking. Why? Because anybody might be able to carry out this type of denominization attack. Not specifically a company or a government that you might be worried about. Any member of the public. And your transactions, since their permanent, your loss of anonymity years down the line could affect all your transactions today and vice versa. So we really want anonymity to even get the privacy level of cryptocurrencies to the level that we enjoy with the traditional system. But also, people hope that it can give us a new level of privacy. Of course, we have to acknowledge the concerns, as well, and one of the major concerns is money laundering and all of the bad things that that can enable. So let's talk about that. This is definitely a legitimate worry. I wouldn't be in favor of studying anonymity and cryptocurrencies and ignoring the ethical aspects and saying oh, that's not something I'm gonna worry about. I'm only interested in the technology. I think it's important to consider the ethical aspects. There's one item of comfort that I will offer though. If you look at how things stand currently in Bitcoin, the difficulty of things like money laundering is not necessarily because the blotching is not so anonymous, and so it's easy to trace flows. But instead, the difficulty stems much more from the fact that moving large flows, into and out of the currency rather than within Bitcoin, Is what is really hard. In other words, cashing out is hard. And so anti-money laundering efforts have great promise if they're focused in this part of the system. And the good news is that all of these attempts to improve anonymity in Bitcoin don't affect this part of the equation in any way. And so I would recommend that Bitcoin researchers and developers coordinate efforts with anti-money laundering efforts by law enforcement and others, so that the technical aspect of Bitcoin anonymity can be relatively separate from law enforcement and legal aspects and so on. Nevertheless, one could try to ask, can't we design the technology in such a way that only the good uses of Bitcoin anonymity are allowed, and the bad uses are somehow permitted? Well, this turns out to be a quite common conundrum in computer security and privacy. In a lot scenarios, we want something like this, but it never turns out to be possible. Why? Because these different uses that we're talking about that we perceive as being very different morally, are going to be almost identical technologically. And if we want to encode some sort of moral rules into the technical rules of the system, that are going to be automatically enforced by minors, it's not even clear how to do that. And so hence my recommendation of separating out the technical anonymity properties of the system, with the legal principles that we put on top of it in terms of how people use that currency. It's not a completely satisfactory solution, but it's perhaps the best way we have of trading off a good with the bad. I do want to point out but this is far from the first time that we're considering this dilemma. It's come up in the context of Tor, an anonymous communication network, and anonymous communication enables bad actions at least as much as anonymous moving of funds does. And so Tor has really had to grapple with this problem. In a very simple and single picture, Tor is a communication network that routes messages between a sender and a receiver through a network of nodes. But further through some clever encryption, ensures that as long as at least some the nodes in that network are honest, then the adversary is not going to be able to link the sender to the receiver. So, that's what Tor does. You can see how it can enable a lot of bad activities. Let's look at some activities, good and bad, that do happen on the Tor network. It's used, first of all, by normal people who want to protect themselves from being trapped online by marketers or various other privacy properties online, when they're browsing websites. It's used by journalists and activists and dissidents, and so on. And, so that's clearly an important use case. It's also used by law enforcement. Cuz if they wanted to do an electronic sting operation, then you want to be able to visit websites without revealing that your IP address is coming from a law enforcement block. And so clearly a lot of activities that we might approve of. But it's also used by botnets, for example, for spreading malware between nodes in the network. And, unfortunately, there is also child pornography in the network. So distinguishing between these uses at a technical level is essentially impossible. And so Tor has grappled with this issue and as a society we have grappled with it. And by and large we've concluded that it's better for the world that the technology exists than it doesn't and in fact one of the main funders of Tor is the US State Department. They're interested in it because Tor helps dissidents in other countries who might be fighting oppressive governments and so on. And in fact recently there was a news story about the FBI having a successful string of sting operations against people using Tor for child pornography. And so of course we have to remember there is a level above the technology that law enforcement can exploit, a variety of ways to get to people who are using these systems for bad purposes and so it preserves a sense of balance. So let's switch gears a little bit once more. Let's look at the history of anonymous Ecash. Even though with BitCoin these questions are quite controversial and there are debates about how anonymous exactly BitCoin is and what are the options, and so on. This is not the first time that we have thought about anonymous crypto currencies at a technical level. These efforts have quite a long history. In fact, all the way back in 1982, more than two decades ago Cryptographer David Chaum proposed something called blind signatures that helped him develop anonymous electronic cash. So what are blind signatures? Blind signatures are a two-party protocol. Two parties communicate with each other, and at the end of that one party has produced a digital signature of some input without knowing what that input is. I know it sounds a little bit like magic but I encourage you to look it up. It's not that sophisticated at a technical level. It's quite simple to understand if you work through the details, but since I'm not going to go through the details now, let's for the moment assume that this works by magic. So, assuming that we have blind signatures how can that help us achieve an electronic cash protocol? That what's David Chaum did and as we go through this protocol try to see if you can spot any other flaws with it other than the anonymity properties are lack thereof. It's quite a simple protocol, I'm gonna show it to you in one slide. Now, imagine that there is a bank, and this is a protocol for anonymous e-cash through blind signatures. Imagine that there is a bank and the bank stores various things in its database. In particular, it stores these two tables. The first table has a mapping of users with the balance that they have in their bank account. These balances don't refer to any sort of crypt a graphic currency. It's just a plain old number sitting in a data base, just ike you're actual bank account or PayPal or something like that. In addition, it has another table called spent coins and you'll see in a moment what this means. Let's say that a user now wants to withdraw an anonymous coin from the system. And now this is where the crypto magic is going to come in. So the user wishes to withdraw an anonymous coin of a standard denomination. Let's say a $1 denomination and all of these values refer to dollars. So the first thing that the bank is going to do on receiving this request is deduct this user's balance, it's gone down from 10 to 9 in this example. The next thing the user and the bank are going to do together is execute a two party protocol. A blind signature protocol at the end of which the user, having picked a random serial number of a coin, that's what's being depicted here. This a serial number for an anonymous coin and the user was completely at liberty to pick that number. She did, and then they executed a protocol, at the end of which the user has received a signature of the serial number, but in such a way that the bank did not in fact learn the serial number. The bank had no idea what number it was signing. It just knew that it was some number that it signed. And now this signed number represents an anonymous token. This is a token that the user can pass around to another user. So let's say that she wants to make a payment to another user. What she'll do is send to that user, not only the signed token, but also the plain text value of the token, of the serial number. And what the receiving user will do immediately is the following, she will immediately contact the bank and try to deposit this anonymous coin. Because without actually trying to deposit it, this red user here cannot be sure that this blue user is not trying to double spend. The blue user could be sending that same anonymous coin to 100 different users. How can they know that they are not being tricked into accepting a double spent coin? The way their sure is when the red user receives the coin, they have to immediately contact the bank to verify if it's valid or not. And only if the coin turns out to be valid will the red user precede to complete the rest of whatever transaction she was having with the blue user. So the bank now receives the message to deposit the coin, and note that it now gets finally the plain text serial number, as well as it's own signature. The bank looks at the signature, verifies that it's a valid signature. And here's the key thing, it also verifies that the serial number that it received is not on the list of spent coins. That's how it knows that that's not a double-spend attempt. This is the legitimate first spend of a coin that the bank signed before. So it's a legitimate anonymous token. And since the bank didn't see the serial number the first time around, the bank does not know which user initially withdrew this anonymous coin? And that's the key on anonymity property. In the period of time between the blue user withdrawing this coin and then perhaps much later sending it to the red user who immediately deposits the coin. Many other pairs of users might of deposited and withdrawn coins. And the bank has no way to tell them apart. Coming back to this part of the protocol, the bank verifies that this is a new serial number that it's seeing for the first time. It puts that serial number into its list of spent coins so that it cannot be spent anymore. And adds one dollar, or whatever the denomination is, to Red's account. And then sends back a message saying this is okay. And now the red user has verified that they received a legitimate anonymous coin from the blue user and can now proceed to complete the transaction. Right, so this is the entirety of a very simple anonymous electronic cash scheme and the key property here is that the bank cannot link the two users. So I asked you to think about whether this has any drawbacks other than anonymity. And of course, the glaring thing that you probably noticed is that all of this depends upon trusting this bank. Look at this part of the system, this is simply the bank keeping numbers in it's database of who owns how much money. So, this seems to be a trust model that's very, very different from the model that BitCoin operates under. So, a lot of the traditional cryptography research on anonymous e-cash was in this model where you were willing to trust a bank for many things including keeping your money, but you were not willing to trust a bank with anonymity. You wanted be sure the bank didn't know who was interacting with whom. Okay, it's an interesting model, it's a valid model. Many such schemes were developed under this model. In retrospect it seems to have been that the decentralization problem was a much more important one to solve than the anonymity problem in order for anonymous electronic cash to become successful. People were willing to accept a decentralized e-cash system with only sort of pseudonymity properties and not real anonymity and then get to work on improving the anonymity. Instead of starting from a fully provably anonymous electronic cash system that relied on a single central authority. But more generally anonymization and decentralization as we'll see repeatedly in this lecture are in conflict with each other. There are at least a couple of reasons for this. One is that as we saw in the last slide often for anonymity you might want to rely on certain interactive protocols with the bank in order to do some blinding which we saw and blind signatures. That's where you get anonymity from. So but how are you gonna do that without a central bank to carry out that protocol with, it's not clear? But even if you got rid of this blinding and were willing to accept just pseudonymity instead of true anonymity. You still have the problem that in order to decentralize and still get security properties like resistance to double spending, often the way to go is to record and trace everything in a public ledger, as Bitcoin does. And so you might even further compromise your anonymity and privacy properties. So these are two big challenges to overcome. And as we'll see much later in this lecture, zero coin and zero cash are cryptographic, anonymous, decentralized electronic cash schemes that have some similarities to a blind signature based protocol that I showed you earlier. But some of the giant challenges that they have to tackle involve these two limitations.