Let's talk a little bit about the history of privacy. The first known legal definition of privacy was in 1604, when Edward Coke, then Attorney General of England, quoting the English common law, said, famously, "the house of every one is to him as his castle and fortress". What this meant, later on, was things like the Fourth Amendment in the US, protection against unreasonable search and seizure. In terms of privacy itself, in 1888, Thomas Cooley, who was a Professor right here at the University of Michigan, and a constitutional scholar, wrote a treatise on the law of torts. And explicitly delineated in that privacy right, which is the right to be left alone. In 1890, Earl Warren and Louis Brandeis, two great jurists, wrote an article at the Harvard Law Review. And in this article, they talked about the importance of privacy. And talked about the fact that, just as people aren't required to say things that they don't want to, if people don't want to give up certain things that are private to them, then they should be able to retain the power to fix the limits of their publicity. This is really beginning to be the modern definitions of privacy. In 1928, there was a famous lawsuit at the Supreme Court in the US, Olmstead versus the US, and this case was about the legitimacy of wiretaps. The Court said that a wiretap wasn't an unreasonable search, that it didn't need any permission, as wiretaps do today. Louis Brandeis, who was by then a Justice of the Supreme Court, famously said in his dissent that ways may someday be developed where the government, without removing papers from secret drawers, can reproduce them in court. And that this kind of access to private material, similar to what one would get by searching a person's home, could only be done with a warrant. The law allowed warrantless wiretaps at this time. Louis Brandeis was in the minority when he expressed his opinion at the Supreme Court in the US. In 1960, legal scholar William Prosser wrote an article in the California Law Review, and defined four types of privacy torts. And this article has been much quoted as the basis of defining what privacy means in legal terms. In 1967, which is now 40 years, almost, after the Olmstead versus US decision, John Marshall Harlan, writing an opinion concurring with the majority in Katz versus the US, defined the test for privacy. And said if an individual has an expectation of privacy, and society is prepared to recognize this expectation as reasonable, then there is a right to privacy in that circumstance. If these two things are not there, if the individual doesn't expect privacy, and society doesn't expect that the the individual has such an expectation, reasonably, then there isn't a right to privacy. The specific thing that was happening in this case was somebody who was making a call from a telephone booth. And somebody eavesdropping, with an eavesdropping device, just outside the phone booth. And the question is, if the person has the phone booth's door closed, they expect privacy. And they don't expect somebody outside or a microphone outside to try to listen to their phone call, without a warrant. This sort of view of privacy, and Louis Brandeis' definition got adopted as law by the Court in Smith versus Maryland in 1979. And today, we know that if you have to establish a wiretap, you need to get a warrant. In fact, in 1974, the US Congress passed something called the Fair Information Practice Principles Act. And in this, they talked about how you limit collection of data, how you must identify the purpose, limit the use to the specified purpose. Be open about what is collected, give people notice about what is being collected, be accountable, keep the data secure, things of this nature. This fundamental act is still the basis on which data collection is governed in the United States. In 2006, Daniel Solovey, a Professor at George Washington University, wrote an article in the University of Pennsylvania Law Review, defining a taxonomy of privacy, and the various ways in which privacy could be eroded. And in particular, he defined four different places. The first three, which is information collection, information processing, and information dissemination, are the things that are of greatest concern to us in the context of the data science paradigm that we are discussing in this course. He also mentions privacies due to invasions. These are things like somebody coming in and observing you where you don't expect them to, because they're bugging your apartment. Or passing laws on private actions that are, I think, less relevant. One of the things that is a legal question and is something that one needs to think about a little bit is if you voluntarily disclose something to others, then it has much less protection than something that you keep completely to yourself. So for instance, If you dial a phone number, the phone company has to know what number you dialed. And so you're supposed to have voluntarily disclosed to the phone company what phone number you're dialing. And since you've disclosed this information, this metadata, that has much less protection than the content of your conversation, which you did not disclose to the phone company and did not intend to disclose to the phone company. This sort of metadata collection quickly becomes problematic, and there's a question of what is voluntary and what is not. And these are some of the issues that we're going to grapple with further as we go through this course. To take a break from the legal arguments, let's go back, just in terms of culturally, what privacy feels like. In small towns, there was little privacy, as everybody in town was always poking their nose in everybody else's business. And famously, you knew what everybody was up to. Big cities provide anonymity. Nobody cares, nobody knows, you can do what you want to and not have 100 people ask you about it. And so there are some who believe that information technology might bring us back to the halcyon days in a small town, where people feel cared for, and there is less privacy as part of that. And indeed, there are changing attitudes to adapt to a new privacy realm. My kids freely post facts about their activities online that I would never dream of doing. To me, the movie I saw last night is something private that I talk about to my friends, but I wouldn't feel like posting online. To my kids, that's a perfectly reasonably thing to do. It's so obviously fine to do. In other words, their privacy boundaries are different from mine, but different boundaries doesn't mean no boundaries. Even in the Internet generation, even people who have exposure to the web limit what they want to share, what they're willing to share, what they're comfortable sharing. And I think this is with good reason. If you go back to the past, and you were in a small town, and you screwed up, you could get a fresh start by moving to a new place. There is a significant cost to moving. You wouldn't have friends, you'd have to start over, you wouldn't have a community, but at least there was that option. There was also an option of staying on and waiting until the past fades. Over time, you could rebuild your reputation. The problem with big data is that it's universal, and it never forgets anything. So you can't move out of town and go to the next town, and you can't wait it out. The other big difference is that in a small town, information is mostly symmetric. I know as much about you as you know about me. And so the fact that we know things about each other keeps us fair and keeps us honest. And keeps from more easily practicing the adage, do unto thy neighbor as you would thy neighbor do unto you. Data science can result in major asymmetries. And when we have these asymmetries, you're going to have more difficulty empathizing with players whose situations are very different from yours. Talking about not forgetting anything, there is actually something called a wayback machine, which archives pages on the web. This archives includes almost everything that is accessible on the web, everything that's not password protected. And the intention is to retain this forever, because these public web pages are, for the most part, expressions of our society and our culture. And are things that are part of our heritage, and things that people will want to look at, that historians and sociologists, among others, would care deeply about in the future. A side effect, though, is that if you have an unflattering page written about you, it'll survive forever in an archive, even if the page itself has since been taken down. And by the way, you really don't want to unfairly blame the Wayback Machine. This is just one organized archive that has a significant cultural purpose. The problem is, once the page is up, many copies get made, even if the page is taken down. You have no idea how many copies have been made and where these copies have been stored. And so you really don't have a way of deleting something once it's been published on the web. It's like you can't take back a secret that has begun to spread. In an attempt to try to deal with this, in some countries, there is now a right to be forgotten. The idea is that, in terms of societal expectations of redemption, there are often laws written where a person's record is cleared after some years. For example, in the US, in most states, if you have an accident on your driving record, that gets expunged after three years. And so things that you did wrong many years ago don't continue to haunt you forever and forever. The same idea applies to more significant harms. So if you had a conviction for certain kinds of offenses, even if you served time in prison, in many cases, there are rules about how long that stays on your record, and after how much time it gets expunged from your official record. While this is all fine with respect to the official record, If this information happens to be on the web, how is it ever to be removed? Well, given what we just saw on the previous slide, you really can't remove it from the web. It's there, and it's there forever. So the way that the right to be forgotten is actually implemented is by saying it's not really completely forgotten, but it is largely forgotten because it's made hard to find. And so what one does is works with intermediaries, like search engines, who are used by most people to find things. And making sure that the results that these search engines produce don't include things that have been expunged, things that are meant to be forgotten. By so doing, even if somebody happened to have made a copy of a web page that said something unflattering about you, if that is no longer something that should be shown, that's something I won't find if I Google you.