Having understood what privacy is, let's talk a little bit about how data science today impacts privacy. There are three main drivers of privacy violation. There's surveillance. And surveillance could be government agencies doing this for national security or private enterprises, say security cameras, logs of various sorts. There are also other private things like private investigators and so on. There is advertising, if a company can show a focused personalized relevant ad everyone wins. But if the company show you ads that annoy you then it's a loss. And then the third category which I'm going to call introduction. Really this is finding out something about a person. Whether this person be a prospective employee, a prospective borrower, a prospective date. You don't know somebody. You want to figure out what kind of person they are. You want to find our what electronic trail they have. And again, this might be done by companies. It's also done by individuals. People Google their blind dates. So, what are the main sources of information that we deal with? It's data collected by merchants and service providers. This activity tracking goes on and off the web. There are sensors all around us in personal devices, in the infrastructure and there are might even be other people's devices like a friend's camera or something like this. Now these source of information can collect vast amounts of data and these collected data are the things that get used to do the privacy harms or potential harms that we were talking about on the previous slide. To understand some of the tradeoffs that we need to discuss, let's talk about open government. So there are many benefits to the government being open. And indeed, many government entities publish detailed records. This is good. This helps citizens keep track of what the government is doing, it prevents abuse by government officers, it makes sure that they're doing a right job and serving us. And, in the US, there is something called Freedom of Information Act, or FOIA, which provides citizens access to many unpublished records. There are few exceptions that FOIA allows. One of which is where there would be unreasonable harm to an individual's privacy. But these exceptions are fairly narrow and limited. Think about the fact that most government records concern citizens. One would expect that most government records when released impact the privacy of some citizens. And so you really don't want to have a very broad definition of what privacy means and what harm to a citizen's expectation of privacy means if you're actually going to get the benefits of FOIA. And so you have to balance privacy concerns against the desirability of openness. And I think that this is the sort of tradeoff that we will have to make repeatedly as we deal with the issues of data in today's world. To think about one particular type of government data drivers license and voter registration databases are maintained by government agencies, and they are often semi-public. These documents may have associated with them important facts, things like age, sex and address for the individuals involved. There may be additional identifying information. And if these databases are available for the public to view, then these databases can be used, for example by commercial interest. And indeed they often are. The value of data commercially is very high. And there's actually a business for companies that are called data brokers. Data brokers aggregate and link information from multiple sources to create more complete information products. So they may pick up little pieces of information about you, piece it all together and now they have a valuable profile of you that they can sell to people who wish to, for instance, sell to you or who may consider hiring you or giving you a loan or whatever. And a problem that we have here is that many people don't realize what can be learned about them by linking multiple sources. We tend to focus as individuals on the information release that we have with respect to one single Individual party as part of a single transaction. And not of that in context with what we have also released at the same time to other parties with regard to other aspects of our lives. Another issue with regard to privacy is what I call waste data collection. So for example, you go to a bar, they want to see proof of age. You show them your drivers license and this is all standard. Now let's say that this bar is going to scan the drivers license, not have a bouncer squint at it. They scan the driver's license as just a part of their confirming proof of age. This is not something I would object to and this is not something that there would be any discussion between you and the bar about. Now if they scan it, they've got a computer on which they've got this information and hey, they choose to record it. And so now they have a record of your name,address and date of birth, information that's on your drivers license that they can use, for instance, for marketing purposes the next day. Not a concept that we often come across is metadata. Metadata is data about the data. And metadata often has lower privacy protection than data and is often distinguished from the data content. So for example, in the phone call, metadata can include information about the caller or the callee, the time and date of the call, the duration of the call, etc. But it would exclude the actual content of the call itself, the conversation that took place. But metadata may carry much information. For a cell phone call, location information might be metadata. Knowing the location doesn't reveal the content of the call and the user has to reveal a location to the cell phone company to get service. However, location tracking can reveal a great deal of information about the person. If the cell phone company knows where you are every Sunday morning, they know that you're in a particular house of worship. They know which religious denomination you belong to and they also know that you're a religious person. And that may become the basis of some discrimination down the road. The other thing that people often do is to underestimate the power of analysis. For example, a smart water meter at your house is recording water usage on a continuous basis. It can recognize the signatures the water use. So it knows but can easily determine every time you flush the toilet or take a shower or wash clothes. And if this smart water meter is communicating with your utility company, your utility company knows every time you actually flush the toilet in your house. That is something you might consider an invasion of privacy, but it's all a part of water conservation and having a better smarter water usage system in the community. Taking this to a further extreme, there have actually been attacks on encrypted data based on observing the power consumed by chips performing encryption. The idea here is that you're encrypting data into zeroes and ones and the amount of power that flows through is just a little bit different and knowing what those momentary difference are can greatly reduce the number of combinations of keys that one has to try when one is making an attack on the encrypted data. So where does all this leave us? If you look at the traditional social norms, they dealt with privacy by trust. You tell me private things because you trust me. And what does it mean when you say you trust me? You trust me not to use what you told me in ways that you wouldn't approve. And this is somewhat nebulous, and this is why sometimes we have misunderstandings in terms of our personal dealings. For the most part, we know who we trust and we know we trust them with good reason, then we know that they're going to do the right things with the things that we share with them. In the modern world, we are sharing data with systems that we really don't have a basis to trust. And certainly doing this in terms of legal agreements is what it boils down to when you don't have trust. But this now means that you have to have privacy by design. We have too many players, players don't trust one another, you must have data sharing. And so this data sharing is now contractual and we have to make sure that these contracts are developed in ways that appropriately respect and manage privacy. So to conclude, so private is a basic human need even for people who have nothing to hide and it is easily eroded by thoughtless actions. Of course it is only by intention in some cases. We've now began to have conversations around privacy. And we have many stakeholders with different interests. But we don't yet have consensus societally on where the lines should be drawn. Hopefully, we'll get there soon.