Welcome back. We're now embarking on the second module of this course and the first one that really considers technical topics. The topic of this module is the Internet particularly the privacy and security issues that exist in conjunction with it. Why is this the first topic on a course on Ethical Issues in Data Science? Well, it's clear that the Internet is the foundation of so much of what we do these days in society to communicate, to conduct commerce, to read, to learn, to share our lives with other people, to entertain ourselves, and so much more. What may be a little less clear, but I think we all understand that lots of that involves large amounts of data and in many cases large amounts of people as well, and involves techniques from Data Science. Conversely, most if not all of the issues in privacy and security in the area of data science are rooted in the Internet. Clearly, privacy and security are at their heart ethical issues so it makes a lot of sense that this is the first topic that we'll be talking about. This module will have three parts, three different lessons. The first one this one, will give some brief background on the Internet and the main capabilities that have been built on top of it. A lot of this is historical, the history is still fairly short and the history is pretty relevant to understanding what we're dealing with today. The second part the second lesson will be on privacy. This is the longest of the three lessons. We'll go through a couple of important examples of ways that privacy can be compromised on the Internet, in ways that can be pretty subtle to a lot of the people who use it. We'll then conclude that lesson with a legal side, The concept of the right to be forgotten which is an example of where legal issues are looked at quite differently in different parts of the world. The third lesson will be a shorter one on security. It's really hard to have many weeks go by without hearing about some issues in Internet security. To do that justice, one has to do a whole course and there is a whole course on security in this Data Science program But in this, we will concentrate on giving some sense of particularly the ethical issues that you as data scientists need to face in terms of internet security. First, as we do in every lesson back to our virtual backgrounds, as I record this, I don't have any way of knowing how many of you were able to identify the Virtual background in the last lesson. It's a place called Kanyakumari, which is at the very Southern tip of India. Kanyakumari is considered to be the place where three major bodies of water come together. The Bay of Bengal to the East, the Arabian Sea to the West, and the Indian Ocean to the South. Although I hear that ocean scientists say that technically they don't exactly meet at that place but close enough. It's also a pilgrimage site. In fact, I could have shared some other photos where you can see lots of people on the beach, lots of brightly colored saris, people come to it at sunrise as part of that pilgrimage. Interestingly, and as a final comment, that statue is not ancient. That statue was put there on the first day of the new millennium, the first day of the year 2,000, but the character is ancient. It's a statue of the ancient Tamil, Tamil Nadu is the state in Southern India. The ancient philosopher Thirukkural, who wrote a very important work on responsibilities and morality. Although I picked that picture basically just for the reasons of it being an interesting background, it actually does have a tie in to the topic of this course on morality and ethics as well. I won't ask you to guess where this next Virtual background comes from. I think you could assume correctly that it comes from China but it might be pretty hard to know where it comes from. This is a photo from a trip that I made in the year 2016 as a CMCEO to what's called the China National Computing Conference, which that year was being held in the city of Taiyuan not Taiwan, but T-A-I- Y-U-A-N in Roman letters, which is a city of about 4 million people that's a few 100 miles Southwest to Beijing. I believe this is an old palace courtyard which is now a tourist attraction. While I chose this photo basically for the appearance of it, actually Taiyuan will come up very briefly during the middle of our discussion on privacy in the next lesson. Now we'll start with our discussion of the Internet. As I said, some knowledge of the history of the Internet is important to our discussions of Internet privacy and security. I may be the only person who's either talking into or listening to this recording who remembers life way before the Internet, but we have to remember that it really only started impacting our lives quite late in the 20th century. Let's start with a little fun question. Why was the Internet invented? I'm putting three possible answers up on your screen right now. Number 1, as a way of selling books, Number 2, as a way of communication both within the military and between scientists, Number 3, so that people could access and post news. Please click on the pause and reflect response that has come up, write down what you think the answer is, and then come back to this lesson. The answer is, I suspect many of you knew, is that the Internet was created initially to enable military and scientific communication. Those other two answers actually play important roles in the subsequent history of the Internet so that's why I put them there and we'll get to that a little bit later in this lesson. But the development of the Internet started in the late 1960, quite a bit before people actually started using it in a common way. It started with something called the ARPANET of the US Department of Defense. ARPA stood for Advanced Research Projects Agency. That agency still exists it's now called DARPA with the letter D, standing for Defense at the start. The Internet, as I said, was started basically for military communications. The thing that really got it moving was the development of the basic communication protocol called TCP/IP. I think many of you, especially computer scientists, have heard those initials many times. We rarely hear what they actually stand for, it stands for Transmission Control Protocol, Internet Protocol, and TCP/IP specifies the basic ways in which information is grouped into packets and then routed around the Internet in a way that can be quite dynamic and decentralized. This was what really enabled the Internet to get going once it was adopted, once TCP/IP was adopted by the ARPANET in 1983. By the way, for some personal or people background about the development of the Internet, TCP/IP was developed by two people named Bob Kahn and Vint Cerf. They tend to be credited as they're creators of the Internet and I've had the pleasure of interacting quite a bit with each of them over the years, very talented people, both born roughly 80 years ago from when I'm recording this in the year 2020, and both still very active in technical matters. What did that first Internet allow people to do? It allowed two parties, both of whom knew that the other one was part of the ARPANET, to communicate either to share data files or to share messages communications. A crucial thing you have to realize is that in that framework, there was absolutely no consideration of privacy and security because this was pretty much a closed honorable club. Everybody knew who else was going to be on it, and nobody was going to be on it for any malicious reasons. Again, no consideration of privacy or security built into Internet 1.0. Well, you'd probably say if you know how software products go, that's fine we'll build it when we get to Internet 2.0, but what's the problem? There has never been an Internet 2.0. Things much moves so quickly that initial design that was built with no thoughts of privacy and security is still the foundation of what we're doing, and everything that we have to do about privacy and security is in some sense getting reverse engineered on top of that. As I said, at one point, a decade or two ago, the National Science Foundation started a major project that in some sense was an effort to design Internet 2.0, but practically that just wasn't going to happen. There's no way that you can undo everything and start with a new foundation. What's happened since then? Since that rather focused beginning, the Internet has become something that's used throughout the world in at least three fundamental ways: to find information, to conduct commerce, and to provide information. Let's look at a history of the major capabilities of the Internet that have led us to that much broader use of its capabilities. The first, and I'd say, most fundamental part that we'll talk about has been the development of browsers, web browsers and with them the whole concept of the World Wide Web. That's really what opened up what we think of as the Internet today. On the screen now you'll see the progression of where they were developed. The first one called Mosaic then transformed into Netscape in 1994. That was really an academic product. Then Internet Explorer, which came on already in 1995, Mozilla in 1999, Safari in 2003, Firefox in 2004, and Chrome is still pretty recent, it debuted in 2008. What have web browsers allowed? Over time, they've allowed people to have access to huge amounts of information at their fingertips in a way that either was formerly not available at all or is available through only indirect and very time-consuming means. This includes news, weather, maps, the information used to get in encyclopedias. Initially, all of the print information but as bandwidths increased, now commonly, of course, audio and video information as well. The second aspect built on the capability of web browsers is the ability to conduct commerce online. That started with Amazon, and Amazon, as you may or may not know, started as an online bookstore an online bookseller. That's why I had that is one of the answers to that little Quiz. Although Amazon quickly branched out to become a much bigger seller. Amazon was joined very quickly one year later in 1995 by eBay and then pretty soon lots of retailers decided to establish e-commerce sites. At the beginning that was considered pretty avant-garde, of course, not that many years after it became a necessity for any retailer, and then perhaps the biggest change of all, and I'll say why I think this is the biggest change of all in a moment, is the advent of social networks. These started with LinkedIn in 2002. Again, these are appearing on your screen now, and Facebook in 2004, Twitter 2006, Instagram 2010, Pinterest also 2010, Snapchat 2011. One of the most recent is TikTok in 2017 and of course, there's plenty more besides that. Not sure what category you put Wikipedia, but it's also a crowd-sourced repository, and Wikipedia started a little earlier than the social networks, barely earlier in 2001. What's crucial about these? It's a very simple fact, anyone can be an author and anyone can easily be an author. Now, what browsers already contributed that capability, but at the outset, at least you have to be pretty technical to be an author and you might have to be connected to people who knew how to not create, only create webpages, but post those webpages whereas in social networks you can very directly be an author. When you think about it, that's a profound, not just technical change, but societal change. Because ever since the advent of written language, producing information was something that was very privileged and very tightly controlled. Initially by the clergy and then by people who publish books and magazines and newspapers, and you had to submit something to get it published if you wanted to do that. Now, anybody can publish and that is sometimes considered one of the real revolutions in human history. Getting back to what we are going to be discussing for the remainder of this module, let me just reiterate the fact that none of this was based or built on a platform that was designed with privacy and security in mind. So there have always been significant liabilities of the Internet in terms of privacy and security and that's what we'll be talking about for the remainder of this lesson. Let me finally say that basically all of the lessons for the remainder of this course will be built upon a number of either short readings, generally from mass media, magazines, newspapers, and sometimes videos, things like TED Talks. So you can choose to be one of two things, you can either read those in advance, they're listed or they're briefly, they're brief enough, I should say that I'll also give you the opportunity to just pause and read or view them during the lesson if that's the way you prefer to do this.