Hi, and welcome to ethical issues in data science. I'm Bobby Schnabel and I'll be the instructor for this course, is great to have you embarking on it with us. Let me get a couple of things out of the way first. The main one is, this is not Hollywood or Bollywood or any other Wood. I'm recording this on a webcam. I'm not a professional at recording things and there's definitely going to be imperfections. I'll be looking down, like I just did there from time to time to look at my notes and I'm sure I'll stumble from time to time, so hopefully it'll be good. But if I say that at the very beginning, maybe I won't have to feel so bad every time I stumble a little bit. The second thing, more fun, is that as you see, there'll be a Virtual background, a different one for every one of the lessons in this course, there are all photos that I've taken. Sometimes I'll tell you what they are as I will for this one in a moment, sometimes I won't and we can have some class discussion where you guessed where in the world that picture was taken from. Now, there'll be taken from lots of places all over the world. This one is close to home, is taken from the case building on the CU Boulder campus. Looking at the flat irons which are the distinguishing feature, Boulder and right in back of me if you look that way is a statue of Ralphie, the buffalo mascot at CU Boulder. Now let's go on with this class. There's not going to be a lot of slides in this course. It's not really one that's conducive to needing a lot of that, but one slide that there will always be is one like you see right now that says what will be covered in that lesson. In this particular lesson, we'll be covering six things. First, I'll briefly introduce myself. Next, I'll motivate why it's important to consider ethical issues in data science. I think everybody by now agrees that it's a very important part of this field. Third, described the course goals. Fourth, discussed the topics that we'll cover in this course. The fifth one is to say what's expected of you in the course and how the course will work overall. Then finally talk about the work that I'd like you to do before the next class. The first of those is to give you a little bit of background on myself, I'm not going to make this too long. You can find out plenty about me online, I'm not a secretes. The very short summary as I've had by now, a too lengthy career in computer science. Most of it but not all of it at the University of Colorado and a lot of it in various leadership roles, so I've been a department chair, I've been a Dean. I was the campus's first Chief Information Officer for nine years. I should have said that Dean part was actually not at Colorado, was at Indiana University, the rest are in Colorado. Those are all experiences that I'll draw upon at various points during this course. My involvement in ethics in computing is more recent. To be honest, most everybody's involvement, ethics and computing is more recent and we'll be talking about why that is just a little bit further into this lesson. But I've done a number of things. I'm one of the co-founders of the ACM, AIII conference on AI ethics and society. I chair an ACM taskforce on ethics and computing and the most fun part is that I created and teach the undergraduate ethics in computing course here at CU Boulder. A little bit removed from that. I'm also the founder of an instituted at CU called Atlas, the Alliance for technology Learning and Society which is a little bit broader than ethics, but it's also about how computing interfaces with a lot of societal issues, so that's been something that interests me for awhile. Second to final thing to say about my background is that diversity is going to come up in this course in a number of ways, there's a lot of issues about gender and race that are in a very important part of how ethics as part of data science but that's been a big part of my career for a long time. I think the biggest thing that mentioned that I'm one of the co-founders of something called the National Center for Women and Information Technology. I've also done a lot of work with historically black universities so along interest in diversity issues in computing. A final thing about my own background, we're going to be talking about codes of professional ethics in this course. The main one that we'll look at is the one from ACM, the main computing Professional Society which formula means the Association for Computing Machinery. So for truth in advertising, I should let you know that I was the CEO of ACM for a couple of years not that long ago, basically 2016 and 2017. Why are ethical issues an important part of studying data science? Or say it another way, why do we have a course on ethical issues in data science? Also, why are we only recently really hearing a lot about ethical issues in data science and for that matter in computing. I'm going to address that by a little bit of historical context about computer science. I said computer science, not data science. I think that's an easier way to bring a broad historical context to this conversation, and then I'll bring that back to data science, which overlaps very strongly with what I'll say about computer science. Having just said that I'm not going to use slides very much in this course, I'm going to rely on slides for this little description. Actually, just basically one slide that I'll build part by part as I discuss things. If we go back to the starts of the origins of computer science, roughly the 1950s and 1960s, at that point, computers were these big monolithic pieces of hardware that sat in remote places and only got touched by a few people. The things that made them work on the software side were really just the programming languages and the operating systems, and at that same point in history, people started developing a theory of computer science that is foundational to their field. There really wasn't a lot of ethics at that point. Computers were used basically for two things, for large scientific computations and for financial computation such as keeping your bank records, and there was no personal aspect to it. This is far before personal computers, far before the Internet. The next stage in the evolution of computing, as embodied in this second ring was the start of a lot of the field or sub-fields of computer science that we study and hear about to these days. Things like databases, graphics, software engineering, security. These are all actually areas that these days have a lot to do with ethics in computing and data science. But in the era that these were being started, maybe the 1970s, there wasn't still computers in the hands of people. There wasn't an internet, and so we really still weren't thinking about ethical issues that much. That all changed with the advent of personal computing, which came first, really starting in the 70's, but probably took another decade to take a big impact. Then the internet, which started a decade later in the 80's, and again, take a while until it caught on. What were the major changes that personal computing and the internet made? Two profound ones that are very high levels. First, it changed who were the direct consumers of computing. It went from being big institutions and people benefiting only indirectly to people directly seeing outputs of computing. The second and probably more important still is who were the producers of computing content? As we know now with social media, every single one of us is or can be a producer of computing content. This has expanded the potential for ethical issues enormously. Along with that, as shown in this next bring of the diagram, the spectrum of areas that computing could impact and did impact, increased enormously. As I said in the early days, you were really talking about scientific calculations and back office business and financial calculations. We've progressed into a world where computing became an integral part of our media. In fact, by now the most important part of how we get information of our entertainment and of how we get information about things like health care, some of the most important and fundamental things that people worry about. These are the places where all of a sudden potential for ethical issues and needing for us to all understand these ethical issues became far more important. These are some of the things that we will talk about in this course. In particular, we will talk about media issues and some issues related to health care. With the expansion of the applications of computing that I just mentioned, also came an expansion of the areas of social science, humanities with computing intersex, and that's shown by the final ring of this diagram. To some extent, computing had long intersected a number of these areas, for instance, sociology and psychology had always intersected discussions of AI. But as we got into more and more applications in computing touching more and more parts of our lives, the importance of ethics which is at the top of that last ring, the importance of legal issues, and the other areas that are talked about there, became far more paramount and computing started becoming this field that is totally connected to most of these other areas of science and social science. That's been a discussion about how computing evolved from something that was in a back room and didn't have much to do with ethics, to something that touches every one of us, that every one of us can give input to, and it has all sorts of potential for ethical issues. I haven't directly used perhaps the words data science, so as this final overlay shows, data science is a very similar picture. Of course data science is a broader field, it involves statistics and applied mathematics, it involves a myriad of applications which are already in this diagram, but it involves most, not quite all, but most of these areas of computing as well. If you think about the issues of ethics and data science, they're very similar to what we talked about. There's a little bit of an emphasis on the data itself, I probably should have seen a lot of emphasis on the data itself which means that you're going to think about both the ethics and the quality of what goes into the calculation, the data, and then the interpretation and the value of what comes out of the calculations. All of this will become much more real when we talk about case studies which we will do throughout this course and I'm going to just talk in very broad terms about one right now just so that I don't leave this quite so abstract. I asked myself, what is the area where I read the most in the media about ethical issues that are related to data science? I think the answer is pretty clear that it's facial recognition these days, I suspect there's not a week that goes by that I don't see some new article about ethical issues related to facial recognition and sometimes it's more frequent than that. What makes it more interesting and more broad, is that most of the things that we'll be talking about in this course, it's not one-sided, there's good, there's bad, and there's we're not sure. To simply illustrate that, one of the ones that I think most of us would see is more on the good side, is the use of facial recognition to let us into our own computing devices or as time goes on, maybe a door entry for security. This will be enhanced security protocols but like everything else, it's not one-sided, it may be more good than bad. One that society is starting to think of as more on the bad side is what we're seeing now of the use of facial recognition in criminal justice. Now, there's a lot of good things that have come to that but there's a lot of problems and the problems have largely been that facial recognition is far less accurate for non-White groups in the United States than for Caucasians, and that's something that we're going to talk about in this course and really get an understanding of why that is and why it's a data science issue. One that I think is probably in the not sure category, would be the uses of facial recognition in surveillance. We've all seen cases where that really has helped our security, there are huge privacy concerns and many of the laws about facial recognition are predicated on people's concerns about that privacy, and I know when I've talked to classes before, there's very split opinions from people about whether they think this is good or bad. That just is one example and one that we're going to come back to later in this course in a whole class about a real example of ethical issues in data science, and let me finally point out that like most of them, it is about issues at both ends. It's about issues of the data itself, what data is fed into the systems that are used to make conclusions, very often our artificial intelligence machine learning systems, not always, and then it's also about how the output of those systems is interpreted. That is exactly the type of thing that we'll be talking about throughout this course.