Hi, and welcome to the fourth module of this course on algorithmic bias. If you mentioned the title of this course, Ethical Issues in Data Science to many people, whether they're people who know about data science or don't know so much about it. The topic that they'd be most likely to associate with those words is algorithmic bias. As you know, algorithmic bias is a term that refers to the search of biases that can result from any algorithmic decision-making, ranging from the simplest to the most complex. These days, it most often refers to algorithms that are generated via a form of machine learning, but it doesn't have to. This is a topic that is highly covering popular media, in fact, so much so that I could easily have based an entire course on media articles about algorithmic bias. In fact, one of the more challenging parts of developing this module was deciding which articles to leave out and which view to include. General comment about algorithmic bias is that most of it, including most of what you see in the media, is about algorithms that either in actuality or potentially treat different groups of people unfairly based on one or more of many characteristics. That could be gender, ethnicity, race, socioeconomic status, age, skin color, the accent of their speech, or many other things and we're going to see instances of a number of those as we go through this module. Like many of the topics that we talk about in this course, in addition to the technical issues, there are legal and policy issues. Again, as we've seen already, the way that those are dealt with can differ between different regions of the world, between different nations and actually in this topic, we'll see that even individual cities are getting into creating their own laws and policies about algorithmic bias. As a final introductory comments, as in other issues and maybe even more so for this topic, you'll see that this is one where it is not only important for you as data scientists to understand what you need to do technically, but also as we talked about in the last unit and Professional Ethics to help in educating the public to understand this issue. This module is divided into three lessons. The first is a general overview of algorithmic bias. It includes some background explaining the concept and how it arises, some initial exposure to examples of algorithmic bias. Also some initial discussions of the debates about whether algorithms are any worse at this thing humans, or perhaps better at this than humans. It also includes the first exposure to government policy discussions related to algorithmic bias. The second lesson we'll focus on what I said already is the main example of algorithmic bias and that is related to gender and race. It will focus on two particular areas where there's been a lot of discussion, publicity about gender and race, issues connected to algorithmic bias, hiring, and criminal justice. Will also continue to see those debates that I just alluded to about whether algorithms are better or worse than humans in this regard and also talked about policy issues. In the third lesson, will discuss what is probably the single most prominent example of algorithmic bias that gets discussed in the media, and that is facial recognition. There's lots of issues here, again, ranging from the technical issues to the policy issues and in this case also to the overarching issue of how facial recognition is leading to increased surveillance capabilities in our society and whether that is a plus or minus or both. The work that you'll have in conjunction with this module beyond a reasonable amount of reading and listening, will include a second fairly brief case study, report on a media article that you select related to algorithmic bias. At the end of the third module, we'll hopefully be fun, participating in a group discussion where you debate the pros and cons of issues related to facial recognition. Before we proceed, as always, the virtual background, the last one was from Western Colorado and I already told you where it was from, so nothing more to say about that. The one today is clearly not Colorado, in fact, I think you could make a pretty good list of reasons that make it quite distinct from Colorado. I think it may be easy to guess the nation that is from, and probably entirely impossible to guess the exact location of the place it's from, I'll talk about that in the next lesson and if anybody's able to guess that legitimately, I'll find a way to give you a prize. There is a little bit of a connection between at least the general location of this image and the content of this lesson. Now let's begin with a little background on algorithmic bias and how it arises. I'm assuming that many of you are at least somewhat familiar with this topic already, so I'm going to keep these remarks relatively brief and include additional reading. For people who would like that. There's going to be two parts to this background discussion. In the first, I'll briefly discuss the article from box recode that I've put in the readings that gives background about algorithmic bias. Then I'm going to ask you if you haven't already, to take the time to listen to the Joy Buolamwini talk, TED talk. It's about eight minutes long. That talk gives some very nice general background about algorithmic bias and also gives you a little bit of an initial exposure to facial recognition and then I'll also discuss that talk briefly. If you feel that you need even additional background, I've put an additional background reading as an option in the reading list and there's plenty that you can find Online. In fact, I looked at the Wikipedia article about algorithmic bias and found that it actually has some pretty good information. What is algorithmic bias? Most generally, it's bias that occurs or potentially occurs in any form of algorithmic decision-making. Presumably, actually algorithmic decision-making, whether it's based on a computer or not so to give a simple whimsical example, if I wrote a program that said that one plus one is equal to two from Monday through Saturday and is equal to three on Sunday. I guess you could call that algorithmic bias, but I think you should just call it wrong but more seriously and substantively let's say that you wrote a program to help you screen job applicants, and it had various information about those applicants including their age and you wrote in the program that anybody over the age of 55 should be excluded. That would be a form of algorithmic bias and unless there is a clear legal reason that you could do that it would likely also be illegal. In Data Science, we're mainly concerned with forms of algorithmic bias that come in instances that include the processing of large amounts of data. Most often, as you already know, these are instances where the algorithms are trained on large amounts of data and then are used to make decisions about cases that were not part of what the algorithms were trained on. We're going to see lots of instances of that. Let me at this point mention some of them, some that we'll see, some that we'll not just to point out the diversity of instances. One would be speech recognition, where the algorithm is meant to understand the words that you are saying, even if you are not one of the people that was part of the training set that it was based upon. A second is facial recognition. We're going to discuss this in some detail, where the algorithm is one that seeing an image of you is going to try to match you to a database of faces that it has again, even if images of you were not included in its training data. A third one is in some making financial decisions such as whether to give you a home loan based on financial information about you. Third, a very serious one is in criminal sentencing, where a judge will use an algorithm to help determine whether you should get a prison sentence and how long it should be. Once it is determined whether you're guilty of a certain crime and that one is pretty life affecting. Somewhat more benign is anytime you put a query into a search engine, there's an information about you and the algorithm uses that information to decide how to prioritize the links that it comes back with and we could go on and on about that. Just wanted to give you some sense of that. You'll also notice that, in some of these instances, like the loan decision, we're talking about are binary decisions. Either you get a home loan or you don't and some of them like the search engine we're talking about a much more complex decision where it decides which of a huge amount of links it should show you in priority order. As I mentioned and as you know, in most of these algorithms, that means that the algorithm was initially trained on some set of instances. For instance, in the home loan algorithm, it may be trained on a database that says that in actual person interactions, these are the decisions that the loan officer made. Then the algorithm tries to emulate, come up with those same decisions. Now if you're immediately saying there is an issue with that. That's the whole point. That if those loan officers had biases in their human decisions, those biases are likely to be replicated in the algorithms. That's one part of what algorithmic bias is about. Now, what about speech recognition? In some sense that would seem like it was a less bias prone exercise. If I say the word bicycle, then there's a unique answer to what I said but one of the issues in speech recognition is accent. If you train the speech recognition algorithm on people all of whom come from the Midwestern part of the United States. It is almost certain to have more difficulty with people who speak the English language, but come from England or come from India, or come from a different region of the United States. By the way, we tend to think these days about algorithmic bias being mainly an issue of algorithms that come from deep learning, but it's not restricted to that. The algorithms could be generated from other forms of Machine Learning, whether are there other forms of supervised learning or are there forms of Machine Learning entirely or in other ways. I'm not going to go into the mechanics of Machine Learning in this section or in this course actually, that's not the point of this one, but if you study Machine Learning, you should definitely have this issue in the back of your minds and think about how that particular form of Machine Learning may relate to algorithmic bias. If you'd like more background in algorithmic bias, I'd suggest that you read or reread the Vox Recode article that was in the reading list or the other one that is mentioned there. I'm not going to review that article in detail here. A number of the points that it made have already come up in the discussion that we just had, but let me mention just a few things from the article that I found particularly interesting. It mentions that there are other ways that algorithmic bias can arise besides deficiencies in the training set. It can be based on the historical basis of that training set. Let's say for instance, that I wanted to predict who is likely to become a successful author, and I created a training set based on any major western library, the Library of Congress in the United States, or someone else. Well, the conclusion I would probably come to is that the likeliest authors are going to be white men from European countries and upper-class families. That would actually be a true prediction if you're trying to predict the past. But it would be a fairly meaningless prediction if you're trying to predict the future, the present, or a different part of the world than Europe and other western parts of the world. This also raises the issue of transparency, revealing, and understanding how algorithmic decisions are made. I think all of us would assert that having that transparency is an ethical thing to do. But there can be issues in that and let me mention two rather different sorts of issues. One is that algorithms can be proprietary. We're going to see instances of that when we talk about the criminal sentencing algorithms. You can argue whether they should be or not, and that gets you into both ethical and legal issues. The other and actually much more profound and technical is that in this era of Machine Learning and particularly deep learning algorithms, we may not be able to really explain how the decisions have been made, how the connection from the input to the output comes about. That is the extremely important topic of explainable AI, which is not the topic that we're getting into here but is one that you should definitely be aware of. Finally, there's the question of whether important life choices should be made by a computer algorithm at all, in part or in whole. Of course, that is one of the biggest ethical questions that we'll be talking about and that's one of the reasons that we look at articles that raise the issue of, do algorithms or humans do a better job of decision-making? As a hint of what we're going to find out, there's opinions on both sides. As a final part of providing general background about algorithmic bias, I suggest that you take time to view the roughly eight and a half minute TED Talk by Joy Buolamwini if you haven't already. The talk is a very nice introduction to the overall topic of algorithmic bias, as well as a nice teaser for the issues and facial recognition that we'll be seeing in less than three. Just click on the discussion prompt for the link and when you're done, come back or if you viewed it already, you can just come straight back. Following you viewing the talk by Joy Buolamwini, let me just make a few references and comments to things that she brought up. First, this talk motivates some additional areas where algorithmic bias can occur. A couple that she mentioned are differential pricing, something that we may not even know goes on, but the ability of algorithms to actually offer different prices to different people based on what they perceive and also how algorithms are used in college admissions. She then mentions a really important issue that already came up in our discussion of professional ethics, which is the importance of diverse teams and the likelihood that a more diverse team is going to avoid some of what she refers to as the blind spots in creating algorithms. She gives you a first look at facial recognition and points out an incredible blind spot that some very commonly used software was clearly not trained on dark faces and absolutely failed to be able to recognize dark faces. As a final little comment or a side about the TED Talk, you might remember that in the last module we encountered a Google AI researcher, Dr. Timnit Gebru, who was involved in a controversial departure from Google based upon her work. Well, Gebru has done some very widely recognize work on facial recognition in partnership with Buolamwini, and we're going to see that in lesson three of this module.