Welcome to Module three of ethical issues and data science. The emphasis of this module is a little bit different than the one that we just did and the next to those air, all about ethical issues that arise on the technical part of data science, how data is collected and how it's used, what products are produced, how those were used, as well as the privacy and security issues that we've seen in the last module. This module is partly about technical issues as well, but it's also a least a much about the ethics of being a data science professional overall, including the ethics that take place in the workplace. That's an important thing to consider as one prepares for any professional career. Both to think about what one will do and to learn from the experiences of others. This module has three parts, three lessons that are really quite distinct from each other. In the first one, we're going to go over two leading codes of professional ethics. One is from the American Statistical Association. The second is from ACM, the association from Computing Machinery, which is the main computer science professional association. We've already referred to the ACM code of professional ethics a little bit in the first module of the course when we were talking about ethical foundations. In this lesson will also introduce the project that is the main work in this module of the course. In the second lesson, we'll talk about ethics in the professional workplace that will be based on reading a good number of articles about experiences that people have had. Largely these come from large Silicon Valley tech companies because there tends to be a lot written about companies like that. Hopefully, they're indicative of the data science world more broadly than just those companies. And then, in the third lesson of this course, we will concentrate on the project, so you'll be spending time working on that project. I'll introduce it, by the way, in this lesson in just a few minutes. And also in that lesson. While I don't have the ability in real time to talk about your projects, I will summarize some interesting insights from projects in previous classes so that you get to benefit from what other people have learned as they've done the project as well. First is always I need to discuss the virtual backgrounds. I don't know how many of you identified the last one, the castle that we purposely chose for the security module. It's the castle in Heidelberg, Germany, called Schloss in German. Heidelberg is a town that many of you may have heard of. It's in the southeast part of Germany, about an hour south of Frankfurt. I could digress on this castle for a long time, and I'll try to keep it short. It's quite a historic structure. Construction on it began over 800 years ago. And It's considered one of the most famous renaissance structures in northern Europe. It was destroyed in parts by both wars and fires on a number of occasions. As you can see, it looks like it's in ruins, although parts of it have been rebuilt. And so there is actually usable parts of it as well. One of the reasons that I'm quite familiar with the structure and with Heidelberg is that's where my maternal grandmother comes from. On her husband. My maternal grandfather comes from Mannheim, which is a bigger city just a few miles away. My mother grew up in mainly in Mannheim and partly in Heidelberg. Heidelberg is a beautiful town. It sits. The castle sits overlooking the Neckar River, which then flows into the Rhine River at Manhunt. So, so much for today's European history lesson. Today's background, as you can see, is in a quite different part of the world. I doubt there's a way to pick out exactly where it is, but you can at least guess in a discussion group what country you think it's in. And I'll come back to that at the start of the next lesson. I do have to confess about these virtual backgrounds that although I'm trying to move around between different parts of the world, that they will only cover three continents Asia, Europe and North America. Because although I've traveled a huge amount in my life, I've only traveled on those three continents. I'll try to rectify that in the coming years, but that's what you'll get to see for now. I'll start the substance of this lesson by introducing the interviewer project that I referred to a few moments ago. The students taking this course for credit, will want to start on this right away. To stay on schedule because this project requires finding a person to interview and then a time that works for you to talk to that person. I've included this project previously in a related residential course that I've taught my experience was that for many students, just with the most valuable experience of the course, so I hope you all find it valuable as well. The assignment is described in the course materials, and I think the description is pretty straightforward. What I'm going to do is have the key parts of it go up on the screen now and summarize them. So what I'm asking you to do is interview a person who's had at least three years experience as some sort of a computing professional. It wouldn't necessarily have to be data science, but something that's relevant to data science. You can do this whatever way it works in person by phone by video conference. Whatever is good for you. What that interview is supposed to cover is the following, and this is what's up on the screen to discuss that person's professional experience with ethics issues in their professional career. Both regarding technical issues and professional or workplace issues. These could be issues that they personally experienced. That's certainly where you should start. And if they want, they could be issues that they didn't personally experienced but heard about in their workplaces. Well, what I'd like you to do, in the interview in particular is have the person pick two or three that they think are most memorable and discuss two things for each one. Do they think the issue was handled well or not? And secondly, were there issues or situations that made it difficult to take what they felt would have been the more ethical path? Those were the main parts of the interview, but if there are other things that come up, you're certainly welcome to bring them in as well. When you're finished for the interview, what I'm asking you, to do is write a three to five page report. The outline of that report is detailed in the assignment and then post that out, report in the discussion board and in turn, evaluate at least three other reports from your classmates. In fact, the value in this project is probably going to be at least as much in reading about the experiences that other people report, as in doing your own interview and writing up your own on DSO. I hope you find both aspects of that very interesting. I'm going to then in the third lesson of this module, highlight some interesting things that have come out of these reports in previous versions of this course. The final thing I should emphasize is that in all cases, when I highlight things, I will keep. Both the names of the people who provided it and the names of the companies or any names that are associated with this anonymous. You don't need to include names of the person you interview or their companies in the report. You can if you think that's better that's totally up to you. Final thing I should mention in conjunction with this assignment is that for some of you, finding a person to interview may be a challenge. And to be honest, that's part of the point of the assignment. One thing that you will definitely experience in your professional career is if you haven't already, is the importance of networking. And so activities like this that require you to do a little good networking are actually quite useful. There's no constraints at all on who you can interview. Well, there's one you can't interview yourself. But other than that you can interview a friend, a friend of a friend, a family member, a person recommended by a family member. For that matter if you know a classmate here or if a classmate volunteers on a discussion board to say hey, I would be a good person to interview, that's fine as well. If you do have further questions about the assignment you're always welcome to reach out to the course facilitator ask. Now we're going to turn to discussing codes of professional ethics. Virtually all professions have codes like this. Whether it's medicine, law, engineering, journalism, you name it. They can go by slightly different names. They're sometimes called codes of ethics or sometimes called codes of conduct but they all have the same sorts of elements to them. Many of these most of them I suspect are established by professional societies. For instance, the ones I just mentioned in the United States at least would be established by the American Medical Association, the American Bar Association, the Society of Professional Journalists and so on. The concept of these however, goes back far beyond when we had professional societies like this. In fact perhaps the most famous part of a professional code is the one in medicine that's called the Hippocratic Oath. Which goes back to ancient Greek times around 300 to 500 BC, the name Hippocratic Oath is attached to the Greek physician Hippocrates who lived at about that time. Although from the limited reading I did it's not believed that he was actually the person who established that code. Unfortunately, while the ancient Greeks gave us ethical frameworks and the Hippocratic oath, they did not give us ethical frameworks for data science. So we're going to have to use one's from more modern professional societies. As I said, we're going to look at codes from the main statistics society and the main computing society, the American Statistical Association, in ACM. These are both the well established codes and they really apply at least if you take the union of them very well to data science. There are starting to be some data science associations, and I looked online and they actually do have in some cases their own codes of ethics. I think it's premature to know if any of those we're going to have the same impact and if so which ones were going to look at the most. So for now I've decided to just stick with the ones from ASA and ACM which are so well regarded all right. These air all fairly well related and so I think you would get the same sense from studying other ones as well. If you look at these codes, they tend to have two different flavors within them. They have comments on ethical issues that are quite technical. And then they have comments on ethical issues that are professional such as working with clients and colleagues. We're going to be talking about both. I don't think you'll see anything in them that you find totally startling, there are other common sense but they're definitely important to overlap with what we're studying. They are also quite extensive at least the two that were looking at are quite extensive. So I hope you haven't least skimmed them and I'm now going to point out highlights of each of the ones from the ASA and from the ACA. So I'll start with the code of ethics of the ASA. As you've seen if you look at it has eight categories I won't list them here. It's interesting each of the categories actually tends to have a mix of technical and broader professional topics within it. I'm going to summarize first some of the key technical things that I found as I read through it and then some of the key professional ones. A little bit of an apology I'm talking about the ASA a code for one of the few parts of this course. I'm actually going to read some stuff verbatim because I think that's the best way to convey it all though I know you can all read yourselves as well. So on the technical side, the way these are phrased is each one starts out with the ethical statistician and then it says what they do. So for that really jumped out at me were first, identifies and mitigates any preferences on the part of the investigators or data providers that might predetermine or influence the analysis results. Second, employees selection are sampling methods and analytic approach is appropriate and valid for the specific question to be addressed. So that results extend beyond the sample to a population relevant to the objectives with minimal error under reasonable assumptions. That was pretty long. Third, acknowledges statistical and substantive assumptions made in the execution and interpretation of any analysis. When reporting on the validity of data used acknowledges data editing procedures including any imputation and missing data mechanisms. And fourth reports the limitations of statistical inference and possible sources of error. So if you comments on the things that I just read first of all, they're pretty technical more technical than some codes get into. If I were to really simplify these things, I would say that they say that make sure that your methods don't bias the outcomes that your methods are sufficiently broad for the outcomes that they're said to achieve and that you don't overstate the results. These points you're going to be closely related to what we do in the next module we talked about algorithmic bias. So we can come back to some of these principles when we talk about that. Now let me do the same thing and extract just a few of the professional points within the ASA code of ethics that I found noteworthy. Again admitting this is a totally subjective judgment on my part. So the code says that the ethical statistician first accepts full responsibility for his her professional performance. Provides only expert testimony, written work and are all presentations that he or she would be willing to have peer reviewed. Second in publications and reports, conveys the findings in ways that are both honest and meaningful to the user reader. This includes tables Models and graphics, third to aid peer review and replication shares. The data used in the analyses. Whenever possible, allowable and exercises do caution to protect proprietary and confidential data including all data that might be inappropriately that might inappropriately reveal respondents identities. Fourth strives to make new statistical knowledge widely available to provide benefits to society at large and beyond his her own scope of applications. And the last one I'll read exhibits respect for others and thus neither engages in or condones discrimination based on personal characteristics. Bullying, unwelcome physical including sexual contact or other forms of harassment or intimidation and takes appropriate action when aware of such unethical practices by others. So to make just a few simple summary comments about that, I think there's a few themes that come out in these professional guidelines that we're going to also be. Shown in the ACM code onto say them very simply and I'll come up on the screen as well. One convey results honestly, two share data when that doesn't violate confidentiality. Third, helped to educate the public and fourth treat other people decently. These may all seem like obvious things to do but they're very important things to do and to stay. I want to comment on two of them in particular, one educating the public. That's not a part of every profession. But as you see from this course, it's got to be a part of a data science professional because our methods and our results impact people so much and in ways that they don't necessarily understand. So for instance, does the average person understand what YouTube is doing to them with recommend systems? Or has the average person ever heard of a tracking pixel? So, to the extent that we have the opportunity to convey that to the general public, that's an important part of our roles. And the second one is the whole topic of treating people well. And I'll simply say that in the next lesson will get evidence. If we didn't already realize that that doesn't happen universally, and it's an awfully important part of what makes us successful in good professionals. Finally, let's turn to the A c M code of professional ethics. This code was revised in 20 18, the first time. It had been revised in about 25 years, so it's actually fairly up to date, both in what it talks about. Technically and professionally, the code has fewer categories and sub categories than the A s a code, but it has longer amounts of text in each, so it's relatively long. But looking just that the headers and sub headers gives you a very good field for the code again. What I'm going to do is subjectively pull out. What I think there's some highlights of the code. This code is not as specifically technical as the A s a code not even close. And so I'm going to group these a little bit differently. When I looked at the things that I would put in the category of either technical or perhaps many technical or in some cases philosophical about technical, there were four that really jumped out at me. First, avoid harm. Secondly, respect privacy. Third, design and implement systems that are robustly and usably secure and fourth contribute to society and to human well being. Acknowledging that all people are stakeholders in computing, there's that theme again that we have a field that is impacting everyone. And then on the professional side, there were also a handful that really jumped out at me, so I'll read those first. Be fair and take action not to discriminate. Second, perform work only in areas of competence. Third, foster public awareness and understanding of computing related technologies and their consequences again, something we saw in the A s a code as well. Fourth, ensure that the public good is the central concern during all professional computing work. In a moment I'm going to comment on that one a little bit. And fifth, manage personal and resource personnel. Sorry, manage personnel and resource is to enhance the quality of working life. So these are really quite general and some of them to use an American phrase are sort of like motherhood and apple pie there just what a good person ought to do. But they're still very useful things to having a code. Let me just comment on a few of them. As I said, once again we see a statement about the the desire and perhaps the duty for data science, computing professional, statistics professional to help educate the public. This is an interesting to think about as we go into our career, the second one. If I reiterate the one that I said I was going to comment on ensure that the public good is the central concern during all professional computing work. Is that really something you could do if you're working for a company that makes gambling software? Are you making the public good the paramount thing in your work? Maybe not, if you're working for a defense contractor, I think that one gets quite a bit more subjective. Some people would say, absolutely. Keeping people safe is an absolute part of the public good. And I've heard some people have other opinions on that. If you're working for a health care provider, hopefully it's closer to the public, good. This is something we're going to come back to when we talk about things that people have heard from talking to other computing professionals. And it may be less something that you can do in a particular job than something might. That may influence you when you choose from a variety of potential jobs. But it's a very interesting thing to have in a code and to think about. The other one I'd like to comment on is the principle that said that one should make the quality of working life a priority. Personally, I'm delighted to see that in the code but I think it should also be admitted that that's a value judgment. It's a societal value. It may not be one that's universally shared in all societies but personally, as I said, I hope it's not that we get that way. So as a final set of comments in this lesson about codes of professional ethics, it's interesting to ask how, if at all do these relate to the ethical frameworks, the ethical foundations that we studied at the beginning of this course? I think in many ways they really do. And it's pretty clear that they come from a position that's consistent with virtue ethics because they really are framed in what would a virtuous data science or statistics or computing professional do? And they certainly have Conte and values in them a swell values such as honesty and plenty of others. It's interesting that if you look at the ACM code of professional ethics, as you know it also has case studies and it has analyses of those case studies. And those analyses don't explicitly point to any of the ethical frameworks ethical that we studied. They kind of use more of the common sense ideas of ethics. I actually know the people who are responsible for this and so I asked them about that. And their response was that they personally are actually quite familiar with philosophy and ethics that they use it when they teach these subjects themselves. But they just chose to do a more general common sense sort of ethics within the examples in the A c M code of ethics. So that's an interesting place to wrap up. And in the next module, we're going to turn the lessons from the professional workforce.