Next we'll turn to two articles that discuss projects to create massive databases of health information, but in two very different context. A national project in Israel and a corporate project in the United States by Google. The first article is entitled Israel Prepares to Unleash AI and Healthcare from the Wall Street Journal. So a little context here, as the article mentions, Israel is in what's probably an unusually good situation is that in that it already has 20 years of digital medical records for the vast majority of the population, and those records are in similar formats. And so it's feasible to try to unify them into one database, and that's what it wants to do at this point. The goal of the project is to provide a richer data set for medical AI. It should be noted that even though the data exists already, patients are still required to provide consent. It's not mandatory that they're part of it. It should also be noted that if they do, they're also going to be asked to supply genetic information. So there's a quote at the beginning of the article that gives the premise for the project, and I think it's worth reading and then analyzing. So quote, AI assisted machines and programs with their ability to learn and draw conclusions from the millions of detailed histories, medical tests, and diagnostic results stored in these records, have the potential to help doctors give more preventive care by spotting diseases in earlier stages. Taylor medicine to the needs of specific patients, read diagnostic imagery and lab tests faster and more accurately, and make health care less expensive. So that sounds pretty good. And I think the questions for all of us include, first of all, whether we really believe all those statements and secondly, even if we do, are there ethical considerations that we need to consider that may make this somewhat less of a panacea. So let's look at the statements that claims first about what this database and AI will enable to be done. Some of the statements clearly are already true. The ability to draw conclusions from huge databases are things that we're seeing already, the ability to help doctors spot diseases earlier. Some are more speculative. One of them where they talk about tailoring medicine to individuals. If you're talking about personalized medicine there, that's a really difficult thing to achieve in a cost effective way. And the statement at the end about the cost being less remains to be seen as well. But they're clearly are advantages, and in fact, the article does give examples of some of those advantages that have already been achieved. It mentions that constantly running algorithms on patient data leads to earlier detection of kidney disease and colon cancer. So that's a a really major achievement from this data. It mentions an interesting collaboration between the Mayo Clinic with an Israeli company that is looking at the correlation of voice characteristics with heart disease. So it's a totally non invasive diagnosis of heart disease. Unfortunately, the article doesn't say anything about the reliability of that diagnosis or what aspects of diagnosis, but it's fascinating. The article also mentions that these this national database is enabling the start of many start up companies that are working on that data. And it gives an example of a company called K Health which is doing something fascinating that has implications for a variety of things we'll talk about in this module. The company is developing an app that mimics conversations with doctors so that you can talk to the app and the app then accesses millions of medical records to compare your symptoms to others and suggest lab tests that you can order and that you're able to order without ever seeing a doctor. So it's taking the human totally out of the loop, topic that comes up repeatedly in this module, and we need to talk about the ethical considerations there. And you certainly can believe that there's other good things that could come out of having all of this data, but they're clearly ethical issues as well. I think the two major categories of them are ones that we've already said several times. One of them are the potential privacy issues, not only the kind of security of the data, but who gets to see the data or the conclusions that arise from that data. And the second one is how far you're taking the human out of the loop. And by the way, the article itself has a couple of conversations at the end that speak to two different sides of this. I'll also say that so far we haven't really talked about ethical frameworks in this module, and we're going to be on a number of occasions. The article kind of gets into that because it makes a statement towards the end that in the end the benefits of this will probably outweigh the pitfalls. And without saying that, that is a utilitarian analysis. And in general we're going to see that it's often the case in healthcare that the natural way to look at things is a utilitarian way, but it's not the only one. So as a final comment on this article, let's say this is a good idea and probably is in many ways, can it be done everywhere? And the answer to that is no or not nearly as readily for two reasons. One is that this availability of pervasive health data in similar formats is not something that shared everywhere in the world. And secondly, there's different levels of privacy laws in different parts of the world that may make this harder in some places. So next we'll turn to the article from The Wall Street Journal, Google's project nightingale gathers personal health data on millions of Americans, which is relevant to the ability to do this sort of thing that we just discussed in the United States, but with a very different corporate approach. The article, in fact, mentions that that not only was just is this a pilot project that it was an initially a secret project. It's a project that entails aggregating comprehensive health data for individuals from 21 of the 50 United States and initially neither the patients nor their healthcare providers, their doctors, were very aware that their records were involved. Turns out that this is legal under US health privacy regulations, HIPAA what that's called, as long as the company that's doing this is doing it to help the health care provider carry out its functions. So the health care provider is subject to legal restrictions that another company that helps it is not subject to. So while this is not a legal course, you can see that there's plenty of issues here that are not only ethical issues, but in the end policy and legal issues. And they're clearly are big privacy issues because of the article mentions a large number of Google employees have access to this data, and so there's plenty of things that could happen. I'm going to make a little bit of an aside here. Articles like this sometimes refer to anonymized data, which is an important concept. It's the concept of taking personally identifiable information out of data. And the comment is that this has been shown in the literature to have its limits. And that is that one can do this and still run into problems. And sometimes that can be the case if one combines various databases. In fact, some of this work is in the context of genetic data. And let me give you just a very simple artificial example of that. I think it's something that's important for you as data scientists to understand. Let's say we want to make sure that nobody is able to put people's names and their social security numbers together. So we might have one database that has your names, your phone number, but not your social security number. We might have a second database that has your social security number, your phone number, but not your name. Well, that's fine, until somebody is able to put those two databases together, and then through having the same phone number, you can immediately link your name to the social security number. Now the people who actually do this, they identify information, are much more clever than that, but that just gives you a sense of the pitfalls that one can still have even if one tries to anonymized data. So as a summary comment about this particular article, it just gets to the point whether it's the protection of data or related things. There were in a realm where there are not only ethical issues but policy and legal issues where we're really just getting started. Finally for this lesson, after two articles that were more on the data collection side, we'll come back to one that's more on the analysis of the data side using AI. Of course, those two things are intimately linked, we collect the data so that we can have it for this type of analysis, as the Israeli article mentioned, and we need lots of data to do useful analysis. So we'll look at the time article, AI can now make medical predictions from raw data, but can deep learning be trusted? So the article starts with the fact that a leading United States hospital, every screening mammogram that is screening for breast cancer, is analyzed using deep learning, and it gets immediately to a pro and a con. The pro is that that seems to be working well, that the model appears to be able to make better predictions of the five year prognosis than physicians can themselves. The con is that we can't really say how the deep learning model reaches its conclusion, which as you know is a statement that can be made generally about deep learning and is the issue that is called explainable AI, although that name is not used in the article itself. To me, the remaining parts of the article are largely a commentary on human issues and important human issues. One of them is that the doctors sound suspicious of something new that they don't understand well. And I think this is an ethical issue, I think we have to take this seriously. Doctors deal in life and death situations with patients and we have to as data scientists and as technologists do our best to help them understand the technology that they're going to be using. The article also quotes some computer scientists on their views on this topic, and there's various views that are taken there. There's one computer scientists who has been helping to develop the analysis that's talked about in the article, and that person is quoted as saying that the explainable AI issue is quote a myth. I'll just say that I wouldn't have taken that approach. As I just said, I think it's not only a very important one for our field technically, and it's getting a lot of attention, but it's an important one for ethical obligation to help other people understand what we're doing. There's three things that the article stresses that I think are useful in summarizing it. One is particular to this application of the mammography that as we test these systems, we're going to need to understand not only how accurate they are, but in what respects they're accurate. For instance, in medical diagnoses, there is a big difference between false negatives and false positives. And one is much more concerned about false negatives that is failing to diagnose the condition when that condition exists. So we're going to have to look carefully at what the characteristics of the diagnoses are in that respect. The second is the one that we've already said, that we're going to have to pay attention to the human side. Both the doctors and the patients are going to need to be comfortable with these AI diagnoses if we're going to use them. And the third is in some sense the flip side of what I just said. That is time may take care of that. If over time it's becoming clear that these systems are doing a good job and a better job than a medical provider could do that by themselves, then both those providers and the general population will start to trust them more. That comment is probably an interesting one to end this lesson on because it's implicitly an ethical framework comment. When we talk about health care in general, our goal is pretty simple one. We want to keep people healthy and keep people healthy for a longer time. That is inherently a utilitarian way of looking at things. And so while it doesn't get discussed in these articles, in many cases, they're taking utilitarian look. There are cases where other ethical frameworks would come in, particularly when we talk about privacy. And if those two things conflict, then you get into even deeper conversations and the decisions maybe less clear. So that wraps up this lesson. In the next one, we're going to talk about where there's two more futuristic healthcare topics, genetic editing and neurological interventions, and these have even more profound and deep ethical considerations attached to them.