So, this brings us to the dangers of AI. I'll lead to love the other courses how wonderful AI is, what you still and everything. In a course on informatics where we make a big deal about above the line issues and unintended consequences, it's important to live doing above the line concerns about AI. So, I have a couple of slides about that. One that is receiving increasing attention is simply the bias, and there we're talking about racial type of bias, and other socioeconomic bias. So, if your training set has a lot of poor people and not a lot of rich people, you may conclude when you're done that poverty alone is a risk factor for something bad. Now it is very often a risk factor for something bad, but you want to know that by having done it the right way, rather than doing it as an epiphenomenon of how your dataset is put together. So, you may just have the wrong input. Taking all, while I talked about noisy data and missing data, etc. into account. There's issue of the wrongly understood input. You may think that you have weight, but what you really have is something else. You may think you have established diagnosis, but what you really have are presumptive diagnosis. I'll show you an example of that in a couple of minutes. A big issue is that the temptation for a data scientist is start off with a dataset that is obvious cases and obvious controls. If I wanted the machine to distinguish temporal lobe epilepsy from normal, I should have cases that are clearly normal and clearly epilepsy, and get to distinguish. The problem, of course, is in practice, you won't see that input. Again, the inexplicability, the interpretability, so that comes up a lot, and this JASON group from 2018 pointed out that if you're going to use AI, you're going to have to validate it much harder than you're used to doing other things. Steve Dance and I published a paper in 2018 that lists actually 10 things to be worried about when it comes to using somebody else's models in your environment. We were just talking about statistical models. It gets much worse when you're talking about AI which is uninterpretable. Example is say, they gave an example of a patient in one case, patients with asthma who later had pneumonia. It looks like the asthma was protective against mortality which sounds naughty, right? If you have asthma, your lungs are not in great shape. You now have had pneumonia on top of that, you probably should be in worse shape than if you started out with normal lungs, and then get pneumonia infection of the lungs, and why was this? Because the analyst did not think about the fact that patients with asthma would be more likely to get treatment or likely to get different treatment than the patients who came in just with pneumonia. So, graphically there's this notion that they are concerned with pneumonia leading to death, and they are looking at patients who had asthma, and then get pneumonia, and they're looking at death as a function of both pneumonia and asthma. So, they were concluding that pneumonia, if you had pneumonia and asthma, you did better than if you had pneumonia alone. What was missing was the fact that, in fact, the patients with asthma were being treated, and that's what was actually protecting the patient, not the fact that they had asthma. This mediating variable which is part of the causal modeling approach to statistical analysis is missing from a lot of AI and machine learning and that is a problem. So, this group goes on to talk about, as I mentioned, the nature of rigorous approval procedures, a novel monitoring, the need to share data, so people can test their systems together or build off a common dataset. Use data that goes beyond clinical, beyond the EHR and could God forbid, let's see what's going on in other countries. Because if you come from other countries, you don't need to be reminded of that. So, in terms of going from data to knowledge, the accomplishments are that you can get the machine can deal with features and probabilities, or it can discern features and probabilities. You can go beyond simple probability of Y given X's. You can go to broader domains. Image analysis is different from just, you have a pneumonia or not, and they could come into mainstream adoption. So, the pros are that these approaches can deal with both supervised and unsupervised contexts, and they can tell you not only how important features are provided, features are important at all. The cons are the black box, the uninterpretability. Another is that they're sensitive to inputs and it's not clear how much, if I want to take your AI model, how much do I have to do to make it work at home? Well, it's great that there's mainstream adoption. It also means the train has left the station before we've worked out a lot of these issues. So, we're building the bicycle as we ride it. It's a very exciting time, but it's also important to keep these caveats in mind as you go from data to knowledge.