Machine learning and AI are all the rage. I'm going to say a very little bit about them. I believe strongly that before you go to machine learning and to AI you got to understand statistics, and you got to understand what a probability is. Because what I care about, the thing that these models have to deliver above the line is, "What is the probability that this patient has pneumonia, given their fever and cough. What's the probability that this patient is going to come back, be readmitted to the hospital after 30 days given features of their course in the hospital or the features of their illness." Right? Because I need to compare that probability against some notion of threshold, I need to decide whether it's too much or too little. That's what I need you, the data scientist, to tell me to deliver me this probability. So, it's good to start off with probability and statistics before you go on to the fancy-schmancy stuff. In general, as I'm suggesting here, what we care about is kind of the Y. There's a Y that I care about; pneumonia, readmission, whatever. Then there are features that I can observe, and I want to get a sense of the Y if we're given the X's. So, there are a number of statistical methods, I'm not going to go through all of them in detail. Logistic regression is great, if your Y is binary, so what's the probability of death by the given aspects of the patient? What happens is I give the machine a training set, I get parameters, weights, from the machine. I can validate it on the second data set, I could be happy with it. I now give the formula that this logistic regression reflects. I give, that's what I put in the knowledge base, and then in use, the knowledge base says, "Okay let me see. Give me the patient's data, I'll multiply by the parameter values that I got from learning, and I'll spit out an answer." So, the learning phase is what goes on over here, what gets deployed is the formula that results from learning. So, same thing multiple linear regression. It the same sort of a thing, except rather than predicting an event and predicting a level. So, what's your blood sugar going to be if you treat it this way or if you have these features. Neural network, we'll say a little bit more about that in a minute. But, I can have multiple features and multiple outcomes. Again when I'm done, I have really a bunch of weights that are used with the features that are observed in the record. I had the machine use those weights to calculate what I care about and then I go ahead and use the answers clinically. A different animal is a classification tree, and just to be confusing these are often called decision trees, which are not decision analyses trees here. So for instance, I live in a complex with a bunch of trees. Since I grew up in Manhattan, I'm very excited to see trees and I have no idea what they are. So, I actually bought a book that helps me classify trees. I go outside and it says, "Okay, look at the leaf." And to be honest, I thought I should look at the tree. But I was surprised to see that the most important feature is the leaf. And then it says, "Okay, is it a leaf, or is it a cone? Then if it is a leaf, do the veins look tree-like, or are the veins in parallel?" So on and so forth. So at the end, I get the Y which is it's an Oak tree, which I now can recognize by the leaves by going down this classification. So, the learning is what should the classification roots be. What gets delivered is the tree, and then in real time the machine takes the features from a patient, and then goes down the tree for that patient and spits out, "I think it's a this, I think it's a that." A Bayesian belief network, or a Bayesian network, is a bit more complicated. It can represent causes that give you diseases, diseases cause syndromes, syndromes give you manifestations, which can be symptoms, and signs, and lab values. It can be complicated to again to calculate the conditional probabilities inside this thing, which are basically waits. Then what's interesting with a Bayes network is that it goes from causes to manifestations, you give it the manifestations it goes backwards and tells you what the likely disease is, or what likely causes. But again, once you've learned it, it delivers it to the knowledge base and uses the data from your patient to spit out an answer. So, what is machine in deep learning? I find it helpful to look at this picture of children. So, on the left-hand side, the mother is showing the child the shapes. Here is a cube, or here is a sphere, or here is a cube, here is a ball. Do you see the difference? All right. In the second case, that baby is unsupervised and is going through, or I should say, melding their way through the world, and probably figuring out eventually that there are cubes and spheres all by themselves. What features do they use to figure these things out? Hopefully, they'll learn that, sharp edges means cubes, and roundedness means sphere. So, roundedness is a feature, edge is a feature. So, in supervised they're told what the outcome is. They got to figure out for themselves which features are important in unsupervised it's learning both the Y's and the X's. Much more challenging. So, here's from a computation biology. A little graphic to show a little bit of the difference between statistical and machine learning. Basically, the ideas that in statistics the kind of what the features are. If the rule is the squiggle between the red and the blue dots in that fourth graph, then you can evaluate and see how well it performs. In machine learning, there's a lot more trying to figure out. First of what are the features are, and given what the features are, that how they perform. Then you could end up with a complicated network to do the dirty work. So, a neural network is what you see in the left-hand side, which I already described. Deep learning simply has a lot more of those middle blue circles. So, you'd like it to be that you understand what the leech layer means. But, these networks unfortunately are very opaque, both in terms of what the layers mean, and in terms what the weights inside the network means. At the same time, they perform excellently well and are very exciting. A third type of a construct is called a tensor. I bring this up now because Google calls their machine learning environment TensorFlow. But, it's a linear algebra way of thinking about all this which is not deep learning. I'm not going to go into it but very quickly you can imagine that I have, let's say, symptoms is one axis and time is another, I have a matrix. It turns out that matrices can be factored into either or simple things. But of course, I don't have just the symptoms, I have labs, I have medications, I have conditions. So now, I have multiple dimensions. So a tensor is a matrix of multiple dimensions. Well it turns out that I can break that tensor down into component matrices, and I can reason about the matrices with well described mathematics. So, this promise that tensor thinking may help us in going from features to outcomes and help us get from data to knowledge. So. What are data issues? Well, I already mentioned the whole list before in terms of data quality issues. A subtle one is that even with all the big data, some may not be enough. So, for instance, in order to get a stable answer, how many cases should I have? A simple rule of thumb is that if I have P parameters in my model, then I need at least 10 times P cases to work out my model. Now, if I have logistic regression, or linear regression. Let's take logistic regression. If I have M features, I could have two to the M possibilities. Possible patterns of those features. So, you could tell me now I need 10 times to the M cases in order to learn the model that's appropriate for this problem. Even with seven and a half billion people in the world, I could not get a data set big enough to deal with M greater than 20. So that's pretty amazing And when we talk about deep learning. We get at this in multiple ways that I'm not going to go into, but it's important for you to realize that a big data set does not always mean enough data. We already went through data quality issues before like I mentioned. A couple of other issues is temporal. Too many people doing data analysis have too simple an idea of time. Sometimes they use too broaden a window of time, sometimes they're unclear about the window of time. Certainly, disease develops over time and therefore your data to knowledge approach has to be cognizant of time. Tincture of time is the notion that sometimes we use time as a treatment. Let's just see what happens. As Yogi Berra says sometimes you can observe a lot just by watching. Clearly, the domain is complicated, and the big problem is that certainly with deep learning, you end up with a black box that's not interpretable.