So now we're going to see classification methods based on probabilities. It's another very famous class of classification and prediction methods called Bayesian networks, also called belief networks or graphical models. They derive a predictive model from data based on Bayes theorem. Bayes is a very famous mathematician who created this particular rule, here. So this is also based on the concept of conditional probabilities. What this says is that P of M knowing D, knowing or after D = P of D knowing M multiplied by P of M divided by P of D. So a little bit of vocabulary here, M is a model or hypothesis, a set of hypotheses. So the model is similar to the models we talked about. This is a classification model we are trying to build. M is a model. Here it's going to be a Bayesian network. D are the data. P of M knowing D is called the posterior. It's the updated belief that M is correct, based on the data D. So it's kind of the rich probability that M is correct knowing the data D being available. The other one is P of M, it is our estimate that M is correct prior to any data. So the probability of the model being correct without any other knowledge on the model. And P of D knowing M is the likelihood. The likelihood is the probability that these data are likely based on the model. To be able to infer to a model, then model needs to evaluate the prior probability of the model. So how do we know that this model is a good model? And the likelihood, P of D knowing M. So we find this model, how likely are these data that we have? So once you have this information, you can build a model. The Bayesian networks allow to model situations in a probabilistic network but also to reason from the network. It's also called to perform inferences. What's interesting about this model is that it deals with causality. Causal relationships are made explicit, and can be used to propagate new facts or beliefs into the network. So far, we have not really discussed that in the previous models that we saw. It looked like all the features were at the same level. Here we have this motion of causality. And that's why it's so interesting as a particular model. As you can see here, this is a graphical model based on, for example, for lung cancer situation to make what's called differential diagnosis. Clinicians, they like the concept of differential diagnosis. Generally, they don't want to know only whether a patient is likely to have one diagnosis, but what's the difference between the different possible diagnoses here, lung cancer and emphysema, for example. So they want to know what's the likelihood of each, so they can make their own judgment. So, lung cancer is caused and influenced by the family history of a person and their smoker status. And emphysema is, according to this model, which generally will be built with a clinician, it represents some kind of knowledge of clinicians about the difference between lung cancer and emphysema. So if a person is a smoker, can cause emphysema as well. And both emphysema and lung cancer cause dyspnea, which is shortness of breath. And of course a positive X-ray for us will be a positive for lung cancer, and of course will be cause only by lung cancer, and not by emphysema. So we represent graphically the causal relationship that we know about. And then there are also some data that we know about, probabilities associated with each node. There is a conditional probability table. For example, for lung cancer, we have these eight figures, eight because we have this two prior conditions. So, for example on the lung cancer row, probability of lung cancer knowing family history, a positive family history or positive smoker status is 0.8, which is pretty high. The probability of lung cancer where there is a family history of lung cancer but no smoking status, 0.5. Still quite high. The probability of having lung cancer when there is no family history of lung cancer but somebody is a smoker, 0.7, which is almost as high as if there was a family history of lung cancer. And finally, the probability of having lung cancer when there is no family history of lung cancer and the person doesn't smoke is only 0.1, which is pretty low. So again, this data can be taken from the literatures. It can be learned of course from data. There are a lot of ways that you can build these tables. And once built, for any data set that you would plug in, in any particular case, you would be able to say what's the probability that this person has lung cancer, for example. Now very important in biomedicine because they can calculate the probability associated with a diagnosis or classification. For example, in differential diagnosis. Also, graphical representations are highly valued for understandability, just like for decision trees. There is also a simplified version of is illustrated on this figure here, which is called Naive Bayes, and Naive Bayes is also very standard way of doing classification in particular. These methods are based on the hypothesis of independence between features. Which may require feature selection before the prediction test, because when you have in particular a lot of features, very often some kind of relationship between the features, they are interdependent a little bit. And so again, feature selection may improve on this so that these models would function better. So they are very popular right now, graphical models, particularly in biomedicine. Thank you.