So before we get into the details of probabilistic graphical models, we need to talk a little bit about what a probability distribution is, just so we have a shared vocabulary. So, let's start with a very simple example of a joint distribution. One that is going to be extended in examples later on in the, in other parts of the course. and let's start with an example that involves just three random variables. this is what I call the student example and you have a student who has, who can be described, in this case, by a variable representing his intelligence. And that could be high or low. The student is taking a class. The class might be difficult or not so this random variable, B. So, the random variable I has two values. Difficulty variable also has two values and then, there is the grade that the student gets in the course, and that has three values. In this case, we're going to assume A, B, and C. Now here's an example, joint distribution over this over this set of three random variables. So this is an example of P of I, D, G. It's a joint distribution. And let's think about how many entries are in such a joint distribution. Well since we have three variables and we want to, we need to represent the probability of every combination of values for each of these three variables, and so we have 2 * 2 * 3 possible combinations. For a total of twelve possible values that we need to assign a probability to. So there's twelve total parameters in this probability and I'm going to introduce a notion of independence parameters which we're going to talk about later, as well. Independent parameters are parameters whose value is not completely determined by the value of other parameters. So in this case, because this thing is a probability distribution, we know that all of the numbers here on the right have to sum to one. And therefore if you tell me eleven out of the twelve, I know what the twelfth is, and so the number of independent parameters is eleven. And we'll see that, that is a useful notion later on when we start evaluating the relative expressive power of different probability distributions. What are things that we can do with probability distributions? Well, one important thing that we can do is condition the probability distribution on a particular observation. So, for example assume that we observe that the student got an A. And so we have now an assignment to the variable G which is G1. And that immediately eliminates all possible assignments, but they're not consistent, with my observations. So everything but the G1 observations, okay? And so that gives me a reduced probability distribution, and so this is an operation that's called reduction. I've taken the probability distribution, I've reduced away stuff that is not consistent with what I've observed. Now, that by itself doesn't give me a probability distribution, because notice that these numbers no longer sum to one, because they summed to one before I threw out a bunch of stuff. Umm, and so what I do in order to get a probability distribution, what I do is I take this. Normalized measure. . An indication the word measure indicates that it's a form of distribution but the fact that it's un-normalized means that it doesn't sum to one, it doesn't normalize to one. So this un-normalized measure if we want to turn it into a probability distribution, the obvious thing to do is to normalize it. And so what we're going to do is take all of these entries and we're going to sum them up. And that's going to give us a number, which in this case is 0.447. And we can now divide each of these by 0.447. And that's going to give us a normalized distribution. Which in this case corresponds to the probability of I, D given G1. So that's a way of taking an un-normalized measure, and turning it into a normaliting a normalized probability distribution. We'll see that this operation is one of the more important ones that we were using, throughout the course. Okay, the final operation I'm going to talk about regarding probability distribution is the operation of marginalization, and that is an operation that takes a probability distribution over a larger subset of variables and produces a probability distribution over a subset of those. So in this case we have a probability distribution over IND. And say that we want to marginalize I which means we're going to basically sum up we're going to throw away, I'm going to restrict the tension to D. And so what that does. Is, for example. If I want to compute the probability of d0. I'm going to add up both of the entries that have the d0, associated with them. And that's, the one corresponding to I0, and the one corresponding to I1. And that's the marginalization of this probability distribution.