So far, we've defined graphical models primarily as a data structure for encoding probability distribution. So we talked about how you can take a probability distribution. And using a set of parameters that are somehow tied to the graph structure. One can go ahead and represent a probability distribution over a high dimensional space in a factored form. It turns out that one can view the graph structure in a graphical model using a completely complimentary viewpoint. Which is, as a representation of the set of independencies, that the probability distribution must satisfy. That theme turns out to be really enlightening, and thought provoking. And so let's talk about that. And we are going to begin by just defining the notion of independencies that we're going to utilize in subsequent presentations. So let's start by just defining the very basic notion of independence within a probability distribution. And, initially we're just going to talk about the probability, the, sorry, the independence of events alpha and datus within a probability distribution and let me just go ahead and introduce this notation, this says P the symbol is the logical symbol for satisfied. And this perpendicular symbol is a standard notation for independence.'Kay? So this says, P satisfies alpha's independence of beta. That's how one should read that statement. And there's actually three entirely equivalent just definitions of the concept of independence. The first one says that the probability of the conjunction of the two events so you can, you can, there's several, several different ways to denote conjunction, some people denote it as intersection, we typically denote it using a comma so here is the probability of alpha and beta holding both, is simply the probability of alpha times the probability of beta. That's the first definition. The second definition, is the definition about flow of influence. And this says, if you tell me beta, it doesn't affect my probability in alpha. So the probability of alpha given the information about beta is the same as the probability of alpha if you don't give me that information. And, because probabilistic influence is symmetrical, we also have the exact converse of that. That is, the probability of beta given alpha is the same as the probability of beta. So this is independence of events, and you can take that exact same definition and generalize it to the independence of random variables. So here we're going to read this in the exact same way. This says p satisfies, x is independent of y for two random variables x and y. And once again we have the exact same set of definitions, so the first one says that p of x comma y is equal to p of x times p of y. The third, the second says that p of x given y is equal to p of x and p of y given x is equal to p of y. You can made this new statements in two different but equivalent form, the first is at a universal statement. So for example, you could read the first statement as saying, for every assignment little X and little Y to the variables x and y, we have that p of the event x comma y is equal to p of x * p of y. So you can think of it as a conjunction of lots and lots of independent statements of the form [SOUND] over here. That's the first interpretation. The second interpretation is as an expression over factors, that is, this one tells me that the factor over here which is the joint distribution over XY is actually a product of two lower dimensional factors one which a factor whose scope is X, and one is a factor whose scope is Y. These are all equivalent definitions but each of them has a slightly different intuition so it's useful to recognize all of them. So let's think of examples of independence, here a, A fragment of, our student network, it has, three rend variabled intelligence, difficulty and course grade, and this is a, probability distribution whose, who, who, that has a scope over three variables, but we can go ahead and marginalize that, to get a probability distribution over the scope, which is a factor over the scope ID as it happens, this is the marginal distribution which you can confirm for yourselfs by just adding up, the appropriate entries, so just as a reminder to get I0, D0 we're going to add up this one. This one, and that one. And that's going to give us this factor. And it's not difficult to test that. If we then go ahead and marginalize p of I, d to get p of I and p of d. That p of I, d is the product of these two factors. Here is a good example of a distribution that satisfies an independence property. And here is the graphical model and when you look at it you can see that there's no, direct connections between I and V, and, well, and we'll talk later bout how that tells us that there is no the detour action independence of this distribution. Now independence by itself is not a particularly powerful notion because it happens only very rarely. That is only in very few cases are you going to have prob, random variables that are truly independent of each other, at least few interesting cases, you can always construct examples. So now we're going to define a much broader notion of much greater usefulness which is the notion of conditional independence. Conditional independence which applies equally well to random variables or to set of random variables is written like this so here we have once again the P satisfies. Here we have, again, the independent sign, but here we have a conditioning sign. And this is red as p is p satisfies x is independent of y given z, okay? And once again, we have three identical, not identical, sorry. Three equivalent definitions of this of this property. The first says that probability of X, Y given Z is equal to the product of P of X given Z times the probability of Y given Z. Once again, you can view this as a universally quantified statement over all possible values of X, Y and Z or as a product of factors. Definition number two, is a definition of information flow given Z, Y gives me no additional information that changes my probability in X, or, given Z, X gives me no additional information that changes my probability in Y. Once again, this is a, this is, you can view this as an expression involving factors. Notice that this is very analagous to the definitions that we had to just plain old independence, Z effectively never moves, it always sits there on the right hand side of the conditioning bar and never moves. And so if you find yourself having a hard time remembering conditional independence just remember that the thing your conditioning on just sits there on the right hand side of the conditioning bar, all the time. Let me introduce one final definition, which is going to serve us, quite well in, in an interesting derivation. it's very similar to the first definition but it says that the probability of x,y,z the joint distribution, is proportional to, so we're going to forget, the potential need to, to add normalizing constant, it's proportional to a product of two factors. Oops. One factor over x and z, and one factor over y and z, okay? This turns out to be yet another definition of conditional dependence. Let's look at an example of conditional independence. imagine that you have that I give you two coins. And I'm telling you that one of those coins is fair, and the other one is biased. And it's going to come up heads 90% of the time. But they look the same. So now you have a process by which you first pick a coin out of my hand. And then you toss it twice. So this is which coin you pick. This is the two tosses. Now, let's think about dependence and independence in this example. If I. Don't, if you don't know which coin you picked, and you tossed the coin and it comes out heads. What happens to the probability of heads in the second toss? Be higher. Right? Because if it came up heads the first time, that is more likely to happen. I mean it happens 50, 50 with a fair coin, but it also happens that it happens with greater probability with a biased coin and so the probability of having heads in the second toss is going to be higher now. On the other hand, if I now tell you, no, no, you've picked the fair coin, if there wasn't really, you don't really care what the outcome of the first toss is. It doesn't tell you anything about the probability of the second toss. Similarly, if I tell you that it's the bias coin. It also doesn't tell you anything at that point. The first toss and the second toss are no longer correlated. And so what we have is that x1. An X two are not independent. So P does not satisfy. X one is independent of X2. But we have that P does satisfy. X one is independent of x two given c. So here's a very simple and intuitive example of contuitive independence. Let's go back to con, another example of conditional independence, one in the distribution that we've seen before. This is actually a very analogous, model, because it also has, in this case, one common cause, which is this case, it's the student's intelligence. This is in the student example that we've seen before. There are two things that emanate from that, the student's grade in the course and their SAT scores, and, and once again, it's, you can generate the, The joint distribution I S. G, which is this, and now you can look at the probability of S and G given, for example, I zero, and ask yourselves how does that how does that decompose and is that independent given when we look at the probability of S given I zero and. In the probability of G minus zero. And convince yourselves that in this case conditional independence applies. Now one, somewhat counter intuitive property is independent instead you kind of don't think about when you ish, hear about independent, about conditional in-dependencies the first time. Is of conditioning, on some things, doesn't just gain you in-dependency as, as it did in the case of the coin, or as the case of the intelligence. But rather, condition can also lose independency. So this is the the other fragment of our student network, where we had the. Intelligence and the difficulty both influencing the grade, and we have already seen that although I and d are independent in the original distribution, they are not independent when we condition on grade. So this is a case where and you can just convince yourselves of this by examining this distribution over here, that I and d are not independent in this conditional distribution even though they were in the marginal distribution.