0:00

So far, we've defined graphical models primarily as a data structure for

encoding probability distribution. So we talked about how you can take a

probability distribution. And using a set of parameters that are

somehow tied to the graph structure. One can go ahead and represent a

probability distribution over a high dimensional space in a factored form.

It turns out that one can view the graph structure in a graphical model using a

completely complimentary viewpoint. Which is, as a representation of the set

of independencies, that the probability distribution must satisfy.

That theme turns out to be really enlightening, and thought provoking.

And so let's talk about that. And we are going to begin by just

defining the notion of independencies that we're going to utilize in subsequent

presentations. So let's start by just defining the very

basic notion of independence within a probability distribution.

And, initially we're just going to talk about the probability, the, sorry, the

independence of events alpha and datus within a probability distribution and let

me just go ahead and introduce this notation, this says P the symbol is the

logical symbol for satisfied. And this perpendicular symbol is a

standard notation for independence.'Kay? So this says, P satisfies alpha's

independence of beta. That's how one should read that

statement. And there's actually three entirely

equivalent just definitions of the concept of independence.

The first one says that the probability of the conjunction of the two events so

you can, you can, there's several, several different ways to denote

conjunction, some people denote it as intersection, we typically denote it

using a comma so here is the probability of alpha and beta holding both, is simply

the probability of alpha times the probability of beta.

That's the first definition. The second definition, is the definition

about flow of influence. And this says, if you tell me beta, it

doesn't affect my probability in alpha. So the probability of alpha given the

information about beta is the same as the probability of alpha if you don't give me

that information. And, because probabilistic influence is

symmetrical, we also have the exact converse of that.

That is, the probability of beta given alpha is the same as the probability of

beta. So this is independence of events, and

you can take that exact same definition and generalize it to the independence of

random variables. So here we're going to read this in the

exact same way. This says p satisfies,

x is independent of y for two random variables x and y.

And once again we have the exact same set of definitions, so the first one says

that p of x comma y is equal to p of x times p of y.

The third, the second says that p of x given y is equal to p of x and p of y

given x is equal to p of y. You can made this new statements in two

different but equivalent form, the first is at a universal statement.

3:10

So for example, you could read the first statement as saying, for every assignment

little X and little Y to the variables x and y, we have that p of the event x

comma y is equal to p of x * p of y. So you can think of it as a conjunction

of lots and lots of independent statements of the form [SOUND] over here.

That's the first interpretation. The second interpretation is as an

expression over factors, that is, this one tells me that the factor over here

which is the joint distribution over XY is actually a product of two lower

dimensional factors one which a factor whose scope is X, and one is a factor

whose scope is Y. These are all equivalent definitions but

each of them has a slightly different intuition so it's useful to recognize all

of them. So let's think of examples of

independence, here a, A fragment of, our student network, it

has, three rend variabled intelligence, difficulty and course grade, and this is

a, probability distribution whose, who, who, that has a scope over three

variables, but we can go ahead and marginalize that, to get a probability

distribution over the scope, which is a factor over the scope ID as it happens,

this is the marginal distribution which you can confirm for yourselfs by just

adding up, the appropriate entries, so just as a reminder to get I0, D0 we're

going to add up this one. This one, and that one.

And that's going to give us this factor. And it's not difficult to test that.

If we then go ahead and marginalize p of I, d to get p of I and p of d.

That p of I, d is the product of these two factors.

Here is a good example of a distribution that satisfies an independence property.

And here is the graphical model and when you look at it you can see that there's

no, direct connections between I and V, and, well, and we'll talk later bout how

that tells us that there is no the detour action independence of this distribution.

Now independence by itself is not a particularly powerful notion because it

happens only very rarely. That is only in very few cases are you

going to have prob, random variables that are truly independent of each other, at

least few interesting cases, you can always construct examples.

So now we're going to define a much broader notion of much greater usefulness

which is the notion of conditional independence.

Conditional independence which applies equally well to random variables or to

set of random variables is written like this so here we have once again the P

satisfies. Here we have, again, the independent

sign, but here we have a conditioning sign.

And this is red as p is p satisfies x is independent of y given z,

okay? And once again, we have three identical,

not identical, sorry. Three equivalent definitions of this of

this property. The first says that probability of X, Y

given Z is equal to the product of P of X given Z times the probability of Y given

Z. Once again, you can view this as a

universally quantified statement over all possible values of X, Y and Z or as a

product of factors. Definition number two, is a definition of

information flow given Z, Y gives me no additional information that changes my

probability in X, or, given Z, X gives me no additional information that changes my

probability in Y. Once again, this is a, this is, you can

view this as an expression involving factors.

Notice that this is very analagous to the definitions that we had to just plain old

independence, Z effectively never moves, it always sits there on the right hand

side of the conditioning bar and never moves.

And so if you find yourself having a hard time remembering conditional independence

just remember that the thing your conditioning on just sits there on the

right hand side of the conditioning bar, all the time.

8:28

Let's look at an example of conditional independence.

imagine that you have that I give you two coins.

And I'm telling you that one of those coins is fair, and the other one is

biased. And it's going to come up heads 90% of

the time. But they look the same.

So now you have a process by which you first pick a coin out of my hand.

And then you toss it twice. So this is which coin you pick.

This is the two tosses. Now, let's think about dependence and

independence in this example. If I.

Don't, if you don't know which coin you picked, and you tossed the coin and it

comes out heads. What happens to the probability of heads

in the second toss? Be higher.

Right? Because if it came up heads the first

time, that is more likely to happen. I mean it happens 50, 50 with a fair

coin, but it also happens that it happens with greater probability with a biased

coin and so the probability of having heads in the second toss is going to be

higher now. On the other hand, if I now tell you, no,

no, you've picked the fair coin, if there wasn't really, you don't really care what

the outcome of the first toss is. It doesn't tell you anything about the

probability of the second toss. Similarly, if I tell you that it's the

bias coin. It also doesn't tell you anything at that

point. The first toss and the second toss are no

longer correlated. And so what we have is that x1.

An X two are not independent. So P does not satisfy.

X one is independent of X2. But we have that P does satisfy.

10:13

X one is independent of x two given c. So here's a very simple and intuitive

example of contuitive independence. Let's go back to con, another example of

conditional independence, one in the distribution that we've seen before.

This is actually a very analogous, model, because it also has, in this case, one

common cause, which is this case, it's the student's intelligence.

This is in the student example that we've seen before.

There are two things that emanate from that, the student's grade in the course

and their SAT scores, and, and once again, it's, you can generate the,

The joint distribution I S. G, which is this, and now you can look at

the probability of S and G given, for example, I zero, and ask yourselves how

does that how does that decompose and is that independent given when we look at

the probability of S given I zero and. In the probability of G minus zero.

11:35

Now one, somewhat counter intuitive property is independent instead you kind

of don't think about when you ish, hear about independent, about conditional

in-dependencies the first time. Is of conditioning, on some things,

doesn't just gain you in-dependency as, as it did in the case of the coin, or as

the case of the intelligence. But rather, condition can also lose

independency. So this is the the other fragment of our

student network, where we had the. Intelligence and the difficulty both

influencing the grade, and we have already seen that although I and d are

independent in the original distribution, they are not independent when we

condition on grade. So this is a case where and you can just

convince yourselves of this by examining this distribution over here, that I and d

are not independent in this conditional distribution even though they were in the

marginal distribution.