One of the most elegant properties of probabilistic graphical models is the

intricate connection between the factorization of the distribution as the

product of factors. And the independence properties that it

needs to satisfy. Now we're going to talk about how that

connection manifests in the context of a directed graphical models or Bayesian

networks. So, let's first remind ourselves about

why independence and factorization are related to each other.

So, for example, the independence definition that P of X, Y is the is the

product of two factors P of X and P of Y is the definition of independence.

And at the same time it's a factorization of the joint distribution as a product of

two factors. similarly one of the definitions that we

gave for conditional independence, which is the, the joint distribution over X, Y

and Z is a factor over X and Z times a factor over Y and Z, is the definition of

conditional independence. So, once again, independence of

factorization. So, we see that factorization of the

distribution corresponds to independencies that hold in that

distribution, and the question is if we have that if so if we know now that a

distribution P factorizes over G. The question is can we know something about

the independencies that the distribution P must satisfy, just by looking at the

structure of the graph G. [COUGH] So what are independencies that

need that might hold in a probabilistic graphical model.

So we talked about the notion of flow of influence in a probabilistic graphical

model where we have for example the notion of an active trail that goes

through as up through I, down through G and up through D, if for example we have

the G is observed, that is in Z. And that gave us an intuition about which

what problistic influence might flow. We can now turn this notion on its head

and ask the question well what happens when we tell you that there are no active

trails on the graph that is influence can't flow.

So we're going to make that notion formal using the notion of d-separation.

And we're going to say that X and Y are deseparated in a graph G given a set of

observation Z if there's a lack of trail between them.

And the intuition that would like to demonstrate in this context is that this

notion of influence can't flow corresponds much more formally to the

rigorous notion of conditional independence in the graph.

So let's actually prove that that's in fact the case.

So the theorem that we'd like to state that if P factorizes over the graph,

then P factorized is over G. And we have a deseparation property that

holds in the graph. So X and Y are deseparated in the graph.

There's no active trails between them. Then P satisfies these conditional

statements, X is independent of Y given Z.

So deseparation implies independence. If, the, probability distribution

factorizes over G. So, we're now going to prove the theorem

in its full glory. We're going to prove it by example

because that example really just, it really does illustrate the main points of

the derivation. So, the assumption is that here is our

graph G, and here is the factorization of the distribution.

So, according to the chain rule, of Bayesian networks.

And this is a factorization that we've seen before.

And so now we'd like to prove that a d-separation statement follows from this

from this deriva-, from this assumption. And, and the d-separation statement that

we'd like to prove follows as an independence.

Is one that says that d is independent of S.

First, let's convince ourselves that D and S are in fact d-separated in this

graph. And so we see that there is only one

possible trail between d and s in this graph.

It goes. That instance g is not observed in this

case, and neither is l. We have the, the trail is not activated

and so they, the two are, the two nodes are de-separated and so, we'd like to

prove that this independence holds. So what is the prob, the joint

probability distribution, of D and S? Well it's the sum over this join

distribution over here. Marginalizing out the three variables

that are now D and S so GL and I so we have the sum over GL and I of this big

product that's defined by the tin roll. And as we've previously seen when we have

a summation like that one of the things that we can do is we can push in

summations into the product so long as we don't push them through terms that

involve the variable. So here for example we have we might have

a summation over L. And we can push in that summation because

the only term that involves L is the probability of L given G.

So if we push in the summations in this way we end up with the following

expression and we see that we have this summation over L here, a summation over G

and then finally the summation over I at the very beginning.

And what do we know about the summation over L of P of L given G?

We know that this, that this is a probability distribution over L and

therefore. It's necessarily sums to one, because

we're summing up over all of the possible values of L.

Once this term is one, we can look at the next term, which is this one.

And we can ask ourselves, well one is, cancels out, so probability, what is the

sum over G of the probability of G given D and I, and that too is one.

And by the time we're done, we end up with the following expression.

And the important thing to notice about this expression is that this is phi one

of D and this is phi two, oops. I two of S, and therefore because we

partitioned this joint distribution as a product of two factors, we end up with

something that corresponds directly to the definition of marginal dependence,

therefore proving that P satisfies this independent statement.

Having convinced ourselves that de-separation is an interesting notion,

let's look at some other de-separation statements that might hold in a graph.

So, one general statement that we can show is that any node in the graph is

de-separated. From it's non-descendants given it's

parents. So let's look at the variable letter.

And let's see what non-descendants, non-descendants it has that are not it's

parents. So it only has two descendants let's mark

those in blue it's this one and that one and what are the non-descendants so these

are the descendants. What about the non-descendants?

Well, here's one non-descendant, here's another, another and another.

So, those are the four non-descendants in the graph that are not parents.

So, now let's convince ourselves that, letter is de-separated from all of it's

non-descendants given it's parents and let's take just an arbitrary one of

those, let's take S A T for example and let's see if we can find an active trail

from S A T to letter that is active given the parents, which, in this case, is just

a variable grade. Okay.

So what are, what's one potential trail? So we can go this way, up.

And then back down through water, but is that trail active?

No its not active because Greg is observed and so drop with blocks a trail

in fact it will block any trail that comes in from the top.

So any trail into letter that comes in through its parents grave is going to be

blocked by the fact that grade is observed.

So that means we have to come in through the bottom.

So let's look at that. So let's look at the trail from s a t,

through job. An up, again through letter, well is that

trail active, no. The reason it's not active is because

only great is observed, and job, since it's not, it can't be a parent of, our

letter, because it's a descendent, is necasserily not observed and so neither

job nor happy, are observed and so we, we can not activate, this V structure, over

here, can not be activated and so, trails that come in from the bottom, don't work

either. And so we again this is a proof by a

demonstration but it shows again that that.

It shows the general property in a very simple way.

So that tells us by the theorem that we proved on the previous slide that if P

factorizes over G, then we know that any variable is indepedent.

Of it's non-descendants given its parents, which is a kind of nice

intuition when you think about it. Because when we motivated the structure

of Bayesian networks we basically said that what makes us pick the, the parents

for a node is the set of, of variables that are the only ones that this variable

depends on. So the parents of are the variables on

which depends directly. And this gives a formal semantics to that

intuition. That is we now have the variable is

depending only on its parents. So, now that we've defined a set of

independencies that hold for any distribution that factorizes over a

graph, we're now going to define a general notion, which is just a term for

this. So we define d-separation in a graph G

and, and we showed that when you have the d-separation property, that P satisfies

the corresponding independence statement. And so now we can basically look at the

independencies that are implied by the graph G.

So I of g which is the set of indepedencies that are implicit in a

graph G are all of the independent statements X is independent of Y given Z

that correspond to d-seperation statements within the graph.

So this is the set of all indepedencies that are derived from d-seperation

statements and those are the ones that the graph implies hold for distribution P

as we just showed in the previous demo. So, now we can give this a name.

We going to say that P if P satisfies all the in-dependencies of G, then G is an

I-map of P, and I-map stands for in-dependency map.

And the reason its an independancy map is by looking at G it's a map of

independancy as a hole in p. That is it maps independancies that we

know to be true. So let's look at examples in imap, so

let's look at two two distributions here is one distribution p1.