0:00

We described the belief propagation algorithm as passing message over cluster

Â graph but we left unspecified how that cluster graph would be constructed and

Â what properties it needs to satisfy in order to support reasonable message

Â passing. So let's remind ourselves what cluster

Â graphs are. A cluster graph is an undirected graph

Â whose nodes are clusters that involve subsets of variables and edges involve a

Â subset SIJ, which is a subset of the two clusters on

Â the twin points. What are some properties that the cluster

Â graph has to satisfy? The first property is the one called

Â family preservation an obvious value. where remember that we need, given a set

Â of factors phi, we need to be able to assign each phi K to some cluster,

Â C alpha K, such that the fact, such that the cluster can accommodate.

Â The scope. So can accommodate 5K.

Â oops accommodate.. Well, in order for the cluster graph to

Â allow this to be done, we need to have an appropriate cluster in the cluster graph.

Â So, this imposes a constraint on the cluster graph that says that for every

Â factor phi K in my set of factors phi, there exists some cluster CI that, such

Â that CI accommodates phi K, which means that you can put phi k inside

Â this cluster. That's the family preservation property.

Â The second property is a little bit trickier to understand.

Â It's called the Running Intersection Property.

Â The Running Intersection Property, let's first read the definition and then

Â understand what it says. It says that let's assume that we have a

Â pair of clusters CI and CJ and a variable X that sub, that belongs to both of them.

Â So for example we might have the variable X sitting here in C7, and we might have

Â the variable X sitting here, in C5. This property says that there exists a,

Â exists a unique path between Ci and Cj for which all clusters and subsets along

Â the path contain x. What does that mean?

Â It means that for example should I choose this to be my unique path.

Â It means that X needs to sit here, here and here.

Â So, there is a connecting path that involves X and that path is along the

Â entire route and that path is unique. So let's try and understand the intuition

Â between both sides of this definition, the existence and the uniqueness.

Â Imagine that I, let's do the existence first.

Â And imagine that for whatever reason I decide that there is not going to be a

Â path over here. So there is no way for C7 to communicate

Â to C3 about the variable X. Well, that means that we now have these

Â two separate, isolated communities, each of which has some information about X and

Â they can never talk to each other about X.

Â So they're never going to get to agree about X, there's never going to be any

Â information transfer, of these two pieces of information.

Â Well, so that's not very good, which is why we need, that path to exist.

Â Okay, so that's the exists part. What about the unique part?

Â The unique part is a little bit trickier to understand but let's but let's try and

Â provide some intuition for it. Let's imagine that I have two paths that

Â involved X. Let's so now let's think about this

Â message passing algorithm. So C3 can send the message.

Â C5 with information about X. For example, I think X is taking the value one, I have

Â a, I have a factor that suggests that X takes the value 1.

Â Well, C5, you know, integrates that with it's own information and sends it on to

Â C2. Which sends it back to C3.

Â And now C3 hears from C2 that ooh, X needs to take the value one.

Â Huh? That reinforces its beliefs that X needs

Â to take the value one and so the probability goes up.

Â It now goes back and sends that information on to C5, which sends it on

Â to C2, which sends it back to C3, and then once again the probability in X

Â taking the value 1 goes up. And so we have this sort of

Â self-reinforcing loop that's going to give rise to very extreme and very skewed

Â probabilities in many examples. And so a way of avoiding that is to or at

Â least reducing that risk is to is to prevent these kinds of feedback loops.

Â Now, it's important to realize that by preventing these loops that only reduces

Â opposed to eliminates the problem. And specifically this is kind of a little

Â bit of a digression but it's important to know,

Â is that, imagine for example that we have X, and Y, that are very strongly

Â correlated. So here, we have for example X and Y and

Â here we have a path. That involves Y.

Â Okay? So now what's going to happen is that C3

Â sends information to C5 about X. C5, translates that to information about

Â Y, Y should take the value one. That information about Y goes back to C3

Â and increases the probability in X taking the value one.

Â Now this is a little bit of a forward looking hint about some of the issues

Â that we'll see with belief propagation later on, which is that belief

Â propagation does very poorly. When we have strong correlations.

Â 6:33

And that's because of these feedback loops.

Â The more skewed the probabilities in your graphical model, the harder time belief

Â propagation has in terms of the results that it gets.

Â So, with that digression, aside, let's go back to the running interception

Â property, and let's provide an alternative definition of the running

Â interception property, just to give ourselves some additional intuition.

Â The running interception property is equivalent to saying that for any X.

Â The set of clusters and subsets that contain X form a tree, so for example if

Â we have X here, here, here, here, here and here.

Â We can see that the set of a cluster is in sub sets that contain x form a tree.

Â It has to be connected, because of the existence of the path.

Â And it can't be, a non tree. Because, because that would give us two

Â different paths. And so that's a different view of this.

Â And so you can think of each variable inducing its own little tree.

Â across which information about that variable flows in the graph Now, let's go

Â back with that definition and consider some cluster graphs that we might adopt

Â for for this example that we've seen before.

Â So, here we have our five clusters so let's check whether it satisfies the

Â running interception property. We've already done family preservation

Â for a particular set of factors. So, let's consider for example the

Â variable, B, and we can see that we have B here, here, here, here,

Â here, here and here, and we can see that, that is a tree.

Â Here's my tree. Note, very carefully, that B isn't here.

Â If B were here that wouldn't satisfy the running intersection property.

Â That would be an illegal cluster graph, that's actually on the next, on a

Â subsequent slide. Here is an illegal cluster graph.

Â It violates the running intersection property Y.

Â Because here is B, and here is a bunch of other B's and there's no way to connect

Â cluster two to any of the others. So this one violates the existance.

Â 9:08

This one, which we just talked about, violates the uniqueness because here we

Â have the, all of the clusters and subsets that involve b, and there's this nice

Â little loop over here that has two paths between say, cluster one and cluster

Â four. If we wanted to take the cluster graph

Â and still allow communication between B and C, so we want to have, for example we

Â want cluster one and cluster two to be able to transmit to each other

Â information about how B and C are correlated with each other, which they

Â can't do in this graph over here. So, one way to do that is to we have, we

Â have eliminated in this case, this edge that we have her that involves B, and now

Â once again we have B. Being a tree in the graph now focused on

Â be here but its not difficult to convince yourself that other that other nodes also

Â satisfied that we satisfy the running intersection property also with respect

Â to other variables so just as an example here is a set of clusters and subsets

Â involved in D and too is a tree and here is the one's involving E and that too is

Â a tree and so on. So and running intersection property

Â needs to hold for every for every one of the variables.

Â How do we construct a cluster graph that has, that has a desired properties.

Â One very simple and in some ways degenerate but still because of its

Â simplicity very often used is a cluster graph called the beta cluster graph.

Â And that's a term from statistical physics where people use this kind of

Â approximation to energy functions in in some, in some calculations, in, in

Â Statistical Physics. So here, we have in the beta cluster

Â graph, we have two types of clusters. We have big clusters and little clusters.

Â The big clusters, these are the big clusters correspond, to factors in phi.

Â 12:09

So if we consider the set of factors that we had before, we can now produce a

Â different cluster graph then one that we had previously.

Â This is a beta cluster graph. Notice that we have these big clusters.

Â These are these four, I'm sorry, these five.

Â And we have these, little clusters corresponding to A, B, C, D, E, and f.

Â And we can see that we have an edge for example, between the ABC cluster and the

Â A cluster, the B cluster, and the C cluster.

Â And that's how messages are passed. And you can see how this graph is

Â degenerate in the sense that it only allows information about Singleton's to

Â be passed. It loses, in every message passing step,

Â any information about the correlation between variables.

Â But, nevertheless, it's simple to construct and it's guaranteed to satisfy

Â the running interception property. Why is that?

Â So let's consider for example, all of the factors that involve the variable, umm,

Â D. So here is, what are the clusters that

Â involve the variable D? Well, there's this one.

Â And then there's this, that, that. Those are the set-, where we have the

Â subsets, which I didn't mark. And then these ones.

Â And notice that it's a tree by definition.

Â Because d doesn't appear on any other subset, except these ones.

Â And so the trees is a star. It's the variable cluster plus the big

Â factors that contain the variable. So d, and, in this case, the three

Â clusters that contain it. Look to summarize.

Â We have, we know, we looked at the properties of the coaster graph, we see

Â there is two of them that it needs to satisfy, the first is family preservation

Â which allows the set of factors to be encoded and the second is the running

Â intersection property which serves two purposes, the first is to connect all

Â information of any single variables that it can all be transmitted through the

Â graph but without creating type feedback loops that are going to create the self

Â reinforcement and highly inaccurate answers.

Â And the beta cluster graph is often the first default that people use.

Â Because it's easy to define. And it's guaranteed to be correct.

Â But richer cluster graph structures of the type that we talked about can offer

Â very different and sometimes significantly better trade-offs.

Â With respect to, on the one hand, the computational cost.

Â And on the other hand. which of course, as you start increasing

Â the amount of, of the sizes of the messages that are passed.

Â that can grow more expensive. But at the same time, allow us to

Â improve. dip the preservation of dependencies as

Â messages are passed in the graph so that more information is actually kept and not

Â lost in this message passing process.

Â