0:00

An alternative class of algorithms to the variable elimination algorithms is the

Â class of Message Passing algorithms. And, as we'll see, this is, a class that

Â is in some ways closely related to variable elimination.

Â It also offers us additional flexibility in how we do a summation and, factor

Â product steps so as to potentially come up with a lower complexity than would be

Â required by even the minimal elimination ordering.

Â So lets consider simple mark of network of the following type and lets assume

Â that I don't want to go to the cost of eliminating variables although in this

Â network its not going to be to expensive. And instead what were going to do is

Â we're going to construct what we call a cluster graph.

Â A cluster graph is something where is what, is a data structure in which we're

Â going to take little bits of knowledge from this graphical model and we're going

Â to place them in things called clusters. So here, we have four clusters in this

Â example. Cluster one is a cluster who's

Â jurisdiction of influence is the, the pair of variables AB.

Â Cluster two has jurisdiction over B and C.

Â Three is over C and D. And four is over A and B.

Â And these clusters are going to sit there, and they're going to talk to each

Â other, and they're going to try and convince each other that what they think

Â about a variable that they both consider to be under their jurisdiction is, is

Â correct. So for example cluster one is going to

Â talk to cluster two about the variable B, and it's going to tell cluster two what

Â it thinks about B, so that cluster two might become more informed about the

Â distribution over B. And so what we're going to do is

Â initially each cluster is going to have it's own little piece of information so

Â phi one is going to go here, phi two is going to go there, phi three goes there

Â and phi four goes there. And now the variables are going to

Â communicate by via these things called messages.

Â So we're going to call, we're going to slightly rename things.

Â We're going to call the variable, the initial set of beliefs, if you will or

Â evidence that a factor, that a cluster has over the variables in this

Â jurisdiction. We're going to call those psis.

Â In the example, we just had psis where just the psi in the original model, but

Â as we'll see sometimes it can becomes a little bit more complicated than that.

Â So now factor, the cluster one has the factor Phi one.

Â And if cluster two has psi two and so on, and now let's imagine that Si one wants

Â to send a message, oh sorry, this psi two, the cluster two wants to send a

Â message to cluster one. So it has to figure out, what it believes

Â and our, our priority we're going to assume that it's just going to, we're

Â going to start out by initialising with a totally uninformed message.

Â So, because initially they haven't even started talking to each other.

Â So all messages are initialized to be one.

Â But now cluster two can come back and say, okay I'm going to ping the

Â information, uninformative as it was that I got from cluster two, and notice that I

Â call this message delta two one. Delta two being the from and one being the two

Â and so if taken delta two, one and now factor one eh cluster one is going to say

Â well I'm going to take that I'm going to multiply it with my current thoughts

Â about the variables AB and that's going to give me a more informed factor.

Â And now, I'm going to communicate that information to factor four, to cluster

Â four. But cluster four doesn't care about a,

Â sorry cluster four doesn't care about b, and so what I'm going to communicate to

Â cluster four which is this message delta one four from one two four is the sum

Â over B, which is the variable that four doesn't want to hear about.

Â The incoming message. Times my initial beliefs, of my initial

Â factors. Now, this general process is what keeps

Â the message-passing algorithm going. So each variable.

Â Each cluster is going to send messages to its adjacent clusters that reflect this

Â exact same process. So for example, just take a different

Â different example here is delta three four.

Â So this is the message that goes from three to four.

Â And that message takes into consideration the evidence that two sent to three,

Â which is this guy over here, times whatever three thought about c d to begin

Â with. notice, and this is important, that the

Â message that three sends to four doesn't take into consideration the information

Â that it got from four. So there isn't in this expression over

Â here the contribution of delta four, three.

Â Because you want to have, you want to avoid this case of I repeat back to you a

Â rumor that you just told me and we all become more convinced about the truth of

Â this rumor, because oh, you, I, I thought about this first, but now you're

Â reinforcing me by telling it to me again and the beliefs are just going to go up

Â and up and up. And so what happens here is that we

Â deliberately only, restrict attention to evidence that.

Â Comes in from other sources. So three only uses evidence from two when

Â reporting to four, and it only uses conversely evidence from four.

Â One reporting to two. And so now this is, 'em, this defines a

Â set, a communication protocol by which one factor or one cluster rather in the

Â graph can communicate information, to its neighbors.

Â What do we do with this so how do we generalize this message

Â passing process? Let's construct a more general version of

Â this. so, this uses a data structure called a

Â cluster graph. A cluster graph is an un-directed graph,

Â whose modes are not variables anymore. Not as, not in, this is not a you know,

Â graphical model of the type that we've seen.

Â The modes in this un-directed graph are clusters that correspond to subsets, of

Â variables. Just like we had before.

Â 6:52

And we're going to connect to adjacent, to clusters c I and c j.

Â And, this, this thing called the subset is the variables that they choose to talk

Â about. And clearly each one can only talk about

Â variables that it knows about, which is why the subset SIJ has to be a subset of

Â both CI and CJ. So once again CI is the jurisdiction of

Â cluster I. These are the variables that it

Â understands. And SIJ is the communication between two

Â adjacent clusters in the cluster graph. So now given a sent of factors Phi we're

Â going to initialize the model by giving each, each cluster in the graph a certain

Â amount of information. So each of my initial clusters, each of

Â my initial factors Phi K in my graph is going to be assigned to one and only one

Â cluster. And this is important, there needs to be

Â at least one so that the information is taken in to account somewhere and it

Â shouldn't be more than one because if you give it to more than one pers, to more

Â than one cluster they're going to, that your going to double count the evidence,

Â they each going to think that independent piece of evidence and it's going to be

Â counted twice. Now where do we put the information

Â corresponding to factor K? We put this only in a fact, we can only

Â put it in a cluster that understands every single variable in that factor.

Â So if we have a factor who's scope has a certain, that has a certain scope that

Â scope better be a subset. Of the variables that the cluster

Â understands. Because otherwise we can't even talk

Â about those variables. So,

Â So once we've done that, if I can erase this, okay.

Â We can now define the initial, beliefs, of a particular cluster.

Â As, the product, of all, of the factors, that are assigned to it.

Â 9:21

Now some variables might, some clusters might be assigned one factor, in which

Â case psi is just equal to that phi. Some, that's, that's, that was the case

Â in the example that we just saw, some clusters might have several factors

Â assigned to them, in which case we need to multiply them to create a single

Â factor that is sort of the total informed beliefs of the cluster.

Â And some clusters might have no factors assigned to them, in which case this is a

Â null product and is equal to one. So now let's look at an example of a

Â somewhat more interesting cluster graph than the trivial one that we showed

Â earlier. Here we have a set of factors.

Â phi one of a b c, phi two of b c, phi three, phi four, five, six, and so on.

Â So initially we have to figure out for each of those factors a cluster in which

Â to put it. For a b c, there's really only one choice

Â because there's only one cluster in this entire graph that understands about all

Â of a b and c, and that's this cluster over here.

Â D C, however, has two choices, it can go into

Â cluster one or it can go to cluster two because both cluster one and cluster two

Â understand the about d and c. We're going to put it in cluster two.

Â I mean, we could've put it in either one. Both are fine.

Â Five, three has, again, two choices. It can go into cluster two, or it can go

Â into cluster three. We're going to go ahead and make the

Â decision to put it in cluster two. cluster five four goes, has only one

Â choice because only one cluster has both D and E in its scopes so it goes here.

Â Five five, similarly, only over here. B D.

Â Again, there is more than one choice. We can put it in, cluster two, or we can

Â put in cluster three. Let's, for simplicity, put in cluster

Â three. And BDF, only one choice.

Â This is one possible way of assigning the cluster, the factors to clusters.

Â There's other alternatives, as I said, that would work.

Â If we do this, we end up for example, with psi two.

Â Being the product of phi two times phi three.

Â 12:14

Okay. This was one cluster graph for those set

Â of factors. Here's another cluster graph.

Â So let's compare them. One, two, one, two.

Â For different, for for the exact same set of factors.

Â And notice that it's still, notice that in fact the clusters haven't changed.

Â What's changed is the edges and the subsets between them.

Â Different cluster graph. Okay.

Â So now, let's think about message passing in the context of this more richly

Â structured cluster graph, to see what it looks like here.

Â So here, for example, if we're interested in passing a message from cluster one to

Â cluster four. We are going to take sign one which is

Â the factor the set of factors that were initially signed to cluster four.

Â We're going to take in the message that this, cluster got from it's other

Â neighbor two. We are going to multiply them and then,

Â we are going to sum out all of the variables that one understands but two

Â doesn't. So, here for example, one understands, a

Â and c and two doesn't so we have to sum out over a and c and that is the message

Â delta one four. What about delta four one?

Â Delta four one is the message, goes in the other direction.

Â 15:06

Multiplies it with all of the other sums out the variables that cluster J doesn't

Â know about. So that everything that's in the scope of

Â CI, but not in the scope of CJ. And that gives us a factor over the

Â subset that is produced as a message. But putting that together that gives us

Â an algorithm which is generally called belief propagation for the reasons that

Â you, that clusters are propagating, if you will, informed beliefs to each other.

Â And here is the summary of the algorithm Each factor, phi, is first assigned to a

Â cluster. That is used to construct our initial

Â potentials. These PSIs that we talked about.

Â We initialize all of the messages before anybody starts communicating to B1 and

Â then we repeat the following process. We select some edge in the graph, between

Â adjacent clusters and we pass the message between cluster I and cluster J over that

Â edge, okay? And we repeatedly, and this is the

Â expression for that message it's this, the one that we saw in the previous slide

Â and that process is repeated again and again.

Â 16:58

Now there's several important aspects of this algorithm that are left undefined.

Â And that we're going to need to talk about later on.

Â The first of these is repeat until when. When do we decide that we're done and

Â that we can stop. And we'll talk about that in the context

Â of different variants of this algorithm later on.

Â The second thing that's left undefined, is how I select each of the edge, how do

Â I select the edges? Which edge do I select to pass messages

Â over? So here there is you know the obvious

Â simple things which is just to go in some pre specified order.

Â and that's one option, is round robin passing.

Â But it turns out that there are better strategies than that and we'll talk about

Â that too. So is this algorithm any good?

Â Well it turns out that you know, yes and no.

Â So first of all if you pass messages over a graph like the one that I showed before

Â like the little, a, b, c, d loop, eventually you see that we achieved

Â convergence, so this is, here we can see it eventually, we do converge the sum

Â probability but that probability is a little bit off.

Â 18:29

That is the answer is not exact. So, in general, with exceptions that we

Â will talk about this is an approximate. Algorithm.

Â Which shows that there's no free lunch, if the problem was n p hard, it's not

Â like I have an easy way of solving it. So given all these problems with the

Â belief propagation algorithm, what makes us think that it's an effective algorithm

Â to use? When the algorithm was rediscovered in

Â the 1990's, Murphy, Weiss and Jordan, as well as others, looked at its performance

Â in the context of the real networks. And specifically networks that have a lot

Â of loops, which is what causes the belief propagation algorithm to misbehave.

Â And so here is an example network, it's it's called the pyramid network, it's a

Â network that is analogous to one that arises in image analysis.

Â And what we, what they showed is that when you compute on the one hand the

Â exact marginals on the X axis and for different marginals in the network.

Â And on the Y axis, we see the marginals compute loop belief propagation.

Â We see that the marginals, by and large, sit almost exactly on a straight line,

Â with few exceptions. Showing that the loop belief propagation

Â is very close to accurate on this network.

Â Here's another network. This is a simplified version of a medical

Â diagnosis network. And once again, we can see that the, when

Â we compare the correct marginals on the x axis to the belief propagation marginals

Â on the y axis, the marginals sit almost exactly on this straight line,

Â which shows that again, they're very close.

Â So the belief propagation's very close to accurate.

Â So to summarize the belief propagation algorithm passes messages over a graph of

Â clusters that are connected to each other via subsets.

Â The adjacent clusters pass information to each other in these messages.

Â transmitting information only about the variables in the subset, which are the

Â ones that they have in common. The message that cluster i, sends to

Â cluster j summarizes everything that i knows, about, the variables in the

Â subset. Except, for information, that it, obtains

Â from j, so that one avoid, direct double counting, of the same piece of evidence

Â where j gets its own information back again via i.

Â We've seen that the algorithm may not converge and it can have this oscillatory

Â behavior. and that the resulting beliefs are

Â pseudo-marginals in that they are not necessarily the exact marginals from the

Â theoretical perspective. But nevertheless, as we've seen, the

Â algorithm actually performs quite well in a range of practical applications, which

Â is why it's quite commonly used.

Â