0:00

The belief propagation algorithm when run over a general cluster graph is an

Â iterative algorithm in which messages are defined in terms of previous messages.

Â And we said before that when run over a general cluster graph the algorithm might

Â not converge, and that even if it converges it might not get to the right

Â answers. What we're going to talk about now are

Â how bad these problems are and what are some of the tricks that we use in

Â practice in order to improve the behavior of this algorithm in these two different

Â regards. So let's get some intuition by looking

Â back at our misconception example. This is not actually a mis-conception

Â example. Its one where we modify some the

Â specifically, the potential 51AB to make the behavior even more extreme by making

Â these potentials larger so that there is a stronger push to agreement between A

Â and B. So A and B, really, really prefer to

Â agreement. And what happens in this example?

Â we can see this beautiful oscillatory behavior of the belief propagation

Â algorithm, so the X axis is iterations of BP, and this is some notion of distance

Â 1:20

And why do you have such strong oscillations?

Â Let's look at what these potentials are doing.

Â So this potential over here pushes A and B to agreement.

Â And so as you pass this message A and B one to agree, A and B also want to agree.

Â B and C want to agree, but C and B really want to disagree.

Â And so, as messages are passed, they're, you're getting conflicting messages from

Â both sides. D, for example, on the C side,

Â C keeps pulling at it to agree with it, sorry disagree with it.

Â On the other hand, when you go all the way around the loop.

Â C is actually urging D to agree with it. And so, from the one side you're getting

Â a pull for C and D to be the same, and from the other side you're getting a pull

Â for them to be different. And this sort of this conflict over the

Â cycle over the loop causes this oscillatory behavior as messages are

Â passed from one side or the other. This type of configuration of

Â specifically. height loops,

Â 2:50

strong potentials, and conflicting directions is probably the single worst

Â example for the single worst scenario for belief propagation.

Â This is the case where it's going to do the worst in terms of both convergence

Â and in terms of the accuracy of the results that are obtained.

Â Now in this example, ultimately you actually do get convergence.

Â You can see it, takes about 300 iterations but ultimately you get

Â convergence, 300 is a lot for graph this size.

Â Here is one that up to 500 you didn't get convergence at all, maybe if you ran it

Â for 10,000 it might converge, but it's hard to say.

Â And, so how do we improve the convergence as well as the accuracy property of the

Â network? So first let's look at what not to do.

Â Here is the most important thing not to do.

Â Which is a variance of belief propagation called synchronous belief propagation.

Â In synchronous belief propagation, which was actually one of the most commonly

Â used variants at the very beginning of of the use of, of the BP algorithm.

Â All messages are updated in parallel. So all of the processors wake up.

Â So, all of the cle, all the clusters wake up.

Â They look at all of their incoming messages.

Â And they compute all of the outgoing messages, all at once.

Â Now, this is a great algorithm, from the perspective of simplicity of

Â implementation, and also from the ability to parallelize, because you could assign

Â a processor to each of the cluster, and they're all working in parallel and

Â there's no dependencies between them. Unfortunately, synchronous belief

Â propagation is actually not a very good algorithm.

Â And so what you see here is the number of messages that are have converged as a

Â function of the amount of time spent. This is an icing grid with certain

Â parameters, it doesn't matter. And you can see that you know, there is a

Â certain improvement over time and then it kind of asymptotes, at the certain number

Â of messages that have converged. By contrast, let's compare that behavior

Â to asynchronous belief propagation, where messages are updated one at a time.

Â So, this one and that one. Now notice that this algorithm is poorly

Â specified because I didn't tell you what order we should do the updates in and

Â we'll come back to that in a minute. But by, even by the simple virtue of

Â passing messages in an asynchronous way, the behavior improves both in terms of

Â how quickly we get messages to converge, and in terms of the number of messages

Â that are converged. Now, this is not a particularly good

Â message passing order. Here's a much better message passing

Â order. It takes a little bit longer to get

Â certain messages to converge. It's trying to make sure that everything

Â is converging. But notice that eventually and in, in not

Â that long of a time it actually converges to 100% convergence rate,

Â okay? So, here's some important observation

Â regarding this. First of all, convergence is a local

Â property and belief propagation. Some messages converge quite quickly,

Â others might never converge. And when you have some messages that

Â after you run the algorithm for certain amount of time, don't converge, one often

Â simply just stop the algorithm and says you know, this is what we have and we're

Â done. Synchronous belief propagation has

Â considerably worse convergence properties than asynchronous belief propagation.

Â Which is why pretty much, at this point very few people actually ever use

Â synchronous belief propagation. And, if we're using asynchronous belief

Â propagation, the order in which messages are passed makes a difference both to the

Â rate of convergence, but even more surprisingly perhaps, to the actual

Â extent to which messages are converged ever converged.

Â And so how do we pick the message passing order?

Â There's two there's several different scheduling algorithms.

Â I'm going to present two of the more popular ones.

Â The first called, tree reparametrization or TRP, for short.

Â And what it does is, it picks a tree, and then passes messages in that tree in the

Â same way that you would do in a creek tree algorithm.

Â To sort of calibrate that tree keeping all other messages fixed.

Â So for example, we might start by calibrating the red tree,

Â which means I pass messages this way and then back in the other direction,

Â 9:27

That's one message scheduling algorithm. The second message scheduling algorithm

Â is something that's called the dual belief propagation and there's actually

Â several variance of that now. And what it does is it tries to look for

Â good messages, messages that are high value messages.

Â So it looks for two clusters whose beliefs disagree a lot.

Â And if they disagree that means that if I pass that message, that's going to

Â potentially have a large impact on the receiving clique cluster rather.

Â And so I'm going to, look I'm going to keep a priority queue messa- of, of

Â edges. And I'm going to pick the messages from

Â the top of the queue, based on how much I think they're going to effect, how much

Â of an effect they're going to have. Which is some kind of huberistic motion,

Â potentially of a disagreement between the two adjacent clusters.

Â 10:35

The other big important Trick that people use to fool the

Â convergence of the belief propagation algorithm is what's called smoothing or

Â sometimes damping of messages. And this is a general trick that's often

Â used to reduce oscillations in dynamic systems that are based on these kinds of

Â fixed point equations where the left hand side is defined in terms of the right

Â hand side. So here we define delta IJ in the

Â original belief propagation algorithm as a function of other deltas.

Â And we've seen that can give rise to oscillatory behavior.

Â This is just the standard belief propagation message.

Â And what we're going to do now is instead of that I'm going to do a kind of smooth

Â version. So I'm not going to let the message

Â change too drastically. I'm going to have a weighted average

Â where the weight is lambda of the new message,

Â which is this thing and the old message. And that turns out to again damping

Â oscillations in the system and increase the chances of it converging.

Â 11:51

So let's look at some example behaviors so here we see

Â synchronous belief propagation in red, over here.

Â And this is the percentage or fraction of message converged.

Â So, one would be perfect convergence. We can see that synchronous plateaus at

Â about 25, 20 to 25% of messages converged.

Â Which is pathetic enough to be pretty much useless in practice.

Â If only 20% of your messages converged. Then really the remaining messages don't

Â really mean very much. without smoothing is the green line, this

Â one. And we can see that this is asynchronous,

Â asynchronous without smoothing. We can see that, even without smoothing

Â the asynchronous algorithm achieves considerably better performance than the

Â synchronous algorithm. And it even shows synchronous without

Â smoothing, because it would be even worse.

Â And then finally, the blue line is asynchronous with smoothing.

Â 13:03

And, you can look at behaviors of individual messages, or individual

Â marginals, rather, individual beliefs. So here, this is still an icing grid, by

Â the way. So icing grid,

Â which is sort of standard set up for trying out convergence sort of, of

Â various approximate inference algorithm. And so what you see here, the line here,

Â the black line is the proof marginal.

Â And you can see that synchronous belief propagation basically flip flops around

Â it in, indefinitely, whereas asynchronous belief propagation converges pretty

Â quickly and very close to the right answers.

Â Not always, this is a different variable in the same

Â model. The black line, again, is the true.

Â And here, we can see that, sure enough, asynchronous does improve convergence.

Â But convergence is still to an answer that's not quite right.

Â And you see that behavior you see both behaviors in real networks.

Â And, in practice depending on all of the, all of the factors that I mentioned at

Â the very beginning. How tight the loops are.

Â How strong or peaked the potentials. those are going to determine, how many of

Â the factors have this oscillatory behavior, and how wrong the answers are.

Â 14:37

So as we, so just summarizing there's two main tricks for achieving BP convergence

Â damping or smoothing and intelligent message passing.

Â convergence is often easier to obtain using these tricks than correctness and

Â convergence unfortunately doesn't guarantee correctness.

Â And the bad cases, the ones that negatively impact both of these are

Â strong potentials that are pulling you in different directions and tight loops.

Â And we're not going to have time to go into all of these into all of these

Â topics, but there's some new algorithms that actually have considerably better

Â behavior both in terms of convergence and in terms of the accuracy of the results.

Â And that is based on a view of inference as optimizing some distance to the true

Â distribution. And there's some really cool ideas as

Â well as math in this but much of this is outside the scope of this class.

Â