0:00

As we said, there's many different kinds of queries that one can use a graphical

Â model to answer. Conditional probability queries are very

Â commonly used, but another commonly used type of inference is called MAP

Â Inference. So, what is MAP inference?

Â MAP stands as a shorthand for what's, for Maximum a Posteriori, and it's defined as

Â follows. Here, we have a set of evidence of

Â observations, e over some variables to E, and we have a query Y.

Â And it turns out that for computational reasons, that we're not going to discuss

Â for the moment it's important that the set of Y's is everything, all of the

Â variables other than E. [SOUND] So now, our task is to compute

Â what's called the MAP assignment, which is the Y, the silent little y, to the

Â variables y that maximizes the conditional probability of the variables

Â y given the evidence. Now so for those of you who haven't seen

Â the notation Argo Max. Argo max is the y that provides the

Â maximal value to this expression, over here.

Â Now, note that in some cases, this maximizing value might not be unique.

Â That is there might be several different assignments say, A1, Y1, and Y2 that give

Â the exact same probability. And so, the MAP is not necessarily a

Â unique assignment. This has many applications, some of which

Â we've discussed before. So, in the context of message decoding,

Â for example, where we have a set of noisy bits that are passed over the channel of

Â a noisy communication channel, what we often want to get is the most

Â likely message that was transmitted. That is, an assignment to the transmitted

Â bits that is the most likely given or evidence.

Â In the context of image segmentation, we would like to take the pixels and figure

Â out the most likely assignment of pixels to se, to semantic categories.

Â So both of these can be viewed as MAP assignment problems.

Â And an important thing to understand about MAP is that it really is a

Â different problem than conditional probability queries.

Â So to understand that, let's look at the following very simple

Â example over just two random variables. So here, we have a Bayesian network over

Â the variables A and B. And if you multiply the to CPDs, it's you

Â get the joint distribution shown over here.

Â And it's not and it's fairly immediate to see that the MAP assignment is this one,

Â because it has the highest probability. Can we get at the map assignment by

Â looking separately at the variable A and at the variable B?

Â So if we look at that, we see that if we look separately at the variable A, we

Â can, the most likely assignment to the variable A is A1.

Â As opposed to in the MAP assignment, the variable A took the value, A0.

Â And so, one can't look separately at the marginal over A and over B, and use that

Â to infer the MAP assignment. And the reason is that we're looking for

Â a single assignment over all of the variables that together has has the

Â highest probability. Unfortunately, just like the conditional

Â probability inference however, this problem too is empty heart.

Â So, again let's formalize what exact problem is empty heart in this context.

Â And it and so here is, again an, one example of an empty heart problem in the

Â context of MAP which is just define the joint assignment with the highest

Â probability. It's not the only NP-hard problem.

Â Here is another one. Figuring out what, for a given

Â probabilistic graphical model and some threshold little p,

Â whether there exists an assignment, little x, whose probability is greater

Â than p. That problem too NP-hard.

Â So should we give up. Well, just like in the context of

Â conditional probability queries, the answer is no. And there is algorithms

Â that can solve this problem very efficiently. In fact, in a even broader

Â set of problems then, for conditional probability queries.

Â So, let's again look deeper into this problem and understand the foundations of

Â what might make it tractable. So, going back to our example of a Bayesian

Â network, once again we're going to view CPDs as

Â factors. So here a P of C again translates into a

Â factor over C just like before. And whereas, in the case of conditional

Â probability queries, we want it to sum out some of the variables marginalize

Â them. Now, we're going to what to find the arg

Â max which is the assignment of these variables which maximizes the product.

Â And so, here we have a max of a product, which is why this is often called a max

Â product problem. Let's break down the max product problem.

Â So, imagine that we, in more general have, in the more general case have

Â probability over Y given E equals little e and re, let's remind ourselves that Y

Â is is the set of all variables other than the ones in E.

Â And so, by the definition of conditional probability, we have this ratio whose

Â where we're trying to find the maximal Y that maximizes this ratio.

Â And notice that the denominator is constant relative to Y.

Â [SOUND] Sorry yes, with respect to Y. Which means that for the purpose of

Â finding the maximum Y, we don't really care about the denominator only the

Â numerator which is the probability of Y, Ee = e, the unnormalized a normalized

Â measure. The prob, this numerator in the general

Â case is a product of factors normalized by the partition function, where the

Â factors here are the reduced factors, reduced relative to the evidence.

Â 7:44

We also have message-passing algorithms, which again are a direct analog to

Â algorithms for some product. And they give rise to a class of

Â algorithms called max-product belief propagation.

Â However, because the MAP problem is intrinsically an optimization problem,

Â one can also call in techniques that are specific to optimization.

Â And the class of such methods include methods that are based on integer

Â programming, which is a general class of optimization

Â over discreet spaces. It turns out that this class of methods

Â that build on integer programming techniques, is one of the most popular

Â methods in the last few years and has given rise to, a whole new range of math

Â algorithms that are considerably better than many of the previous algorithms

Â developed after that point, especially for the approximate case.

Â It also turns out that for certain types of networks that some of which we'll

Â discuss, there are specific algorithms that are very efficient for that

Â particular class of of graphical model. And one of those, perhaps the most

Â commonly used if not the only one, is methods based on class of algorithms

Â called graph cuts and. And,

Â and finally, because it's an optimization problem, one can also use standard search

Â techniques over communatorial search spaces,

Â and they're problems for which this is also a very useful and successful

Â solution. So, to summarize the MAP problem, aims to

Â find a single coherent assignment of highest probability.

Â And, that means that it is not the same as maximizing individual marginal

Â probabilities as we saw in the example. one can reformulate this problem as one

Â of finding the max over a factor product. And, this is a communitorial optimization

Â problem which admits a whole range of different solutions on which some are

Â exact and others approximate.

Â