0:00

One of the reasons for the increasing popularity of map inference is that there

Â is a large subclass of map problems that actually allow tractable inference.

Â Now we've already see one class of map problems that allows tractable inference,

Â which is the graphical models that have low tree width, for which you can apply

Â variable elimination or clique trees. But it turns out that besides that,

Â there's actually a rich set of other sub-classes that can be solved using

Â efficient algorithms, often based on ideas from computer science theory.

Â So let's look at some of these examples and see how we might go about solving

Â them. Once such important class of problems is

Â that of correspondence, or data association problems.

Â Here we have two classes of objects. In this case the red object on the one

Â side, and the blue objects in the other. And we'ld like to find a correspondence,

Â or matching between them. And in fact it's the matching means that

Â each that, that the assignment has to be one to one.

Â So you can't have a single red dot match more than one blue dot, or vice-versa.

Â So here for example, is one. Such matching.

Â And we may additionally require that all the red dots be matched or all the blue

Â dots be matched, which can't be done in this case.

Â So what is the quality of a matching given the the set of possible matching,

Â so how do we select between them. So we have in the model a notion of the

Â quality of the match between I and J which we donate in this case as the

Â parameter theta I J. And so our goal in this in this problem

Â is to find the highest scoring matching where the score is defined as the sum of.

Â The matches multiplied by the quality of their score, and we constrain the

Â variables X I J, these binary variables that indicate whether I is mapped to J to

Â via the mutual exclusion constraint. That we discussed earlier.

Â Now, this type of problem is easily solved by using bipartite matching

Â problems developed in computer science theory.

Â And, it turns out that, in the context of graphical models.

Â or probabilistic inference. This type of problem has numerous

Â applications. So, for example, we might have, on the

Â one hand, say, the the red side. A set of sensor readings from a number

Â of, say, airplanes, that we're trying to keep track of.

Â And on the blue side, we might have a set, a set of airplanes or objects that

Â we've previously detected. And we'd like to figure out which sensor

Â reading corresponds to which object. Or we might have features in two

Â unrelated images. And we'd like to figure out, which

Â feature. Image one matches which feature in image

Â two. Or if we're trying to do some kind of

Â text analysis we might have a set of entities.

Â Say people or organizations that we're trying to reason about, or learn

Â something about. And we're trying to match mentions in the

Â text to those entities. Subjects in the constraint that each

Â mention correspond to exactly one entity. So lets look at one example application

Â of this which corresponds to one of the examples that we showed on the

Â [INAUDIBLE], mentioned on the previous slide which is that of 3D cell

Â reconstruction. In this case we have over here a cell

Â that we are trying to assess with 3D structure and we are looking at various

Â slices through that cell taken from different angles via microscope.

Â And so these are various two dimensional projections of the 3D object and we are

Â trying to figure out from that three dimensional structure.

Â Now within the cell we might have these little tiny gold beads that were put into

Â the cell so that we can somehow register the different images of the cell to each

Â other. So, here these are examples of two such

Â real images and these are the little gold beads you can see them in the little

Â circles and what we would like to do is we would like to.

Â Decide that this beat over here is the same as that beat over there.

Â And once we have that correspondence, we can then compute the 3D reconstruction

Â using some volumetric reconstruction algorithms.

Â And in this case the matching weights; those data IJ's that we saw on the

Â previous slide, cor, represent the similarity between this position.

Â I mean this dot and this dot. In terms of the location, in the image.

Â And in terms of the neighborhood appearance in in the vicinity of that

Â point. A very different application for what is

Â effectively the same kind of idea is if we have a 3-D mesh scan of a, of an

Â object such as a human being. And we'd like to figure out how points on

Â this mesh match up to points on a somewhat related but different mesh.

Â Whether it's different pose or different shape.

Â So here we would like to realize that this point over here maps to this point

Â over here. Or that this point on this mesh maps to

Â this point on this mesh. And once again these matching weights,

Â the Vetta IJ represent exactly the similarity of both location and, again

Â the local neighborhood appearance. Another very commonly used class of

Â retractable math sub problems are those that involve what are called associative

Â potential. Associative also called regular in some

Â cases or attractive are cases where the variables that are connected to each

Â other like to take the same values. So let's consider this example on the

Â context of binary value tangen variables where we have a variable XI.

Â And the variable XJ that are connected to each other and we have the on diagonal

Â potentials of the zero, zero and the one, one which

Â are the cases where XI and XJ take the same value.

Â And we have the off diagonal potentials zero one, one zero where the variables

Â take on different values and what we would like in an associative model is for

Â the zero, zero and the one, one cases to be preferred over the off diagonal cases.

Â And that can be formulated using the following constraint in which we have

Â that A plus D which are the on diagonal entries are in sum greater than or equal

Â to B plus C. So that in aggregate we prefer the on

Â diagonal to the off diagonal entries. It turns out that.

Â If you have even an arbitrary network in terms of fully dense clinic tivity over

Â binary variables, that use only the Singleton potentials, and, these,

Â pairwise potentials that are associative. They're also called super modular, which

Â is a term that comes from mathematics. if the pairwise potential satisfy this

Â constraint that we saw over here, then you can find an exact solution regardless

Â of the network connectivity. And that can be done using algorithms

Â from communitorial optimization from computer science theory for finding

Â minimum cuts in graphs. For non-binary valued random variables

Â exact solutions are no longer possible, but it turns out that very simple

Â heuristics can get very, very high quality answers very efficiently.

Â and so many related variance that are not necessarily binary admit either exact or

Â approximate solutions and specifically we talked before about the class of metric

Â MRFs that occur a lot in computer vision as well as other applications.

Â And it turns out that for metric MRFs even over non-binary variables you can

Â you can use these very efficient algorithms.

Â So, one such example is in the context of depth reconstruction.

Â Where we have two views of the same object taken from slight disparity, say,

Â using in a, using a stereo camera. So here's view one and view two in the

Â same scene. You can see that they're slightly

Â different from each other and and if we can and we now have for a given

Â we're trying to infer the following image that tells us for each pixel in the image

Â what is the actual depth. That is the Z dimension.

Â And one of the constraints that we like to enforce is that because of smoothness

Â in the image, the z value for a pixel is similar to the z value of an adjacent

Â pixel. And that allows us to clean up a lot of

Â noise that might arise if we just try to infer the depth of individual pixels by

Â themselves. And there are many other such examples in

Â computer vision that are used quite commonly and use this kind of associative

Â potentials. Examples include, for example, denoising

Â of images infilling of missing regions, and a whole variety of other tasks,

Â including foreground, background segmentation, and so on.

Â . Another kind of factor that admits

Â efficient solutions is a cardinality factor.

Â And this is a factor that in general can involve arbitrarily many binary variables

Â X1 up to XK. But we're going to require the, that it's

Â sparsely representable in the sense that the score for an assignment that X1 up to

Â XK is a function of how many XI's are turned on.

Â So to take an example, here is a factor like this over A B C and D and you can

Â see that we have only five different values for entries in this factor.

Â We have this pink entry over here when the sum of the X Is, is zero.

Â We have this blue entry over here that occurs when the sum of the entries is

Â one. Now for these four entries you have the

Â purple one that occurs when we have two XI's on the green one occurs when we have

Â three and the dark blue a bit occurs when we have four.

Â Okay so there is only five different possible values here even though there's

Â potentially two to four different combination's.

Â And it turns out, that this actually occurs in, quite a number of different

Â application, when we talked about, parody constraints for example, in message

Â decoding, parody constraints are, something that only looks at the number

Â of XI's that are on, if you want to have digital image segmentation and you want

Â to have some kind of prior on the number of pixels in a given category, you want

Â to say that, in this image we expect the number of pixels, to be between you know,

Â that are labeled sky, to be between 30 and 70 percent.

Â This is potentially a factor over all pixels in the image.

Â But it's, but it only counts the total number of pixels.

Â And so again, it turns out to be efficient.

Â And there's many other examples as well. A different one that also turns out to be

Â very useful in practice is the notion of sparse sparse patterns.

Â So here again you could have an arbitrary number of variables X1 of the XK, but you

Â only specify score for some small number of assignments.

Â So here for example is a factor again over a, b, c and d and I have only now

Â three factor, only three entries in that factor that gets some values, you don't

Â have to get the same value, but only they get some value and all of the one's that

Â are white are all have exactly the same score.

Â Generally zero. And.

Â This is actually surprisingly useful in practice because it's very useful to give

Â a higher score to combinations that occur in real data.

Â So for example if we were doing a spelling problem you want to give a

Â higher probability to whatever letter combinations occur in a dictionary.

Â And that's a finite number, it's bounded by the size of the dictionary.

Â And now you have all the combinatorially many letter combinations that don't occur

Â in the dictionary. You don't have to give them all a

Â separate weight, and now again it turns out to be a sparse factor.

Â The same thing actually occurs in an image analysis problem where, for

Â example, you can look at all the five by five image patches that actually occur in

Â natural images; that's actually a surprisingly small set of patches

Â relative to all possible 25 pixel combinations.

Â And so, once again, this is a very commonly used factor over.

Â [INAUDIBLE] that arises on practice. a different factor that has tractable

Â properties is a convexity factor. Here we have [INAUDIBLE].

Â Go on up to XK that are denoted by these blue dots over here.

Â And and, but now they're ordered. So this is X1, X2 and XK.

Â And the convexity constraint says you can assign these values, these variables one

Â or zero. But you have to assign them in a way

Â that. Only a convex set is assigned the value

Â one. So, for example, you can have this set be

Â one, and everything else be zero. Or you can have this set be one, and

Â everything else be zero. But you can't sort of have a scattered

Â set of ones that are interspersed with the zeros.

Â And again, this occurs in a surprising number of applications.

Â You can look at parts in an image segmentation problem and say that the

Â parts should be convex. You can think about word labeling in text

Â and say that the words labeled with a certain part of speech or a certain

Â a certain segment that corresponds to a part in the parse tree, for example,

Â needs to be contiguous in the sequence if you're trying to label a video, with a

Â bunch of sub activities, you can say that each sub activity needs to have a

Â coherent span with a beginning and an end for example.

Â So to summarize, there is many specialized models that admit a map,

Â solution, using tractable algorithms. And, a surprising number of those do not

Â actually have tractable algorithms for computing marginals.

Â And these specialized models are useful in many cases on their own like the

Â matching problem that I showed. but also, as we'll see in a min, as we'll

Â see as a component, in a larger model with other types of factors.

Â And the fact that this, as a sub-model, has a tractable solution, will turn out

Â to be very useful as well.

Â