Now, one important component,

one important benefit of probabilistic graphical modules and their modularity,

is that they allow us to mix and match different ideas within the modeling

framework. And the reason why that's important is

because it gives us a much greater richness of options than by considering

individual topics that we, that we learned as a solution in and of itself.

We can often combine those in interesting ways.

So to see one example of that let's go back to our image segmentation that we

just talked about and think about models that actually misdirected and undirected

edges. So here, we might, for example if we're

trying to learn image segmentation for unsupervised data. We might have

undirected edges over the labels S, as an MRF, just as we see in this diagram,

which is the natural directionality or rather the lack of directionality that we

discussed before. But we have directed edges from the S's

to the features XI. And this is a model that is half undirected, it's undirected

at this level and it's directed going from the S's to the X's.

Now why is that good? It's good because, first, it gives us a

meaningful optimization objective, that is we're trying to optimize using our

model parameters and our model structure, the ability to explain the images that

we've seen using a using the segment characteristics or the class label

characteristics. But why is it,

so in that respect it's a generative model that by forcing us to in some sense

generate the observed data can allow us to learn something about the statistics

or the patterns in, in the distribution. But by allowing, by making this model

directed as opposed to undirected which we could have done by doing an MRF over

the entire set which would also have been a generative model, it actually greatly

facilitates the learning because each of these potentials, P of XI given SI can be

learned separately without trying to optomize over the model as a whole and so

is considerably more efficient. So this is the case where mix and match

is, a, a better strategy than a unified model that is entirely directed or

entirely undirected. Another type of mix and match can occur

on the inference side. We learned about a variety of different

inference algorithms, and we might be tempted to just take a model and just

throw it into a black box of one of those inference algorithms or another.

But it turns out to be much more beneficial in certain cases, to consider

applying different inference algorithms to different pieces of the model.

So for example we might use something like belief propagation or Markhoff chain

Monte Carlo on some subset of variables, but on certain sets of variables we can

do exact inference which gives rise to much more accurate results over those

subsets and perhaps makes the convergence properties of our algorithms considerably

better as well as the accuracy that we get.

So, let's take an example, this is an bipartite gr-, MRF that we have seen

before where we have a set of variable of A on the one side and B on the other as

we have seen before and let's assume that there's

fairly dense connectivity between the variables and maybe even full

connectivity between the a's and the b's. Now, let's think which algorithm we might

want to use for this kind of a model. If there is a very small set of As or a

very small set of Bs, then we can use exact inference, but that's not usually

the case, and so we might have to resort to a proximate inference.

One possibility is to apply belief propagation, but if you look at this

graph, you can see that it has a very large number of fairly tight loops.

And we know the tight loops are a problem for belief propagation and so we might be

worried about the quality of the convergence and the quality of the

answers that we get. Another approach is to use something like

the Markov chain Monte Carlos such as Gibb sampling.

but in this case if we have a lot of A's and a lot of B's then we're sampling over

a large very high dimensional space. And we know that sampling methods in high

dimensional methods have, again, limitations in terms of the rate of

convergence and the quality of the answers.

So one possibility that we might consider here is a variant of of Markov chain

Monte Carlo where we only sample for example only, over the A's, say there's

fewer A's than B's. So we sample over the A's and then for

each assignment over the A's, so for each little assignment, A1.