Shared structures and shared parameters comes up in directed models.

But it comes up as much or perhaps even more, in undirected models.

And that is because for reasons that we've already discussed and will continue

to discuss. eliciting parameters in an undirected

model is much more difficult because they don't represent conditional probabilities

or probabilities. And so, it's a lot easier to represent

them as sort of templates that are comprised of smaller building blocks.

So, let's talk about log-linear models and how, why and how we might share

parameters in log-linear models, and it turns out to be quite easy once you think

about life in terms of log-linear models. So, let's go to the one of the simplistic

examples that we've discussed such as ising models.

So, ising models, as you recall, is a model that was first invented in the

context of statistical physics where we have a bunch of binary random variables

each, which in this case in the original application represent the spin of atoms

in a certain, grid, so here is one, here's a spin going one way and here's a

spin going, the, or the pink one is a spin going in the opposite direction and

we have correlations between the spins of the J, of, of the electrons in adjacent

atoms. And what we had as our energy model in an

ising model is a sum over all of the edges in the ising model.

So, in this example, the edges are organized in a grid of the weight that

represents the connection of the spin of the two adjacent atoms times this feature

over here. So, f xixj which is the product of xi and

xj. And just as a reminder, xi are all in the

inter, are all either -1 or +1. So, the product is always either -1 or

+1, -1 if the two have different spin and +1 if they have the same spin.

Now, this doesn't have any shared parameters because we have a separate

parameter, wi, j, for every pair of atoms, i and j.

But clearly, we're not going to have, in general, a separate model for every for

every number of atoms in the material that we're trying to model.

So, usually, what's going to happen, is that we're going to just have a single

parameter w and that represents the interconnectivity between or the, the

extent of the influence, extent and type of the influence of two adjacent atoms on

the on each other. And so, this is now a model that has the

same feature and the same weight. So, same feature and same weight we used

across different sets of, different pairs of random variables in the model,

okay? in the natural language processing

examples that we have used previously, so we have

again parameter sharing where we have in this case, these are the, just as a

reminder, we have these being the actual words,

these being the labels for the words. this is the name that's used at the

recognition problem. But the same things happens in the other

examples. So here, we have we talked previously

about the kinds of features that represent that, that relate, for example,

a label of a word and the word itself. For example, if it's capitalized, it's

more likely to be a person. If the previous word is misses, it's more

likely to be a person, too. So

and so here again, we have features, which are the ones that we just talked

about. Feature of, for example, is Yi a person

and Xi is a [UNKNOWN]. And we would have a term that uses this

feature repeated for every position in the sequence.

That is the same feature which talks about the connection between the label of

the word and the word itself, would recur in every position in the sequence,

because you wouldn't want that same to, to represent the different

parameterization for different positions of the sequence.

And similarly we have the same energy terms for example that relate adjacent

words also being repeated across different pairs, in this case, of

adjacent words. Image segmentation in in this example,

once again, we have node potentials that represent in this case, the connection

between the image features and the label. And once again, we're going to use the

exact same potentials for all pixels and we're going to have a different, in this

case, set of potentials that are going to be reused for all adjacent pairs of

pixels or superpixels. So, how do we do this?

It turns out to be really simple. What we basically need to specify for a

given feature that we want to reuse, we're going to specify for it a set of

scopes, that is the set of scopes to which we would like it to be applied.

And so, if we have a set of scopes and let's assume the Dk is one of the scopes

that, to which we would like to apply the feature k.

So, our energy function now is going to include a term wkfk(Dk) in the energy

function. So, for example, if fk is,

we have we want to have fkb applied to adjacent two adjacent superpixels in the

image. So, the scopes of a feature fk will be

the label, YiYj, such that i,j are adjacent.