Shared structures and shared parameters comes up in directed models. But it comes up as much or perhaps even more, in undirected models. And that is because for reasons that we've already discussed and will continue to discuss. eliciting parameters in an undirected model is much more difficult because they don't represent conditional probabilities or probabilities. And so, it's a lot easier to represent them as sort of templates that are comprised of smaller building blocks. So, let's talk about log-linear models and how, why and how we might share parameters in log-linear models, and it turns out to be quite easy once you think about life in terms of log-linear models. So, let's go to the one of the simplistic examples that we've discussed such as ising models. So, ising models, as you recall, is a model that was first invented in the context of statistical physics where we have a bunch of binary random variables each, which in this case in the original application represent the spin of atoms in a certain, grid, so here is one, here's a spin going one way and here's a spin going, the, or the pink one is a spin going in the opposite direction and we have correlations between the spins of the J, of, of the electrons in adjacent atoms. And what we had as our energy model in an ising model is a sum over all of the edges in the ising model. So, in this example, the edges are organized in a grid of the weight that represents the connection of the spin of the two adjacent atoms times this feature over here. So, f xixj which is the product of xi and xj. And just as a reminder, xi are all in the inter, are all either -1 or +1. So, the product is always either -1 or +1, -1 if the two have different spin and +1 if they have the same spin. Now, this doesn't have any shared parameters because we have a separate parameter, wi, j, for every pair of atoms, i and j. But clearly, we're not going to have, in general, a separate model for every for every number of atoms in the material that we're trying to model. So, usually, what's going to happen, is that we're going to just have a single parameter w and that represents the interconnectivity between or the, the extent of the influence, extent and type of the influence of two adjacent atoms on the on each other. And so, this is now a model that has the same feature and the same weight. So, same feature and same weight we used across different sets of, different pairs of random variables in the model, okay? in the natural language processing examples that we have used previously, so we have again parameter sharing where we have in this case, these are the, just as a reminder, we have these being the actual words, these being the labels for the words. this is the name that's used at the recognition problem. But the same things happens in the other examples. So here, we have we talked previously about the kinds of features that represent that, that relate, for example, a label of a word and the word itself. For example, if it's capitalized, it's more likely to be a person. If the previous word is misses, it's more likely to be a person, too. So and so here again, we have features, which are the ones that we just talked about. Feature of, for example, is Yi a person and Xi is a [UNKNOWN]. And we would have a term that uses this feature repeated for every position in the sequence. That is the same feature which talks about the connection between the label of the word and the word itself, would recur in every position in the sequence, because you wouldn't want that same to, to represent the different parameterization for different positions of the sequence. And similarly we have the same energy terms for example that relate adjacent words also being repeated across different pairs, in this case, of adjacent words. Image segmentation in in this example, once again, we have node potentials that represent in this case, the connection between the image features and the label. And once again, we're going to use the exact same potentials for all pixels and we're going to have a different, in this case, set of potentials that are going to be reused for all adjacent pairs of pixels or superpixels. So, how do we do this? It turns out to be really simple. What we basically need to specify for a given feature that we want to reuse, we're going to specify for it a set of scopes, that is the set of scopes to which we would like it to be applied. And so, if we have a set of scopes and let's assume the Dk is one of the scopes that, to which we would like to apply the feature k. So, our energy function now is going to include a term wkfk(Dk) in the energy function. So, for example, if fk is, we have we want to have fkb applied to adjacent two adjacent superpixels in the image. So, the scopes of a feature fk will be the label, YiYj, such that i,j are adjacent. And that's going to be the set of scopes to which we apply that feature. And every time that we have that feature applied, we're going to have a term that multiplies in the value of that feature times the weight, which is shared. The weight depends on the feature. So, wk depends on k, if not on the scope. So, this is just a way of replicating the feature with its weight across different different subsets of random variables in the image or in the, in the in [UNKNOWN]. So said otherwise, and I wrote all over this, so let's see if I can erase this. Yes. We will have a term in the energy function which has wk times the sum of the features across all the scopes to which it applies, which is just a way of of aggregating like terms that all have wk in them. But to summarize, in log-linear models, it's very common to use the exact same feature and weight for multiple subsets of variables and we've given multiple examples of that and this allows us just as in the other examples that we've seen in the directed case to provide a single template for multiple Markov networks whether its a different, the Markov network for different images or different sentences. And the parameter M structure are then allowed, are then reused both within and across Markov networks. And the only thing we need to do in order to apply this is to specify a set of scopes to which a particular feature is going to be applied.