We've already said that there's many different algorithms for inferencing graphical models, but the simplest and most fundamental is an algorithm typically known as variable elimination. Let's consider the variable elimination algorithm in the context of a simple example of a graphical model structured as a chain. Here we have we're interested maybe in computing the distribution over variable E. So we're interested in P(E). And as we've already said that probability is proportional to the un-normalized measure p tilda over A B C D and E summing up all of the variables except for E. So now let's see how we can do this more efficiently than simply constructing the joint distribution and then summing things out. So the first thing we do is we write up this unnormalized measure as a product of the constituent factors. And for the moment we're going to assume that we only have pairwise factors for the edges in, this graph. So we have a factor for AB, a factor for BC, CD, and D. So those are the factors phi one up to phi four. Now, what is the first observation that we have when we see the summation over A B C and D of a product of factors? Well, we've already done this exercise previously when we were doing some proofs related to graphical models. That if we have a factor that doesn't mention a particular variable in its scope we can move it out of the scope for the summation. So specifically, phi two of BC can be moved out of the summation over A, as can phi three of CD and phi four of DE, which leaves us only with the summation over A of phi one AB. So that gives us the expression over here. Now this is. Now this is a summation over a para-wise factor. And the result of this is a factor over a single variable B. which we're going to call tau one of B. And so we end up with an expression that looks like this. So now let's continue this expression the developing this expression further. noting that we now have an expression that doesn't involve A, only the variables B, C, D and E, so that effectively, we have eliminated E from our graph, A, we've eliminated A from our graphical model. So let's go back to this expression, we now have this, this product of four factors and once again we can look at what factors involve the variable B and which ones don't. And the ones that don't can be moved out of the summation, just as before, giving us this where now we have an expression which is a product of these two factors. Sum that over B. And this is going to give us an expression. How to, whose scope is C. And so we now have an expression that does not involve the variable B. And so now we've eliminated B from this graphical model. And we can similarly continue to eliminate C and D. So that, ultimately, we will end up with an expression that involves only the variable E. And that expression is going to be proportional to the probability of the, to the marginal probability of E. Now let's do this algorithm in the context of a somewhat more complicated example which is our enhanced student network that we've played around with before. So let's imagine that our goal is to compute the probability of a variable J as one, and in order to do that we're going to have to eliminate some of the joint distribution all of the other variables except for J. So this our, our expression. And note that we have this product of factors. I've taken, already, in this expression, the factors that were, the CPDs, and turned them into factors so that we can have a consistent notation. And now we need to eliminate every one of the variables except for J. So we're going to start with eliminating the variable C first. And and so once again we are going to take the summation over C and, we are going to push it in, leaving in the summation only the factors that involve C. And those are phi D and phi C. Multiplying them together and eliminating C is going to give us a factor which we're going to call tau one, and whose scope is D. Okay? And, by putting tau one into this expression and removing the ones that we've just multiplied together, we end up with this expression over here. Having eliminated C, we now go ahead and eliminate D. So, here we have the variable, D. And which factor is, involve D? We'll tau one of D. And phi G of G, I, and D. And so everything else is taken out of the summation. And we just have this expression over here. And that's going to give us tau two whose scope is G and I after having eliminated the variable D, from this product whose scope is G, I, and D. And tau two gets put back into the bucket together with everything else. Moving forward, we're now interested in eliminating I. And so the factors that involve I are tau two of G and I and phi S of S and I, and phi I of I, and so, we go ahead and multiply them together to give us a factor whose scope is G, I, S and eliminating I gives us a factor tau three whose scope is S and G. And so the process continues and let's just finish it all the way to the end. Now our goal is to eliminate H. Well H is a little bit of an interesting case. They only factor that mentions H is this factor 5H. and if we think about what 5h is, 5h as it happens is P of H given G and J. And so if we're summing that up over H, we're actually summing up what is, in fact, the conditional distribution. And since we know that a conditional distribution when you sum up the, the values on the left-hand side, the summation necessarily is equal to one. So in principle we could have taken this entire expression. Erased it and written one instead and that would have given us something that is a factor that doesn't depend on anything would have been, would have just disappeared. But for purposes of demonstration we're not actually going to do that because in fact not every algorithm is clever enough to notice these kinds of coincidences, it depends on whether they were designed to look for that. And so we're going to do this in, in the same way, in the same sort of naive way that we've done before which is just to turn this into a factor which is going to be tau four of G. J. So so now we have that factor which really is one. But we're not going to pay attention to that particular aspect for the purposes of demonstrating how the algorithm would work. Okay? Next is eliminating D. And we have this expression, which we inherited from the previous line. And G is one of the big ones, because it appears in phi L tau three and tau four. So when we think about the variables here in the scope, we see that this one actually has, this product over here, actually has a scope of L, G, S, J which is the largest factor, the one with the largest scope that we've encountered so far. Summing our G we end up with a factor whose scope is L, S, and J. So we're missing an S. And and now we put that into. This expression, and out comes a now product of two factors. And really at this point we may just as well multiply them and end up with a factor over J and. Hold on. So that gives us variable elimination in it's naive form. What about variable elimination with evidence? Well, we've already basically established how to deal with evidence. if we're interested in, do in, for example, solving the query probability of J, I equals little i. Comma H equals little h. The way in which we do that is by, is by, computing the probability of the joint event, J comma I equals little i comma H equals little h. And the way in which we do that is by reducing the factors to correspond to this scope. And so if these were, If, so we take each of the factors that involved I. And we basically instantiated to take a particular value for that I. The value, little I, and similarly, for h. And so we see here, for example, that, whereas phi I initially depended on i. Now it doesn't depend on anything. Because this is simply the value phi I of little i which is a constant. And where as for example, G depended on D and I as we can see in the original diagram, here, phi G, in a reduced factor doesn't depend on I, and is really probability of G, given little i and, D. And the same reduction occurs for H equals H and so we end up with the following set, of reduced factors. And now once we have that set of reduced factors we do the as, exactly as before. No changes whatsoever to the algorithm. The only aspect that's a little bit different is notice that we don't eliminate. No H, and no I, because there's no point eliminating a, a variable that has a single value. There's no need to sum up over it when it only has a single value. So these give us a unified framework for dealing with with queries whether they involve evidence or not. And then how do we get the probability of j given the evidence? Well, this is straight forward. We simply renormalize. Because we, because we can take this. And simply divide by what it turns out to be the probability, normalizing constant. Okay. So, let's see whether the same idea applies to Markov networks. So here is our simple Markov networks with, network with four variables. Let's imagine that our goal is to compute P of D and so in order to that we need to eliminate A B and C from the unnormalized measures. So we have this being the unnormalized measure. And we're summing up over A, B and C. And the process works in exactly the same way. So if we want to sum out a first, then here is, the factors that involve A. Phi one of A, B, this one. And phi four of AD. Multiply them together. We get a factor whose scope is ABD and then we sum out A to get a factor whose scope is BD. That gives us a new set of factors where A has effectively been eliminated from the graphical model. And at the end of the process, we get a factor over the single remaining variable D. So tau three of D and that factor is not the probability of D, it's proportional to the probability of D. It's actually equal to the tilda of D which is the unnormalized measure and so in order to get P of D we renormalize. So to summarize this the main routine in this algorithm is something is a routine which we call eliminate variable Z from a set of factors phi, and what it does is the following. We first look within phi, and we define a set of factors phi prime which are all factors that involve Z. And that's what this mathematical expression says. The factors phi I, so that Z is in their scope. We take all those factors, and we multiply them. And then, we sum out the variable Z, which is the one that we want to eliminate. Now, and here is the important point, we've already used up these factors. These ones over here, have now been used. We don't want to reuse them. And so we take them out of the set of factors. And instead we introduce the one we just created by multiplying those factors, and summing them up. This basic operation is what we use in the context of the algorithm as a whole. we begin by reducing all factors by the evidence, in, but which is just eliminating the, rows that don't that are not consistent with the observations, and that is what gets us are set of factors phi. Now for each non query variable we need to eliminate it. And so we run one at a time, something that eliminates the variable Z from the set of factors phi. Each subset changes my set of factors. It adds factors and removes. So actually starts by removing factors which we, phi prime from the previous one, from the previous line and it adds a new factor tau. And then finally at the very end, when all variables have been eliminated, there may be one or more factors remaining so that point we multiply all the remaining factors and then we normalize to get a distribution. So, to summarize, this is a very simple algorithm. It works equally well for Bayes nets and Markov nets. and it uses. And it's a factor product and factor summation steps. And it does that be ensuring that when you mult-, that when you sum out a factor, when you do the summation step over variable Z, then all factors involving z have been multiplied in. Which is the critical piece of correctness of this algorithm.