We've shown that the clique tree algorithm has some pretty cool properties, that it guarentees that we achieve the correct marginal, at every single clique and therefore that these marginals necassarily agree with each other. We've also seen that these marginals be, can be computed, using a single upward and downward pass over the cligue trees, so a fairly efficient computation. But, we also note that the problem of inference in probablistic graphical models, is an empty heart problem and somewhere over there, there might, must be a catch, that is there must be, a computational cost, that occurs. Least in certain cases when we round the clique tree algorithm. And what we're going to do now is we're going to try and understand where that hidden cost might lie. So Well where going to do so in order to analyze the computation complexity of a clique tree we're going to define a little bit of notation. so let's consider an edge IJ in the cligue tree T and let's divide a little bit of notation we're going to divide up the variables into three groups there's this group so here what we have is a clique tree here's CI and CJ. There adjacent kleeks and they're connected to each other via steps at S, I, J and then there's this whole kleek tree, a whole bunch of kleeks on the left side and a whole bunch of kleeks on the right side. We're going to divide the variables into these three groups: W, less, W that's on the I side of the I, J edge are the variables that are just. In this, group here not counting the sub set, which is going to appear in both sets. W that's on the J side, is over there, and SIJ, is the stuff that's in the middle, so these are three, mutually exclusive, and exhaustive groups, that partiiton all the variables in, the tree. And now, here is a, An interesting and, and quite enlightening theorem. That says that, the clique tree t satisfies the running intersection property. If and only if, for every subset IJ, we have that, the variables on the left side of the edge are independent of the variables on the right side of the edge. Given the variables in the middle of the edge. That is, these variables, the subset, separate. The left side from the right side. Now, remember that the running intersection property was the critical property that we used to prove correctness of the clique tree algorithm. And so this is a, coming up with this condition that tells us exactly when running intersection holds, is important because this is the defining property for all of those nice behaviors that we talked about earlier in terms of the clique tree algorithm. So let's try and convince ourselves of this by looking at a concrete example first. Let's look at the cleet tree that we have over here and let's for example focus on this subset which has the variables G and S and the variables on the left side of that subset excluding the variables G and S are I. D and C. On the other hand, the variables on the right side of this subset again excluding G and S are H. J and L, so now let's look at where these variables placed themselves on the graph structure over here which is the induced graph corresponding to the Markov network, which we derive from the factors in this Basian network, so Basian network, CPE's produce factors, the factors give us this induced Markov network. So where are the sets of variables G and S? Those are over here. Where are the variables on the left hand side? These are the blue variables, C, I, D, and they're sitting over there. And the green variables, H, J, and L, are sitting over here. And, simple inspection can show us that there are no paths or trails between the blue variables and the green variables, that did not go through the red variables. So that, we conclude from that, that the variables C. I knee are separated. From. H, O, and J given G and S. And from the, connection between the graph structure and independence in Markov networks, we can conclude from that, that this independence property. Holds, that is that C, I, and D are independent of H, J, and l given G and S. Which is exactly what we wanted to show, that the subset separates the variables on the left, the blue variables from the variables on the right, the green variables. Let's try an give a slightly more a general argument for this. One That isn't just demonstrating it in the context of a particular example. so let's ignore for a moment the concrete letters inside this. Example and just think about what would why this is going to be, true. So let's imagine that this is now a generic. Subset. And, this is it over here. And, we'd like to prove that all the variables, on the green side are independent of all the variables on the blue side. Now let's assume otherwise this is going to be a prove by contradiction. If this were not the case then that means that in this induced Markov network there needs to be some. So there needs to be, to otherwise. Which means there needs to be a path in the induced Markov network between the blue side and the green side. So between W less than IJ. And, W, less than JI. Well if there exists a path that goes from one side of this graph to the other, it means there eventually has to be an edge where one node is on one side and one node's on the other. So there, that means there needs to be some edge that goes from the blue side to the green side. Now notice, and I forgot to say, this path that exists in the induced Markov network, doesn't. That doesn't, go through. The subset, because otherwise the subset would separate it. So, it doesn't go through SIJ. So there needs to be an edge that goes for example from here to there or it doesn't matter which node I pick, which pair of nodes I pick. There needs to be some node on the green side and some node on the blue side. This is the green side and this is the blue side and an edge that goes between them that doesn't go through the subset. But now that implies that there needs to be some factor. That involves those two variables. So in this case, e comma h. But now because of family preservation that factor must sit in some. One of the clique's in this clique tree. And that clique is either on the green side or the blue side. It has to be somewhere. So, lets assume that we put that clique without loss of generality on. The green site that's not what happens we have an H that's sitting here and remember H is a blue variable and H is also sitting here because it is on the blue side and now we have an H in one clique and an H in the other and running intersection property tells us that H needs to be everywhere in between and specifically. It needs to be in the subset, which is a violation of the assumption, either of the running interception property or of the assumption that the, the variable h is not in this in the subset. And so that's, a sort of, a somewhat formal proof outline that can, with a little bit of extra effort and notation, be made into a rigorous proof of why a running intersection property implies this independence assumption. And I didn't prove the other direction, because this is actually the direction that we care about more. So we start out this whole discussion by saying that these properties have computational implications. But where are these computational implications? What can we conclude from the fact that the subset needs to separate the graphics to conditionally independent pieces? Well it turns out that in many graphs that implies a certain minimal complexity that can sometimes be quite large. So let's do this in, let's look at this in the context of two quite simple but very commonly used examples. So here is an example of A. What's called a, a complete bipartite gra, graph. Where we have two sets of variables. That have no edges between each of the set separately. But where all of the cross edges are present. This is a structure that is that we've actually seen before. We saw it in the context of the, play model for a student for course difficulty. And student intelligence where we had a bunch of difficulty variables, a bunch of intelligence variables and there were no edges between the difficulties or, I mean between the intelligences. But for every difficulty intelligence pair there was an edge that was induced by the V structure corresponding to an observed student grade between the course difficulty, corresponding course difficulty and not student intelligence. Now what is the smallest subset that we could possibly construct for this graph? Can we, for example just look at, say these two As and separate out the graph into two conditionally independent pieces? Well, no, not really because for example if we now look at these two Bs we see that you can connect them via any one of the other As that I didn't include in the subset, and so there is no, this doesn't de-couple the graph at all. With a little bit of extra thought it's not difficult to convince oneself that the smaller sets that we could construct, that actually breaks up the graph into meaningful pieces is either all the variable son the one side or all the variables on the other. Which means the smaller sub set. For the small. in any kind of meaningful clique tree has size. Greater than or equal to min of k comma m with k's number of variables on one side and m on the other. A slightly less obvious example, but one that is also imposes some very significant lower bounds is in the context of a grid. Such as we encountered in the context of the icing model or you know, when doing image analysis, where the variables correspond to pixels. And now let's try and think about how we might break up this graph into conditionally independent pieces. Now, we can construct clique trees that have smaller subsets, small subsets. For example, here's a subset. Breaks away A11 from everything else. But notice that, that still leaves me a very large everything else. but if we try for example, to break up the graphs, so that a11 appears on the one side, and A44 appears on the other. Any, clique tree that, you give me, that will have this separation, with A11 on the one side, and A44 on the other. Any such clique tree has to have. A subset. Of size. Greater than or equal to N, where this is a N by N gre, grid. Which means that if you try and break up the N by N grid. In a way that puts, one corner on one side, and one corner on the other. Then, you're going to have a subset, that's at least a dimension of the grid. And breaking it up in other ways doesn't make it any better. So, here are two cases where we can place a lower bound on the size of the subset that is required for running cliue interference and that is where we pay the exponential blowup that is in some sense required by the fact that that the problem is intrinsically an [INAUDIBLE]. So to summarize we've shown previously that the correctness of the cleet tree inference algorithm relies on having the running intersection property and we've now shown in tern that the running intersection property implies certain separation properties on the original distribution. These separation properties in turn, can be used to analyze the complexity of inference in different graphs. And provide certain minimal bounds on the complexity that would hav to be incurred by the best possible clique tree on graphs. And we've already seen before, a notion of minimal complexity, which is the minimal induced width of the graph. This notion is a little bit different, because it talks about subsets. Whereas the induced width really talks more about cliques, but these are both different ways of analyzing the complexity certain minimal complexity that has to be incurred by exact inference, which again is related to the NP-hardness of the problem.