So we're talking about exponential random graph models and some of the issues in estimating those. And so now what I want to do is, is talk about another class of models called Statistical Exponential Random Graph Models. So SERGMs for short. And so what we've been through is we had sets of models that are, are linked based, other models which are still link by link based which can begin to capture different things like clustering and so forth. We brought in attributes in the Stochastic block models. And then we said okay, now there's the class of models which allow us to capture richer things where there's two dependencies and trying to estimate those things statistically. Exponential Random Graph Models, but as we saw there is difficulties in estimating those. So what I want to talk about now is a class of new ones called Statistical Exponential Random Graph Models. That'll allow us to keep the track of these local features and dependencies and actually do some accurate and fast and easy estimation. 'Kay. So, the Exponential Random Graph Models are not accurately estimable in, in many cases. There's basically just too many alternative networks to consider. So what's the idea here? What's the way out? The way out is going to be that many networks lead to the same statistics. So for instance in the, in the simulations we did in the last video what we had was a series of you know, 45 links, 10 triangles and 20 isolated nodes. And in fact, under the statistical, under an Exponential Random Graph formulation, any network that had exactly those characteristics would be equally likely. They'd all lead to the same function, they'd lead to the same probability. So what we can do is we can just say, okay look, even though there's many different networks to search over, In fact, what we really need to keep track of is just the number of different types of statistics, because all the networks that have the same statistic have the same probability. And so we can look at those as equivalent ones. Collapse all those equivalent networks. And that makes the summation and the area that we have to be searching over much smaller than what it was before. Okay. And, and part of this came out, you know, so, so why am I going to talk about this, this is coming out of a paper with Arun Chandrasekhar. And effectively, the way I got interested in this was in projects where I had to be using Exponential Random Graph Models, or, or something that really accounted for dependencies, and finding that the software just didn't work. And, and I couldn't find any models that were actually working. And so, the idea was, okay, we need some models, so let's go develop them. So, so here I am going to tell you about a class of models which I have been working with, which I think solve a lot of the problems that we have there in the literature. When you collapse these equilvalent networks and so lets just go through the, the idea here. So we start with our exponential random graph model, right? So we have got some vector of statistics. Vector of, of coefficients on that. We've got the summation that we're having to do. And so what we're going to do is, let's let N of s prime be the number of networks that would have statistics equal to s prime. So how many different networks are there that have 20 isolates? Okay, say 10 triangles, and 45 links. Okay, so that's a set of statistics. That's our S of g. So how many different networks would have exactly this, that will let N of s be this. Now that's not necessarily an easy number to, to calculate. And in some cases it can actually be a, a complicated number to even estimate, but we'll talk about that in, in a moment. So what we can do is, is we take this original Exponential Random Graph Model and what we can do is, is instead of summing over all g prime, we say that any g primes that have the same statistics are going to lead to the same exponential here, right? So if two, if g prime and g double prime have the same statistics, they're going to give us the same number down here. So instead, let's just sum across the s primes, and then weight that by how many different networks would have generated that same s prime. Okay? So we've changed this into a summation over statistics, instead of of the raw networks. So now, now we just have to worry about how many, instead of how many different networks there are in the world, we just need to think of how many, what's the range here. Well it's the number of triangles you could possibly have with the number of links you could possibly have which is going to be much smaller. And the second thing we're going to do is then, instead of thinking about the probability of a network, we can actually think of what's the probability that we see a certain set of statistics. So ultimately, it's not so much that we're interested in the particular network we saw, as much as it had 20 isolates, 10 triangles and 45 links. So that was the really important aspect, was that it had certain attributes. Those are the kinds of things we're going to want to test as scientists. You know, does the network have certain characteristics? That was a very a very particular network realized, as opposed to some other one. Whatever the characteristics we're really interested in, we can put into those statistics. And so this is what we're ultimately interested in. What's the probability that we see a network that has certain characteristics. Okay? So now we're going to call this the statistical form here. Because we've gotten rid of the networks and we are just talking about the properties of the network. And all estimations are now in statistics based which can be a much smaller space than we are working in than the original space in, in that ideas as the idea behind the Statistical Exponential Random Graph Models. And of course, anywhere one of these representations of an Exponential Random Graph Model then has a corresponding representation in this format. And if we can estimate the parameters here, then that tells us what the parameters would be in the original model. So we can work in this space, if we can do the estimation and, and track back and we get the Exponential Random Graph spot. So the basic idea here, just to emphasize. Instead of asking what the probability of a specific network is, we ask what's the probability of seeing certain density of links, clustering, average path length. Whatever the statistics are you're interested in, what is the probability of seeing something that has those characteristics? Okay. Now one other thing is, one, once we start representing things in this form, so we got this statistical form. Then we can begin to say, you know, why do we have what's known as particular, what's known as reference distributions. So here this is the number of graphs that have a certain set of statistics but it's not obvious that we might not want something else. And in particular, you can put in whatever k function you know, instead of the number of graphs that have a certain statistics. Put in some other number. You could weight these differently. And that would still be a valid model of how statistics are generated. And in particular in a few minutes, we're going to see a model which give us, will give rise to Ks which will look actually very different than the ones that come out of the usual Exponential Random Graph family. So there's a, a natural way of thinking of other kinds of weights than the ones that are there. Okay. So I, I, I have just been through emphasizing idea. So now we have got thing in statistical space that's going to be easier to work with and working directly in a graph space. Okay. Now, one, one thing is you can encode whatever you want in terms of statistics. You can also begin to, to put in preference based models. And we'll see some of that I guess in, in the fourth week of the course. So we'll come back and, and you can begin to do preference-based models. And test for whether people have certain kinds of preferences, and biases and preferences in these models as well. Okay. So what's the real challenge now still in estimating these, just to reemphasize? What was, usually faced with is one network, right? We don't see many different networks. Usually, we just see one realization. And now we've got these inter-dependencies. So we've got problems in, in working with large numbers because we've, we have basically only got one observation. But we do have many observations of how many links are present, which triangles are present. Even though these qui, aren't quite independent, we can still ask the question, are they enough? Is there enough information in one of these things to estimate a model? Okay? And, and so it's not completely obvious that as you estimate these kinds of models, where you've got all these, inter-dependencies, that whatever estimates of the parameters you're going to get are going to be accurate. So we can do, you know, we can go ahead and do say maximum likelihood on this kind of model, an Exponential Random Graph Model, now within a statistical form. And we can go through and estimate these things. So, you know, we saw certain statistic come out. We can say okay, what are the betas which actually maximize this function that make it most likely that we saw these statistics rather than some other ones? Now we can just go ahead and do maximum likelihood on this function. And we can ask, when is it that these betas will be accurate estimates of the true parameters. How large does the, does our data set has to be before we're pretty sure that were getting an accurate look at what nature was doing? Okay. So there is a set of theorems that we have in a paper. In a 2012 paper, where we actually show classes of, of statistical Exponential Random Graph Models for which maximum likelihood is going to converge to the true parameters and grows and provides some easy ways of estimating those. And instead of going into that detail, you, you, you are free to to look at the paper. What I'm going to do is look at a very simple related set of models, which will generate particular case, give us some idea of how this things work, and they do really simple and, and fast estimation. So that will be your next topic.