0:00

So we're talking about exponential random graph models

and some of the issues in estimating those.

And so now what I want to do is, is talk

about another class of models

called Statistical Exponential Random Graph Models.

So SERGMs for short.

And so what we've been through is we had

sets of models that are, are linked based, other models

which are still link by link based which can

begin to capture different things like clustering and so forth.

We brought in attributes

in the Stochastic block models.

And then we said okay, now there's the class of models which allow us

to capture richer things where there's two

dependencies and trying to estimate those things statistically.

Exponential Random Graph Models, but as we

saw there is difficulties in estimating those.

So what I want to talk about now is a

class of new ones called Statistical Exponential Random Graph Models.

That'll allow us to keep the track of these local features

and dependencies and actually do some accurate and fast and easy estimation.

'Kay.

So, the Exponential Random Graph Models are

not accurately estimable in, in many cases.

There's basically just too many alternative networks to consider.

So what's the idea here?

What's the way out?

The way out is going to be that many networks lead to the same statistics.

So for instance in the, in the simulations we did in

the last video what we had was a series of you know,

45 links, 10 triangles and 20 isolated nodes.

And in fact, under the statistical, under an Exponential Random Graph

formulation, any network that had exactly

those characteristics would be equally likely.

They'd all lead to the same function, they'd lead to the same probability.

So what we can do is we can just say,

okay look, even though there's many different networks to search over,

In fact, what we really need to keep track of is just the number of different

types of statistics, because all the networks that

have the same statistic have the same probability.

And so we can look at those as equivalent ones.

Collapse all those equivalent networks.

And that makes the summation and the area that we have

to be searching over much smaller than what it was before.

Okay.

And, and part of this came out, you know, so, so why am I

going to talk about this, this is coming out of a paper with Arun Chandrasekhar.

And effectively, the way I got interested in this

was in projects where I had to be using Exponential

Random Graph Models, or, or something that really accounted

for dependencies, and finding that the software just didn't work.

And, and I couldn't find any models that were actually working.

And so, the idea was, okay, we need some models, so let's go develop them.

So, so here I am going to tell you about a class of

models which I have been working with, which I think solve a lot

of the problems that we have there in the literature.

When you collapse these equilvalent networks and so

lets just go through the, the idea here.

So we start with our exponential random graph model, right?

So we have got some vector of statistics.

3:28

So how many different networks would have exactly

this, that will let N of s be this.

Now that's not necessarily an easy number to, to calculate.

And in some cases it can actually be a, a complicated number

to even estimate, but we'll talk about that in, in a moment.

So what we can do is, is we take this original Exponential

Random Graph Model and what we can do is, is instead of summing

over all g prime, we say that any g primes that have the same

statistics are going to lead to the same exponential here, right?

So if two, if g prime and g double prime have

the same statistics, they're going to give us the same number down here.

So instead, let's just sum across the s primes, and

then weight that by how many different

networks would have generated that same s prime.

Okay?

So we've changed this into a summation

over statistics, instead of of the raw networks.

So now, now we just have to worry

about how many, instead of how many different networks

there are in the world, we just need to think of how many, what's the range here.

Well it's the number of triangles you could possibly have with the number

of links you could possibly have which is going to be much smaller.

And the second thing we're going to do is

then, instead of thinking about the probability of a

network, we can actually think of what's the

probability that we see a certain set of statistics.

So ultimately, it's not so much that we're interested in the particular network

we saw, as much as it had 20 isolates, 10 triangles and 45 links.

So that was the really important aspect, was that it had certain attributes.

Those are the kinds

of things we're going to want to test as scientists.

You know, does the network have certain characteristics?

That was a very a very particular network realized, as opposed to some other one.

Whatever the characteristics we're really interested

in, we can put into those statistics.

And so this is what we're ultimately interested in.

What's the probability that we see a network that has certain characteristics.

Okay?

So now we're going to call this the statistical form here.

Because we've gotten rid of the networks and we are just talking about the

properties of the network.

And all estimations are now in statistics based which

can be a much smaller space than we are working

in than the original space in, in that ideas

as the idea behind the Statistical Exponential Random Graph Models.

And of course, anywhere one of these representations of

an Exponential Random Graph Model then has a corresponding

5:48

representation in this format.

And if we can estimate the parameters here, then that

tells us what the parameters would be in the original model.

So we can work in this space, if we can do the

estimation and, and track back and we get the Exponential Random Graph spot.

So the basic idea here, just to emphasize.

Instead of asking what the probability of a specific network is, we ask

what's the probability of seeing certain

density of links, clustering, average path length.

Whatever the statistics are you're interested

in, what is the probability of seeing something that has those characteristics?

Okay.

Now one other thing is, one, once we start representing

things in this form, so we got this statistical form.

Then we can begin to say, you know, why do

we have what's known as particular, what's known as reference distributions.

So here this is the number of graphs that have a certain set

of statistics but it's not obvious that we might not want something else.

And in particular,

you can put in whatever k function you know, instead

of the number of graphs that have a certain statistics.

Put in some other number.

You could weight these differently.

And that would still be a valid model of how statistics are generated.

And in particular in a few minutes, we're going to

see a model which give us, will give rise to

Ks which will look actually very different than the ones

that come out of the usual Exponential Random Graph family.

So there's a,

a natural way of thinking of other kinds of weights than the ones that are there.

Okay.

So I, I, I have just been through emphasizing idea.

So now we have got thing in statistical space that's going to

be easier to work with and working directly in a graph space.

Okay.

Now, one, one thing is you can encode whatever you want in terms of statistics.

You can also begin to, to put in preference based models.

And we'll see some of that

I guess in, in the fourth week of the course.

So we'll come back and, and you can begin to do preference-based models.

And test for whether people have certain kinds of

preferences, and biases and preferences in these models as well.

Okay.

So what's the real challenge now still in estimating these, just to reemphasize?

7:48

What was, usually faced with is one network, right?

We don't see many different networks.

Usually, we just see one realization. And now we've

got these inter-dependencies.

So we've got problems in, in working with large

numbers because we've, we have basically only got one observation.

But we do have many observations of how

many links are present, which triangles are present.

Even though these qui, aren't quite independent, we

can still ask the question, are they enough?

Is there enough information in one of these things to estimate a model?

Okay?

And, and so it's not completely obvious that

as you estimate these kinds of

models, where you've got all these, inter-dependencies,

that whatever estimates of the parameters

you're going to get are going to be accurate.

So we can do, you know, we can go ahead and do say maximum likelihood

on this kind of model, an Exponential

Random Graph Model, now within a statistical form.

And we can go through and estimate these things.

So, you know, we saw certain statistic come out.

We can say okay, what are the betas which actually maximize this function

that make it most likely that we saw these statistics rather than some other ones?

Now we can just go ahead and do maximum likelihood on this function.

And we can ask, when is it that these

betas will be accurate estimates of the true parameters.

How large does the, does our data set has to be before we're

pretty sure that were getting an accurate look at what nature was doing?

Okay.

So there is a set of theorems that we have in

a paper.

In a 2012 paper, where we actually show classes

of, of statistical Exponential Random Graph Models for which

maximum likelihood is going to converge to the true parameters

and grows and provides some easy ways of estimating those.

And instead of going into that detail, you, you,

you are free to to look at the paper.

What I'm going to do is look at a very

simple related set of models, which will generate particular

case, give us some idea of how this things

work, and they do really simple and, and fast estimation.

So that will be your next topic.