0:00

Hi folks. So now we're going to talk about another

property which is important in capturing networks, and in particular is one which

is looking at a local property of the networks.

So, in particular what's going on when we zoom in on, on given nodes and, begin to

understand the relationship between different ties in the network this is

known as clustering. And in particular, when we begin to think

about asking how dense is a network at a local level, we could ask a question of

you know, what fraction of the people who I'm friends with, are friends with each

other? And so, clustering looks at if we have a

given node i, and we look at two of i's friends j and k, what's the chance that

those two are related to each other. So what's the frequency of lengths among

the friends of i. So if we want to look at a given node i,

and ask what the clustering is for that node i, in a given network, then we can

say okay, let's look at i's neighborhood and look at all the pairs of friends that

i has. Two different k's and j's in that

neighborhood. And keep track of, for those possible

pairs, how many of them are actually connected to each other, compared to the

overall number of them. And so that gives just a, a fraction of

how many of, of your friends are friends with each other.

and then average clustering, we can just take that number and average it across

all the different nodes in the network. Okay?

So, that's a particular measure of, clustering.

And, it, there are different ways to measure clustering.

And so what we did was just do the average.

So first calculate it for a given node i, and then average across all different

nodes. And what that does, is it weights this

clustering node by node. And another way to do this, would be

instead to look at overall clustering. So look at all possible nodes and pairs

of friends that they have, and ask overall in the whole network every time

we've got a, a particular situation which looks like this, what's the chance that

it's connected and those, others are connected?

And so instead of first doing this node by node and then averaging the, this is

done overall and we're comparing out of all the possible triples in the network

where we see them connected in a, in a situation like this.

What's the frequency with which they're connected over?

So this is overall clustering. And, these numbers an be different.

So, which way you measure it, whether you're weighting it my node, or doing it

as overall possible triangles in, in the network, it's going, can possibly give

you different answers. So just as an example, let's suppose we

had a situation which looked like this. Where we have in particular a, you know,

a given node here at the center. And we keep forming the, this node has

groups of friends in three's that are all friends with each other, but aren't

friends across these different groups of three.

So we keep looking at these different groups of three, and what do we find?

In terms of average clustering, this is going to go to 10 to one.

So, for instance out of nine, node nine's friends every pair of friends that nine

has know each other. And that's true for ten as well, and

eight. So as we look at most of these nodes,

they're actually clustered at 100%. All of their pairs of friends are friends

with each other. but when we look at one, very few of

one's friends are going to actually be friends with each other.

And interestingly enough, if you began to keep adding more and more groups like

this, the number of triangles that you form in a network, a lot of the triangles

are actually going to be triangles which go through 1, and so the overall

clustering can be much much smaller than the average clustering in a network like

this. And so, you know, what you're measuring,

whether you are doing it node by node or whether you're doing it overall by

looking at possible triangles and then asking whether they are completed you can

get different answers. And so they measure different things and,

and it's important to sort of keep that keep that straight.

Now one thing that's going to be important in this setting is that when we

compare this to what happens in a, in a network uniformly at random.

If we ask what's the clustering number in a uniformly at random network, well, this

is just simply going to be p. So any time we actually look at, at a

connection like this and we ask what's the possibility of, of this link being

present? The prof, possibility of this link being

present, ignores all the rest of the information, it was just formed with sum

probability p. So the clustering is going to be p,

regardless of whether we look at average or overall we're always going to get an

answer of p for what that number is. And so if we're looking at very, very

large networks, and people have a relatively small number of friends

compared to the overall network, then p is going to be going to 0, and so

clustering in a Poisson random network, or an Erdos–Renyi random network, this

gnp kind of network, is going to go to 0 as n grows, if p is actually getting

small. which will often be the case in a lot of,

of settings we're going to be interested in.

So what that tells us is that random networks are going to tend to have very

low clustering if we're looking at uniform at random.

And then we can look at actually what we see in data.

And when we look in data across a variety of different kinds of, of data sets we

tend to see, numbers which are much higher than would have occurred at

random. So a study of prison relationships by

MacRae in 1960 clustering is about 0.31, it's about 0.01 if you do the following

calculation. Look at the same Expected degree, but

instead look at GNP model so then there's basically about 1.3% of the, of the links

are present and so your, your clustering should be 1.3 if it was uniformly random

and yet, it's 31% in the data. So that tells us that the network looks

dramatically different, then what would have happened if you'd point these links

down uniformly at random. Co-authorships 15% in math

co-authorships. Here you see that the p is extremely

tiny. These are large graphs with, with a lot

of mathematicians never having collaborated together.

.09 in biology again, so, so here you see much higher numbers than you would have

seen at random. worldwide web if you look at it without

paying attention to direction, your going to get about 11% again a much

smaller number if you don't. If you look back to our data from the

Florentine marriages, and in this case here I've included the business dealings

as well. so this is Padgett and Ansell's data from

the 1430's. here you get a clustering of about 0.46,

at random it would be at about 0.29. So that's another situation where we've

got substantially higher clustering than at random.

So this is another property of networks. This has been a more local property of

networks looking at, at how the, the links relate to each other, not just how

they're distributed over the network, and so forth.

so we've, we've, taken a look at, at a variety of, of different measures we're

going to now begin to look at putting nodes in context and, and other kinds of

things. So additional definitions that will help

us go forward in, in managing to keep track of networks, and talk about their

properties, and talk about their characteristics in a meaningful way.