Hi folks. So now we're going to talk about another property which is important in capturing networks, and in particular is one which is looking at a local property of the networks. So, in particular what's going on when we zoom in on, on given nodes and, begin to understand the relationship between different ties in the network this is known as clustering. And in particular, when we begin to think about asking how dense is a network at a local level, we could ask a question of you know, what fraction of the people who I'm friends with, are friends with each other? And so, clustering looks at if we have a given node i, and we look at two of i's friends j and k, what's the chance that those two are related to each other. So what's the frequency of lengths among the friends of i. So if we want to look at a given node i, and ask what the clustering is for that node i, in a given network, then we can say okay, let's look at i's neighborhood and look at all the pairs of friends that i has. Two different k's and j's in that neighborhood. And keep track of, for those possible pairs, how many of them are actually connected to each other, compared to the overall number of them. And so that gives just a, a fraction of how many of, of your friends are friends with each other. and then average clustering, we can just take that number and average it across all the different nodes in the network. Okay? So, that's a particular measure of, clustering. And, it, there are different ways to measure clustering. And so what we did was just do the average. So first calculate it for a given node i, and then average across all different nodes. And what that does, is it weights this clustering node by node. And another way to do this, would be instead to look at overall clustering. So look at all possible nodes and pairs of friends that they have, and ask overall in the whole network every time we've got a, a particular situation which looks like this, what's the chance that it's connected and those, others are connected? And so instead of first doing this node by node and then averaging the, this is done overall and we're comparing out of all the possible triples in the network where we see them connected in a, in a situation like this. What's the frequency with which they're connected over? So this is overall clustering. And, these numbers an be different. So, which way you measure it, whether you're weighting it my node, or doing it as overall possible triangles in, in the network, it's going, can possibly give you different answers. So just as an example, let's suppose we had a situation which looked like this. Where we have in particular a, you know, a given node here at the center. And we keep forming the, this node has groups of friends in three's that are all friends with each other, but aren't friends across these different groups of three. So we keep looking at these different groups of three, and what do we find? In terms of average clustering, this is going to go to 10 to one. So, for instance out of nine, node nine's friends every pair of friends that nine has know each other. And that's true for ten as well, and eight. So as we look at most of these nodes, they're actually clustered at 100%. All of their pairs of friends are friends with each other. but when we look at one, very few of one's friends are going to actually be friends with each other. And interestingly enough, if you began to keep adding more and more groups like this, the number of triangles that you form in a network, a lot of the triangles are actually going to be triangles which go through 1, and so the overall clustering can be much much smaller than the average clustering in a network like this. And so, you know, what you're measuring, whether you are doing it node by node or whether you're doing it overall by looking at possible triangles and then asking whether they are completed you can get different answers. And so they measure different things and, and it's important to sort of keep that keep that straight. Now one thing that's going to be important in this setting is that when we compare this to what happens in a, in a network uniformly at random. If we ask what's the clustering number in a uniformly at random network, well, this is just simply going to be p. So any time we actually look at, at a connection like this and we ask what's the possibility of, of this link being present? The prof, possibility of this link being present, ignores all the rest of the information, it was just formed with sum probability p. So the clustering is going to be p, regardless of whether we look at average or overall we're always going to get an answer of p for what that number is. And so if we're looking at very, very large networks, and people have a relatively small number of friends compared to the overall network, then p is going to be going to 0, and so clustering in a Poisson random network, or an Erdos–Renyi random network, this gnp kind of network, is going to go to 0 as n grows, if p is actually getting small. which will often be the case in a lot of, of settings we're going to be interested in. So what that tells us is that random networks are going to tend to have very low clustering if we're looking at uniform at random. And then we can look at actually what we see in data. And when we look in data across a variety of different kinds of, of data sets we tend to see, numbers which are much higher than would have occurred at random. So a study of prison relationships by MacRae in 1960 clustering is about 0.31, it's about 0.01 if you do the following calculation. Look at the same Expected degree, but instead look at GNP model so then there's basically about 1.3% of the, of the links are present and so your, your clustering should be 1.3 if it was uniformly random and yet, it's 31% in the data. So that tells us that the network looks dramatically different, then what would have happened if you'd point these links down uniformly at random. Co-authorships 15% in math co-authorships. Here you see that the p is extremely tiny. These are large graphs with, with a lot of mathematicians never having collaborated together. .09 in biology again, so, so here you see much higher numbers than you would have seen at random. worldwide web if you look at it without paying attention to direction, your going to get about 11% again a much smaller number if you don't. If you look back to our data from the Florentine marriages, and in this case here I've included the business dealings as well. so this is Padgett and Ansell's data from the 1430's. here you get a clustering of about 0.46, at random it would be at about 0.29. So that's another situation where we've got substantially higher clustering than at random. So this is another property of networks. This has been a more local property of networks looking at, at how the, the links relate to each other, not just how they're distributed over the network, and so forth. so we've, we've, taken a look at, at a variety of, of different measures we're going to now begin to look at putting nodes in context and, and other kinds of things. So additional definitions that will help us go forward in, in managing to keep track of networks, and talk about their properties, and talk about their characteristics in a meaningful way.