Hi, it's Matt again. And now, we're going to be talking a

little bit about the diameter of a network.

So again, we're still in our basic definitions, trying to understand how we

represent networks, things we can talk about in terms of their characteristics

and diameter is a very important one. Diameter and average path length refer to

how close nodes are to each other. So, how long will it take to get from one

node to another in some sense? How fast will information spread?

How, how quickly can we get from one part of the graph to another?

These are going to be questions that are going to be answered by things like

diameter and average path length. So, when we look at the diameter, the

definition here is the largest geodesic in the network.

So, the largest shortest path. so if, if we have a network which is not

connected, often when people talk about the diameter, they'll talk about the

diameter of the largest component. It get's a little iffy spent a lot of

time's, you're going to be stuck with a few isolated nodes in a network.

And so diameter often will then just to re, refer to the component largest

component. one thing about diameter is it can be

prone to outliers. It could be that there's, that happen to

be one pair of nodes which are very far from each other, but a lot of the other

ones are, are relatively well connected to each other.

So, average path length is going to be a different measure than the diameter

inside of the maximum geodesic. It's going to be the average geodesic in

the network. And if we look at different network

structures, diameter begins to tell us things about it.

So, for instance, if we look here at the network on the left.

We've got a ring or a circle of different nodes connected to each other.

And in that situation, when we look at these nodes, we end up seeing that they

are some of them can be quite far from each other.

And the diameter in this case is on the order of half the number of nodes.

So, either n over 2 or n minus 1 over 2, depending on whether we've got even or

odd number of nodes. Whereas, if we look at a tree, the

situation on the right here in this picture, what we have is a much more

efficient way of getting from one node to another.

And so, if it, if we think about the number of levels that this tree has.

So, in this case it has three different sets of, of links.

So, one, two, three depth. so in this case, we're going to end up

or, or we're, we're going to end up covering a number of nodes where n is

equal 2 to the K plus 1 minus 1. So, you can just go through and, and keep

track of that yourself. when you look at the relationship between

how many levels you have and how many nodes you have, you end up, in this case,

having K is equal to log base 2 of n plus 1 minus 1.

So, a very simple formula. And with that in, in this case the

diameter is 6. It's twice what this K was.

So you, getting from node 25 here to node 30.

it takes a path link of 6 to go up 3, and then down 3.

So, it's twice times K is going to be the, the longest shortest path in this

network. And what this tells us is that the order

of, of the diameter is going to be roughly 2 times this log of n plus 1.

So, that's giving us sort of how quickly we reach out.

And the idea here is a tree is a much more efficient way of connecting nodes in

term of short paths than we saw in terms of this circle.

Because now we, we end up connecting nodes to each other in a manner which is,

is quite efficient in terms of the branching process.

And so, even with a small number of, of links, we're able to more efficiently

connect things. And so, instead of scaling linearly with

the number of nodes, this is going to scale with the log of the number of

nodes, which is going to tend to be a much smaller number.

Now, in, in terms of small average path lengths and diameters, one thing that's

been found quite extensively in terms of looking at social networks in the real

world is that they tend to have short average path lengths and short diameters.

And what do we mean by short? It means intuitively that we connect a

large number of nodes with a, a fairly small number of connections.

The famous most famous example of this was Stanley Milgram's experiment where he

had people sending letters from one part of the US to another.

I believe it was part of the Midwest, say, Kansas and Nebraska, and you were

trying to get a letter to somebody on the East Coast, say, in Massachusetts.

he would give them say, a person's name and the rough location of where this

person was and maybe some basic background information about the person.

So, you know, I'm supposed to get something to John Smith who lives in, in

Massachusetts and who is a lawyer in Amherst.

And I live in Lincoln, Nebraska. And the, the idea was that I could,

should send the letter to somebody I know.

And then, the instructions in the letters tell them to send it to somebody they

know and so forth, with the intent of eventually getting to, the letter to the

person that was named. And the amazing thing that Milgram found

was that out of the letters that were originally mailed from the initial points

the median number of hops that it took to get to the end point was 5.

And in fact, 25% of the letters made it. So that, that's first of all, fairly high

in terms of response rate for this kind of participation in an experiment.

And secondly, the median of five seem to be quite small given that you're starting

with somebody in, in one part of the country who had no knowledge of the other

individual, in another part of the country and it took them only five steps

to get to that individual. this kind of experiment has been

replicated in, in more, more advanced settings, we're working with e-mail and

other kinds of things. one thing that's sort of interesting

about this experiment is that when, you know, if we think about the median of 5,

we have to be a little bit careful about how we interpret that result.

Why? Well out of the 75% that didn't make it,

the longer a path is, that, that would actually take to, to go from one person

to another, if we think that there's some probability that people won't send it on,

the longer the path, the more likely it, it becomes that it won't appear as having

made it in, in the experiment. So the, the letter will never reach the

final point where Stanley Milgram could then collect it and get all the names.

So, the ones that actually reach are going to tend to be shorter than the

others. So in, in experiments that have followed

up on this, you can begin to try and estimate, you know, how many didn't make

it. And, and those are going to be biased

upwards. But it's still sort of an important

number because people are getting it in only five steps and they didn't get to

see the network. It's not as if they saw the network of

all the connections in the US and we're able to figure out what the most

efficient path was. They were just trying to send it to

somebody they thought might know somebody.

You know, if I, If this person's a lawyer then, if I know a lawyer and I know a

lawyer in Massachusetts, maybe I send it to them or if I know a lawyer in Chicago

who might have a friend in Massachusetts. So, I'm, I'm trying to navigate the

network without actually knowing the explicit structure of the network.