0:00

Hi, in this set of lectures we are talking about networks. In particular, we are

Â focusing on three things, we're talking about the logic of networks, which is how

Â they form, going to talk about the structure of networks, which is how

Â connected are nodes from one another, how far is it from one another to another node

Â and finally we're going to talk about the functions of network. What functionality

Â does a network have? Maybe one built-in, like sort of emerged from the structure.

Â Now in this lecture, we're gonna talk about the structure of the network. So

Â we're gonna define a bunch of terms, find where the network is in terms of nodes and

Â edges and we'll just define and describe a bunch of measures in network that help us.

Â For a better understanding of how networks differ from one another. So when you think

Â of a network, you can think of a bunch of things that we're gonna call nodes, which

Â are points, and then lines connecting those, and those are gonna be edges. So a

Â network is really just a set of nodes and a set of edges. Now those edges can be

Â either directed, and that means there's like a little arrow, or they can be

Â undirected where it's just a straight line. So an undirected node. The network

Â would be something like. [sound]. So an undirected network would be something like

Â the 50 United States. So each node would be a state, and then an edge would just be

Â whether those two states share a boundary with a common border. So there's no

Â direction there. If Ohio's connected to Michigan, then Michigan is connected to

Â Ohio. In a directed network, then that little arrow that you see, that means

Â that. One node points to another node, but it might not necessarily go in the other

Â direction. So if you take a college campus, and you look, you ask whether one

Â student looks to another student for fashion advice. It may very well be that

Â person A looks to person B, but person B doesn't look to A. So in a directed

Â network, you could have an arrow going in one direction, or you could have arrows

Â going in both directions. So a lot of social networks are directed. But many

Â networks, the networks were gonna focus on are gonna be undirected networks, just

Â because they're easier to work with. So then you've got structure we're gonna

Â focus on four measures: degree, which is how many edges each node has on edges,

Â path length, which is how far it is from each node to another node. Connectedness,

Â which is whether the entire graph is connected to itself. And then, finally,

Â clustering coefficient, which will give us some understanding of, like, how tightly

Â clustered are the, are the edges? So if A is connected to B, and B is connected to

Â C, is A connected to C? Okay, so let's get started. Degree of the easiest one. If you

Â take a node, the degree is just the number of edges connected to that node. So if a

Â node looks like this, its degree is one. If a node looks like this, it's degree is

Â four. So it's just how many, if you're a social network, it's how many friends do

Â you have? If you're a business, and you're looking at, and each edge represents

Â working [inaudible]. Another firm and it's a case sort of how many other firms do you

Â work with, so degree is just really a sense of how connected you are to other

Â parts of the node. The degree of a network is the average degree across all nodes. So

Â if I take the entire network and I add, count up the degree of every single node

Â and than average by the number of nodes, that's what we'll think of as the degree

Â of the network. So a high degree network will be one where lots of people are

Â connected to lots of other people. A low degree network will be one where there's

Â very few connections between people. So here's a network, right here. And if I

Â look at this green node, I can ask what is it's degree. And you see that it's

Â connected to one, two, three people. So the degree of the green node. Is equal to

Â three. Now, if I wanna figure out the degree of the entire graph you can

Â [inaudible] I'm gonna go through and count up each person. So, this person has a

Â degree of one. This person has a degree of one. This person has a degree of three.

Â And I can go through and count all the way up. Now, that seems like that takes a lot

Â of time. There should be a better way to. Figure out the average degree for a

Â network, and in fact there is. Cause all I have to do is count the nodes. So, in this

Â case, there's one, two, three, four, five, six, seven, eight, nine, ten nodes, and

Â then I can just count the number of edges. There's one, two, three, four, five, six,

Â seven, eight, nine edges. So, ten nodes, nine edges. Now each edge connects two

Â nodes. So, what I've gotta do is just multiply those edges by two. So if I have

Â two, two, two times the number of edges divided by the number of nodes, that'll

Â give me my average degree. So, in this case two times nine is eighteen, divided

Â by ten gives me an average degree of. 1.8, now when you think of moving along those

Â edges, you can call the people you're connected to, your neighbors. So if this

Â is a node here, and he's connected to these three people, then these three

Â people here are the neighbors of the node. And what we can think about then is.

Â People who are connected have more neighbors, and later on, we're gonna

Â extend this concept to neighbors from sort of your immediate neighbors, to neighbors

Â who are further away on the graph. Well, here's an interesting result. So we only

Â introduced one measure, which is degree. And now we're gonna get a really, sort of

Â fascinating result. And that is, the average degree of neighbors of nodes will

Â be at least as large as the average degree of the network. So, typically, it'll be

Â larger, but it has to be at least as large. What does this mean in less

Â technical terms? It mean, most people's friends are more popular than they are. So

Â literally, it means that. But if you look at your friends, they, on average, have

Â more friends than you have. But what does C have [inaudible]. So let's look at this

Â network. A. Has a degree of two. B has a degree of three. C has a degree of two.

Â And D has a degree of one. So if we add all that up, we get eight, divided by four

Â gives us two. Now we also could have done this by just counting up the edges which

Â are four, multiplying that by two, and dividing by the number of nodes, and that

Â would also give us two. So what we get is we get on average, people have two

Â friends. Well now we can ask what about friends of friends? Well, let's start with

Â D. D has one friend. And that's B, and B has three friends. Well let's, let's walk

Â through the whole thing and see how this works. Let's start, again we start with D,

Â they have one friend, which is B, who has three friends. C has two friends, A and B.

Â A has two friends, B has three friends, that's an average of 2.5. If we look at B,

Â B has three friends. A, which has two, C, which has two, D has one, that's 1.66. And

Â if we look at A, A has two friends, B and C and they also have an average of 2.5.

Â So, if I add all this up, what I end up getting is that this is five, eight, 9.66.

Â Divided by four, which is, you know, approximately 2.4. So, people's friends of

Â friends have 2.4 friends, whereas people on them, by, on themselves only have two

Â friends. What you see is your friends have more friends than you do. That's just a

Â fact about networks. Next measurement we want to get to. Path length. The path

Â lengths from A to B is just the minimal number of edges you have to traverse to

Â get from A, person A to person B, from node A to node B. So, I've gotta node here

Â and a node here, but there's no direct path. Get to those two, you have to take

Â one, two edges, and their path length is two. So, in this particular graph, here's

Â this green node. To get to this green node, I've gotta take one, two paths, two,

Â two edges. So the path length is two. We can also then compute the average path

Â length for an entire graph. So now you take all pairs of nodes, and figure out

Â the path length. So let's take this graph. Let's think of, there's, what are there?

Â There's A and B, there's A to C. A to D, B to C. B to D, and C to D. Those are the

Â different we could get. So, A and B have a distance of one, A and C have a distance

Â of one, A and D have a distance of one too. B and C have a distance of one. B and

Â D have a distance of one. And C and D have a distance of two. So when I add all this

Â up, I get one, two, four, five, six, eight, and there's six total. So that

Â means the average path length is one and one-third. That's a very short average

Â path length. In that particular graph, it was connected, which is our next measure.

Â A connected is just a graph where you can get from any node to any other by walking

Â along edges. So this would be, again, a picture of a connected graph, 'cause you

Â can get from any node to any other node just by walking across the edges. There's

Â also disconnected graphs. So here's a, a graph of, where there's six nodes and you

Â can't get from this set of nodes to this set of nodes. So it's disconnected. So in,

Â in a graph like this, it's interesting because if it's not connected, any

Â information that's over here with A, D, B, and E, might not get over to C and F. So

Â disconnected nodes, information's not gonna flow between them, which could be a

Â bad thing. Now if you think about disease, this could be a good thing because if

Â there's a new disease in this population, it's not gonna cross this barrier and get

Â over to this population. So depending on what you're concerned with, connectedness

Â could be good or bad. Here's an interesting assignment. We've talked about

Â mark-off processes a couple times. We can think of a mark-off process like a

Â network. In fact our pictures sort of look like networks. So you can think of there

Â being here's four states of a mark-off process and then I can just draw little

Â arrows showing the probability of movement from one state to another. Now this is a

Â directed graph, and you could have, you could be that it's close as you can get

Â from B to A with property 0.3 and we can just throw that in. So, one way to

Â represent a mark-off process is with a network because you can just write down

Â the states and then draw little directed arrows showing the probability from moving

Â from one. One state to the next. So again, you start seeing all of our models get

Â related to one another in interesting ways. Last measure, clustering

Â coefficient. This is gonna be the percentage of triples of nodes that have

Â edges between all three nodes. So if I think of three nodes A, B, and C, it could

Â be that they're connected like this, and then we don't have the full triple. Or it

Â could be, that A is connected to B. B's connected to C and C's connected to A. You

Â get the full triple so we can fill in the triangle. You can ask, what percentage in

Â the possible triangles can you fill in? Let's go back and think of a simple graph.

Â So here's a graph, and you can ask, as I start drawing lines, what happens to the

Â clustering coefficient? So if I draw these three lines, I can ask, how many triangles

Â could there possibly be? And there could be four. There could be this triangle one,

Â this triangle two, this triangle three, and this triangle four. But what we see

Â when we look at this graph is that none of those triangles are filled in. So this is

Â gonna have a clustering coefficient of zero. What about this graph? Same number

Â of edges but now we've got one triangle to fill in so this is going to have a

Â clustering coefficient of one over four. What about this graph? I've got one

Â triangle here, and one triangle here, so this is going to have a clustering

Â coefficient of two over four. And then finally, if I have a completely connected

Â graph, I've got, this triangle's filled in, this triangle's filled in, this

Â triangle's filled in, this triangle's filled in, so I have four over four, which

Â is a clustering coefficient of one. So, if you. Think about that, alright? You can

Â ask all sorts of really interesting questions. So here's a simple quiz

Â question you might ask. Draw two Connected graphs each that has six nodes and six

Â edges, but have this different cluster coefficients. Now the reason we ask an

Â exercise like this is to get across the idea that, number of edges sort of tell

Â you how connected the graph is but it doesn't tell you how clustered it is. So

Â it's possible to have, two different graphs with the same numbers of nodes same

Â number of edges but different clustering coefficients. So here is graph one, and if

Â we look at it there's only one triangle filled in. So this clustering coefficient

Â is gonna be one. Over the number of triangles. Well how many triangles are

Â that we get. One, two, three, four, six nodes and we ask a triangle consists of

Â three. That means there's six nodes that we can pick first, five we can pick

Â second, four we can pick third, but the order doesn't matter. Three times two

Â times one doesn't matter. So that means we get twenty triangles. So this [inaudible]

Â one over twenty. Well here's another graph with six nodes and six edges but none of

Â the triangles [inaudible]. So this is going to have a coefficient of zero over

Â twenty. So two graphs. Same number of nodes, same number of edges, different

Â clustering coefficient. So what have we learned? We've learned that there's

Â structure to graphs. We could measure degree which is on average how many nodes

Â is another node connected to. We could talk about path length, which is how far

Â is it from one node to another node. We could talk about connectedness, is the

Â whole graph connected. And we can talk about clustering coefficient, which is how

Â many triangles, of the possible triangles, how many of those are filled in. Now these

Â are statistical measures. So let's talk about, why would we care about these? Why

Â might these be important? Well, think about degree, what is it telling you? It's

Â telling you the density of connections, in a way. It's telling you just how connected

Â is this graph? So you can think of this as some sort of proxy, possibly, for social

Â capital. So if you looked at a community, and there were very few connections

Â between people, then not many people know other people. So remember we talked about

Â Bob Solo saying social capital can't be just this abstract thing. We need some

Â measures. One measure could be, how connected are people to one another?

Â What's the average degree of the social network? You can also think about speed of

Â diffusion. If there's a piece of information up there, how fast is it going

Â to spread in this network? Well, one thing that'll determine that is how connected

Â people are. How many nodes people are connected to. So, somebody's connected to

Â 100 people and you tell them something and they tell all 100, it's gonna spread

Â really fast. If I'm only connected to two people and you tell me something, then

Â that information is gonna spread fairly slowly. What about path length? Why do we

Â care about path length graph? Well think of an airline graph. It's gonna tell you

Â the number of flights your gonna need on average. So if you wanted to fly from

Â Miami to L.A. And the average path length was 1.1 for the airline network graph,

Â you'd know that you?re probably gonna get a direct flight. If the average path

Â length was seven then you'd know oh my goodness this is gonna take me a lot of

Â flights to get from point a to point b. It also tells you social distance. So suppose

Â you run an organization, and you've got a network. And you know that the average

Â path link between employees is six. That means randomly picking two employees, it's

Â gonna take six other employees, or five other employees to get from employee A to

Â employee B. That means that information knowledge, ideas, are not gonna spread

Â very fast within your organization. And you may wanna try and figure out ways to

Â reduce that path link. And last, the likelihood of information spreading. If

Â the path length is really long, it's just not that likely that information is gonna

Â make it nine steps, ten steps, eleven steps, if people are that disconnected.

Â What about connectedness, our next thing? Well, connectedness for Markov processes,

Â it matters. [inaudible] Markov process, we had to be able to get from any state to

Â any other state. If the nodes are the states, and it's not connected, then we

Â can't apply the Markov convergence theorem, because that was one of our

Â assumptions. You can get from any state to any other. If you think about, the

Â abilities of a terrorist group. You might still think all these people are really

Â upset, with, the world in some way. And, and that sort of a terrorist organization

Â is people who wanna do, curiosum tasks that [inaudible] we don't want them to

Â carry out. If their network is not connected. Then it's going to be difficult

Â for them to carry out these activities. If it is connected, then they can pass

Â information and can possibly carry things out. So things like path-link degree in

Â connectedness are going to determine the functionalities of organizations that do

Â good as well as organizations that do bad. Now, also connecting [inaudible] measures

Â if you think about things like the internet or power failures, the electric

Â grid. If the electric [inaudible] becomes disconnected, you disconnect from the grid

Â and you don't have any power. And finally, living in terms of information, if people

Â are disconnected from other people there gonna be isolated in terms of information.

Â So they may not learn things, they may not figure things out. And then finally the

Â clustering coefficient. Class stream coefficient. Can be used to figure out how

Â redundant or how robust a network is. So if you pulled one person out. So think

Â about it, if you've got a little triangle here between A, B, and C. If A gets wiped

Â out for some reason, B and C are still connected. So, it gives you a sense of how

Â robust or how redundant the network is. This is also sometimes used as a measure

Â of social capital, cause it means there's these, these tight links between people.

Â And one last really interesting thing about The custom co-efficient can be used

Â to capture how likely an innovation is to be adopted, in particular possibly a new

Â word. So if I've got three people that are all connected, A, B and C. And A uses some

Â new word. And B hears it and then tells it to C. And then C uses the same new word,

Â and A hears it. You get this sort of autocatalytic set where the information

Â get passed through the loop, and it now seems part of the new normal. A says,

Â well, I said this, now I heard C say it, so this must be something that people say.

Â Whereas, if A wasn't connected to C, when A said something new, it could just sort

Â of fan out, and it may never come back to A. And so A may be less likely to maintain

Â the innovation. So these clusters [inaudible] creating feedbacks can allow

Â things to be maintained within a network. One last thing. [laugh] And I've used all

Â these measures there's still a sense that networks that a picture is worth a

Â thousand words. And one reason I think for the rise of networks so much is because of

Â the fact that we now can graph these networks in really interesting ways. So

Â let me show you the power of pictures. Here are three graphs. The first one is

Â the 50 US states, and there's an edge, those are the nodes. And there's an edge

Â if they're adjacent. The second one is a group of 48 universities. And then it

Â shows, there's a, an edge between them if they played football against each other,

Â [inaudible] American universities. And then the last one is an airline, so each

Â node is an airport. And [inaudible], a network edge is just gonna be whether the

Â airline flies from that one network to another one. Here are the statistics on

Â those three graphs. So graph one has a degree of 10.8, a path length of four.

Â Graph two has a degree of nine, a path length of six. Graph three has a degree of

Â ten and a pass length of four. So they're all about the same. So look at these

Â statistics. There's not a huge difference between the three networks. Let me show

Â you the pictures. Here's graph one. What do you think this is? Well, this is

Â clearly an airline network. There's a couple hubs here, and there's all these

Â cities that the airline flies into. Here's graph two, what is this? Well, this is the

Â 50 United States. Up here are Alaska and Hawaii, who are disconnected. And it

Â looks, in fact, a lot like the United States. And then here is graph three.

Â These are the football conferences, one, two, and three. And what you can see is

Â the teams all playing each other within conference, with an occasional game

Â outside the conference. So what we see is we see very different structures between

Â the airline network, the states. And the University's playing football, even though

Â the statistics, the first two statistics, degree and path link were about the same.

Â Now, where we would see a difference statistically is in clustering

Â coefficient. So, if we look at this graph, the clustering coefficient is going to be

Â very low, in fact it's going to be zero almost. There's very few triangles in this

Â graph. In graph two, there's a little bit of a clustering coefficient, but in graph

Â three the clustering coefficient is really high. So the sense in which, maybe instead

Â of saying a picture's worth a thousand words, we might want to say a picture's

Â worth at least three statistics, or maybe four statistics. Cuz you need quite a few

Â statistics. Statistics before you can really start distinguishing between

Â graphs. So degree won't be enough, path link won't be enough, but if you throw in

Â clustering coefficient, then maybe you start making more sense of the graph.

Â Okay, so that's structure. We now understand, we can think of networks as

Â having nodes, edges, degrees, path length, and connected in a [inaudible] cluster

Â coefficient. So this is a language for describing different network structures.

Â What we're gonna do next, is start with the logic which these networks form.

Â Alright? Thanks.

Â