0:00

So how do we quantify the importance of nodes?

Â One obvious choice is to use degree for example for this node with two edges

Â coming in, three edges going out, we say the total degree is five.

Â In degree is two, out degree three. Dunbar's number is used by sociologists as

Â an estimate of the largest number of friends in a social network an average

Â person can keep, Usually quoted at around 150.

Â But we also know that degree can be misleading number to quantify node

Â importance. For example, we may want to say that more

Â important node is pointing to you, make you more important.

Â And that is exactly what we did in lecture three with Google's ranking of webpages.

Â Over there we said that construct the matrix G which is a connectivity matrix H

Â plus rank one matrices and view that matrix as a random walking graph model for

Â example. And then look at the dominant eigenvector.

Â The eigenvector associated with the largest eigenvalue of this case, one of

Â this matrix G. And now we rank web pages based on entries

Â of this vector. That's what we did with page rank back

Â then. And then in general, the idea of eigen

Â vector centrality. Centrality here is just a fancier name to

Â call importance of nodes. The eigenvector ideas says that it take

Â the I, dominant eigenvector Okay, the eigenvector says a lot, not just

Â eigenvalue of some meaningful matrix such as the adjacency matrix A that we just

Â defined. So, why would you want to do that?

Â One possible motivation comes from the following linear dynamic system

Â interpretation. As in the homework problem, you see that

Â we can often model many useful quantities in the dynamics of, the functionality on a

Â graph by looking at a vector X. With one entry per node, evolved over this

Â great time index by T, as multiplying an initialization vector X Time zero does

Â initialization on the left by this incidence matrix A.

Â A time T has been applied T Times. Now how do we understand this expression?

Â We know that any given vector we can decompose that represent that as a way to

Â sum of the eigenvectors of this matrix A. Call those eigenvectors VI and the

Â corresponding eigenvalues scalers a lambda. And we can represent any

Â initialization vector as the linear combination with weight CI of these

Â eigenvectors. Now substitute this expression back into,

Â this evolution, we see. The following expression.

Â A multiply on the left T <i>,,, </i> the summation of CI VI vectors.

Â Again, CI are the weighted, are the weights to provide the linear combination

Â among the VIs to give you this vector. Because this is a linear [inaudible], we

Â can prove the matrix multiplication inside the summation.

Â . Now if you multiply it, the eigenvector,

Â an eigenvector of this matrix on the left, and then you just get lunder ivi, and you

Â have done the key times. Therefor it is just CI times lunder I,

Â raised to the power of T times VI vector. In other words, we have expressed the time

Â Ti I vector X as a linear combination of the eigenvector of this matrix a here.

Â Where the weight's on the CI times lambda [inaudible] lambda I times, to the power

Â of T. So as time, goes on.

Â 3:56

What we dominate this summation is really the one associated with lamda1.

Â So this becomes basically like C1 times lambda one the largest eigenvalue times

Â the power of P V1. In adverse X time T, large T is

Â approximated by this V1. The dominant eigenvector the one source of

Â the largest eigenvalue of matrix A. .

Â So we can take the largest, dominant eigenvector of matrix A as the second

Â notion of centrality. The third notion of centrality is what's

Â called closeness. And this takes a distance point of view.

Â Suppose we've got two nodes, I and J. And there might be many path connecting I

Â and J. Let's say the distance between node I and

Â j is the shortest one called as Dij. Now late and lectres nine and thirteen for

Â socio and techno works respectively will look at how exactly can we compute

Â hopefully distributedly these Dij's, search for the shortest path, that's not

Â easy job. But for today let's assume that somebody

Â had already done that. So we can tabulate these Dij's for all of

Â the node I, j pairs. Then once we've got this list of all the

Â dijs, we can do a few things. One is, pick the largest one.

Â And we call that the diameter of this graph.

Â It is the maximum dij across all dij pairs.

Â 6:25

That says the following, if I pick a node I here Fix I.

Â And then look at all the other nodes J, down there.

Â Okay. And there are N minus one of them.

Â I look at the distance from me, node I, to all the others.

Â The average. Then if this number is, small, then I am

Â more important. Okay?

Â Cuz I'm, I have a very short distance, to all the other nodes.

Â If this number is big, I am less important.

Â So, just to make the direction work out, let's take the reciporical of this.

Â N -one over summation DIJ over all the IJ [inaudible] for a given fixed node I.

Â This is what people call the centrality close-, closeness centrality denoted by C

Â for this given node I. Now if you have a high C sub I, it means

Â that, you are, you have a very short, shortest the distance to all the other

Â nodes, on average. Now, we didn't have to use the average.

Â The simple mean. We can use other kind of means.

Â For example, harmonic mean, and so on. Now, this is not the end of it.

Â There is yet another. Centrality measure that's often used.

Â In fact, there's quite a couple of others as well, including another in the homework

Â problem. But today we'll stop with this fourth

Â notion of centrality quantified by the notion of betweenness.

Â What does that mean? This goes back to Granovetter's 1973

Â experiment that says, if a node is sort of along the shortest path of many other node

Â pairs, then it has some strategic importances in the mix of the action.

Â 'Kay. Any other no pairs, if they want to, reach

Â each other, along a, efficient path, then, they have to pass through you.

Â 8:34

Okay. You are in between many known pairs, your

Â on many shortest path. And this is the idea of between

Â [inaudible]. How do we quantify this?

Â Let's first denote [inaudible] of ST as the [inaudible], total number of shortest

Â path between two nodes S and T, okay. Then let's denote by NST sub I as the

Â number of shortest path that node I sits on [inaudible], the pair ST.

Â So clearly n sup ist is less than or equal to gst, okay.

Â Now this requires you to count for each pair st the number of shortest path.

Â Divide the only one and note I might not be on it but you have one for gst and zero

Â for nist. So now we want to say, well, let me look

Â at. For each node pair ST, they may have many

Â shortest path, so you have to normalize by GST.

Â How many are use [inaudible] sitting on top of?

Â 9:51

Now, this expression would then have to sum up across S and T.

Â If you sum up across both S and T indices then we're actually double counting all

Â the node pairs. Which is fine if you want to stick to that

Â convention. But most books stick to the following

Â convention. You just order all the nodes.

Â And therefore you sum across all the S, and then all the, this nacent T, this less

Â than S in the ordering of the nodes. This will avoid double counting on node

Â pairs. It's just effect of two difference in the

Â end. Now we call this summation.

Â 10:34

Okay. The sum of the fraction of the shortest

Â path that node I sits, across all the other node pairs.

Â Between this, the node I. Bi for each given node I.

Â So now we've got the four notions. Degree, which we know isn't hardly a

Â useful quantity in many instances. We have the eigenvector, the largest.

Â Dominant eigenvector from incidence matr- adjacency matrix.

Â We have the closeness centrality, CI, for each node I, and the betweenness

Â centrality, BI, for each node I. So we got four different matrix.

Â Which one to use? Again, depends on what you want to do.

Â But, first, let's illustrate this with a simple example.

Â This is a network with just eight nodes. Okay?

Â And we could compute the four different centronic measures for each of the eight

Â nodes. That would take too long time.

Â Let's focus just on two nodes, nodes one and two here.

Â If you just look at a graph, intuitively you say, well Node one got a degree of

Â three. Node two got a degree of three.

Â But they shouldn't be equally important. In particular, this is like a [inaudible]

Â apology. The graph has a left side, a right side.

Â Visually, I can see that this is an important link between the two.

Â And node one is part of that link. Whereas, node two is not.

Â So node one should be more important than node two.

Â So how do we quantify that notion? Clearly not by degree.

Â Okay. Well, let's then look at eigenvector

Â centrality. That means we have to compute.

Â 12:22

Matrix A here. What is matrix A?

Â And this is a graph that doesn't have directional links, so it would be a

Â symmetric matrix. I'll just write down the first two rows

Â here. Clearly the first row says node one

Â connects to two, three, and four. That's it.

Â So it's zero, one, one, one, and then zero, zero, zero, zero.

Â The second row, node two, connects to nodes one, four, five, so one, four, four,

Â five, and zero, zero everywhere else.'Kay. You can fill out the rest or you can take

Â a look at the textbook. And now all we need to do is to compute.

Â