0:07

Hello. As we have seen before,

Â the degree of a node in an undirected graph is the number of neighbors that it has,

Â but sometimes we're not interested in that particular degree of a specific node.

Â We're interested in seeing how the degrees of

Â all the nodes are distributed across the whole network.

Â For example, here in this network,

Â we see that many of the nodes have degree two,

Â some have degree three,

Â and only one of them has a degree four,

Â and one has degree one.

Â We like to see how these are distributed over the network, and when we do,

Â we look at the network's degree distribution which is

Â the probability distribution of the degrees over the entire network.

Â If we let P(k),

Â where k is degree,

Â be the degree distribution of this network,

Â we'll find that P(k) has the following values: P(1) would be 1/9 because only one node,

Â node E, has degree one out of all nine nodes;

Â P(2) will be 4/9 because there are four nodes that have degree two over the nine nodes;

Â and then P(3) is 3/9 or one-third because three nodes have degree three.

Â We can plot the degree distribution of a network

Â using Network X in the following way: first,

Â we use the function degrees which returns a dictionary where

Â the keys are nodes and the values are the degree of the nodes;

Â and then we construct a list of sorted degrees (so this would

Â just have the degrees of all the nodes in the network in a sorted list);

Â then we construct a histogram that tells us

Â how many nodes of a particular degree we have;

Â and then we can just plot a bar plot of these histogram.

Â If we do that for this network,

Â this is what that histogram looks like.

Â We see that as we had seen before,

Â most of the nodes have degree two,

Â then some of them have a degree three,

Â and very few of them have degrees one and four.

Â This allows us to see how these degrees are distributed over the network.

Â Now, sometimes we're working with directed graphs instead of undirected graphs,

Â and then in that case,

Â we would be interested in the in-degree or the out-degree of the nodes.

Â Usually, we look at the in-degree, and in this case,

Â if we look at the in-degrees,

Â this is how they're distributed.

Â If we look at the in-degree distribution, say P_in(k),

Â that degree distribution would have these values: we have

Â P_in(0) is 2/9 because two nodes have degree zero,

Â P_in(1) is 4/9 because four of the nodes have in-degree one and so on.

Â We can also visualize the in-degree distribution in

Â the same way that we did for the degree distribution in the undirected graph,

Â but now we use the in-degree function rather than the degree function,

Â and everything else is the same,

Â and we can visualize the in-degree distribution of this network.

Â So one interesting question to ask is,

Â what does the degree distribution look like for real networks?

Â Let me show you three examples.

Â Here are the degree distributions of three different networks,

Â and let me tell you what the networks are: Network A is a network of

Â 225,000 actors connected when they appear in a movie together,

Â B is a network of the World Wide Web so it

Â has about 325,000 documents that are connected by URLs,

Â and then C is a network of

Â the US Power Grid so it's a network of

Â about 5,000 generators connected by transmission lines.

Â There are two things to notice about these degree distributions.

Â The first thing is that they're all in log-log scale,

Â meaning that the x-axis and the y-axis are both on log scale rather than linear scale.

Â The second thing to notice is that,

Â for at least part of these distributions,

Â they tend to look like straight lines for all three cases.

Â When you put those two things together when you have

Â a degree distribution on a log-log scale and it looks kind of like a straight line,

Â then we say that this degree distribution looks kind of like a power law.

Â A power law degree distribution would have the form

Â P(k) equals C times k to the negative alpha,

Â where C and alpha are constant,

Â and the alpha values for these three distributions that we have here are 2.3 for A,

Â 2.1 for B, and four for C. Okay,

Â these details are not so important,

Â at least for this course,

Â but the thing to know about power law degree distributions is

Â that they tend to have most of the nodes with very,

Â very small degree, but you have a few nodes that accumulate a very, very large degree.

Â For example, if we look at the degree distribution for Network A,

Â we have that most actors have a very small degree

Â so they have appeared in movies with a very small number of other actors,

Â but few actors actually appear in a movie with many, many actors.

Â Some appear in at least one movie with almost 1,000 actors.

Â You can notice that when you look at this degree distribution,

Â these actors have accumulated a degree that's very large, almost 1,000.

Â Whereas, most actors actually have a very small degree.

Â And that's typical of power law degree distributions.

Â One of the things we try to ask when we see something like this is,

Â what could explain this property that we observe happening in many different networks?

Â The way we try to answer this question is by coming up with models that

Â generate networks that make a few assumptions about how these networks get formed,

Â and then they give rise to whatever properties we observe.

Â So in this case, the question would be,

Â can we come up with a model that generates

Â a network that has a power law-like degree distribution?

Â One of the models that achieves this property is called a Preferential Attachment Model.

Â Let me tell you how the preferential attachment model works.

Â First, we start with just two nodes that are connected by an edge,

Â and then at each time step,

Â we're going to add a new node,

Â and the new node is going to connect to a single existing node.

Â The sort of special sauce in

Â the model comes when you decide which node to pick out of the existing nodes.

Â The way that these new node is going to pick an existing node to attach

Â to is going to be that it's going to choose it at random,

Â but it's going to choose it with probability proportional to the node's current degree.

Â So the probability of connecting to a node u that has a degree, k_u,

Â is going to be k_u divided by the sum of all the degrees of all the other nodes.

Â Let's run an example of this model.

Â Again, we start with these two nodes connected by an edge,

Â and then at each time step we're going to add a new node.

Â Node three is going to come in and is going to attach to a single node,

Â either node one and node two,

Â but it's going to do so with probability proportional to their degree.

Â But right now, node one and two both have degree of one,

Â so the probability of choosing one and two is 50/50 for node three.

Â Let's say, it chooses node two.

Â Now, node four comes in, but now,

Â node two has degree of two,

Â and nodes one and three have degree one,

Â and so the probability of attaching to node two is 0.5,

Â and the probability of attaching to nodes one or three is 0.25.

Â Node four attaches to node three,

Â which had a lower probability.

Â That could happen. So it attaches to node three.

Â Now, node five comes in,

Â and we have to recompute all the probabilities.

Â Well, now, nodes two and three both have degree two,

Â and nodes one and four have degree one.

Â So nodes two and three are going to have a higher probability of attaching,

Â and so that's 0.33 for nodes two and three,

Â and 0.17 for nodes one and four.

Â Let's say, five attaches to node two,

Â and we continue this node six,

Â again, we recompute the probabilities.

Â Now, node two has the highest degree,

Â so we will have the highest probability of getting that new attachment.

Â So six, let's say, attaches to two.

Â Now, seven comes in,

Â we recompute the probabilities.

Â Again, now, node two has an even higher degree so it has

Â an ever higher probability of getting that new node.

Â Second is node three that has a degree or two

Â with probability of getting that new node edge at 0.2,

Â and seven attaches to two.

Â Node eight comes in, we recompute the probabilities.

Â Node two probability becomes even larger,

Â but this node eight attaches to node three, and so on.

Â You can see how this continues.

Â The thing to notice here is that as node two started to get larger and larger degree,

Â its probability of getting a new edge became larger and larger as well.

Â There is this sort of rich get richer phenomenon,

Â where as the nodes get larger and larger degree,

Â they also start to become more and more likely to increase their degree.

Â What we can prove about this particular mechanism is that it gives rise to a power law.

Â It approaches a power law as the number of nodes gets larger, and larger, and larger.

Â So it matches the kind of degree distribution that we see in these real networks.

Â This type of modeling technique allows us to explain or at least have

Â some hypothesis for what kind of mechanism could give and

Â rise to this shape of the degree distribution that we observe.

Â For example, if we believe that

Â a very popular actor that has

Â appeared with many other actors in movies has a higher likelihood

Â of getting an additional actor to co-appear in

Â a movie than a maybe less popular actor

Â that has not appeared with many other actors in movies,

Â then this is the kind of mechanism that could be explaining

Â the sort of very skewed power law distribution that we observed.

Â In Network X, you can use the function,

Â barabasi_albert_graph, which is named after the researchers that came up with this model,

Â with input powers n and m,

Â where n is the number of nodes and m is

Â the number of new nodes that an arriving node would attach to.

Â In our example, the way we define it,

Â this m parameter would be one because we said that

Â every new node would attach to only a single existing node,

Â but you can generalize this and have it so that every node attaches to m existing nodes,

Â and that m will not change the fact that you still get a power law degree distribution.

Â You can use this function in Network X to create

Â networks that follow these Preferential Attachment Model.

Â Let's create one here with a million nodes and an m(1),

Â so every node attaches to a single existing node.

Â Now, let's plot the degree distribution in the same way that we

Â did before except that now I'm not using a bar plot,

Â I'm using a scatter plot,

Â and now I'm setting the scales,

Â the y-scale and x-scales to be logged so we can see

Â that straight line that gets formed on the log-log scale for the degree distribution.

Â Here it is. We see that it forms

Â the straight line that characterizes the power law degree distribution.

Â So in summary, we have the degree distribution of a graph is

Â the probability distribution of the degrees over the entire network,

Â and in many real networks that we see,

Â this degree distribution tends to look sort of like a power law

Â because it looks sort of like a straight line on a log-log scale.

Â We also discussed that models of networks generation allows to identify

Â mechanisms that give rise to patterns that we observe in real networks.

Â In this case, we observed that

Â many real networks tend to have these power law degree distribution,

Â and then we covered a model that gives

Â rise to this particular property which allows us to have

Â some insight about the kinds of things that could be

Â given rise to these properties in real networks.

Â In this case, this was the Preferential Attachment Model that produces

Â these networks with power law degree distribution.

Â In Network X, you can use this function,

Â barabasi_albert_graph to construct networks that have n nodes and where

Â every single node attaches to m

Â existing nodes following the Preferential Attachment Model.

Â That's all for today. I hope to see you next time.

Â