Hello. Today we're going to talk about the small world phenomenon, which suggests that the world is small in the sense that we're all connected by very short paths between each other. But how can we measure these paths and how short are they really? In the 1960s, a researcher named Stanley Milgram was wondering about these very question, and he set out to create an experiment that would allow him to measure how long these paths are and where they're not, they really exist. The setup of the experiment was the following. He chose 296 people at random as starters all over the US and ask them to forward a letter to a target person. These target person was a broker in Boston, and the instructions for this starters was that they were to send the letter to the target if they knew him on a first-name basis. Of course, since these were randomly selected people, most of them didn't know this person on a first-name basis, so if they didn't know him, their job was to send a letter to someone else, along with instructions who they knew on a first name basis, was more likely to know the target on a first-name basis. The instructions contains some information about the target, like the City where he lived and he's occupation. Then what they did is, they check how many letters actually arrived at the destination and how many hops they took to arrive there. These were the results. Out of all 296 starters, 64 of them reached the target. The median chain length, the number of hops that it took for these letters to get there was six, which was consistent with this famous phrase of six degrees of separation. Here in this plot, we can see a histogram of the chain length for the letters that actually arrived, and we can see that they took anywhere from 1-10 hops to get there, but the median was six. The key points here are the following. First of all, a relatively large percentage of these letters actually arrived. If you think about the fact that these were randomly selected and that people could drop out of this and say, I'm not even going to try to forward this letter 21, and so despite all of these things, more than 20 percent of the letters actually reached the target. The second is that for the ones that reached, these paths were relatively short. Median of six, single digit in a network of millions and millions of people, that seems pretty small. The other thing that's interesting, although we were not going to focus too much on this part of it is that people are actually able to find these short paths. Not only do these paths exist, but somehow people are able to find them. Which is also interesting. Now more recently, people have tried to answer this question, but without actually running an experiment, instead looking at actual data and measuring how long these paths are. Of course in the 1960s we didn't have access to very large data sets of networks, but nowadays we do. One example of this is looking at an instant message communication among people. Researchers took a network of 240 million active users on Microsoft Instant Messenger, and then they connected them. If two users were engaged in a two-way communication over a period of a month. These defined a very large network. The estimated path length in this network, if you take any two people at random and check what the distance between them is in the network, the length of the shortest path, the median is estimated to be seven. Which is very close to what Milgram had found in the 1960s, which was six, and is also very small. Here is the histogram of the distances of this particular network. We again see that these tend to be small. People have also tried to do this using Facebook data. What they did is they looked at how this distance has changed over time and how they vary if you try to measure them on the full Facebook network versus just a subset of it in a particular region like in the United States. What they found is that if you've taken a global network, the average path length in 2008 was about 5.28. Three years later in 2011 it was 4.74. It seems like these path lengths are as short and they seem to be getting smaller over time. As you may expect, if you take a smaller region like just the United States, then these tend to be even smaller. What this tells us is that if we look at this real networks that have millions and millions of people, the average shortest path between pairs of nodes on average or median tends to be very small. That's the first property that I wanted to discuss today. The other property that I wanted to talk about is this clustering coefficient property, which we've talked about before. Remember, the local clustering coefficient of a node is the fraction of pairs of the node's friends who are friends with each other. This roughly talks about, are there lots of triangles or not, and when we take the average local clustering coefficient of all the nodes. When we look at real networks, for example, Facebook in 2011, we find that the average clustering coefficient tends to be pretty high. Here is a plot that has on the x-axis the degree of a node, and on the y-axis, we have the average clustering coefficient. We find that it decreases as degree increases. For people that have lots and lots of friends, their clustering coefficient tends to be smaller, but on average, it's still pretty large. If you look at nodes that have, say, 22-50 friends their clustering coefficient is somewhere in the 30s or so. In the Microsoft Instant Message network, the average clustering coefficient was 0.13, and in the actor network that we talked about before, the average clustering coefficient is even higher, at 0.78. The thing to note here, is that I say that these clustering coefficients are high because if you imagine that these graphs were completely random, meaning you take all of these nodes and whatever number of edges it has and you just assign them at random, then you would find that the average clustering coefficient would be much smaller because there's nothing that's making these triangles up here. So these clustering coefficients tend to be high. We think that there is something happening in the formation of the network that is favoring triangle formation. We've noticed two things: one, that social networks tend to have a high clustering coefficient, and two, that they tend to have small average path length. The next question is, can we think of a model that generates networks that have these two properties? Just like we did for the power-law degree distribution, we observe that the networks tended to have these power-law degree distributions, and then we covered a model that gives rise to these types of properties, and that allowed us to think about, well, maybe this model explains how these properties come about in the real data. Can we do the same thing here? Well, the first natural thing to do would be to check, well, we already have a model of network formation, the preferential attachment model, can we check if this particular model has networks with high clustering coefficient and small average path length? Let's create one of 1000 nodes in parameter m of four, that means that each new node is going to attach to four existing nodes, and then see what its average clustering is. Well, in this case is 0.02, which is pretty small compared to the networks that we had seen before. Now, for the average shortest path length, it is 4.16, which is pretty small. It's pretty distant. It seems that it gets one of the properties but not the other. Let's see what happens if we vary the number of nodes and the number of edges per new node, perimeter m, and let's see what happens through the path length and the clustering. On the x-axis, I have the number of nodes, and on the y axis I have the average clustering coefficient, and we have different curves that represent different values of the parameter m. Then I have the same thing for the average shortest path. What we observe is that these networks have a small average shortest path, and the reason why is that if you remember, these power-law degree distributions have the property that some nodes have a very high degree. These high degree nodes act as hubs that connect many pairs of nodes and make bridges between them. This is why we see that these average shortest paths tend to be small. However, for the clustering side, we see that as the number of nodes increases, the average clustering coefficient also decreases and it becomes very small. Even at 2,000 nodes, we see that the average clustering coefficient is very small, at 0.05 or so. We have seen networks of millions and millions of nodes that had a much larger clustering coefficient. It seems like the preferential attachment model, while it has the small average shortest path property, it fails to have this high clustering coefficient property. The reason is that there is no mechanism in the preferential attachment model that would favor triangle formation. So it's natural that it just doesn't have that property. Now, let's talk about a new model called the Small World Model that actually gives networks that have these two properties. First, let's discuss how the model actually works. You started with a ring of n nodes, so basically, you put n nodes in a circle, and then you have each node connected to its k nearest neighbors. So there's some parameter k to fix. You will have fix another parameter p between zero and one. What you're going to do is we're going to take each one of these edges that we created in the first step, and with probability p, we're going to rewire it. Let's say we're taking the edge UV. With probability p, we're going to find another node w completely at random, and we're going to rewire it, so that the edge UV becomes UW. You do this for each one of the edges and you're done. Let's walk through an example here. We have twelve nodes, and in the example k will be 2, each node is connected to its two nearest neighbors, and p will be 0.4. I will say that these parameters, k equals 2, p equals 0.4 are not the typical parameters you would use for this particular model. Typically you have a k that's much larger, and a p this much smaller. But just for the purposes of illustration I'm showing you here, I'm going to choose these parameters. What we have to do is we're going to have to go through every edge and decide whether we're going to rewire it or not. We're going to rewire it with probability 0.4. Let's start with the edge LA, and we have to decide whether to rewire not. Let's say we use some type of random number generator, and we decide that we don't rewire it. Then go to the next one, and this one we won't rewire either. We go to the next one and this one we will rewire. Now we have to pick another node at random. In this case we pick G, and so these edge instead of connecting K and J, is going to connect K and G. We go to the next edge. J I, and this one, we're not going to rewire. The next edge, we will rewire. We pick a node at random and rewire it. Go to the next edge. We don't rewire it. Go to the next edge. We don't rewire it. Go to the next one, and we rewire it. Becomes FJ. Go to the next one, no rewire. Go to the next one, rewire. It becomes DH. Go to the next one, we rewire it, and go to the next one, we don't rewire it, and we're done. This is the network that these model produces, or at least one of the possible instances of network that it can produce with these parameters. Now let's think about what happens to this network as we change this parameter p, which ends up being most interesting parameter of the network. Well, here's an illustration from the original paper that came up with this model. Let's think about, first of all, what happens in the extreme cases. When p is 0, so we're looking at this type of network. What we have is a completely regular lattice, so there is no rewiring, and because every node is connected to k of its neighbors, then there are lots of triangles that get formed locally. It depends on the value of k, but if k is large enough, then you start to form many triangles. This network will have pretty high clustering coefficient, because it purposely forms triangles in the beginning, and then nothing gets rewired, nothing gets changed, so it has pretty high clustering coefficient. However, if you imagine this being a very large network, you can imagine that the distances between nodes can get very large. If you're in one side of the ring, to get to the other side of the ring, you can hop a little bit, but there's no long bridge that can take you there. We expect that the distances would be long. Now, let's think of the other extreme where we're going to rewire every single one of the edges. That would be this network. What's happening here is that we're going to rewire every single edge. This network is now completely random. We've created a bunch of long bridges, and presumably, distances are pretty small between different nodes. But now we kind of destroyed the local structure that we had before, and now we probably don't have many triangles there. While the distances are small, now the clustering also got very small. If you're in between, if p is between 0 and 1, then what you have is that some edges get rewired, you create some long bridges, and the average distance gets reduced. But the local structure, depending on p, can be maintained. You can maintain that high clustering as well. There's a sweet spot for the parameter p where you can maintain both the short distances as well as the high clustering. Let's see that next. What does the average clustering coefficient and shortest path of a small wire network. Well, it depends on the parameters k and p. Here, I'm showing two plots where on the x-axis we have the value of p, and on the y-axis we have the average shortest path and the average clustering. Then we have curves for networks that have different parameters k. What we see is that as p increases from 0 to 0.1, and notice here that we don't get anywhere close to p equals 1. So this is stained with very small values of p. What we see happening is that the average shortest path decreases rapidly right after, so our p is away from zero, it just drops down. Whereas the average clustering coefficient, while it also decreases as p increases, it decreases much slower. For example, an instance of a network with 1,000 nodes and k equals six and p equals 0.04 has the following values, it has a value of 8.99 average shortest path and 0.53 average clustering coefficient. For these types of values of p, we achieve both of the properties that we wanted. The average shortest path being small, single digit, and the average clustering coefficient being pretty large. In NetworkX, you can use the function watts_strogatz_graph, again, named after the researchers who came up with this model and with parameters n, k, and p. It returns a small world network with n nodes that starts with a ring lattice where each node is connected to its k-nearest neighbors, and it has rewiring probability p. Wondering about the degree distribution of the small world network, what does it look like? Well, let's use the same code that we used before to visualize the degree distributions of network. This time, using a small world network with 1,000 nodes, k equals 6, and p equals 0.04. All we see is that it looks like this. Most nodes have degree 6, few of them have five and seven, and I think maybe one or very small number of them as degree 4 and 8. That's about it. This makes sense because, well, the rewiring probability is very small, so most of the edges aren't going to be rewired. Most of the nodes are going to stay with their degree of six that they had in the beginning when the ring was first formed. Because there is no mechanism that makes some nodes accumulate a very large degree, then none of them do. While these model gets the two properties that we discussed, the clustering coefficient and the shortest path, it doesn't get the power law degree distribution that we also observed in real networks, and the preferential attachment model gets correctly. There are certain variants of this small world network in NetworkX that you can use. The first one addresses an issue that small world networks in general can become disconnected. There is nothing that makes them so that they necessarily are in a single component. If you rewire too many of the edges, you could end up with more than one connected component. This is sometimes something we don't want. Sometimes you want to create a network that is connected where every node can reach every other node. If that's what we need, then we can use the function connected watts_strogatz_graph, which has the same parameters except that it has an additional parameter t. What it does is it runs the standard watts_strogatz_graph model up to t times until it returns a connected small world network. So it keeps trying different iterations of the model until it gets one that's connected. Unless it runs out of tries, which, if he tries more than t times, and it returns some type of error. There's also the newman_watts_strogatz_graph, which is very similar to what small world model. But instead of rewiring the edges when it's time to rewire an edge, it actually adds a new edge and leaves the other one still in the network, and still does this with probability p. Instead of rewiring, it adds additional edges. In summary, today, we looked at real networks and we found that they appear to have a small shortest path and that they tend to have high clustering coefficient. We found that there's preferential attachment model that we talked about last time produces networks that have small shortest paths, but they also have a very small clustering coefficient. Then when we talked about a new model, the small world model, that starts with a ring lattice with nodes connected to the k nearest neighbors, so it starts out with very high clustering, but then it rewires some of the edges with probability p. We find that for some values of p, namely for very small values of p, the small world networks have a small average shortest path because new long edges are added to the network, and those create bridges for nodes to reach each other, and they still maintain these high clustering coefficient, which matches what we observed in the real networks that we looked at. However, the degree distribution of a small world network is not a power law as we had also seen in real data. In our NetworkX, you can use the function watts_strogatz_graph or some of the other variants that we covered to produce small world networks. That's all for today, I hope to see you next time.