Hi. In this lecture, I'm going to talk to you about Small-World and Scale-Free Networks. But before we start, I would like to mention a few things about modeling. So, as I said in the first lecture, modeling is a theory that is consistent with the data. It's a coarse grade abstraction of a system. It needs to be cognitively and mathematically understood, so you can trace the variables in the models, and you can understand it's behavior. It needs to represent the structure and/or the dynamics of the real system. It needs to be predictive. And one famous case saying by George Box is that models are wrong but some are useful. So the application of a modelling of network modeling of complex systems started in 1736 by Euller who used the newfound field of graph theory mathematics to prove the seven bridges of Konigsberg problem. A lot of advancement in graph theory and application of network analysis in complex systems was made by those two Hungarian mathematicians, A Paul Erdos and Alfred Renyi in the late 50s. What they did is to study the properties of random networks. So their famous random network model placed some buttons on the table and then connected those buttons with random connections. And they tried to understand the properties of those networks, of those graphs. And one of the properties is that nodes in those graphs have an average connectivity, and that connectivity distribution if you plot the number of connections per node. And the frequency of nodes having a specific number of connections, it follow a Poisson distribution, which is similar to the normal distribution. It's a bell-shaped curve distribution. Erdos and Renyi were also famous for proving the problem of, how many colors would you need in order to distinctly color states on a map? So in the late 90s there were two seminal publications that really made a huge impact on this idea of applying network analysis to real world complex systems. And first the paper was published in nature by Steve Strogatz and Duncan Watts. And what they showed is that real world systems follow this Small World topology. So what does that mean? And this actually has a very specific mathematical definition. So real world networks are not random or not like the erostainiant model networks. The actually have these properties where they have high clustering coefficient and relatively short characteristic path length. So, those are two network properties that can measure the levels clustering and the average distance between pairs of nodes. So, as Stevens struggled and Duncan Watts analyze three complex systems abstracted to networks. One is the Movie Actor database IMDB. They also analyze the power grid system that lists transformers and generators and also they analyze the normal connectivity map of the warm C elegans. Where this worm is the only organism that we mapped its entire neuronal connectivity system or has 200 neurons that we know exactly how they're connecting. So as you see, they compared the actual L, which is the characteristic path length to the L that is expected for random networks of the same size, as well as the C, which is the clustering coefficient, and compare that to the clustering coefficient expected in random network of the same size. So how do you compute the clustering coefficient and the characteristic path length? So the computer clustering coefficient or what the clustering coefficient means is it's a measure of how the neighbors are a known or connection. A direct neighbor of a node or connected. And in this particular example you can see that the red node has four neighbors and in the first example on the clustering coefficient nodes C=1, all of the neighbors are connected. So there are six possible connections that those four nodes can be connected. And the first example on the left, all of the are implemented. In the middle the C equals 1/2, only three out of these possible six are implemented, and the clustering coefficient is zero for that particular node. There are no connections between the neighbors. So you can compute the clustering coefficients of all the nodes in the network. And then the average is clustering coefficient for the entire network. The characteristic path length is an average of all the shortest path distances between all pairs of nodes. One of things that Watts and Strogatz did in a paper, they created a model that can generate a network that follows the same Small World topology that real world networks also follow. So in this model I started with a regular lattice and then gradually randomized the connections, taking a link, and then putting it somewhere else. So they extract it, one of those links, and then gradually randomize the network. So if you do that process starting with this regular lattice, going all the way to random network, somewhere in the middle you're going to get a Small-World Network. So what they then as they go through this process of randomizing those links from the regular lattice to the completely random network, they can do a measurement of the clustering coefficient and a characteristic path name that every few steps. And what they found is like, somewhere in the middle you still are retaining the high clustering while your path lengths drops very rapidly and this explains, or fits, the real world topology. So just, think there are shortcuts in real-world networks, that connect modules that are highly clustered. So in this paper that was published in Science in 1999, Albert and Barabasi noticed, is that real-world network connectivity distribution is not like the Pearson Distribution that was reported by Erdos and Renyi in 1959. It's actually following a power law distribution. A power law, so the connectivity distribution has a power law density distribution histogram. And this is what is shown here. And then they also, like Watts and Strogatz, they had a model that can generate those power law distributed networks that they called Scale Free Networks. So, let's look at how the model works. Instead of starting with all the nodes and all the links, and gradually shuffling the links. The Rich-Get-Richer model, network growth model, is can generate scale free networks, or networks that have a parallel distribution, starts with very few nodes that are randomly connected. So here, we have an example of five nodes. With some random links that connect. Then you start adding nodes and links to the system. So your introducing nodes, growing the network in size until it reaches your desired size. And that growth model has some basic rules of how the node, the new node, that is introduced to the network. Will be connected to those other nodes. If we are interested in generating an [INAUDIBLE] connected network we will just randomly connect a new node and just grow the network by adding nodes and randomly connect those nodes to the network. However with the Rich- Get- Richer model we are connecting the new incoming nodes by assigning a probability for each node to be connected to the new node based on the nodes already existing connectivity degree. One of the things that Barabosi showed is the importance of hub nodes. Most of the nobs in the network have very few connections, either one or two connections. And then there are those highly connected nodes that although there are not that many of them, compare to the the lowly connected nodes. There is a lot more of these hubs compared to the hubs that you can find in an network. And those hubs are important components in the network. If you destroy those hubs obviously the network will fall apart faster. And what they showed in this two papers that were published in Nature. Is that when you knock down genes that are highly connected, you the viability of the organism is going to be much lower than if you would have knocked down genes at their lowly connections. So in this review article that Barabasi wrote many years ago, he compared the view of a random network to a scale-free network. One of the criticisms about the scale-free model is that when you look at many complex systems, you see those power law distributions a lot. This is because complex systems are made of diverse components, diverse agents. And those different types of agents will have different properties, when you are comparing a lot of things that are different, that are diverse, heterogeneous. Then you will get those parallel distributions. So this is not necessarily, or at least the proponents, the opponents of the scale-free model suggest that this is not necessarily a feature of the structure of the system by design but just because we are looking at a very diverse system. If you compare people's height you won't get a bell shaped curve distribution. However, if you compare all animal's height and weight you will probably get a power law because you are comparing so many diverse things. So now I'm going to turn my camera around and I'll show you what I'm looking at. And this is the North part of New York City in Manhattan. And you can see that there are many buildings of many different heights that I'm looking at. And in this particular case, if you would've just plotted the distribution of the height of those buildings, you probably also would obtain a parallel distributions, because those buildings are very different. But if you only picked a specific type of building, or only the five floor buildings, or the. You'll see that then you can obtain a normal, you can see that you can obtain a bell shaped curve distribution, and not necessarily a parallel distribution. One thing that to keep in mind about, the scale free distribution is that there is sometimes a drop in the hubs. So the hubs you cannot find infinitely large hubs. So although the power log curve would suggest that there are very large hubs, sometimes we see a drop in the size of hubs. And this can be explained by two mechanisms. So for an interesting paper that came after the publications of Albert and Barbossi suggested an explanation for that drop in distribution of hubs and the tale of the scaled free distribution. And this, like the building that we just looked at, suggests that there could be some type of a limit of how tall a building can be. They modeled two types of mechanisms that can explain this drop of the tail in the hubs and is one is suggesting that the hub nodes age. And as they age they lose the capacity to accept new links in the Rich-Get- Richer model. Another one is that there is some type of a physical limit, a maximum physical capacity for a hub to grow and once it has reached that peak max physical capacity that hub will stop growing. We will talk more later about other mechanism that can generate scale free networks. [MUSIC]