Hello and welcome to the fifth lecture in this series of video lectures created by the Nanyang Technological University in Singapore for a MOOC on complexity science. I'm Steve Lansing, director of the Complexity Institute. In this lecture, we'll learn about complex networks, and how they can be understood as coarse grained models of complex systems. We'll also learn the basic terminologies of complex networks, what can be measured on them, and why they're important. Finally, we'll learn the most important models for generating complex networks. To get an idea of how these networks could have emerged in the real world. First, let's consider the famous Lorenz Equation from the study of nonlinear dynamics and chaos. This equation is famous, because if we numerically simulate the equation long enough, the trajectory x, y, z will settled into a strange attractor. That's neither a fixed point, nor limit cycle. The strange attractor looks like the wings of a butterfly. And as such, it's known as the Lorenz butterfly. Although there's no rigorous mathematical proof, complex systems frequently have many competing attractor sets that appear is stable states. And for which attractions between them are known as regime shifts. Therefore, the Lorenz equation can be thought of as a very simple complex system. Well the key mathematical feature of the Lorenz equation, is its non linearity. That is, the rates of change of y and z, depend on the product x, z and x y. Another important feature that leads to complex systems like behavior, is the fact that the rates of change depends selectively on some variables, but not all variables. For instance, the rate of change of x increases as y increases, but decreases as x increases. Therefore, if we draw three circles and label them as x y and z, we would draw a black arrow from y to x. Indicating that dx over dt increases as y increases. And a red arrow, there we go from x to itself, indicating that dx over dt decreases as x increases. We can do something similar for the other two equations. For example, drawing a black arrow from x to y To indicate that dy over dt increases as x increases. When we're done, we end up with a figure with three labeled circles, two black arrows, and four red arrows. In some sense, this represents the logic of the Lorenz Equation. In general, we can draw the underlying logic of a complex system in the form of a complex network, like the one shown here. The value of such a representation is that we're able to understand a lot about the behavior of the complex system, even if the equations are not known. And we only understand the relationships between the variables. Now let's talk about some definitions. A figure like the one shown here, consisting of labeled circles and lines or arrows connecting the circles is called a network or a graph. The labeled circles representing variables in the complex system are called nodes or vertices. Arrows joining one variable to another are called links or edges. Links and edges can be directed like that, with arrows indicating which nodes they start from, and which nodes they end at. Or they can be undirected, where there's no need to distinguish between which nodes they start from, or end at. In general, a network or a graph should only have one kind of link or edge. Otherwise we would call the resulting mathematical structure, a multi graph. If an edge goes from a node x, back to node x, in the link represents some sort of self interaction of x with itself. A network that has only undirected links is called an undirected network. Whereas a network that has only directed links is called a directed network. Sometimes we also encounter situations where the nodes come in two types, A and B. Nodes have type A are only connected to nodes of type B and vice versa. For example, in a heterosexual dating network, boys only date girls, and girls only date boys. Such networks are called bipartite networks. There are also other ways to classify networks. For example, consider the undirected network on the left. This network is said to be non-planar, because it cannot be drawn on a piece of paper without any of its links not crossing another. So here you have links crossing each other, requires another dimension. In the middle, we have a sub graph of the undirected network. That is, we draw some links, but not others. To draw a network that has branches, but no loops, and network like this is called a tree. Tree graphs can occur naturally, or they can be sub graphs reduced from a more complex network. Frequently, when we extract a tree subgraph from a complex network, we do so by choosing the most important links. Finally on the right, we have a planar graph, a network created by dropping only one link from the network on the left, this link. The resulting network has the property that it can be drawn on a single sheet of paper without any crossing links. And that's the reason why it's called a planar graph. It fits on a graph. Finally, we can introduce weights to the links of a network like this. On the left, we see an unweighted network where the links are equally important. On the right, we see a weighted network. The thicker the link, the more important it is. And there are many ways to weigh a link. If a link represents an interaction like friendship, the weight can be the number of times the two nodes interact with each other. These interactions can include face to face meetings, or emails or text messages to each other. In the case of messages, we can distinguish between the weights for different directions. For example, A sends 200 text messages to B, then B sends back only 10 messages. If this difference is important, we can draw a directed link from A to B, with a weight equals 200, and a different directed link from B to A, where the weight equals 10. Okay, now that we've familiarized ourselves with our network terminologies, let's talk about what we can measure on complex networks. There are two main classes of quantities that we can measure. The first is associated with the nodes, and these include the degree, clustering, and assortativity. The second is associated with paths between links and these include path length, diameter, and betweenness. Let's go through them one by one. To begin, let's consider the network shown on the left side of the slide. For each of the nodes on this network, let's count the number of neighbors that it has. This is called the degree of the node. If node j is linked to note i, we say that no j is a neighbor of node i. As we can see, we find two nodes with degree 5. These two nodes have five neighbors each. We also find three nodes with degree 4, each of which has four neighbors. Moving on, we find one node with a single degree and so forth. Then we can condense all this information into a histogram called the Degree Distribution. To characterize the histogram, we can report such statistics as the minimum degree, which is 1, the maximum degree which is 5, and the average degree which is the sum of the degrees divided by the number of nodes. For this network, the average degree is 3.1. For directed networks we can further distinguish between in degree and out degree of the nodes. The in degree of a node is the number of directed links coming into the node. Whereas the out degree of a node is the number of directed links leaving the node. For example, for the same network that we've been using, but converted into a directed graph, we see that on this node at the top, there's one incoming link and three outgoing links. Therefore, it's in degree is 1, and it's out degree is 3. On the other hand, for this note on the bottom left of the network, we find one incoming link and one outgoing link. Therefore it's in degree is 1 and so, is it out degree. Finally, for the node on the top right, We find three incoming links and two outgoing links. Therefore the in degree is 3, and the out degree is 2. The next nodal property that we'll consider is the Clustering Coefficient. By definition, the clustering coefficient of node i is the number of neighbors of i that are also neighbors of each other, divided by the maximum number of mutual neighbors that i can have. To better understand what it means to have a high clustering coefficient, or low one, let's consider this network where the central node i has 8 neighbors. If these eight neighbors are also neighbors of each other, then by themselves they should form the network shown on the right, where the number of links between them would be 8(8-1) divided by 2 which is 28. The maximum number of mutual neighbors that note i can have is thus 28. But in reality, none of the neighbors of i are neighbors of each other. Therefore the clustering coefficient of node i is 0 divided by 28, which is 0. This tells us that the neighborhood of node i is completely fragmented. There are no connections. On the other hand, a clustering coefficient of 1 means that the neighborhood of node i is fully connected, and thus note i is nested in a tight knit community. Now let's compute the clustering coefficients, for nodes on the network example that we've been using all along. First, let's consider the node that's circled in red at the top of the network. This node has four neighbors circled in green, and there they are. The neighbors of the red node. For the four nodes, the maximum number of mutual neighbors is 4 times 4 minus 1 divided by 2, which is 6. Within the network, we find on the two links within the four node network. Therefore the clustering coefficient of the node circled in red, is c equals 2 divided by 6, which equals 1/3. Now let's circle the node in red towards the bottom in the network. This node has 5 neighbors circled in green. For 5 nodes, the maximum number of mutual neighbors is 5 times 5 minus 1 divided by 2, which is 10. Within the network, we can find only one link between the nodes, that one. Therefore, the clustering coefficient of the nodes circled in red is c = 1/10 equals 0.1. If we go through all the nodes in this example network, we'll find the clustering coefficients inside each node. Clearly, there's a distribution of values for the clustering coefficients. For large networks, clearly we cannot report every single clustering coefficient. Therefore in practice, we report the average clustering coefficient, which for this network is 0.188. The last nodal property that we'd like to introduce is the Assortativity. Unlike the clustering coefficient, this is defined globally over the network as a degree degree correlation. To see how this works again, we return to our example network. For this example network, the range of degree values is from 1 to 5. There's one node with degree 1, and it's linked to a node with degree 5. There are four nodes with degree 2. One of these is linked to a node with degree 3, 4 to nodes with a degree 4, and 3 to nodes with degree 5. So that's the distribution of connections from those nodes at degree 2. Similarly, there's only one node with degree 3. The link to the node with degree 2, has already been accounted for. So we need only take note of the links to nodes with degrees 4 and 5. There are three nodes with degree 4. Four of these links to nodes the degree 2, have been accounted for. One of these links to a node with degree 3 has also been accounted for. We therefore note two links to nodes of degree 4, and 3 links to nodes of degree 5. Finally, there is one additional link, Between 2 degree 5 nodes that we need to account for. To compute r, the degree degree correlation, we need to subtract the average degree 3.1 from all the degree values. And so the contribution of the single occurrence of degree 1 linked to degree 5 becomes 1 times 2.1 times 1.9. Similarly, the contribution of the single occurrence of degree 2, linked to degree 3 becomes 1 times minus 1.1 times minus 0.1. While the threefold occurrence of degree 2, linked to degree 4 becomes 3 times minus 1.1 times 0.9. Finally, if we work out all the contributions and average them, we end up with r equals minus 0.095. And that work is said to be assortative, if r is positive, meaning that large degree nodes, tend to be linked to large degree nodes and small degree nodes, tend to be linked to small degree nodes. It's disassortative if r is negative, which means that large degree nodes tend to be linked to small degree nodes, and vice versa. For our example, r is slightly negative, meaning that it's only slightly just assortative. In the next section, we'll explore path based measures of a complex network. First, let's understand the concept of a path. Suppose we've identified one of the nodes as node i, and another as no j. We then asked how can we go from node i to node j by hopping along the links in the network. So here's one way to do it. Which takes us 4 hops. Here's another way to get from node i, to node j, which takes only 3 hops. And here's our very long winded way to get there. Which takes us 6 hops. Because there may be more than one path going from node i to node j, we defined the path length or god zikr from node i to node j to be the shortest path Between i and j. When we switch to a different pair of nodes, the path link can be different. To characterize the network as a whole, we calculate the average path link. This is the average number of hops to get from one node to another node on the network. And it's an indication of the connectedness of the network. If the network is well connected, the average path length is small. On the other hand, if the network is poorly connected, the average path length would be large. Instead of using the average path link, we can also use a more direct measure of the size of the network. This quantity is called the diameter of the network, defined as the maximum path link over the network. For a complex network, there are other interesting path based quantities that we can measure. For example, if we compile the shortest paths between all pairs of nodes i n j in the network, we may find that a large number of paths pass through a certain node k. We may then say that node k lies between many pairs of nodes, and thus it has a high betweenness. Going back to our favorite example network, let's see how the betweenness of a node can be computed. First given 11 nodes, there are 11 times 11 minus 1 divided by 2, which equals 55 shortest paths. For the nodes circled in red, we can see that there are 4 shortest paths that pass through it. Therefore the betweenness of this node is 4 divided by 55. In the following video, you'll hear Michael Lease explained the terminologies associated with networks, and also how to compute the various node based or path based quantities.