Hi everyone. Today we'll be looking at loading graphs using NetworkX. Graphs can come in a variety of different formats, and it can be a little tricky to know just how to read them into a NetworkX graph. First, let's take a look at this graph I created G1, which will be loading in from various formats. NetworkX has some built-in functions for plotting graphs that we can use to visualize them if they aren't too large. Let's use one of them. Draw NetworkX to quickly visualize our new graph. Now let's take a look at how this graph looks like in a few different file formats and how to read each of these. The first format we're going to look at is called the adjacency list. The adjacency list format is useful for graphs without nodes or edge attributes. Let's take a look at what the file G_adjlist.txt looks like. This format consists of lines with node labels. The first label in each line is the source node, while subsequent labels correspond to connected target nodes. For example, looking at the first line of G_adjlist.txt, we can see nodes 1, 2, 3, and 5 are all connected to Node 0. Note that each line only adds nodes that weren't in previous lines as source nodes. For instance, in the second line of G_adjlist.txt, Node 0 is not included because the connection between nodes zero and one has already been accounted for. We can read in a graph in this format using NetworkX's read_adglist function. We can also pass int to nodetype to make sure that nodes are read in as integers, instead of this functions default strings. Looking at the edges, we can see that these match up with our initialization of G1 above. The next format is called an adjacency matrix. Each element in the matrix indicates whether pairs of vertices are adjacent. For example, looking at the NumPy array G_mat, Node 0, corresponding to the first row of the array, is adjacent to nodes 1, 2, 3, and 5. If this were a multigraph, we would see numbers larger than one in this matrix indicating the number of edges between a pair of nodes. To convert an adjacency matrix into our Network graph, just pass it into an xdot graph. Looking at the edges, we can see these also match up with our previous graphs. Next, let's take a look at the edgelist format. The edgelist format is useful for graphs with simple edge attributes and without node attributes. This format is not good for networks with isolated nodes. Looking at the G_edgelist file, the first two columns might look familiar. Those are the node pairs corresponding to edges of the graph from above. But notice in this format, we can have additional columns for edge attributes. For example, we can see the edge from Node 0 to Node 1 has a weight of four. To load this graph in, we can use the read_edgelist function. The data parameter expects a list of tuples with names and types for edge data. Looking at this graph's edges, we can see that we've successfully loaded this graph, including the weights. Note the type of the node in this example, the node is of type string instead of integer. Be careful as this might be different than what you are expecting as each of these functions may have different defaults. NetworkX also lets you create graphs from Pandas DataFrames. This can be very useful if you need to perform any analysis or data munging on your network data. Our DataFrame will need to have a similar structure to the edgelist format. Then we can use the from Pandas DataFrame function. Passing in the DataFrame, the source nodes, target nodes, and edge attributes. Taking a look at the edges, it looks like we've successfully loaded up that graph from the DataFrame. In addition to these formats, NetworkX also has built-in functions to easily read in more structured formats, such as Gmail or JSON. Now let's look at a step-by-step example of reading in a more complex network and analyzing it. We'll be working with a network of chess games in edgelist format. Let's look at the first five lines of chess_graph.txt. Each node corresponds to a chess player and a directed edge represents a game with the white player in the first column having an outgoing edge and the black player in the second column having an incoming edge. The weight of the edge in the third column represents the outcome. One for white when zero for draw, a negative 1 for black one. The data file also includes a column of approximate timestamps in the fourth column for when the games were played. To read in this network, we use the function read_edgelist, passing in the name of the file and the list of tuples with the edge attributes. We're also going to tell the function that we want to create the network using a multi-directed graph. This is important because otherwise the function will use the default undirected graph and direction and additional edges will be lost. Let's check to make sure our new graph is a directed multigraph. Good. Now let's look at the edges with attributes. Looks like we've successfully imported our chess network. Now let's find out which player played the most games in our network by looking at the degree or number of edges of each node. The degree function returns a dictionary of the number of edges connected to each node. Note that this is a directed network and the degree function returns the sum of both in-degree, the number of incoming edges and out-degree, the number of outgoing edges. We can then use this dictionary to find the maximum value in the dictionary. Find the key that corresponds to the player with the max value by using a list comprehension. You can see that the player with the most games is Player 461 and has played 280 games. Let's do one more example using Pandas to find the players that won the most games. First, we create a DataFrame using our graphs edge data. Looking at the DataFrame created, the outcome column has dictionaries with the edge attributes. Let's use map and a Lambda function to pull out the outcome value from those dictionaries. Now let's find the number of times a player won_as_white. By finding where outcome is one, grouping by white and summing all of those values. Similarly, to find the number of times a player won_as_black, we look at where outcome is negative 1, group by black, sum the values and multiply by negative 1. To find that win count we add up the number of times a player won_as_white and as black, we can use the add function and pass in a fill value of zero, which will set missing values to zero to let us find the number of games won for players that happen to only play as black or as white. Using n largest on our win count DataFrame, we find the players that won the most games, with Player 330 having won the most games at 109. That wraps it up for this tutorial. Hopefully this has given you some insight on the different ways of graph can be represented, as well as how to load different formats into NetworkX. Thanks for watching.