Welcome back to the second module of the course. In this module, we will cover a number of basic principles and techniques of graph analytics. As we mentioned in the last module, the goal of graph analytics is to utilize the mathematical properties of data and provide efficient algorithmic solutions for large and complex graph structure problems. As I said, in this module, we'll learn a number of basic graph analytic techniques. After this module, you'll be able to identify the right class of techniques to apply for a graph analytics problem. To be more specific, in this module we will consider the mathematical and algorithmic aspects, and not so much the computing frameworks to implement these methods. In Modules 3 and 4, we will look at two different kinds of computing platforms that are used for implementing the techniques discussed in this module. Here is the lesson plan for the module. First, we'll discuss a few basic terms and their definition which we'll use for the rest of the course. Of course, they are not the only terms and concepts we will learn. As we go through each technique, we will add more terms and definitions in our vocabulary. Now following these definitions we will consider four categories of graph analytic procedures. The first, called Path Analytics, is centered around the Analytic Techniques where the primary objective involves traversing to the nodes and edges of the ground. The second analytic technique inquires and explores the connectivity pattern of the gaps. Where the term connectivity pattern refers to the structure and organizations of the edges of the graph. The third analytics category involves the discovery and behavior of communities which are closely interacting entities in a network. The fourth category, termed Centrality Analytics, detects and characterizes significant nodes of a network with respect to a specific analysis problem. Of course there are many more types of graph analytic techniques that we'll cover in the course. We'll provide some additional reading material for those who are interested. But we start by recapitulating our definition of graphs as a collection of vertices and edges, which represent ordered pairs of nodes. While this mathematical definition is indeed correct, in practice it needs to be extended to give you other information elements. Let us consider a single Tweet. As we have mentioned previously, a Tweet is a complex information output because it is a graph in itself with several nodes and edges. But over and about the structure, a Tweet actually contains much more information and code inside the nodes and edges. First, there several kinds of nodes in a Tweet. For example it has a Tweet node, a User node, a Media node, a URL node, a hashtag node and so forth. This assignment of kinds or labels to nodes is often called node typing. Every graph application will have its own set of types. And it'll assign one or more types to a node but it is not mandatory for an application to use node types. Mathematically we can extend our original definition with two more elements. The set of node types and the mapping function that assigns types to nodes. That means it associates a type to every node. However, not all nodes need to have a type but in many applications they do. In addition to types, a node also has attributes and values. In our Tweet example, text is the name of an attribute that refers to the textual body of the Tweet whose value is a character string written by the author of the Tweet. For a specific kind of data like a Tweet, one has a fixed set of attributes as decided by Twitter. This collection of attributes is called a node schema. For a general graph, a node schema may have an arbitrary number of attribute value pairs. We will revisit this in module 3 when we discuss graphing the models. Similarly, at edge of a graph, we have an edge type, also called an edge label. Also just like a node, an edge may have an edge schema consisting of attribute value pierce. Here, Interaction Type is an attribute in our biological network that describes the modality of interaction between a pair of genes. For the specific edge we have highlighted, the genes interact through biochemical activity. Because they are party to some biochemical process. Clearly, there are different kinds of interaction between these genes or proteins. That means an attribute called interaction type can have a set of possible values like physical, genetic and so on. This set of possible values is called the domain of the attribute. Putting these elements back into our mathematical model, we get a more concrete specification of what a real live graph would contain. We have already discussed edge types as well as node and edge properties. Take a minute to look through this again. Whenever you consider an application that needs graph analytics, the first task should be to determine the informational model of the graph your application needs. It's always a good exercise to document the information model, in terms of the elements described on the slide. Let's see a little more on the topic of edge properties. Many application encode different kids of numeric knowledge into edges of a graph in the form of edge points. If we do not put weights in an adjacency metrics, an edge is just represented by placing a one in the appropriate cell. However, if we do use a weight, the weight value can be placed in the adjacency matrix to facilitate down stream computation as we'll show in the next lesson. What do the weights mean? That depends on the application. Let's see some examples. The most obvious example is a road map where the nodes are road intersections and the edges represent stretches of the street or highway between these intersections. The edge weight can represent the distance of a particular segment of the road. In a personal communication network, for example an email network, we can count the average number of emails per week sent from John to Jill and use it as a proxy for the strength of their connection. So more emails means a stronger connection. In a biological network, one often has to assess whether an interaction that can occur is actually likely to occur given the concentration of the reactants, the chemical environment at the site of the reaction and so forth. This is represented as a weight that designates the likelihood of interaction. Finally, consider a knowledge network where nodes represent entities like people, places and events. And edges represent relationships like a person visited a place or movie actor Tom is dating movie actress Kim. Now this kind of information may be important for some news media. However, if the information does not come from an authentic source, itself, it is more prudent to put a certainty value on it. This certainty value may be treated as a weight on the edge. Moving on from the information model of the graph, the structure of the graph often contains valuable insights to a graph. Many of the graph analytic techniques we will discuss in this section will consider these structural properties of graphs. One such structure is a loop, which is an edge from a node to itself. In the example here, you can see that a protein can interact with another protein of the same kind. Many other examples abound. People send emails to themselves. A road segment circles back to the same intersection. A website has a link to its own URL. The existence of loops and the nodes that have such loops can be very informative in some applications and can be problematic for other applications. Another structure property of a graph is the occurrence of multiple edges between the same node pair. The graphs with this feature are called multi-graphs. In this example, the two map kinase genes have five edges between them. Why multiple edges? It's because each edge has a different information content. In this case, these two genes can have five different types of interactions between them. Where each interaction has a different value for the attribute interaction type. We see this all the time in human networks too. A person can be my spouse, a co-performer in music and my financial adviser, all at the same time. Many analytics algorithms are not natively designed for multigraphs, and often need a little customization to handle them. We will mention some of these customizations as we go forward and walk through the different kinds of analytics applications preformed on graphs.