0:08

So for handling non-coordinate data, non-numerical data,

Â we're often more interested in the relationship between data

Â items than in the data items themselves.

Â And so we'll use a graph or

Â a network to represent the relationships between data items.

Â And so we'll talk here about different ways of specifying a graph, and

Â different attributes of graphs, and how graphs are represented computationally.

Â 0:36

So a graph just consists of nodes and edges.

Â In this case, we have four nodes and we have five edges connecting these nodes.

Â And so the nodes will represent a data item and

Â the edge will represent a relationship between two data items.

Â If we add this particular edge,

Â then every data item is going to be connected to every other data item.

Â We would call that a complete graph, or a clique, of these four nodes.

Â 1:05

Here on the left, I've got I've got a graph of four nodes connected by

Â five edges, and on the right I've got four nodes connected by five edges.

Â In fact, these two graphs are isomorphic.

Â They're the same graph, they're just laid out differently.

Â And on the right, I've got a planar graph, which means to nodes and

Â edges are laid out so that none of the edges cross each other.

Â And then I can speak of a face as being the region bounded

Â by a cycle of edges starting and ending at the same node.

Â And also, we would say that these are two different embeddings

Â of the same graph or at least these graphs are isomorphic.

Â If we think of this whole arrangement of eight nodes and

Â ten edges as a single graph,

Â then that would be one graph consisting of two connected components, but

Â the entire graph containing both of these sub graphs would be a disconnected graph.

Â 2:08

We can also have directed graphs.

Â So in an ordinary graph, or an undirected graph,

Â just has edges, but does not really suggest the direction of these edges.

Â We can have a directed graph by using arrows to indicate a direction, so

Â that we would have a connection from this node to this node, but

Â not necessarily a connection from this node back to this node.

Â And you can think of a graph as having a cycle, a directed graph having a cycle, so

Â this undirected graph, this ordinary graph has a cycle because these three nodes.

Â I can start at this node and I can follow the cycle to get back.

Â You can have a cyclic directed graph, because I can follow a cycle here.

Â This graph is acyclic, because there is no cycle.

Â Once I get to this node I can go to this node.

Â Once I get to this node, I can't go any place further.

Â So there's no way to get back to any of these nodes

Â by following these edges in the directions indicated.

Â You can also have trees.

Â 3:08

Any graph that's connected

Â that has one less edge than the number of nodes forms a tree.

Â It's minimally connected.

Â And you can think of having a parent node, but for an undirected graph,

Â any of these nodes could be the parent and you can think of multiple siblings.

Â Here's some graphs with many more edges and

Â you can see kind of the way they're laid out would imply a parent.

Â But I could lay these out in isomorphic methods,

Â equivalent methods, and another potent node might appear to be the parent.

Â 3:44

When you have directed graphs, the tree forms a hierarchy and

Â then you can have a parent.

Â So in this case, the child nodes, these child nodes point to their parents.

Â And so I've plotted their parents higher than the child nodes.

Â And then there is a clear root node.

Â This one parent that's the parent of all of these nodes.

Â You can also have a hierarchy that's not a tree.

Â In this case we have a parent relationship,

Â but this child node has two parents.

Â And so you've got a definite hierarchy here, but

Â it's not a tree because a tree would have each node would have a single parent node.

Â 4:25

There's also a relationship between

Â the number of edges each node has and the kind of graph you're looking at.

Â We talk about the degree of the node as being the number of edges

Â extending from a node.

Â Directed graphs would have a different in degree than an out degree.

Â The in degree would be the number of edges going into the node and

Â the out degree would be the number of edges leaving the node.

Â 4:49

And then these, the number of nodes you have of each degree,

Â if it follows this kind of fall off, it's called a social network.

Â A social network, you can think of this as a friend's network or

Â many natural

Â data relationships follow this power law, this social network power law.

Â In this case, this is the number of interactions of yeast proteins.

Â Each one of these nodes is a yeast protein, and

Â the edges represent interaction between these proteins and, in this case,

Â you have many nodes that have a few interactions that are connected to,

Â you know, one or two, you have one or two edges.

Â And then you have many fewer nodes that have high degree, that have a lot of edges

Â that are connected to a lot of other nodes by a single edge.

Â And so in this case, the number of nodes that have a certain degree

Â follows a power-law.

Â If the y is the number of nodes with that degree,

Â is equal to the degree to some power with some constant multiplied to it.

Â And you see this fall off happen quite often.

Â These graphs tend not to be plainer, they tend to be difficult to embed.

Â They also tend to be the most popular graphs that we encounter in real life.

Â 6:14

Finally, we're going to use an adjacency matrix to represent graphs.

Â And so in this case we have a graph that has four nodes, four data items.

Â One, two, three and four,

Â and then we're going to represent this graph using this matrix.

Â In this case, the Adjacency Matrix will have a one in row one,

Â column two, if there's an edge connecting node one to node two.

Â Likewise, it'll have a one in row two, column one,

Â because it connects node two to node one and so it will be symmetric.

Â Because this is a non-directed graph.

Â If it was a directed graph, then you would connect, if it was directed from

Â node one to node two by an edge leaving node one going to node two.

Â Then you would have a one in row one column two but

Â you would have a zero in row two column one.

Â So a directed graph would have asymmetric or

Â could have an asymmetric adjacency matrix but an ordinary graph,

Â a non-directed graph, would just have a symmetric adjacency matrix.

Â And these edges could also have weights,

Â in which case you might not just have zero or one, you might have a value here

Â to indicate how strong the connection is between nodes.

Â And the diagonal is usually zero, unless there's some relationship between a node

Â and itself, and you can represent that along the diagonal.

Â 7:46

So it might be good to review graphs and the different kinds of graphs,

Â the different representations of graphs, and

Â the different attributes of graphs to familiarize yourself with them,

Â because we'll be using those for the rest of this module.

Â [MUSIC]

Â