0:00

. As we'll see, in little bit later in the

Â lecture the way that we're going to set up our graph processing algorithms, is to

Â develop an API to cover our representation of the graph and provide simple set of

Â methods for clients to call to process the graph.

Â So, let's take a look at that in detail. So, the idea is we have to represent, a,

Â graph within the computer. One of the first things, to remember is

Â that, you can, draw a graph, that maybe, that provides some kind of intuition,

Â about the structure, an, but you could have two drawings that represent,

Â represent the same graph that look pretty different.

Â And so one of the things to remember in any graph representation is, it can give

Â you some intuition. But that intuition may be misleading.

Â And we'll just remember that as we look at different representations.

Â So first thing is how to represent the vertices.

Â So what we're going to do in this lecture is just use integers between zero and V

Â minus one. The reason we do that is it allows us, to

Â use vertex indexed arrays, within the graph representation.

Â And we understand, from earlier algorithms, lectures, that we can use a

Â symbol table to convert names to integers, with a symbol table.

Â And so, we'll, leave that part as a symbol table application.

Â And, just work with graphs with vertex names between zero and V minus one where v

Â is the number of vertices. Now we have to remember when we're doing

Â representation we can have various anomalies in the graph.

Â So we can we draw edges, but actually, in real data we might have multiple edges or

Â we might have a self loop. And we'll take that into account when we

Â look at the representation, graph representation.

Â So here is the API that we're going to use.

Â So again our graph processing algorithms are going to be clients of this API.

Â The idea is that will use this to build graphs.

Â And then our processing programs will be client at this program,

Â In this, of this API. And the idea is most of them have a very

Â simple operations that they need to do, and those are the one's that we put in the

Â API. So we have two constructors.

Â One that creates an empty graph would be vertices.

Â Another one that creates a graph from an input stream.

Â Then the basic operation for building a graph is just add edge, so that adds an

Â edge connecting two vertices, V and W. The basic operations for processing our

Â graph, well there's V and E that gives a number of edges and vertices.

Â But then there's an iterator that takes the vertex's argument,

Â Then iterates through the vertices adjacent to that vertex.

Â All of our graph processing can be cast in terms of this iterator.

Â So down here is an example of a client program that prints out every edge in the

Â graph. So we created an input stream, maybe with

Â a given final name, create a graph from that input stream,

Â And we'll look at the code for that. And then the client program for every

Â vertex, So remember the vertices are numbered

Â between zero and G dot the number of vertices minus one.

Â So for every vertex, we iterate through all the vertices adjacent to that vertex

Â and print out an edge, if, if there's an edge connecting v and w we print out v and

Â then a little dash to indicate an edge and then w.

Â This actually prints out every edge twice and in an undirected graph because if the

Â v and w are connected by an edge then w appears in v's adjacency list and v

Â appears in w's adjacency list. So heres an example of a running that

Â client, if, if we have a file tinyG.txt, our standard is we have a number of

Â vertices as an integer in the first is the first integer in the file.

Â Number of edges is the second integer in the file and then pairs of vertex names.

Â 4:55

And so. The constructor will read those two things

Â and then call that edge for all of these pairs of things and that enables this

Â client, to, if we run it for that graph, to print out all the edges.

Â So everybody adjacent to zero, 61 and five, everybody adjacent to one is just

Â zero, So you notice the edge 0-1 appears twice in the list.

Â So that's the sample client of our basic graph in API.

Â And so here's some typical and simpicle, simple graph processing code that uses the

Â API. So you can write a static method that

Â takes a graph and a vertex as argument. And returns the degree, the number of

Â edges that are connect, number of vertices that are connected by an edge to V.

Â So all it does is set a local variable degree to zero, And then iterate through

Â all the vertices adjacent to V and increment that and return it.

Â Similarly, you can compute the maximum degree of a vertex in a graph.

Â And that's for every vertex, compute the degree and find the biggest one or the

Â average degree. Well, the average degree, if you think

Â about it, it's just twice the number of edges divided by the vertex or you could

Â go through all the way through every vertex and edge and compute the total and

Â divide, but this is probably a much more efficient way to do it or say number of

Â self loops. and so that involves going through the whole graph of every vertex

Â for every edge adjacent to that vertex you check whether it's v and we've got a self

Â loop. And if it does then your return the number

Â of self loops that divided by two because every edge is counted twice.

Â So those are examples of static methods that a client might use.

Â In just example of the use of the ATI. So now how we going to implement that?

Â That's our usual standard of lets look at some clients and now let's talk about a

Â representation that we can use to implement the graph API.

Â So one possible representation is, set of edges representation, where, for every

Â edge, we just maintain a list, maybe an array of edges or a linked list, of edges.

Â So for every edge in the graph, there's, a representation of it.

Â And that one is a possible representation, but it leads to, inefficient,

Â 7:49

implementation, much less efficient, that would make it unusable for, the huge

Â graphs that we see in practice. Another one is called the adjacency matrix

Â representation, Here we maintain a 2-dimensional v x v

Â array, It's a boolean array, 0-1 or true or

Â false. And for every edge v-w in the graph you

Â put true for row v in column w and for row w in column v.

Â So it's actually two representations of each edge in an adjacency matrix graph

Â representation. S, you can immediately, give a v and w

Â test whether there's an edge connecting v and w.

Â But that's one of the few operations that's efficient with this representation

Â and it's not very widely used, because for a huge graph, say with billions of, of,

Â vertices, You would have to have billion squared

Â number of entries in this array, which is going to be too big for your computer,

Â most likely. So this one actually isn't that widely

Â used. A, The one that is most widely used in

Â practice, and the one that we'll stick with, is called the adjacency list

Â representation. And that's where we keep a vertex index

Â array where, for every vertex, we maintain a list of the vertices that are adjacent

Â to that. So, for example, vertex four, Has, five,

Â is connected to five, six, and three, so its list has five, six, and three on it.

Â Now, on lower level representations we've talk about using linked width or array for

Â these, But actually in modern lingo with the

Â background that we'd built with algorisms what we're going to use is an abstract

Â data type. Our bag representation for this, which is

Â implemented with a linked list, but we don't have to think about it when we're

Â talking about graphs. we'd keep the vertices,

Â Names of, numbers of the vertices that are adjacent to each given vertex in a bag.

Â And we know that we can implement it, such that we can iterate through and time

Â proportional to the number of entries and the space taken is also proportional to

Â the number of entries,, And that's going to enable us to process

Â huge graphs. So here's the full implementation of the

Â Jason C, List graph representation.

Â So, the private instance variables that we're gonna use area the number of

Â vertices in the graph and then a array of nags of integers. so data the types and

Â set of values, set operations on those values, so those are the sets of values

Â for a graph. So here's the constructor of an empty

Â graph with V vertices. We keep the value v in the instance

Â variable as usual. Then we create.

Â An array of size V. And, of bags of integers.

Â And, store that array in, [inaudible] as the adjacency array of the graph.

Â Adjacency lists array. And then, as usual.

Â When we create, a, a, an array of objects. We go through.

Â And for every entry in the array, we initialize with, a, empty object.

Â So, after this code, we have the empty bags.

Â And so that's the constructor and then the other main engine and building graphs is

Â at edge and so, to add an edge between V and W, we add W to V's bag, and we add V

Â to W's bag. That's it.

Â And to iterate through all the vertices adjacent to a given vertex we simply

Â return the bag which is iterable. This is a nice example, illustrating the

Â power of abstraction. Because we did the low level processing,

Â for, that, that's involved, with our bag implementation in one of the early,

Â lectures. And now we, we get to use that to give a

Â very compact implementation, and efficient implementation, of the, graph data

Â structure. So it's really important to understand

Â this code. And you should make sure, that you study

Â it. So, as I mentioned in practice.

Â We're gonna be using this adjacency list representation.

Â Because all the algorithms are based on iterating over the vertices adjacent to V.

Â And this gets that done in time proportional to the number of such

Â vertices. So that's number one and number two, in

Â the real world, the graphs have lots of num, lots of vertices,

Â But a pretty small vertex degre. so number one, we can afford to represent, represent

Â the graph when we use the adjacency list representation because basically our space

Â is proportional to the number of edges. And number two, we can afford to process

Â it because our time taken is proportional to the number of edges that, that we

Â examined, with the ray of edges representation in the adjacency matrix

Â representation it gets very slow for some very simple task,

Â But, mostly, it's very slow for iterating over the vertices given to, adjacent to a

Â given vertex. Which is the key operation.

Â A list of edges, you have to go through all the edges to find the ones adjacent to

Â a given vertex. Adjacency matrix, you have to go through,

Â all possible, vertices adjacent and that's just going to be much too slow in

Â practice, Because adjacency list gets it done, in

Â time proportional degree of v, which is much smaller, for the huge graph that we

Â see in the real world. So that's our basic API.

Â Next, we'll look at some algorithms that are clients of this API.

Â