The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

From the course by Stanford University

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

485 ratings

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

From the lesson

Week 1

Two motivating applications; selected review; introduction to greedy algorithms; a scheduling application; Prim's MST algorithm.

- Tim RoughgardenProfessor

Computer Science

So in this sequence of videos, we're going to apply the greedy algorithm

design paradigm to a fundamental graph problem, the problem of computing minimum

spanning trees. The MST problem is a really fun

playground for greedy algorithm design, because it's the singular problem in

which pretty much any greedy algorithm you come up with seems to work.

So we'll talk about a couple of the famous ones, show why they're correct,

and show how they can be implemented using suitable data structures to be

blazingly fast. So, I'll give you the formal problem

definition on the next slide but first let me just say informally what it is

we're trying to accomplish. Essentially, what we want do is connect a

bunch of points together as cheaply as possible.

And, as usual with an abstract problem the objects can mean something very

literal. So maybe the points we're trying to

connect are servers in some computer network, or it could represent something

more abstract. Like maybe we have a model of documents

like Web Pages where we represent them as points in space.

And we want to somehow connect those together.

Now the main reason I'm going to spend time on the minimum expenditure problem

is pedagogical. It's just a great problem for sharpening

your skills with greedy algoritum design and proof of correctness.

It'll also give us another opportunity to see the beautiful interplay between data

structures and fast limitation of graph algorithms.

That said that minimum expenditure problem does have applications.

One very cool one is in clustering, and that I'll talk about in detail in a later

video, it also comes up in networking. So if you do a web search on spanning

tree protocol you'll also find some information about that.

So as I said at the beginning the minimum spanning tree problem is remarkable in

that it doesn't just admit one greedy algorithm that's correct, but in fact it

admits multiple greedy algorithms that are correct.

we're going to talk about two of them, the two most well known ones.

But there are even some others believe it or not.

So the first one we're going to discuss beginning in the next video is Prim's MST

algorithm. This dates back over 50 years to 1957.

in fact as you'll see Prim's algorithm shows a remarkable number of similarities

with Dijkstra's shortest path algorithm. So you might not be surprised to know

that Dijkstra also independently had discovered this algorithm a couple of

years later. But in fact it was only noticed much

later that this exact same algorithm had been first discovered over 25 years

earlier by a mathematician named Jarnick. For that reason you'll sometimes hear

this called Jarnick's algorithm or the Prim-Jarnick algorithm.

for gravity and to be consistent with some of the main text books in the area

I'm just going to call this Prim's algorithm throughout the lectures.

The other algorithm we're going to cover which is also rightfully famous is

Kruskal's MST algorithm. As far as I know this was indeed first

discovered by Kruskal roughly the same time as Prim was doing his algorithm in

the mid 50s. And in what sense do I say these

algorithms are blazingly fast? Well, they run in almost linear time,

linear in the number of edges of the graph.

Specifically we'll see how using appropriate data structures will get each

of them to run in time big O of M log N, where M is the number of edges in the

graph, and N is the number of vertices in the graph.

We'll employ data structures to speed up Prim's algorithm in exactly the same way

we did for Dijkstra's algorithm, that is we'll be using the heap data structure,

One thing that's cool about Crystal's algorithm is it'll give us an opportunity

to study a new data structure, mainly the union fine data structure and that's a

lot of fun to think about, in its own right, as you'll see.

So to put this amazing running time on perspective I want to emphasize that only

is it awesome in the sense it's you know, barely, it's almost linear.

It takes almost barely more time to compute the spanning tree than it does to

read the input graph. Reading the input graph alone, remember

would take linear time. O of M time.

But more over, graphs can have an enormous number of spanning trees.

An exponential number. So some of these algorithms are honing in

really quickly on a needle in a haystack. There's no way they have time to look at

all these spanning tees, and yet they find the one which is the best which is

optimal amongst all of them. How do these seemingly magical algorithms

do it? Well, to discuss the details let's start

by formalizing the Minimum Spanning Tree, or MST problem on the next slot.

So in the MSD problem this is a graph problem so the main part of the input is

a graph comprising verticies and edges. I do want to emphasize for the MST

problem we are be considering only undirected graphs.

This is different notice, than when we discussed shortest-path problems in Part

one of the course. There we worked with directed graphs.

There is an analogous problem to the [INAUDIBLE] signature problem for

directed graphs. It's often called the optimal branching

problem. And there are fast algorithms for it, but

those algorithms are just slightly beyond the scope of this course.

So we're not going to cover it. We're going to discuss only undirected

graphs, and then minimum spanning trees for them.

Now, whenever you talk about graph problems, you need to talk about, how is

the graph actually represented. So that's something we discussed at

length in part one. If you don't remember, I suggest going

back and reviewing the video on graph representations.

For the MST problem, we're going to assume that the graph is given as an

adjacency list. That means, we're given an array of

vertices, an array of edges. And we have pointers, wiring vertices to

their incident edges and wiring edges back to their two endpoints.

In addition to the graph of self the input includes a cost, for each of the

edges, we're going to use the notation C sebies of note the cost of a edge, E.

And in another contrast, to are discussion of shortest path problems,

we're actually not going to care if the edge cost are positive or negative, they

can be any number whatsoever. So no prizes for guessing what the

outputs supposed to be, it's right there in the problem definition, the output is

supposed to be a minimum cost spanning tree of the graph, but let's drill down

and explain exactly what we mean by that. So first of all what do we mean by the

cost of a tree or generally the cost of a sub graph, as a subset of the edges.

Well we're just going to be looking at summing up the edges in the tree that we

output. Now the other question is what do I mean

by a tree that spans all vertices? So let me tell you exactly what this

means, the sub graph T should have two properties, first of all there can not be

any cycles, there can not be any loops in this tree.

And by spanning all vertices, what I mean is that this sub graph is what's called

connected. That is, there's a path, using the edges

and t, from any vertex of the graph to any other vertex.

That's what it means to span all of the vertices.

So for example, consider the following graph with four vertices and five edges.

I've labeled each of the five edges with a cost, which in this case, is just an

integer between one and five. So, let's look at some example subgraphs,

let's start with the three edges, A, B, B, D and CD.

This sub-graph satisfies properties one and two.

That is, it has no cycles, there's no loops and it spans all of the vertices.

If you start at any one of these four vertices, you can get to any of the other

four vertices by using only red edges. So in that sense, this red sub-graph is a

spanning tree. However, it is not the minimum cost

spanning tree. There is another spanning tree which is

even cheaper, has a smaller sum of edge costs, namely the edges AC, AB, and BD.

This also has no cycles and it's also connected but the sum of the edge cost is

only seven, smaller than the eight of the previous spanning tree.

In fact, this pixograph is the unique minimum spanning tree of this graph.

There is a sub graph that has three edges which has an even smaller sum, of edge

costs, namely the triangle AB, BD and AD. But this light blue sub graph, this

triangle, is not a spanning tree. In fact, it fails on both counts.

It does obviously have a cycle. It has a loop.

That's, what it is by definition. It's also not connected, so there's no

way to get from C, the vertex, to any of the other three vertices by following

only light blue edges. It's disconnected, and so it fails

property one as well. So the MST problem in general is you're

given it under a graph, like, for example, this four note, five edge graph,

or presumably. something much larger and an interesting

problem and your suppose to quickly identify the minimum spanding tree like

in this example the pink subgraph. So what I want to do next is something

you're probably quite accustomed to me doing by this point, is I want to make a

couple of mild simplifying assumptions just among friends.

So these assumptions are not important in the sense that all of the conclusions of

these lectures will remain true, will remain valid even if these assumptions

are violated but it'll make the lectures a little bit easier.

It'll allow us to focus on the main points and not get distracted by less

relevant details so here are the two assumptions that we're going to make

throughout all of the lectures on minimum spanning trees.

The first assumption we're going to make is that the input graph G is itself

connected. That is G contains a path from any vertex

to any other vertex. So why am I making this assumption?

Well if this assumptions violated then the problem isn't even well defined.

If the graph isn't connected then certainly none of it's subgraphs are

connected so it has no spanning trees and it's not clear what we're trying to do.

So, those of you who still remember the stuff we covered in part one in

particular, graph search. Should recognize that this condition's

easy to check in a pre-processing step. Just run something like breadth first

search or depth first search. Remember, we know how to implement those

in linear time. And those will, in particular, tell you

whether or not the input graph is connected.

Now, another thing you might be wondering is, suppose it was disconnected.

Then what? Should be really just sort of throw up

our hands and give up? You can define a version of the minimum

spanning tree problem. A more general one called minimum

spanning forest. Where, basically you want the minimum

cost sub graph that spans as much stuff as possible.

Essentially, it's responsible for computing a spanning tree within each of

the connected components of the original graph.

And using the algorithms I'll show you here, Prim's algorithm, Kruskal's

algorithm, they're easily modified to solve the more general problem with

disconnected input graphs as well. But again, for simplicity among friends,

let's just focus on the connected graph case that contains all of the main ideas.

Our second standing assumption throughout all of the minimum of spanning tree

lectures will be that in the input graph the edge costs are distinct.

So you're already use to this sort of no ties kind of assumption from our foray

into scheduling algorithms, and we're going to do something similar here.

Now again this assumption is not important in the sense that the

algorithms that we cover prims algorithm crustgrals algorithm.

They remain correct even if the input has equal cost edges, irrespective of how

ties are broken. So the algorithms are correct as widely

as you would want. That's it.

I'm not going to actually prove for you that they are correct with ties.

Remember we had our scheduling, application it was a little bit easier to

get a proof of correctness without ties, I gave you that, and then optionally

there was a slightly more complicated argument that handled ties.

You can do the same thing here, but I'm just not going to give it to you.

I'll leave that for the keen viewer to work out for themselves.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.