0:06
Now we will look a Kruskal's algorithm for computing the MST this is an old algorithm
that dates back over 50 years but it is an effective way to solve the problem.
The idea is very simple. We are going to take the edges and we are
going to sort them by weight. And we are going to consider ''em in order
of ascending weight. And the algorithm is simply add the next
edge to the tree. Unless that edge would create a cycle.
If it, if the edge creates cycle, ignore it and go on to the next one.
Let's look at a demo of Kruskal's algorithm.
Okay. So we have the edges sorted by weight and
we're going to take them in ascending order of weight.
The smallest edge is in the MST. And so we add that to the tree it doesn't
create a cycle. Next smallest edge is two, three.
That doesn't create a cycle. Next one is one, seven that's fine.
Next one is zero, two. Next one, five, seven.
Now when we get to one, three that one creates a cycle and the algorithm says,
creates a cycle just ignore it. Next one is one, five that one also
creates a cycle. one, five, seven. Again, just ignore it.
Two, seven creates a cycle not in the MST. Four, five that, the next smallest edge
does not create a cycle so we add it to the MST.
One, two creates a cycle. Four, seven creates a cycle.
Zero, four creates a cycle. And finally we get to six, two which does
not create a cycle, so we add that to the tree.
At this point we have an MST we could check that we have V - one edges and stop
or we can just keep going and see that the other edges in the graph create a cycle.
And when we're done considering all the edges then we have computed the minimum
spanning tree. That's with Kruskal's Algorithm.
Consider the edges in descending order of weight, add the next edge to T, unless
doing so would create a cycle. Now let's look at what Kruskal's algorithm
does on a very large graph. You get a good feeling for what it's
doing. It's taking the small edges and they
coalless together in little clusters and eventually the edges get longer and longer
and they connect together the clusters. Initially it is broken up into the very
small edges and after a while if you look at the larger clusters eventually an edge
will come that will connect them together. It's not going to be so easy to see how
the algorithm ends. Though there's some, long edge there.
That's the longest edge in the MST that'll finally connect the graph.
3:08
This simulation really gives a good feeling for Kruskal's algorithm.
In fact when we first had personal computers I was fortunate to be in Xerox
Park. And I remember very well in the late 70s,
This was one of the first algorithms that I wanted to see visualized on a personal
computer. And I, I wrote a program at that time that
produced pretty much this image it's a very interesting way to look at MST
algorithms. Alright, so it's easy to implement we have
to prove that it computes the MST of course I and we've done a lot of work
setting that up. And.
3:54
We do so just by proving that its a special case of the greedy MST algorithm.
So what does that mean? Well.
It, suppose that crosco's algorithm. Colors a given edge black.
So say it's VW. So we'll define a cut.
That, is the set of vertices that are connected to V.
So it might be just a V, but if theres any black edges connecting V to other vertices
we put all of those in the cut. So for that cut there's no black crossing
edge. Cuz it's a, it's a component.
And the other thing is that there's no crossing edge with lower weight than VW.
And why is that? Well we haven't examined any of those
edges yet, and they're all longer because we are considering the edges in increasing
order of their weight. So this new edge that connects V somewhere
else is a crossing edge of minimal weight. And so therefore the algorithm always
finds a cut and colors a crossing edge of minimal weight for that cut black.
And it's an instance of the Greedy Algorithm.
Now we still have a bit away from having an implementation of Kruskal's algorithm.
So the question is, how do we know whether adding a new edge to the tree will cause a
cycle? So, you might think about how difficult
that is. I've, I've got a tree, that's represented
as the set of edges or however. And I have a new edge, and I want to know
whether it creates a cycle or not. How, how difficult is that going to be?
Well, it turns out way back in the first lecture of algorithms part one we had a
way. What one thing we could do is just run DFS
to check if we can get to W from V. That'd take time proportional to V.
But the union find data structure that we did in the first lecture absolutely does
the job. It's just testing whether this edge
connects anybody in, in the equivalence class corresponding to V with anybody in
the equivalence class corresponding to Y, for to W and if it did, it creates a
cycle. So union find is exactly what we need.
6:36
So we're going to what we're going to do is maintain a set for each of the
connected components in the spanning tree, that we built up so far and so if the new
edge connects vertices that are in the same set, then that means it would create
a cycle. And otherwise we're going to be adding the
edge to the tree. So then we just do the union operation and
merge the set containing V with the set containing W.
Those are the two things that we need to do and those are the two things that union
find does for us. So this leads to a very compact
implementation of Kruskal's Algorithm. Now to consider the edges in order, we'll
use a, a priority queue, a modern data structure.
So let's take a look the, the every line of this implementation of Kruskal's
algorithm. We are going to have a queue of edges.
That's how we are going to represent the MST.
It could be a stack or bag, but, but we'll use a queue.
And then, the, Cruskill MST algorithm takes a graph.
And it's going to compute the MST in this MST.
And to do that, it's going to use a priority queue, a min priority queue.
A minimum oriented priority queue of edges.
So we'll build that minimum, priority queue.
Now edges are comparable. We had a compare to a method so our, our,
8:16
generic priority queue code is going to work fine.
And so, what we'll do is we'll put all the edges in the graph into the priority
queue. So that's building a priority queue.
Containing all the edges in the graph. Alternatively, we could put the edges in
an array and sort the array, but priority queues.
A more elegant way to express this, and is a way to look at more efficient algorithms
as well. Okay, so that's priority queue, so that's
the first data structure from part one that we're going to make use of.
And then the second one is the union find data structure.
So we're going to build a union find data structure with for vertices because the
spanning force divides the vertices into equivalence classes.
So then we go into the main loop. And the main loop stops in two conditions.
We run out of edges is one, where we'd have a minimum spinning force.
Or the other condition is we get to V - one edges in the MST.
So as long as neither one of those is true, then we've got an edge in the
priority queue, and we want to take the smallest one.
We want to get its vertices V and W either and other.
And then we want to check using the union find algorithm if they're connected.
If they're connected we don't want to do anything, if they're not connected then we
want to merge them, and put that edge onto the MST.
Thats it. And then when the that's the full
implementation and then we have the edges method for the client to return the MST.
It's quite a compact implementation of a classic algorithm.
Using the basic data structures that we built up in algorithm part one data
structures and algorithms of priority queue.
And union find gives a, a, really fine implementation of this MST algorithm.
So what about the running time of this algorithm?
Well it's not hard to see that it's going to compute the MST in time proportional to
e log e. It's linearithmic in the number of edges.
Which means that we can use it for huge graphs.
So this is just, this table is just a proof that, summarizes the costs.
We're going to first build the priority queue.
And, we can do that in linear time using bottom-up, construction method.
We have to, every edge comes off the priority Q.
So there's E of them, and it takes log E per operation.
Snd then union operations, every vertex is involved in one, and connected operations,
every edge is involved in one. So the total time is dominated by the E
log E for the priority queue operations. One thing to note is that, if the edges
come in, in sorted order, for some reason, it's almost linear.
The order of growth is E log star of E. And, actually we don't have to always,
sort'em. We can, in real life situations, we can
stop when we get the V-1 edges in the MST. When we have those v - one edges we can
stop and usually that's for practical situations that's way before we see all
the edges in the graph. So that's Kruskal's algorithm for
computing the MST.