We've arrived at another one of computer science's greatest hits, namely Djikstra's shortest path algorithm. So let me tell you about the problem, it's a problem called single source shortest paths. Basically what we want to do is compute something like driving directions. So we're given as input a graph, in this lecture I'm going to work with directed graphs, although the same algorithm would work for undirected graphs with cosmetic changes. As usual, we'll use m to denote the number of edges and n to denote the number of vertices. The input also includes two extra ingredients. First of all, for each edge e we're given as input a non-negative link which I'll denote as l sub e. In the context of a driving directions application l sub e could denote the mileage how long this particular road is, or it could also denote the expected travel time along the edge. The second ingredient is a vertex from which we are looking for paths. This is exactly the same as we had in breadth first search. In that first search we have an originating vertex which we'll call here the source. Our responsibility then is to, given this input, to compute for every other vertex, v, in this network the length of the shortest path from the source vertex, s, to that destination vertex, v. And so just to be clear, what is the length of the path that has say three edges in it? Well it's just the sum of the link of the first edge in the path plus the length of the second edge in the path plus the length of the third edge in the path. So if you had a path like this with three edges and length one, two and three, then the length of the path would just be six. And then we define the shortest SV path in the natural way, so amongst all of the paths directed from S to V, each one has its own respective path length and then the minimum overall SV paths is the shortest path distance in the graph G. So I'm going to make two assumptions for these lectures. One is just really for convenience, the other is really important. The other assumption, without which, Dijkstra's algorithm is not correct, as we'll see. So for convenience we'll assume that there is a directed path from S to every other vertex V in the graph, otherwise the shortest path distance is something we define to be plus infinity. And the reason this is not a big assumption is, if you think about it, you could detect which vertices are not reachable from S just in a preprocessing step using, say, breadth-first or depth-first search. And then you could delete the irrelevant part of the graph, and run Dijkstra's algorithm as we'll describe it on what remains. Alternatively, Dijkstra's algorithm will quite naturally figure out what vertices there are paths to from S and which ones there are not, so this won't really come up. So to keep it simple, just think about we have an input graph where you can get from S to V, for every different vertex V. And the challenge then is amongst all the ways to get from S to V, what is the shortest way to do it? So the second assumption already appears in the problem statement, but I want to reiterate it just so it's really clear. When we analyze Jackson's algorithm, we always focus on graphs where every length is non-negative, no negative edge lengths are allowed. And we'll see why a little bit later in the video. Now in the context of a driving directions application it's natural to ask the question why would you ever care about negative edge lengths. Until we invent a time machine that doesn't seem like negative edge lengths are going to be relevant when you are computing literal paths through literal networks. But again remember that paths can be thought of as more abstractly as a just sequence of decisions. And some of the most powerful applications of shortest paths are coming up with optimal weight such sequences. So, for example, maybe you're engaging in financial transactions and you have the option of both buying and selling assets at different times. If you sell then you get some kind of profit and that would correspond to a negative edge length. So there are quite interesting applications in which negative edge lengths are relevant. If you are dealing with such an application, Dijkstra's algorithm is not the algorithm to use. There's a different shortest path algorithm, a couple other ones. But the most well-known one is called Bellman-Ford. That's something based on dynamic programming, which we may well cover in a SQL course. Okay, so for Dijkstra's algorithm, we always focus on graphs. That'll have only non-negative edge lengths. So, with the next quiz, I just want to make sure that you understand the single source shortest path problem. Let me draw for you here a simple four node network, and ask you for, what are the four shortest path lengths. So from the source vertex s, to each of the four vertices in the network. All right, so the answer to this quiz is the final option, 0,1,3,6. To see why that's true, well, all of the options had 0 as the shortest-path distance from s to itself. So that just seemed kind of obvious. So the empty path will get you from s to itself and have 0 length. No suppose you wanted to get from S to V, well there's actually only one way to do that, you have to go along this one hop path. So the only path has length of one, so the shortest path distance from S to V is one. Now W's more interesting, there's a direct one hop path, SW, that has a length of four, but that is not the shortest path from S to W Inf act to two-hop path that goes through v as an intermediary has total path length three which is less than the length of the direct arc from s to w. So therefore the shortest distance from s to w is going to be 3. And finally for the vertex t there's three different paths going from s to t. There's the two-hop path that goes through v. There's the two hop path which goes through w, both of those have path length 7, and then there's the three hop path which goes through both v and w. And that actually has a path length of one plus two plus three equals six. So despite having a largest number of edges, the zigzag path is, in fact, the shortest path from s to t and it has length 6. All right, so before I tell you how Dijstrka's algorthin works, I feel like I should justify the existence of this video a little bit. All right? Because this is not the first time we've seen shortest paths. You might be thinking rightfully so. We already know how to compute shortest paths. That was one of the applications of breadth first search. So the answer to this question is both yes and no. Breadth first search does indeed compute shortest paths. We had an entire video about that. But it works only in the special case where the length of every edge of the graph is one. At the moment we're trying to solve a more general problem. We're trying to solve shortest paths, when edges can have arbitrary non-negative edge lengths. So for example, in the graph that we've explored in the previous quiz, if we ran breadth first search, starting from the vertex s, it would say that the shortest path distance from s to t is 2 and that's because there's a path with two hops going from s to t. Put differently, t is in the second layer emanating from s. But as we saw in the quiz, there's not in fact a shortest two hop path from s to t if you care about the edge lengths. Rather the minimum length path, the shortest path, with respect to the edge weights, is this three hop path which gives us a total length of 6. So breadth first search is not going to give us what we want when the edge lengths are not all the same. And if you think about an application like driving directions, then needless to say, it's not the case that every edge in the network is the same. Some roads are much longer than others, some roads will have much larger travel times than others, so we really do need to solve this more general shortest path problem. Similarly, if you're thinking more abstractly about a sequence of decisions like financial transactions, in general different transactions will have different values. So you really want to solve general shortest paths, you're not in the special case that breadth-first search solves. Now, if you're feeling particularly sharp today, you might have the following objection to what I've just said. You might say, eh, big deal. General edge weights, unit edge weights, it's basically the same. Say you have an edge that has length three. How is that fundamentally different than having a path with three edges, each of which has length one? So why not just replace all the edges with a path of edges of the appropriate length? Now we have a network in which every edge has unit length and now we can just run breadth-first search. So put succinctly, isn't it the case that computing shortest paths with general edge weights reduces to computing shortest paths with unit edge weights? Well, the first comment I want to make is I think this would be an excellent objection to raise. And indeed, as programmers, as computer scientists, this is the way you should be thinking. If you see a problem that seems superficially harder than another one, you always want to ask, well, maybe just with a clever trick I can reduce it to a problem I already know how to solve. That's a great attitude in general for problem solving. And indeed, if all of the edge lengths were just small numbers, like 1, 2, and 3 and so on, this trick would work fine. The issue is when you have a network where the different edges can have very different lengths. And that's certainly the case in many applications. Definitely road networks would be one where you have both sort of long highways and you have neighborhood streets. And potentially in financial transaction based networks you would also have a wide variance between the value of different transactions. And the problem then is some of these edge lengths might be really big. They might be 100, they might be 1,000. It's very hard to put operating bounds on how large these edge weights could be. So if you start wantonly replacing single edges with these really long paths of like 1,000, you've blown up the size of your graph way too much. So you do have a faithful representation of your old network, but it's too wasteful. So even though breadth-first search runs in linear time, it's now on this much larger graph. And we'd much prefer something which is linear time or almost linear time that works directly on the original graph. And that is exactly what Dijkstra's shortest-path algorithm is going to accomplish. Let's now move on to the pseudocode for Dijkstra's shortest path algorithm. So this is another one of those algorithms where no matter how many times I explain it, it's always just super fun to teach. And the main reason is because it exposes the beauty that pops up in good algorithm design. So the pseudocode, as you'll see in a second, is itself very elegant. We're just going to have one loop, and in each iteration of the loop we will compute the shortest path distance to one additional vertex. And by the end of the loop we'll compute shortest path distances to everybody. The proof of correctness, which we'll do in the next video, is a little bit subtle, but also quite natural, quite pretty. And then finally, Dijkstra's algorithm will give us our first opportunity to see the interplay between good algorithm design and good data structure design. So with a suitable application of the heap data structure, we'll be able to implement Dijkstra's algorithm so it runs blazingly fast, almost linear time, namely m times log n. But I'm getting little ahead of myself. Let me actually show you this pseudocode. At a high level, you really should think of Dijkstra's algorithm as being a close cousin of breadth-first search. And indeed, if all of the edge lengths are equal to one, Dijkstra's algorithm becomes breadth-first search. So this is sort of a slick generalization of breadth-first search when edges can have different lengths. So like our generic graph search procedures, we're going to start at the source vertex s, and in each iteration we're going to conquer one new vertex. And we'll do that once each iteration after m minus1 iteration, we'll be done. And in each iteration will correctly compute the shortest path distance to one new possible destination vertex v. So let me just start by initializing some notation. So capital X is going to denote the vertices that we've dealt with so far. And by dealt with, I mean we've correctly computed shortest path distance from the source vertex to every vertex in X. We're going to augment X by one new vertex in each iteration of the main loop. Remember that we're responsible for outputting n numbers, one for each vertex. We're not just computing one thing, we're computing the shortest path distance from the source vertex s to every other vertex. So I'm going to frame the output in terms of this array capital A. So for each vertex, we're going to have an entry in the array A, and the goal is at the end of the algorithm, A will be populated with the correct shortest path distances. Now to help you understand Dijkstra's algorithm, I'm going to do some additional bookkeeping which you would not do in a real implementation of Dijkstra's algorithm. Specifically, in addition to this array capital A in which we compute shortest path distances from the source vertex to every other destination, there's going to be an array capital B in which we'll keep track of the actual shortest path itself from the source vertex s to each destination v. So the arrays A and B will be indexed in the same way. There'll be one entry for each possible destination vertex v. Capital A will store just a number for each destination, shortest path distance. The array B in each position will store an actual path, so the shortest path from s to V. But again, you would not include this in an actual implementation. I just find in my experience it's easier for students to understand this algorithm if we think of the paths being carried along as well. So now that I've told you the semantics of these two arrays, I hope it's no surprise how we initialize them for the source vertex itself, s. The shortest path distance from s to itself is 0. The empty path gets you from s to s with length 0. There's no negative edges by assumption, so there's no way you can get from s back to s with a non-positive length, so this is definitely the shortest path distance for s. By the same reasoning, the shortest path from s to s is just the empty path, the path with no edges in it. So now let's proceed to the main while loop. So the plan is we want to grow this set capital X like a mold until it covers the entire graph. So in each iteration, it's going to grow and cover up one new vertex and that vertex will then be processed. And at the time of processing, we're responsible for computing the shortest path distance from s to this vertex and also figuring out what the actual shortest path from s to this vertex is. So in each iteration, we need to grow X by one node to ensure that we make progress. So the obvious question is, which node should we pick? Which one do we add to X next? So there's going to be two ideas here. The first one we've already seen in terms of all of these generic graph search procedures, which is we're going to look at the edges and the vertices which are on the frontier. So we're going to look at the vertices that are just one hop away from vertices we've already put into X. So that motivates that a given iteration of the while loop to look at the stuff we've already process, that's X, and the stuff we haven't already processed, that's v minus X. s, of course, starts in X and we never take anything out of X, so s is still there. In some generic iteration of the while loop, we might have some other vertices that are in X. And in a generic iteration of this while loop, there might be multiple vertices which are not in X. And now, as we've seen in our graph search procedures, there are general or edges crossing this cut. So there are edges which have one endpoint in each side, one endpoint in X and one endpoint outside of X. This is a directed graph so they can cross in two directions. They can cross from left to right or they can cross from right to left. So you might have some edges internal to X. Those are things we don't care about at this point. You might have edges which are internal to v- X. We also don't care about those, at least not quite yet. And then you got edges which can cross from X to v-X, as well as edges that can cross in the reverse direction, from v -X back to X. And the ones we're going to be interested in, just like when we did graph search and directed graphs, are the edges crossing from left to right, the edges whose tail is amongst the vertices we've already seen and whose head is some not yet explored vertex. So the first idea is that in each iteration of the while loop we scan, or we examine, all of the edges with tail in X and head outside of X. One of those is going to lead us to the vertex that we pick next. So that's the first idea, but now we need a second idea, because this is again quite underdetermined. There could be multiple such vertices which meet this criterion. So for example, in the cartoon in the bottom left part of this slide, you'll notice that there's one vertex here. Which is the head of an arc that crosses from left to right. And there's yet another vertex down here in v minus x, which again is the head of an arc which crosses from left to right. There are two options of which of those two to suck into our set x and we might want some guidance about which one to pick next. The key idea in Dijkstra is to give each vertex a score corresponding to how close that vertex seems to the source vertex s, and then to pick among all candidate vertices, the one that has the minimum score. Let me be more precise. Among all crossing edges, with tail on the left side and head on the right side, we pick the edge that minimizes the following criterion. The shortest path distance that we've previously computed from s to the vertex v, plus the length of the edge that connects v to w. This is quite an important expression, so I will call this Dijkstra's greedy criterion. This is a very good idea to use this method to choose which vertex to add to the set x, as we'll see. I need to give a name to this edge which minimizes this quantity over all crossing edges. Let's call it v star w star. For example, in the cartoon in the bottom left, maybe of the two edges crossing from left to right, maybe the top one is the one that has a smaller value of Dijkstra's greedy criterion. In that case, this would be the vertex v star and the other end of the edge would be the vertex w star. This edge, v star, w star is going to do wonders for us. It will both guide us to the vertex that we should add the x next, that's going to be w star. It's going to tell us how we should compute the shortest path distance to w star, as well as what the actual shortest path from s to w star is. Specifically, in this iteration of the wild loop, after we've chosen this edge v star w star, we add w star to x. Remember, by definition, w star was previously not in capitol X. We're making progress by adding it to x, that's one more vertex in x. Now x is supposed to represent all of the nodes that we've already processed. So an environ of this algorithm is that we've computed shortest path distances for everybody in x as well as the actual shortest paths. Now that we're putting w star in x, we're responsible for all this information, the shortest path information. What we're going to do is we're going to set the r estimate of w star's shortest path distance from s to be equal to the value of this Dijkstra's greedy criterion for this edge. That is, whatever our previously computed shortest path distance from s to v star was plus the length of the direct edge from v star to w star. Now a key point is to realize that this code does make sense. By which I mean, if you think about this quantity A(v), this is been previously computed. And that's because environ of this algorithm is we've always computed shortest path distances to everything that is in capital X. And of course, the same thing holds when we need to assign w star shortest path distance because v star was a member of capital X, we had already computed its shortest path distance. So we can just look up the v star entry position in the array a. Over in our picture on our left, we would just say, what did we compute the shortest path distance to v star previously? Maybe it's something like 17. And then we'd say, what is the length of this direct edge from v star to w star? Maybe that's 6. Then we would just add 17 and 6 and we would put 23 as our estimate of the shortest path distance from s to w star. We do something analogous with the shortest path itself in the array b. That is, again, we're responsible, since we just added w star to capital X, we're responsible for suggesting a path from s to w star in the b array. What we're going to do is we're just going to inherit the previously computed path to v star and we're just going to tack on the end one extra hop, namely the direct edge from v star to w star. That will give us a path from s all the way to w star via v star as an intermediate pit stop and that is the entirety of Dijkstra's Algorithm. I've explained all of the ingredients about how it works at a conceptual level. The two things I argue is, why is it correct? Why does it actually compute shortest paths directly to all of the different vertices, and then secondly, how fast can we implement it? The next two videos are going to answer both of those questions but before we do that, let's go through an example to get a better feel for how this algorithm actually works. I also want to go through a non example so that you can appreciate how it breaks down when there are negative edges, and that'll make it clear why do we need a proof of correctness because it's not correct without any assumptions about the edge lengths.