In this video, we'll start developing the Bellman-Ford algorithm as an instantiation of the dynamic programming paradigm. We'll develop it in the usual way, by understanding how optimal solutions must necessarily be composed of optimal solutions to smaller sub-problems. So let me just quickly review what we learned from the subtleties of the previous video. In particular, let me be precise about what I mean by solving the single-source shortest path problem when the input graph can have negative edge costs. So the input, as usual, is a directed graph. Every edge has an edge cost, c sub e. We're allowing those edge costs to be negative. And we're given a source for text s. So what do we want to compute? Well, ideally, we'd like to compute the shortest path distance from s to all other destinations v, where as usual, the length of the path is just the sum of the edge costs. Now, in a previous video, we discussed how this goal is problematic for input graphs that have a negative cycle. If you allow cycled paths to contain cycles then this isn't even defined. The, in some sense, shortest path length is minus infinity. And if you don't allow cycles, then it's computationally intractable and would be a hard problem. So if you fail to compute shortest paths, then, at least, tell us an excuse. Output a negative cycle in the input graph as the reason for your failure to compute shortest path distances. So for this video, I'm just going to develop the Bellman-Ford algorithm for input graphs that do not contain negative cycles. And, of course, these are the cases in which we better actually output the correct shortest paths. Once we've got the Bellman-Ford algorithm up and running for input graphs that do not have negative cycles, we'll see that it's a fairly easy matter to extend it to the general case. And for input graphs that have negative cycles, we'll be able to output such a cycle without affecting the running time of the basic algorithm. So with an eye toward a dynamic programming algorithm for the shortest path problem, let's start thinking about optimal substructure, the way in which optimal solutions, that is shortest paths, must necessarily be composed of optimal solutions, that is shortest paths to smaller sub-problems. Now, the formal optimal substructure limit is going to be a little bit cumbersome to state. So let me just spend a slide kind of telling you what the deal is, how to think about it. It's often tricky to figure out how to apply the dynamic programming paradigm to graph problems. One of the reasons for that is that graphs aren't really naturally sequential objects, they're not ordered in any obvious way. We're just given some unordered set of vertices, and some unordered set of edges. Now, one special case, of course, would be path graphs, our very first example of dynamic programming. There, there was an obvious ordering on the vertices of the graph. But unlike, say, alignments, sequences, where again there's an obvious ordering from left to right, in a graph problem, it's not so clear how to order things. For this particular graph problem, however, it seems like we've got something going for us, which is the sequentiality of our output. Right? Paths are certainly sequential objects. So that gives us hope we can state and prove an optimal substructure lemma, talking about how optimal solutions, shortest paths, must be built up from, in some sense, smaller shortest paths. Unfortunately, it's still far from obvious how to really define smaller and larger sub-problems. So, for example, you'd love to have some intelligent ordering on which you process the possible destinations v. But it's not at all clear how to do that without knowing the shortest path distances in the first place. So this is a subtle issue that I encourage you to think hard about in the privacy of your own home, so you can better appreciate what's non-trivial about the Bellman-Ford solution. So the key idea in the Bellman-Ford algorithm, and it's a good one, is to introduce an additional parameter that gives us an unambiguous definition of sub-problem source. So what this parameter is going to do, for a given destination v, is it's going to control how many edges we allow in paths from the source s to this destination v that we're looking at. To explain this, let's look at the following green graph that has five vertices on the right-hand part of the slide. So in the Bellman-Ford algorithm, we're going to have one sub-problem for each possible destination, and each possible restriction on the number of edges in a path. So, for example, suppose we're looking at s and we're looking at destination t. And we think about paths that only have two edges or less. Well, in that case, the shortest path length, subject to that constraint from s to t in this graph, has the length of four. The bottom path, which has three edges, is not an option, we're only permitting two edges or less in this current sub-problem. Now, if we bump up the edge budget to three, then the corresponding shortest path distance drops from four, to three. Because all of a sudden, we can make use of this three-hop path on the bottom. And again, the point here is that it gives us an unambiguous notion of sub-problem size. The more edges you're allowed to use in your paths, from a source to a given destination, the bigger that sub-problem. All right, so that discussion was deliberately vague, to give you some context for the following formal statement of the optimal substructure lemma. So we're going to go ahead and state and prove this optimal substructure lemma in full generality. We're going to be working with arbitrary input graphs, they might have a negative cycle, they might not. So like all of these optimal sub-structure lemmas, the statement gonna's have the form that the optimal solution to a sub-problem has to be one of a small number of candidates that are composed in simple ways from optimal solutions to smaller sub-problems. So how do we index a given sub-problem? Well, there's going to be a destination v, that we care about. And, as we said in the last slide, there's going to be a budget on how many edges you're allowed to use in a path from s to v. And we're going to use i to denote that budget. So for every possible destination v, and so for every possible edge budget, there's going to be a positive integer, 1 or bigger. Suppose that p is an optimal solution then. Meaning, amongst all paths that start at s, that end at v, and that contain, at most, i edges. Amongst all of those paths, p has the minimum sum of edge cost, the minimum length. So a subtle point, because we're proving this lemma in its full generality, that is also for input graphs that might have negative cycles, we need to go ahead and allow this path p to use cycles if it wants, including potentially to use a negative cycle multiple times. That is allowed. Notice we're not worried about this path using this cycle an infinite number of times, because we have a finite budget, i, on how many edges it can contain. So with that setup, what are the possible candidates for what capital P could possibly be? We're going to have our usual trivial case 1, which is that if this path P doesn't bother to use up its full edge budget, that is if it has (i-1) edges or less, well then, naturally, P better be the shortest s-v path that has at most (i-1) edges. So the non-trivial case is when the shortest path with the most i edges from s to v actually uses its full budget, uses all i of its edges. So now, by analogy, with all of our previous dynamic programming algorithms, where we think about plucking off the final part of an optimal solution, here we're going to pluck off the final edge from the path p. What do we get? Well, we get a path with one fewer edge. So that's good, it's going to correspond to some smaller sub-problem, because it has at most i-1 edges. On the other hand, notice that if we take a path from s to v and we pluck off the final edge, we get a path from s to somewhere else. So we're going to call that somewhere else w. So plucking off the final edge from capital P gives us a path, P prime, that starts at s, that ends at w, that has at most i minus 1 edges. And I hope you can guess that the claim is that it's not just any old path from s to w with the most i minus 1 edges. It is, in fact, a shortest such path. Now, in this case, too, notice that, we, of course, know that p prime has exactly i minus 1 edges. Not nearly at most, i minus 1 edges. But it's going to be useful to have the stronger assertion claimed here, that P-prime is optimal amongst all paths that have i minus 1 edges, or possibly fewer. All right, so this is one of those lemmas that's actually harder to state than it is to prove. So let's just quickly sort of talk through the proof. It's the same as the previous optimal substructure lemmas that we've seen. Case one is totally trivial, it's the obvious contradiction that we've seen in many of our other case ones. Case two is going to be one of our usual cut-and-paste contradictions. So suppose there was some path Q better than P-prime, that is Q starts at s, ends at w, contains (i-1) edges or less, and the sum of its edge costs is even smaller than that of P-prime. Well then, if we just tack on the final hop of p, that is, the edge wv, we get a path that starts at s, that goes to v, that has the most i edges. And the total sum of edge costs, in this path q plus wv, is strictly less than that in the original path P. But that contradicts the purported optimality of p, amongst all paths starting at s, ending in v, and having the most i edges. So that's the proof of the optimal sub-structure lemma. It's simple enough. And as usual, the next step is going to be to compile this lemma into a recurrence, where informally, what the recurrence is going to do is brute-force search amongst the possible candidates for the optimal solution. So to make sure you understand what just happened, let's move on to a quiz. And the question is, for a given destination v of some input graph, how many candidates are there for the optimal solution of a sub-problem that involves v? The correct answer is the second one. The answer depends on which destination v you're talking about, and what governs the number of sub-problems is the in degree of this vertex. That is, the number of edges of the input graph that have v as their head. Why is this true? Well, case one contributes one possible candidate. It's possible that for a given i and a given v, all you do is inherit the optimal solution to destination v, with a budget of one fewer edge. Case two may seem like only one other candidate, but actually case two comprises a number of them, one for each choice of that final hop w, v. Specifically for a given choice of w, that contributes a candidate optimal solution that's the shortest path from s, to that choice of w that uses the most i minus 1 edges, plus that edge wv tacked on.