0:23

So what is that?

Â Well, this says, if we were to take this value of v and

Â change it a little bit, how would the value of J change?

Â Well, J is defined as 3 times v.

Â And right now, v = 11.

Â So if we're to bump up v by a little bit to 11.001,

Â then J, which is 3v, so currently 33,

Â will get bumped up to 33.003.

Â So here, we've increased v by 0.001.

Â And the net result of that is that J goes out 3 times as much.

Â So the derivative of J with respect to v is equal to 3.

Â Because the increase in J is 3 times the increase in v.

Â And in fact, this is very analogous to the example

Â we had in the previous video, where we had f(a) = 3a.

Â And so we then derived that df/da, which with slightly simplified,

Â a slightly sloppy notation, you can write as df/da = 3.

Â So instead, here we have J = 3v,

Â and so dJ/dv = 3.

Â With here, J playing the role of f, and

Â v playing the role of a in this previous example that we had from an earlier video.

Â So indeed, terminology of backpropagation, what we're seeing

Â is that if you want to compute the derivative of this final output variable,

Â which usually is a variable you care most about,

Â with respect to v, then we've done one step of backpropagation.

Â So we call it one step backwards in this graph.

Â Now let's look at another example.

Â What is dJ/da?

Â In other words, if we bump up the value of a, how does that affect the value of J?

Â 2:35

Well, let's go through the example, where now a = 5.

Â So let's bump it up to 5.001.

Â The net impact of that is that v, which was a + u, so that was previously 11.

Â This would get increased to 11.001.

Â And then we've already seen as above that

Â J now gets bumped up to 33.003.

Â So what we're seeing is that if you increase a by 0.001, J increases by 0.003.

Â And by increase a, I mean, you have to take this value of 5 and

Â just plug in a new value.

Â Then the change to a will propagate to the right of the computation graph so

Â that J ends up being 33.003.

Â And so the increase to J is 3 times the increase to a.

Â So that means this derivative is equal to 3.

Â And one way to break this down is to say that if you change a,

Â then that will change v.

Â 3:57

First, by changing a, you end up increasing v.

Â Well, how much does v increase?

Â It is increased by an amount that's determined by dv/da.

Â And then the change in v will cause the value of J to also increase.

Â So in calculus, this is actually called the chain rule that if a affects v,

Â affects J, then the amounts that J changes when you

Â nudge a is the product of how much v changes when you

Â nudge a times how much J changes when you nudge v.

Â So in calculus, again, this is called the chain rule.

Â And what we saw from this calculation is that if you increase a by 0.001,

Â v changes by the same amount.

Â So dv/da = 1.

Â So in fact, if you plug in what we have wrapped up previously,

Â dv/dJ = 3 and dv/da = 1.

Â So the product of these 3 times 1,

Â that actually gives you the correct value that dJ/da = 3.

Â So this little illustration shows hows by having computed dJ/dv,

Â that is, derivative with respect to this variable,

Â it can then help you to compute dJ/da.

Â And so that's another step of this backward calculation.

Â 5:39

I just want to introduce one more new notational convention.

Â Which is that when you're witting codes to implement backpropagation,

Â there will usually be some final output variable that you really care about.

Â So a final output variable that you really care about or that you want to optimize.

Â And in this case, this final output variable is J.

Â It's really the last node in your computation graph.

Â And so a lot of computations will be trying to compute the derivative of that

Â final output variable.

Â So d of this final output variable with respect to some other variable.

Â Then we just call that dvar.

Â So a lot of the computations you have will be to compute the derivative of the final

Â output variable, J in this case, with various intermediate variables,

Â such as a, b, c, u or v.

Â And when you implement this in software, what do you call this variable name?

Â One thing you could do is in Python,

Â you could give us a very long variable name like dFinalOurputVar/dvar.

Â But that's a very long variable name.

Â You could call this, I guess, dJdvar.

Â But because you're always taking derivatives with respect to dJ, with

Â respect to this final output variable, I'm going to introduce a new notation.

Â Where, in code, when you're computing this thing in the code you write,

Â we're just going to use the variable name dvar in order to represent that quantity.

Â So dvar in a code you write will represent the derivative of

Â the final output variable you care about such as J.

Â Well, sometimes, the last l with respect to the various intermediate quantities

Â you're computing in your code.

Â So this thing here in your code, you use dv to denote this value.

Â So dv would be equal to 3.

Â And your code, you represent this as da,

Â which is we also figured out to be equal to 3.

Â So we've done backpropagation partially through this computation graph.

Â Let's go through the rest of this example on the next slide.

Â So let's go to a cleaned up copy of the computation graph.

Â And just to recap, what we've done so

Â far is go backward here and figured out that dv = 3.

Â And again, the definition of dv, that's just a variable name,

Â where the code is really dJ/dv.

Â We've figured out that da = 3.

Â And again, da is the variable name in your code and that's really the value dJ/da.

Â 8:32

And we hand wave how we've gone backwards on these two edges like so.

Â Now let's keep computing derivatives.

Â Now let's look at the value u.

Â So what is dJ/du?

Â Well, through a similar calculation as what we did before and

Â then we start off with u = 6.

Â If you bump up u to 6.001, then v,

Â which is previously 11, goes up to 11.001.

Â And so J goes from 33 to 33.003.

Â And so the increase in J is 3x, so this is equal.

Â And the analysis for u is very similar to the analysis we did for a.

Â This is actually computed as dJ/dv times dv/du,

Â where this we had already figured out was 3.

Â And this turns out to be equal to 1.

Â So we've gone up one more step of backpropagation.

Â We end up computing that du is also equal to 3.

Â And du is, of course, just this dJ/du.

Â Now we just step through one last example in detail.

Â So what is dJ/db?

Â So here, imagine if you are allowed to change the value of b.

Â And you want to tweak b a little bit in order to minimize or

Â maximize the value of J.

Â So what is the derivative or

Â what's the slope of this function J when you change the value of b a little bit?

Â 10:11

It turns out that using the chain rule for calculus,

Â this can be written as the product of two things.

Â This dJ/du times du/db.

Â And the reasoning is if you change b a little bit,

Â so b = 3 to, say, 3.001.

Â The way that it will affect J is it will first affect u.

Â So how much does it affect u?

Â Well, u is defined as b times c.

Â So this will go from 6,

Â when b = 3, to now 6.002

Â because c = 2 in our example here.

Â And so this tells us that du/db = 2.

Â Because when you bump up b by 0.001, u increases twice as much.

Â So du/db, this is equal to 2.

Â And now, we know that u has gone up twice as much as b has gone up.

Â Well, what is dJ/du?

Â We've already figured out that this is equal to 3.

Â And so by multiplying these two out, we find that dJ/db = 6.

Â And again, here's the reasoning for the second part of the argument.

Â Which is we want to know when u goes up by 0.002, how does that affect J?

Â The fact that dJ/du = 3, that tells us that when

Â u goes up by 0.002, J goes up 3 times as much.

Â So J should go up by 0.006.

Â So this comes from the fact that dJ/du = 3.

Â And if you check the math in detail,

Â you will find that if b becomes 3.001,

Â then u becomes 6.002, v becomes 11.002.

Â So that's a + u, so that's 5 + u.

Â And then J, which is equal to 3 times v,

Â that ends up being equal to 33.006.

Â And so that's how you get that dJ/db = 6.

Â And to fill that in, this is if we go backwards, so this is db = 6.

Â And db really is the Python code variable name for dJ/db.

Â And I won't go through the last example in great detail.

Â But it turns out that if you also compute out dJ,

Â this turns out to be dJ/du times du.

Â And this turns out to be 9, this turns out to be 3 times 3.

Â I won't go through that example in detail.

Â So through this last step, it is possible to derive that dc is equal to.

Â 13:20

So the key takeaway from this video, from this example, is that when computing

Â derivatives and computing all of these derivatives, the most efficient way to do

Â so is through a right to left computation following the direction of the red arrows.

Â And in particular, we'll first compute the derivative with respect to v.

Â And then that becomes useful for

Â computing the derivative with respect to a and the derivative with respect to u.

Â And then the derivative with respect to u, for

Â example, this term over here and this term over here.

Â Those in turn become useful for computing the derivative with respect to b and

Â the derivative with respect to c.

Â So that was the computation graph and how does a forward or left to right

Â calculation to compute the cost function such as J that you might want to optimize.

Â And a backwards or a right to left calculation to compute derivatives.

Â If you're not familiar with calculus or the chain rule,

Â I know some of those details, but they've gone by really quickly.

Â But if you didn't follow all the details, don't worry about it.

Â In the next video,

Â we'll go over this again in the context of logistic regression.

Â And show you exactly what you need to do in order to implement the computations you

Â need to compute the derivatives of the logistic regression model.

Â