0:00

[MUSIC] Many of the functions that we'd most like to differentiate are actually

compositions of two different functions. This happens in the real world, too. I

mean, look, if you change the number of flowers, that's going to affect say how

many rabbits there are around to you know, eat those flowers. And if you change them

with rabbits, that'll affect how many wolves that forest can support. There's

some really concrete examples of this. Here's a concrete example. Suppose that f

of x is the number of widgets produced with an investment of x dollars, right?

With, with more money, maybe, you can build more widgets. Suppose g of n is the

income that you get by selling those n widgets. What you're probably really

interested in is not exactly how many widgets you produce. What you'd like to

know is, for a given investment, how much money are you going to make, right? Well,

that's g of f of x minus your initial investment, right, g of f of x is how much

money comes in when you sell the widgets that you produced with your initial

investment of x dollars, right? This quantity is measuring the profit on an

investment of x dollars in widget production. We need some framework, some

general picture that let's us understand how one thing changing affects something

else and how that thing's changing goes on to affect something else. Specifically, if

I've got some function h which is a composition of two functions, g of f of x

in this case, I'd like to know something about the derivative of h. I want to know

how changing x affects f and then how changing f goes on to affect g. And I'd

like some sort of formula that gives me that answer, right? I'd like to know the

derivative of h in terms of information about how x is changing affects f and how

changing the input to g affects g. I want a formula for the derivative of h in terms

of the derivatives of f and the derivative of g. This is exactly what the chain rule

does. What the chain rule says, is that the derivative of the composition is the

derivative of g evaluated at f of x times the derivative of f evaluated at x.

Sometimes, people have the idea that the chain rule looks somehow, that you'd

really expect the formula to look very different. I mean sometimes people think

this formula looks a little bit weird, you know? I'm composing functions, but now

it's the derivative of g composes just a function f. What's going on? You might

think that given the fact that the derivative of a sum is the sum of the

derivatives. You might be tempted to think that the derivative of a composition

should be the composition of derivatives, but that's not the case. But the chain

rule really is capturing what happens when you chain together these changes. So let's

think about this chain rule, the derivative of g of f of x is g prime f of

x times f prime of x in terms of chaining together different changes. I'm trying to

calculate is how changing x changes g of f of x right? This is the derivative of the

composition. What do I know? Well, I know how changing x will change f of x, right?

This is what the derivative of f is, is, is measuring, right? The derivative is the

ratio of output change to input change. Now, in between here, what I have is the

change in f of x will change g of f of x in some way. This ratio of changes is

really the derivative of g at the point of f of x. What is the derivative? You plug

in an input to the derivative to ask how wiggling that input would effect the

output and that's exactly what this ratio is. I'm asking how will f of x is changing

affect g of f of x, right? That's the derivative of g at the point that's

wiggling, f of x. Well, if you think about it, now, if I

just multiply these two things together, then I get the change in g of f of x

divided by the change in x. This is the chain rule, right? If I multiply together

g prime f of x and f prime of x, what I'm left with is exactly what I want, the

derivative of g of f of x. You can see this pictorially as well. So here, I've

drawn three number lines. On the first number line, I've drawn x and I imagine x

is the input to f. And on the second number line, I've drawn f of x and f of x

is now the input to g. And on the last number line, I've drawn g of f of x. The

essential question answered by the derivative is how changing x will affect g

of f of x? But since this is a composition of functions, I'm going to analyze the

effect of changing x and g of f of x in stages, right? I'm first going to see how

this changing x affect f of x and how f of x is changing affect g of f of x. So let's

imagine that I change x by a small quantity. I'm calling that small quantity

h here, h is not a function, just some small number, the amount by which I'm

wiggling the input. Now, how is the output affected? Well, that's exactly what the

derivative measures. Right? The derivative of f at x tells me how wiggling the input

x would affect the output. So f prime of x, which is the ratio of output change to

input change times an actual input change gives me a first order approximation of

the output change. So I imagine the output is changing by about f prime of x times h.

Now, how does that change in value of f of x affect g? Well, I have to figure out how

wiggling the input to g will affect the output of g and that depends on where I'm

calculating the derivative. I need to calculate the derivative of g at the point

f of x, because, f of x is the point that's doing the wiggling. So, it's the

derivative of g at the point f of x that tells me how wiggling the input around f

of x would affect the output to g. So it's that derivative times the amount by which

the input changed, which is this quantity here, f prime of x times h. And when you

look at it this way, you can see that for an input change to x of some small amount

h, the output changes by about g prime f of x times f prime of x as much, which is

exactly what the chain rule is telling me should be the case. Since this is the

correct rule, that the chain rule really is the derivative of the outside at the

inside times the derivative fu nction. Let's try to see a numerical example of

this thing in action. So as a numerical example let's consider the function g of x

equals x to the 4th power and the function f of x equals 1 plus x to the 3rd power.

Andm maybe what I want to try to estimate is g of f of 1.0001, and now,

approximately what is that equal to? Well, it's not too hard to calculate g of, of 1,

right? What's f of 1? Well, that's 1 plus 1 cubed, well, that's 2. So what's g of 2>

Well, that's 2 to the 4th, well, that's 16. So I know that g of f of 1.0001 is

going to be close to 16. The question is, how is wiggling the input up to 1.0001

going to affect the output of this composition of functions? Well, I could do

it in stages, right? That's what the chain rule's telling me to do. So I could

calculate first the derivative of f at 1. Right? And the derivative of f is 1 plus

3x squared, so the derivative of f at 1 is 3. And indeed, if I calculate f of 1.0001,

that's about 2.0003 and a bit more. Now, I want to try to calculate how changing the

input to g will affect the output of g. So I should calculate the derivative of g and

that's 4x cubed by the power rule, but where should I evaluate the derivative of

g? Your first temptation is to calculate the derivative of g at 1, but that is not

a good idea, because you're not wiggling the input 1 to g. What you're really

should be calculating is the derivative of g at 2, because it's this 2 that's going

to be wiggling. When you wiggle the input to f, it's the output to f, f of 1, that's

going to be changing, so you should calculate the derivative of g there and

what is that? That's 4 times 2 cubed, that's 4 times 8, that's 32.

So what we're trying to calculate is g of f of 1.0001 and we know that that's about

g of, well, what's f of 1.0001? It's about 2.0003. So what happens when I wiggle the

input of g from 2 to 2.0003? Well, that should be about the output of g at 2 which

is 16 plus how much I change the input by, times the derivative of g at the point

where the wiggli ng is happening, which is 2 and that's 32. And what's 16 plus 0.0003

times 32, that's 16.0096. So g of f of 1.0001 is about 16.0096. And you can see

this 96 just from the chain rule, right? The relevant thing to calculate is g prime

of f of 1 times f prime of 1, right? This is going to tell me how wiggling the input

1 affects the output and g prime of f of 1 is 32, f prime of 1 is 3, and 32 times 3

is 96. So, that's the chain rule and it's going to take some time for the chain rule

to really sink in. But the chain rule is super important for two very different

reasons. On the one hand, you've ta know the chain rule just to be able to compute

derivatives. A lot of the functions that you'll be asked to differentiate are

actually compositions of differentiable functions, so you'll need to use the chain

rule to finish those derivative calculations. But on the other hand,

you've gotta know the chain rule just to understand how chained together changes

work. In the real world, a lot of things change, and those changing things affect

other things, and those changing things, then go on to affect yet other things. And

you've got, got understand how those changes get composed together, in order to

really understand how the real world works.