0:00

, I often want to differentiate an inverse function. Say, I've got a function f. The

Â derivative of f encodes how wiggling the input affects the output. The derivative

Â of the inverse function would encode how changes to the output affect the input.

Â Here's a theorem that I can use to handle this situation. Here is the inverse

Â function theorem. I'm going to suppose that f is some differentiable function, f

Â prime is continuous, the derivative is continuous. And the derivative, at some

Â point, a, is nonzero. In that case, I get the following fantastic conclusion. Then

Â the inverse function at y is defined for values of y near f of a. So, the function

Â f is invertable near a. The inverse function is differentiable for inputs near

Â f of a. And that derivative is continuous in your inputs near f of a. And I've even

Â got a formula for the derivative. The derivative of the inverse function at y is

Â 1 over the original derivative, the derivative of the original function,

Â evaluated at the inverse function of y. How can I justify a result like that? Why

Â should something like that be true? One 1 way to think about this is geometrically.

Â Here, I've drawn the graph with just some made up function, y equals f of x. What's

Â the graph of the inverse function look like? Well, one way to think about this is

Â that the inverse function exchanges the roles of the x and y axes, which is the

Â same as just flipping it over, alright? What was the y-axis now, the x-axis, what,

Â was the x-axis is now the y-axis? And this graph here is y equals f inverse of x.

Â This is how you graph the inverse function. Alright.

Â So, let's go back to the original function and if I put down a tangent line to the

Â curve at some point, let's say that tangent line has slope m. Well, what's the

Â tangent line of the inverse function? That would be the derivative of the inverse

Â function. Well, if I flip over the graph again to look at the graph of the inverse

Â function, I can put down a tangent line to the to the inverse function. And that has

Â slo pe 1 over m. If m was the original slope for the tangent line to the original

Â function, 1 over m is the new slope to the tangent line of the inverse function. Why

Â 1 over m? Well, that makes sense because I got this graph by exchanging the roles of

Â the x and y-axis, by flipping the paper over. And that exchange is rise for run,

Â and run for rise. So, the slope becomes the reciprocal of the old slope. This

Â slope business is reflected in the notation, dy dx. Som let's suppose that y

Â is f of x, so x is f inverse of y, supposing that this is an invariable

Â function. If y is f of x, then f prime of x could be written dy dx. And if f is

Â inverse of y, then the derivative of the inverse function at y, well, that's asking

Â how's changing y change x could write that as dx over dy. Well, if you really take

Â this notation seriously, what it looks like it's saying, is that, dx dy, which is

Â the derivative of the inverse function, should be 1 over dy dx, right? The

Â derivative of the inverse function is 1 over the derivative of the original

Â function. But you have to think about where these derivatives are being

Â computed. So, maybe you believe that dx dy is 1 over dy dx, it makes sense that if

Â you exchange the roles of x and y, that takes the reciprocal of the slope of the

Â line. But where is this wiggling happening, right? dy dx is measuring how

Â wiggling x affects y. Wiggling around where? Well, let's suppose that I'm

Â wiggling around a. So, I'm really calculating dy dx when x, say, is at a.

Â 3:59

This is the quantity that records how wiggling x near a. will affect y. Well

Â then, where's y wiggling? Well, if x is wiggling around a, y is wiggling around f

Â of a. So, the derivative on this side is really being calculated at y equals f of

Â a. And it's really necessary to keep track of where this wiggling is happening in

Â order to get a valid formula. It's actually easier to think about what's

Â going on if we just phrase all of these in terms of the Chain rule. So, what do I

Â know about the inverse function? Well, here's f inve rse.

Â F of f inverse of x is just x. Alright, what is the inverse function do? Whatever

Â you plug into the inverse function, it outputs whatever you need to plug into f

Â to get out the thing you plugged into the inverse function. Alright. So, this is

Â true. Now, if I differentiate both sides, assuming that f and f inverse are

Â differentiable, then by the Chain rule, what do I get? Well, the derivative of

Â this composition is the derivative of the outside at the inside times the derivative

Â of the inside. And that's equal to the derivative of the other side, which is the

Â derivative of x is just 1. Now, I'll divide both sides by f prime f inverse of

Â x and I get that the derivative of the inverse function of x is 1 over f prime of

Â f inverse of x. Is that a proof? Absolutely not. The embarrassing truth is

Â that this argument assumes the differentiability of the inverse function.

Â If this function, f inverse, is differentiable, then the Chain rule can be

Â applied to it. The Chain rule requires that the functions be differentiable. Now,

Â if the function is differentiable, then this Chain rule calculation tells me that

Â the derivative inverse function is this quantity. But that's all predicated on

Â knowing that the inverse function is differentiable. How do we know that? Well,

Â that's actually the content of this theorem, right? The content of the inverse

Â function theorem is not really the calculation of the derivative of the

Â inverse function. It's really just the fact that the inverse function is

Â differentiable at all. That is a huge deal, and it's not something that we can

Â just get from the Chain rule. Once we know that the inverse function is

Â differentiable, then the Chain rule gives us this calculation. But actually

Â verifying if the inverse function is differentiable is really quite deep,

Â that's why the inverse function theorem is such a big deal. The Chain rule requires

Â that the functions I'm applying the change rule to be differentiable. In contrast,

Â the inverse function theorem is asserting the differenti ability of the inverse

Â function. It's really saying much more, than just a computation of the derivative

Â if the derivative exists. It's actually telling me that the derivative exists. I'm

Â going to have to punt on saying much more about the proof of the inverse function

Â theorem. But nevertheless, we can now apply the inverse function theorem to some

Â concrete examples. For example think about the function, f of x equals x squared.

Â Well, what's the inverse function to this? Let's suppose the domain is just the

Â nonnegative real numbers. Then, the functions invertible on the

Â domain, and we know the name of the inverse is the square root of x. What's

Â the derivative of the original function? Well, we know that it's 2x, and the

Â derivative is continuous and the derivative is not 0 provided that x is a

Â positive. This is all the stuff that we need to apply the inverse function

Â theorem. Then, we know that the derivative of the inverse function at x is 1 over the

Â original derivative at the inverse of x. Now, the inverse fuction is the square

Â root of x, so that's 1 over f prime of the square root of x, and what's f prime? f

Â prime is the function that doubles its input. So, that's 1 over 2 square roots of

Â x. So, the derivative of the inverse function, the derivative of the square

Â root function is 1 over 2 square roots of x, provided x is bigger than 0, right?

Â Just like before, this is a calculation of the derivative of the square root

Â function. We can also see this numerically. So, the square root of 10,000

Â is 100, and you might ask what do you have to take the square root of, to get at

Â about 100.1? Say, some numeric example. Well, think now about the functions that

Â are involved here. There's the squaring function and the square root function. we

Â saw the derivative of the square root function is 1 over 2 square root x and the

Â derivative of x squared, we already know, is 2x. Where are we evaluating these

Â functions? Well, I'm evaluating the square root function at 10,000, right? This is at

Â x equals 10,000 . And if I evaluate that at 10,000, that's 1 over 2 times the

Â square root of 10,000, that's 1 over 200. Where am I evaluating the other function,

Â the x squared function? Well there, I'm really thinking of 100 as the input, so

Â I'll evaluate that derivative at 100 and 2x, when x is a 100 is 200. And it's not

Â too surprising, right, that 1 over 200 and 200 are reciprocals of each other, because

Â I'm calculating derivatives of a function and the inverse function at the

Â appropriate places. Now, let's try to answer the original question. I'm trying

Â to figure out, what do I have to take the square root of to get about 100.1? Well,

Â the ratio here is about 200 between the input and the output. So, if I want the

Â output to be affected by 0.1, I should try to change the input by about 200 times as

Â much, and 200 times 0.1 is 20, so I should try to change the input by about 20 and

Â sure enough, if you take the square root of 10,020, that's awfully close to a

Â 100.1. I hope that you'll play around with these numbers. All the conceptual stuff

Â that we're doing, these theorems, I'm not telling you these theorems to make numbers

Â boring, right? I'm telling you all these theorems to heighten your appreciation of

Â the numerical examples.

Â