When you implement back propagation you'll find that there's a test called

creating checking that can really help you make sure

that your implementation of back prop is correct.

Because sometimes you write all these equations and you're just not 100% sure if

you've got all the details right and internal back propagation.

So in order to build up to gradient and checking,

let's first talk about how to numerically approximate computations of gradients and

in the next video, we'll talk about how you can implement

gradient checking to make sure the implementation of backdrop is correct.

So lets take the function f and replot it here and remember this is

f of theta equals theta cubed, and let's again start off to some value of theta.

Let's say theta equals 1.

Now instead of just nudging theta to the right to get theta plus epsilon,

we're going to nudge it to the right and

nudge it to the left to get theta minus epsilon, as was theta plus epsilon.

So this is 1, this is 1.01, this is 0.99 where, again,

epsilon is same as before, it is 0.01.

It turns out that rather than taking this little triangle and

computing the height over the width, you can get a much better estimate of

the gradient if you take this point, f of theta minus epsilon and this point,

and you instead compute the height over width of this bigger triangle.

So for technical reasons which I won't go into, the height over width of this bigger

green triangle gives you a much better approximation to the derivative at theta.

And you saw it yourself, taking just this lower triangle in the upper right

is as if you have two triangles, right?

This one on the upper right and this one on the lower left.

And you're kind of taking both of them into account

by using this bigger green triangle.

So rather than a one sided difference, you're taking a two sided difference.

So let's work out the math.

This point here is F of theta plus epsilon.

This point here is F of theta minus epsilon.

So the height of this big green triangle is f of theta plus epsilon

minus f of theta minus epsilon.

And then the width, this is 1 epsilon, this is 2 epsilon.

So the width of this green triangle is 2 epsilon.

So the height of the width is going to be first the height, so

that's F of theta plus epsilon minus F of theta minus epsilon divided by the width.

So that was 2 epsilon which we write that down here.