Hi there. I'm David Dye and I'll be

taking you through the last few modules of this course.

In this module, we'll start to use the calculus we've done and

put it together with vectors in order to start solving equations.

In this first video, we'll look at a nice simple case where we just need to find the gradient, or the

derivative in order to solve an equation using what's called the Newton-Raphson method.

Now, say we've got that distribution of heights again with a mean,

an average, mu and width sigma,

and we want to fit an equation to that distribution.

So, we don't have to after we've

fitted it bother about carrying around all the data points,

we just have a model with two parameters; a mean and a width.

And we can do everything using just the model.

And that would be loads faster simpler,

I would let us make predictions and so on.

So, it would be much, much nicer,

but how do we find the right parameters for the model?

How do we find the best mu and sigma we can?

What we're going to do is, we're going to find some expression for

how well the model fits the data,

and then look at how that goodness of fit varies,

is the fitting parameters mu and sigma vary.

So, we're trying to solve an equation where

the fitting parameters are variables in the equation.

But in order to get there,

in the next module actually,

we're first going to need to do a bit more calculus.

So first, let's look at the equation of a line say,

here y equals x cubed minus 2 x plus two.

If we differentiate this equation,

we get the quadratic 3 x squared minus two,

and that quadratic will have two solutions and therefore two turning points will exist.

One a maximum, one a minimum just as we see here.

Now, say that I don't know what the equation looks like,

I'm blind and I haven't got

enough computer resources to graph out the values at every point.

Or more likely, in reality, say the function exists in

so many dimensions that I can't visualise it at all.

But say I only need to find the solution to the equation where y equals nought.

So, where x cubed minus 2 x plus two is equal to 0.

We can see there's actually only one solution here this graph,

but there could be more depending on

the exact form the equation that we're trying to solve.

Now, say that I have an idea that I want to hunt for solutions

somewhere near some initial guess, the red dot here.

For instance, the constant is pretty small and positive so I might guess.

I need a slightly negative value of x say minus two maybe somewhere near a solution.

Now, if I can evaluate the value of the function at my guess of x equals minus two,

I find the function has a value of minus two.

And if I ask what the gradient is at that value of x equals minus two,

I find that the gradient is positive, and it's 10.

Now, I can extrapolate the gradient to the intercept with the y axis,

which will be my first guess of the solution of the equation.

So, I try to solve where it finds that intercept with the y axis.

So, I can use that value of x at the intercept as

a new estimate for what the solution to the equation is.

Effectively, I'm guessing that the function is a straight line,

and then I'm using the gradient to extrapolate to find the solution.

It isn't really a straight line of course,

but the first order approximation would be that it's a straight line,

and we'll use that to update our guess and then go again and evaluate.

So, I can write down an equation for my new guess x_i plus one,

based on my previous guess by the time x_i as being

my original guess minus the value of the function divided by its gradient.

So, let's see how it plays out.

We can make a table starting with our initial guess that i equals nought,

and then we can find the gradient at the intercept,

and then use that to generate a new guess.

In this case, minus two, minus,

minus two divided by 10 gives us minus two plus 0.2,

which is minus 1.8.

Then we can evaluate

the result for that guess and find that it is just a little less than 0,

minus 0.23, and the gradient is 7.7.

So, we've gone from being out by two on y to being out by just 0.23.

So, in some sense, we've got like 10 times better in our estimate just in our first go.

If we carry on,

then we get the next guess for x2 is minus 1,77,

and that's just 0.005 away from the axis, it's really close.

Then, if we have another go,

after just three iterations,

we an answer of x equals minus 1.769,

which is just 2.3 times 10 to the minus six from the axis.

So, in just three iterations,

we pretty much solved the problem,

which is pretty cool.

This method is called the Newton-Raphson method,

and it's really pretty neat.

To solve an equation all we need to be able to do is evaluate it and differentiate it.

We don't need to graph and visualize it everywhere,

calculating it lots and lots of times,

we don't need to be able to solve it algebraically either.

Which if we have lots of dimensions to a dataset say, and a big multi,

multi variable function we're trying to fit that data,

it's going to be much too expensive to try and solve it analytically,

or you can plot it out for all the possible values of the variables.

This sort of method where we try a solution and evaluate it,

and then generate a new guess,

and then evaluate that, and again,

and again, and again is called iteration.

And it's a very fundamental computational approach.

Now, there are some things that can go wrong sometimes with this method.

So, let's briefly look at those.

Say I started off with a guess of x equals zero,

which evaluates y equals two.

When I find the gradient for that and extrapolate it,

that takes me away from a solution to the other side of the turning point.

It gives me a new guess that x equals one.

When I evaluate that,

then I get a value for y at x equals one of one.

When I find the gradient and extrapolate back,

then my new estimate lands me back at x equals zero,

which is where I begun. So, I have a problem.

I seem to have magically landed in a closed loop where

my estimate just cycles back between x equals nought and x equals 1.

I never get close.

I never even go anywhere near to the solution of x equals minus 1.769.

There's another problem, which is that if I'm close to a turning point,

this bottom here, to a minima or a maxima,

then because my gradient will be very small,

when I divide by the gradient in the Newton-Raphson equation here,

my next estimate will tend to be zapping off to some crazy value,

and therefore it won't converge you see, it will dive off somewhere.

Those are the problems. So, that's the Newton-Raphson method.

We iterate to a solution,

to an equation by each time making and new estimate from

a solution using the gradient to extrapolate towards the solution,

then going again, and again, and again.

Most of the time,

this works really well as a means to step towards a solution.

So, what we've done in this video is look at a method for using

just the gradient to step our way towards solving a problem.

This method is called the Newton-Raphson method.

It's a really powerful way to solve an equation just by

evaluating it and its gradient a few times.

It's as if you're standing on a hill in the fog,

and you can know your height,

and you can know locally what's going on around you,

you can know how steep the hill is.

But you can see the whole landscape around you. You don't know what it looks like.

You don't know how to get down the mountain if you like,

down to a nice safe place that doesn't have cliffs.

So, what you do is,

you guess based on how steep a hill is locally around you which way to go.

You want to go down to sea level so you go down the hill.

Then you take that step blindfolded,

and when you get there, you ask again what height you're at and how steep it is,

and then you keep making more steps down

the hill until either something goes wrong and

you need to go back down the other way or you get home to where you wanted to be.

The point is, you don't need to know what the landscape looks like,

the function, you just need an altimeter,

the value of the function,

and to be able to feel with your toe what the gradient is like locally around you.

What we'll do in the next video is,

look at how to apply this where we've got multiple variables not just x.

And that will involve finding the gradient vector,

how to go down a hill on a contour plot.