hi there I'm David Dye and I'll be taking through the last few modules of

this course in this module we'll start to use the calculus we've done and put

it together with vectors in order to start solving equations in this first

video we'll look at a nice simple case where we just need to find the gradient

the derivative in order to solve an equation using what's called the

newton-raphson method now say we've got that distribution of heights again with

a mean an average mu and a width Sigma and we want to fit an equation to that

distribution that so we don't have to if we fitted it bother about carrying

around all the data points we just have a model with two parameters

a mean and a width and we could do everything using just the model and that

would be loads faster and simpler and would let us make predictions and so on

so it'd be much much nicer but how do we find the right parameters for the model

how do we find the best mu and Sigma we can well what we're gonna do is going to

find some expression for how well the model fits the data and then look at how

that goodness of fit varies is the fitting parameters mu and Sigma vary so

trying to solve an equation where the fitting prompts the variables in the

equation but in order to get there in the next module actually we're first

gonna needs to a bit more calculus so first let's look at the equation of a

line so here y equals x cubed minus 2x plus 2

if we differentiate this equation we get the quadratic 3x squared minus 2 and

that quadratic will have two solutions and therefore two turning points will

exist one a maximum one a minimum just as we see here now say that I don't know

what the equation looks like I'm blind and I haven't got enough computer

resources to graph out the values at every point or more likely in reality

say the function exists in so many dimensions that I can't visualize it at

all but it's say I only need to find the solution to the equation where y equals

nor so where X cubed minus 2x plus 2 is equal to 0 we can see that it's actually

only one solution here on this graph but there could be more depending on the

exact form of the equation I was trying to solve now say that I have an idea

that I want to hunt from solutions somewhere near some initial guess at the

red dot here for instance the constants pretty small and positive so my guess is I

need a slightly negative value of XA minus 2 might be somewhere near a

solution now if I can evaluate the value of the function at my guess of x equals

-2 I find that the function has a value of minus 2 and if I ask what the

gradient is at that value of x equals -2 I find that the gradient is positive and

it's 10 now I can extrapolate the gradient to the intersect with the y

axis which is would be my first guess of the solution of the equation I was

trying to solve where it finds that intercept with the y axis so I can use

that value of x at the intercept as a new estimate for what the solution to

the equation is effectively I'm guessing that the function is a straight line and

then I'm using the gradient to extrapolate to find the solution it

isn't really a straight line of course but that first order approximation would

be that it's a straight line and we'll use that to update our guess and then go

again and evaluate so I can write down an equation for my new guess X I plus 1

based on my previous guess and it's an X I as being my original guess minus the

value of the function divided by its gradient so let's see how it plays out

we can make a table starting with our initial guess that I equals naught and

then we can find the gradient and the intercept and then use that to generate

a new guess in this case minus 2 minus -2 divided by 10 gives us minus 2 plus

0.2 which is minus 1.8 then we can evaluate the result for that guess and

find that it's just a little less than zero - nor point two three and the

gradient is seven point seven so we've gone from being out by 2 on Y to being

out by just naught point two three so in some sense we've got like 10 times

better a nice timet just in our first go if we carry on then we get the next

guess for x2 is minus 1 point 7 7 and that's just naught point naught 5 away

from the axis it's really close then if we have another go after just 3

iterations we get an answer of x equals minus one point seven six nine which is

just two point three times ten to the minus six

the axis so in just three iterations we pretty much solve the problem which is

pretty cool this method is called the newton-raphson method and its really

pretty neat to solve an equation all we need to be able to do is evaluate it and

differentiate it we don't need to graph and visualize it everywhere calculating

it lots and lots of times we don't need to be able to solve it algebraically

either which if we have lots of dimensions to a dataset say and a big

multi multi variable function we're trying to fit to that data it's going to

be much too expensive to try and solve it analytically or even plot it out for

all the possible values of the variables this sort of method where we try a

solution and evaluate it and then generate a new guess and then evaluate

that and again and again and again it's called iteration and it's a very

fundamental computational approach now there are some things that can go wrong

sometimes with this method so let's briefly look at those say I started off

with a guess of x equals 0 which evaluates to y equals 2

when I find the gradient for that and extrapolate it and takes me away from

the solution to the other side of the turning point it gives me a new guess

that x equals 1 what I evaluate that then I get a value for y at x equals 1

of 1 when I find the gradient and extrapolate back then my new estimate

lands me back at x equals 0 just where I begun so I have a problem I seem to a

magically landed in a closed loop where my estimates just cycle back between x

equals naught and x equals 1 and I never get close I never even go anywhere near

to the solution x equals minus 1 point 7 6 9 there's another problem which is

that if I'm close to a turning point this bottom here to a minimum or a

maximum then because my gradient will be very small when I divide by the gradient

in the newton-raphson equation here my next estimate will take me zapping off

to some crazy value and therefore it won't converges there dive off somewhere

those are the problems so that's the newton-raphson method we iterate to a

solution to an equation by each time making a new estimate from the solution

using the gradient to extrapolate or the solution then going again and again

and again a most of the time this works really well as a means to step towards

the solution so what we've done in this video is look at a method for using just

the gradient to step our way towards solving a problem this method is called

the newton-raphson method and it's a really powerful way to solve an equation

just by evaluating it and its gradient a few times it's as if you're standing on

a hill in the fog and you can know your height and you can know locally what's

going on around you can know how steep the hill is but you can't see the whole

landscape around you you don't know what it looks like you don't know how to get

down the mountain if you like down to a nice safe place that doesn't have cliffs

so what you do is you guess based on how steep the hill is locally around you

which way to go that you want to go down to sea level so you go down the hill

then you take that step blindfolded and when you get there you ask again what

height you out and how steep it is and then you keep making more steps down the

hill until either something goes wrong and you need to go back down the other

way or you get home to where you want it want it to be the point is you don't need

to know what the landscape looks like the function you just need an altimeter

the value of the function and to be able to feel with your toe what the gradient

is like locally around you what we'll do in the next video is look at how to

apply this where we've got multiple variables not just X and that will

involve finding the gradient vector how to go down a hill on a contour plot