Now that we've explored the relationship between functions and their gradient,

we should be ready to lay out the more formal definition of a derivative.

All we're going to do,

is translate the understanding about gradients that we saw in the previous video,

into some mathematical notation that we can write down.

Unfortunately, this is the bit that a lot of people seem to find

uncomfortable so I will do my best to make it as painless as possible.

We talked in the last video about horizontal lines having a gradient of zero,

whilst upwards and downward sloping lines

having positive or negative gradients respectively.

We can write down a definition of this concept by

taking the example of a linear function,

which has the same gradient everywhere.

If we start by picking any two points,

say here and here,

we can then say that the gradient of this line is equal to

the amount of that function that increases in this interval,

divided by the length of the interval, that we're considering.

This description is often just condensed to the expression, rise over run,

where rise is the increase in the vertical direction,

and run is the distance along the horizontal axis.

So, rise here and the run down here.

If our function was sloping down,

and we pick points at the same horizontal locations,

then our run would be the same,

but our rise would now be negative.

So, our gradient equals

rise divided by run.

Fairly straightforward so far;

but how does it relate to the more complicated functions we saw previously,

where the gradient is different at every point.

Now, the rise over run type gradient varies depending on where we choose our points.

Let's pick a single point where we'd like to know the gradient,

which we'll say is at point x on the horizontal axis.

The value of our function at this point is therefore clearly just f of x.

We're going to use the same logic as before.

We now need to pick a second point to draw our rise over run triangle.

We can call the horizontal distance between our two points, delta x,

where as usual, a delta is being used to express a small change in something.

Which means all second point must be at position x plus delta x.

We can also write down the vertical position of our second point as

a function f evaluated at our new location x plus delta x, i.e.

f of x plus delta x.

We can now build an expression for the approximate gradient at our point x,

based on the rise over run gradient between point x,

and any second point remembering that the run will be our distance delta x,

and our rise is just a difference in height of the two points.

The last step in this process is simply to notice that

for nice smooth continuous functions like the one we're showing here,

as delta x gets smaller,

the line connecting two points becomes a better and better approximation,

of the actual gradient at our point x.

We can express this concept formally,

by using the limit notation scheme,

which says that as delta goes to zero.

Our expression will give us a function for our gradient at any point we choose,

which we write as f dash of x or d_f by d_x,

depending on which notation scheme you prefer.

This is a slightly strange concept as we're not talking about delta x equals zero,

as dividing by zero is not defined.

But instead just when x is extremely close to zero.

This process of extreme rise over run,

is what differentiation is.

When you are asked to differentiate a function,

this literally just means,

substitute the function into this expression.

We could fill up several videos with

more robust interpretations of this infinitely small but non-zero delta x;

but for now, just don't worry about it too much.

We know more than enough to continue on our journey.

Let's now put our new derivative expression into practice and see if it works.

First, we should try this out on a linear function,

once again, as we know that the answer is just going to be a constant.

What will the gradient of f of x equals 3_x plus two?

We can immediately sub this straight in, and say okay,

f dash of x,

is going to equal the limit as delta x goes to zero of.

We're taking our function and subbing it in: 3 x plus

delta x plus 2 minus 3 x plus 2,

all divided by delta x.

Okay? Now, we just work through the algebra.

So we can say the limit of delta x goes to 0 of,

and so we can expand these brackets out.

So we're going to get 3 x plus 3 delta x,

plus 2 minus 3 x minus 2,

all divided by delta x.

Okay. Now, we can look at this and say some of the terms are going to cancel.

This 3 x here goes with this 3 x here,

and this plus 2 here goes with this minus 2 here.

So we can keep going with this line say well the limit,

it's going to be 3 delta x over delta x,

and then working down here.

We can say, "Okay.

Well, the delta x's themselves are just going to cancel."

You end up with the limit of delta as delta x goes to zero, of just 3.

Now, this thing doesn't have a delta x in it.

Actually, the answer for this one is just the number three,

because all of the delta x terms have disappeared in

this case then the limit expression has no effect.

So we can just ignore it.

The gradient of our linear function is just a constant as we expected.

In fact, I'm sure many of you would have known already,

that the gradient of an equation of the form f equals ax plus b, is just a.

It's reassuring to see this result emerging from our expression.

Something else to take away from the simple example

is that we actually differentiated two things at once.

A 3 x bit, and a plus 2 bit.

We could have differentiated them separately,

and then added them together,

and still got the same result.

This interchangeability of the approach is called The Sum Rule, and it's pretty handy.

Let's now try a slightly more complicated example.

If we take the function f of x equals 5 x squared.

All we're going to do, is take this thing and put it

into our differentiation expression at the top.

So, f dash of x

equals the limit as delta x goes to 0 of well,

we've got five x plus delta x,

all squared minus 5 x squared,

all divided by delta x.

All we've got to do now is work through the algebra and see what comes

out of the other side, so it equals the limit.

Once again, as delta x goes to 0 off,

and let's expand this thing out.

We're going to think,

x plus delta x squared.

We're going to get x squared plus 2 x delta x,

plus delta x squared.

Which means we're going to get 5 x squared plus 10 x delta x,

plus 5 delta squared once again minus our 5r x squared here,

and all of that is divided by delta x.

And now what have we got on this top row here.

We've got a pair of 5 x squared.

So we can get rid of those because it's 5 x squared and minus 5 x squared over here.

Okay? And we can also see that we've got

a delta x in both the top terms and also in this bottom term here.

So we can get rid of that with this and this over here.

Okay? Let's write that line again,

so we can now say it's the limit as delta x to goes to 0 of,

we've just got 10 x plus 5 delta x.

Okay? We're looking at this thing so it's the limit has

delta x goes to zero of this expression here.

Now only the second turn has got a delta x in it.

What's going to happen,

as our delta x goes to zero,

we're going to just forget about this term.

It's going to become extremely small.

So we can write that this is equal to just 10 x.

So the derivative of the expression 5_x squared is just 10_x.

We can generalise the lesson from this example to

a rule for handling functions with powers of x.

For example, if we take the function f of x equals ax to the power of b,

and substitute it into our differentiation expression,

we will find that the derivative is always f dash of

x equals abx to the power of b minus 1.

The original power gets multiplied by

the front and then the new power is just one less than it was before.

This is known as the power rule,

and you can put this into your calculus toolbox along with sum rule, that we saw earlier.

You've now seen two examples in which we apply the limit of rise over run method,

to differentiate two simple functions.

As you can probably imagine,

if we wanted to differentiate a long complicated expression,

this process is going to become quite tedious.

What we are going to see in later videos are more rules,

like this sum rule and the power rule,

which will help us speed up the process.

However, before we do that,

we're just going to look at some fairly magical special case functions,

which differentiate in an interesting way.