0:00

[SOUND] So the first thing we need to do in this course on Linear Algebra is to

Â get a handle on vectors, which will turn out to be really useful for

Â us in solving those linear algebra problems.

Â That is the ones that are linear in their coefficients

Â such as most fitting parameters.

Â We're going to first step back and

Â look in some detail at the sort of things we're trying to do with data.

Â And why those vectors you first learned about in high school were even relevant?

Â And this will hopefully make all the work with vectors a lot more intuitive.

Â 1:12

So we could fit it with some curve that

Â had two parameters, mu and sigma.

Â I would use an equation like this.

Â I'd call it f(x),

Â some function of x where x is the height,

Â = 1 over sigma root 2 pi times

Â the exponential of - (x - mu) squared

Â divided by 2 sigma squared.

Â So this equation only has two parameters, sigma here and mu, and it looks like this.

Â And it has an area here of 1,

Â because there's only 100% of people in the population as a whole.

Â Now don't worry too much about the form of the equation.

Â This is what happens, this is called the normal or Gaussian distribution.

Â And it's got a center of mu and a width of sigma.

Â And it's normalized, so that it's area is 1.

Â 2:27

Imagine that we had guessed that the width was wider than it really is, but

Â keeping the area of 1.

Â So if we guessed that it was a fatter and probably a bit shorter distribution,

Â something like that, say.

Â So this one has something like that.

Â This one has a wider sigma, but it's got the same mu.

Â It'd be too high at the edges here, and too low in the middle.

Â So then we could add up the differences between all of our measurements, and

Â all of our estimates.

Â We've got all of these places where we underestimate here, and

Â all of these places where we overestimate here.

Â And we could add up those differences or, in fact,

Â the squares of them to get a measure of the goodness or badness of the fit.

Â And we'll look in detail at how we do that once we've done all the vectors

Â work, and once we've done actually all the calculus work, then we could plot how

Â that goodness varied as we change the fitting parameters, sigma and

Â mu, and we get a plot like this.

Â So if we had a correct value, our best possible value for mu here,

Â and our best possible value for the width, sigma here.

Â 3:35

We could then plot, for a given value of mu and sigma, what the difference was.

Â So if we were at the right value, we'd get a value

Â of goodness where the sum of the squares of the differences was nought.

Â And if mu was too far over, if we had mis-estimated mu and

Â we got the distribution shifted over, so the width was right, but

Â we had some wrong value of mu there, that we get some value of all the sums of

Â the squares of the differences of goodness being some value here that was higher.

Â And it might be the same if we went over the other side and

Â we had some value there.

Â And if we were too wide, we'd get something there or too thin,

Â we'd get something that was too thin like that, something like that, say.

Â So we'd get some other value of goodness.

Â We could imagine plotting out all of the values of where we have the same

Â value of goodness or badness for different values of mu and sigma.

Â And we could then do that for some other value of badness, and

Â we might get a contour that looked like this, and

Â another contour that looked like this, and so on and so forth.

Â Now, say we don't want to compute the value of this goodness parameter for

Â every possible mu and sigma.

Â We just want to do it a few times, and

Â then find our way to the best possible fit of all.

Â 4:55

Say we started off here with some guess that was too big a mu and

Â too small a width.

Â We thought people were taller than they really are,

Â and that they were tighter packed in their heights than they really are.

Â But what we could do is we could say well,

Â if I do a little move in mu and sigma, then does it get better or worse?

Â And if it gets better, well, we'll keep moving that direction.

Â So we could imagine making a vector of a change in mu and a change in sigma.

Â And we could have our original mu and sigma there.

Â And we could have a new value, mu prime, sigma prime, and

Â ask if that gives us a better answer?

Â If it's better there or if mu prime sigma prime took us over here?

Â If we were better or worse there, something like that.

Â 5:44

Now actually, if we could find what the steepest way down the hill was,

Â then we could go down this set of contours, this sort of landscape here

Â towards the minimum point, towards the point where get the best possible fit.

Â And what we're doing here,

Â these are vectors, these are little moves around space.

Â They're not moves around a physical space,

Â they're moves around a parameter space, but it's the same thing.

Â So if we understand vectors and we understand how to get down hills,

Â that sort of curviness of this value of goodness, that's calculus.

Â Then once we got calculus and

Â vectors, we'll be able to solve this sort of problem.

Â So we can see that vectors don't have to be just geometric objects in the physical

Â order of space.

Â They can describe directions along any sorts of axes.

Â So we can think of vectors as just being lists.

Â If we thought of the space of all possible cars, for example.

Â So here's a car.

Â There's its back, there's its window, there's the front, something like that.

Â There's a car, there's the window.

Â We could write down in a vector all of the things about the car.

Â We could write down its cost in euros.

Â We could write down its emissions performance in grams of CO2 per

Â 100 kilometers.

Â We could write down its Nox performance,

Â how much it polluted our city and killed people due to air pollution.

Â We could write down its Euro NCAP star rating, how good it was in a crash.

Â We could write down its top speed.

Â And write those all down in a list that was a vector.

Â That'd be more of a computer science view of vectors,

Â whereas the spatial view is more familiar from physics.

Â In my field, metallurgy, I could think of any alloy as being described by a vector

Â that describes all of the possible components,

Â all the compositions of that alloy.

Â Einstein, when he conceived relativity,

Â conceived of time as just being another dimension.

Â So space-time is a four dimensional space, three dimension of metres, and

Â one of time in seconds.

Â And he wrote those down as a vector of space-time of x, y, z,

Â and time which he called space-time.

Â When we put it like that, it's not so

Â crazy to think of the space of all the fitting parameters of a function, and

Â then of vectors as being things that take us around that space.

Â And what we're trying to do then is find the location in that space,

Â where the badness is minimized, the goodness is maximized, and

Â the function fits the data best.

Â If the badness surface here was like a contour map of a landscape, we're trying

Â to find the bottom of the hill, the lowest possible point in the landscape.

Â So to do this well, we'll want to understand how to work with vectors and

Â then how to do calculus on those vectors in order to find

Â gradients in these contour maps and minima and all those sorts of things.

Â Then we'll be able to go and do optimizations, enabling us to go and

Â work with data and do machine learning and data science.

Â 8:39

What we've seen is the function we fit, whatever it is, has some parameters.

Â And we can plot how the quality of the fit, the goodness of the fit,

Â varies as we vary those parameters.

Â Moves around these fitting parameter space are then just vectors

Â in that space of fitting parameters.

Â And, therefore, we want to look at and revisit vector maths in order to be able

Â to build on that, and then do calculus, and then do machine learning.

Â [SOUND]

Â