0:00

Welcome to calculus. I'm Professor Ghrist, and we're about to

Â begin lecture 14, Bonus Material. Well let's consider a more involved

Â example, this one motivated by a problem in statistics.

Â Lets say that you run an experiment and it gives you some data that is of the

Â following form, y equals m times x. That is you measure x values and y

Â values. And you know that there's some linear

Â relationship between them, but you don't know the value of m.

Â Perhaps this is coming from a physical experiment, like trying to measure a

Â spring constant by measuring force and deflection.

Â Whatever the physical motivation, you are given some collection of data points, but

Â these data points are noisy. They don't fit perfectly on the line.

Â How do you determine the appropriate value of m?

Â Well you could just draw a line and try to make it fit, and look right.

Â But wouldn't it be nice to have a more principled approach?

Â This is what statistics is meant for. So, let's assume that the inputs to your

Â problem are a collection of data points. x values, x sub i and y values, y sub i,

Â that are paired. Now, in order to find the appropriate

Â value of m, we're going to write this as an optimization problem.

Â The method of least squares is a wonderful technique for determining the

Â optimal m. Consider the function s depending on m

Â that is given by the following. I'm going to look at the vertical

Â distance between the data points, and the line of slope m.

Â This vertical distance is given by y sub i minus m times x sub i.

Â What I'm going to want to do is add up all of those distances and then minimize.

Â Now there's a bit of a problem in that these distances are signed, they're

Â positive or negative because I'm really just looking at the change in y values.

Â So let's square that term, we have y sub i minus mx sub i, quantity squared.

Â And now let's sum all of those terms up over i.

Â This is going to give you a deviation of the data from the line of slope m.

Â This function depends on m. If we chose a value of m like 0, well

Â that would that would give a very large value of s.

Â In this case, what we want to do is find the value of m that minimizes this

Â deviation s. So let's proceed.

Â If we compute the derivative of s with respect to m, what would we get?

Â This looks scary, but it's not so bad. Differentiation is linear, so we can pass

Â the derivative inside the summation sign. Now, using the chain rule what do we get?

Â Well we get twice quantity y sub i minus m times x sub i, times the derivative of

Â that quantity with respect to m. That derivative is negative x sub i.

Â Now if we distribute this multiplication and expand out into 2 sums we get minus 2

Â times the sum of over i of xi times yi plus 2 times m times the sum over i, xi

Â squared. We can factor out that two and that m

Â because they appear in every summation term.

Â Now our goal is to compute the minimum. So we find the critical point by setting

Â this derivative equal to zero. Moving one sum over to the other side, we

Â see that 2 times the sum over i of xi, yi is equal to 2m times the sum over i of xi

Â squared. What is it that we're trying to solve

Â for. We're trying to solve for m and so

Â cancelling the 2s and then dividing both sides by the sum of xi squared gives a

Â value of m equal to the sum over i of xi times yi divided by the sum over i of xi

Â squared. The question remains is this critical

Â point a local min or a local max? Well you might guess that it's a local

Â min, but how would you show it for sure? Well, if we compute the second derivative

Â of s with respect to m, what will we get? It looks complicated, but there's really

Â only one m in that first derivative. And so, treating everything as a

Â constant, we get that the second derivative is simply 2 times the sum over

Â i of xi squared. What do we note about that?

Â Well we don't care what the xi values are.

Â When we square it, we get something non negative.

Â So as long as sum of the xi terms are positive, we get a positive second

Â derivative, and a minimum. This value of m is going to minimize our

Â deviation and give us a best fit line. Now, what happens if our experiment is a

Â little bit different? The line that we're looking for doesn't

Â necessarily pass through the origin. Well it doesn't seem as though the

Â problem has really changed much at all. We're just again looking for a straight

Â line. But now we have to worry about not only

Â the slope, but also the y intercept which we might call b.

Â We're looking for a line of the form y equals m x plus b.

Â I wonder, could we do the same thing? Well the vertical distance would involve

Â a b term in this s function. And now, this function would depend not

Â only on m but on B. And this leads us to some very

Â interesting questions because we do not know how to find a max or min of a

Â function that depends on more than one input.

Â This is really a problem that you're going to come back to in multivariable

Â calculus. When you add function with several

Â inputs, how do you do optimization? Well I've got to tell you, some unusual

Â things can happen in that context. But, those unusually situations wind up

Â opening a whole new world of interesting questions and applications for example,

Â gain theory, deals with optimization of multi-varied functions.

Â Linear programming, machine learning, all of these fascinating subjects, are deeply

Â concerned with optimization of finding maxima, minima and other types of

Â critical points. There are some wonderful fields out there

Â that will rely on the intuition that we've learned in single variable

Â calculus.

Â