So, how does regression work? Let's look at this scatter, it shows the age and the percent of body fat for a number of people. This scatter appears roughly linear. So this is a case where we might use regression. The idea is that we want to summarize this scatter by a line. This line should look something like this one, and the question is how do we get it? Recall that the equation of a line looks like this, you have an intercept a, plus a slope times the argument x. So, if we plug in the ith x variable, then the equation of a line gives us a point which we call yi hat. The idea of finding the line is we want to look at those values of a and b that minimize the difference between the point on the line yi hat, and the true observed value yi. One way to do that is to look at the difference between yi and yi hat, square it up, and sum it over all the observations, and then we want to find a and b to minimize that sum. Minimizing that sum in a and b can be done either by calculus or simply using software on the computer. The whole idea is called the method of least squares. It turns out that the solutions involve all the summary quantities that we looked at before. The slope equals the correlation coefficient times the ratio of the two standard deviations, and the intercept involves the means as well as the slope previously derived. This line is called the regression line. Now there's another interpretation of the regression line. It computes the average value of y, when the first coordinate is near x. Remember, the idea in statistics is that an average is oftentimes a best predictor. So by computing averages of y values, where the first coordinate is near x, we finesse that argument somewhat to incorporate the information given to us in terms of x. And the idea would be that this is a better predictor of y than simply taking the average of all the y's.