Before we can do regression, we need to know about the correlation coefficient. Remember that the scatterplot is very useful in visualizing the relationship between two quantitative variables. For example, in the left scatterplot, which shows education and income, we see that there seems to be an upward direction of the scatterplot. That is also true for the scatterplot that shows the heights of the fathers and their sons on the right. However, these two relationships are somewhat different. If we look at the relationship of income on education, we see that there seems to be kind of an upward sloping curve, which the scatter follows. On the other hand, if we look at the heights, we see that a scatter roughly follows a line. Finally, the last thing we can get out of a scatterplot is the strength of the relationship. That means, how closely do the points follow the form? In the example of the heights, we see the scatter is quite wide around the line, whereas in the example of the incomes, it's more closely around the curve. In the case where the scatter clusters around a line, it's very useful to summarize that clustering with a correlation coefficient r. It's not worth to remember the formula for r, but let's look at what's going on there. You see that we are looking at the standardized values of the x's, and the standardized values of the y's. And then we multiply them together and take the average of all the observations. Now, you see that this product here is positive if both x is above x-bar and y is above y-bar, and it could also be positive if x is below x-bar and y is also below y-bar. So, this contribution will be positive if both x and y vary in the same direction from the averages, and it will be negative if they vary in opposite directions. So the idea of the correlation coefficient is that the correlation coefficient should be positive if the scatter slopes upward, and it's negative if it slopes downward.