[MUSIC] Now I want to introduce multiple regression. Multiple regression is normally the subject of a semester-long course. And you might at some point, if you pursue your graduate training, take such a course. But I can try to give you some flavor of what it is right now. So multiple regression involves relating a Y variable, an outcome variable, to multiple X variables rather than just 1. So the example I just gave you, you're always looking at one variable as a function of a single other variable. But imagine the scenario where we're actually interested in controlling for or accounting for the influence of multiple variables on some outcome that we're interested in. A very good example would be looking at income. We've talked a lot about the relationship of income and education. But we know a lot of other factors influence income as well in addition to education, age, years of work experience, sex, ethnicity and all sorts of other things. College major, you can think of all these as potential X variables, all things that we think probably influence Y variable income. Multiple regression is a tool for looking at the influence of all of these variables simultaneously on some outcome variable. So if we have in our example income as a function of years of education, age, and years of work experience. And we decide that X1 is years of education, then the coefficient that we get for X1 is going to be the average change in income, associated with a one unit change in our measure of income that is increasing the total number of years of education by one. Importantly, this assumes that none of the other variables are changing. That, in this example, years of work experience and age are locked down. So in the multiple regression context, the variable, X1, the coefficient that we get for it measures an association, a average change in the outcome, among people who's education changes by one year but whose age and then years of experience remain unchanged. So you could think about a comparison between different people who are of the same age, been in the workforce for the same number of years, but differ in terms of education. Then a coefficient for another variable, maybe it's age, we'll call X2 age will reflect the effects of a one year change in age on an outcome variable Y. Again, holding constant or holding equal the values of the other variable so the coefficient we get for X2 represents the average change in income when we compare people whose ages differ, but who have identical education and identical years of work experience. So multiple regression, because of its ability to handle multiple variables at the same time and then measure their partial effects on an outcome is a common tool for trying to account for problems with lacking or omitted variables like we talked about in previous lectures. If we're trying to isolate the effects of education on income we might try singles all of the possible things that on a one hand might be associated with income and which my also be associated with education. And then introduce them as additional X variables to control them so that the effect that we observed, or at least the association that we observed, for education and income was among people with varying levels of education who were identical on the other variables in the model. So for decades, this was the mainstay of a lot of quantitative analysis of social data, to try to identify causal associations by controlling for all of the possible confounding, or omitted variables. Now, of course, for this to be successful, you actually need to be able to measure all of those omitted variables and it's not always possible. We can always speculate about some other variables that might be out there that in fact no survey has yet measured, or which are impossible to measure. And so if you go on to advance studies, you'll learn about more advanced forms of regression to account for unobserved variables. Variables that we can't observe at all, but where people in our study might share some characteristics. So these are fixed or random effects models. You'll also learn about other extensions to regression to account for partial cases like the situation where perhaps instead of trying to predict the behavior of a outcome variable that is continuous like income. In fact, we want to do with the prediction of which categories some outcome is. So going beyond the tabulations that we talked about in the previous module. There are in fact regression based approaches for modeling, which category people may end up in, as a function of both categorical and continuous variables on the right hand side. So there's a lot to learn if you want to make use of regression. I hope this has given you at least a little bit of a taste, of what a regression coefficient actually means and what a correlation coefficient actually means enough to help you a bit as you read papers and so forth where people talk about regression coefficients. But you'll have to take additional courses if you really want to learn how to use these techniques properly.