So, in this next set of lectures, we'll first give an overview of multiple regression techniques for estimation, adjustment, and basic prediction, and we'll then focus our efforts on a specific type of multiple regression, multiple linear regression which will be followed in subsequent lecture sets by multiple logistic, et cetera. So, in this set of lectures, we will first develop a framework for multiple linear, logistic, and Cox proportional hazards regression in the first section. Then the remaining sections, we'll focus on multiple linear regression, which is a general framework for estimating the mean of a continuous outcome based on multiple predictors. So, we're basically going to extend the idea of simple regression to allow for more than one predictor in a single model. So, let's first give an overview that cache of techniques that fall under the umbrella of multiple regression. So, what we'll be able to do after looking at this lecture section, is identify the group comparisons being made by a multiple regression slope, regardless of the outcome variable type. Whether it's continuous and we're doing a linear regression, binary with logistic, or time-to-event with a Poisson or Cox proportional hazards regression. We're going to be able to appreciate that multiple regression allows for an outcome to be predicted by taking into account multiple predictors in one method. It also allows for the easy adjustment of a relationship of interest in the presence of potential confounding variables. Also realized the approach to creating confidence intervals for multiple regression, intercepts and slopes will hold no surprises, it's more of the same. So, regression provides a general framework for the estimation and testing procedures we've covered in the first term, and we showed that we could represent a lot of the techniques we saw as simple regressions in the first couple of lecture sets. We could also extend what we've done the first term via simple regression techniques by allowing for continuous predictors as well. Well, multiple regression continues this allowance for extensions of these methods from term one, to also allow for multiple predictors of an outcome with a single method and allow for the estimation of adjusted associations via comparing simple and multiple regression models. So, as we saw in simple regressions already, regression allows for predictors to be binary, categorical, or continuous and that is also the case for multiple regressions. So, the ability of the model to predict the function of the outcome estimated by the model can potentially be improved by using more than one predictor at a time. So, the basic structure of these multiple regression models is just an extension of what we set up as the basic structure for simple regression models. We have some function of an outcome depending on what our outcome looks like, is equal to an intercept, plus a slope times a predictor x_1, plus a slope times a predictor x_2, et cetera. More mathematically, we can write this generically say some function of an outcome y given x, this is x_1 through x_p, equals an intercept plus slope times each of the respective xs. So, x_1 through x_p represent the predictors of interest. Just to be clear, these p xs is may represent up to p predictors. If each x was its own predictor, then there would be p predictors in the model, but we might have p xs that represent less than p predictors if one or some of the predictors are multi-categorical and hence require multiple xs to model them. As with simple regression, the left-hand side, the function of our outcome depends on what variable type the outcome of interest is. So, for continuous measures the left-hand side, the function we estimate will be the mean of the outcome y for a given set of xs and the regression type is linear regression. For binary outcomes, the left-hand side, we start with one, zero variables and we turn them into proportions which we turn into odds. Well, the computer does and then ultimately the function we model as a linear function of our predictor is the log odds of the outcome. The log odds or p over one minus p, where p is the probability that the outcome occurs, that y equals one and the regression type for this is multiple logistic regression. For time-to-event outcomes we either have Poisson regression. We can use this when either the individual event times and censoring times are not known, we can do Poisson regression or certainly if they are known, we can also do Poisson regression, but we can also take into account the individual level information as well without grouping it, which we'd have to do for Poisson using Cox proportional hazards regression. As with everything else we've done this far in this course, we will only be able to estimate the regression equation from a sample of data, so just to indicate the estimates on putting hats on with the intercept and all the subsequent slopes. So, of course that ultimately means that we'll have to deal with the uncertainty in these estimates and do things like put confidence intervals on them and get p values when interested. So, the right-hand side, the Beta naught hat plus Beta one hat x_1, plus Beta two at x_2 et cetera, includes the predictors of interest, the xs of interest, x_1 through x_p. These can represent binary, categorical, or continuous predictors. Then each slope estimates the difference in the left-hand side like we saw before for a one unit difference in the corresponding x. But now that we have multiple predictors in the model, it's adjusted for the other x variables or other predictors in the model. We'll drill down on this in detail with examples for each type of regression. What the intercept is going to estimate is the left-hand side when all over xs are zero. So, let's just do a generic example to give a little more illustration of what I just defined generically. So, suppose we estimate a multiple regression with three predictors, it's a study on intravenous drug users in four cities and we have some outcome could be binary, continuous, timed event, that we want to models of function of our xs. The reason we have five xs with three predictors, is because one of the predictors is multi-categorical. So, the first x is just the indicator of the sex, the biological sex of the participant. One for female, zero for male. The second predictor is nominal categorical. There's four cities that this study takes place in and so we need three indicator variables. The four cities are Baltimore, London, Delhi, and Cape Town. So this is truly a global study. We'll make Baltimore the reference and we'll have indicators for the other three cities. Then our predictor x_5 is going to be the age of the participant measured in years. So, how would we generically interpret these slopes? Well, the first x is for sex, a one for female and a zero for male. So, just like we saw in simple linear regression, this slope is going to compare the left-hand side is going to be the difference in the left-hand side for females compared to males, but that's where we'd stop. In simple regression, we wouldn't qualify it any further. But now that we have other predictors in here including city of where the person was from and their age, this is now a difference in the left-hand side for females compared to males adjusted or taking into account those other two characteristics. So, we talked about adjusted estimates in the lecture set on confounding, and how they were useful and what they meant conceptually and now we're seeing a way to operationalize this pretty painlessly by using multiple regression techniques. Similarly Beta five is the difference in the value of the left-hand side for subjects who differ by one year in age adjusted for sex and the city where the participant is from. If we look at the slopes for the xs that make up the multi-categorical predictor of city, Beta two, Beta three, and Beta four, the respective differences in the left-hand side between London and Baltimore, Delhi and Baltimore, and Cape Town and Baltimore. These are now differences adjusted for sex and age of the participants. So, if there are different sex and age distributions across the different sites, there's the potential for confounding. Now we have adjusted estimates that we can call compared to their unadjusted counterparts from a simple regression where city is the only predictor. How do we get confidence intervals and p-values for individual slopes and intercepts? Well, these will be generically computed as we saw before. We'll take our estimated intercept or slope and add or subtract two estimated standard errors. Which of course will come from the computer, and hypothesis testing for any individual slope or intercept will be done the same way as always. We'll compute the standardized distance of our estimate from what we'd expect it to be under the null of no association, which will be zero. So, we convert that into standard errors and then take this distance and figure out how far our result is from what we'd expect zero in standard errors and whether it's far or not by turning to p-value and looking at the chances of getting the result as farther if our null about that particular quantity is true. So, we're going to introduce a slightly new concept here, just conceptually and I'll point it out and how to interpret the resulting p-value. When we estimate a multiple regression model that includes multi-categorical predictors technically speaking, in order to test whether the multi-categorical predictors. So, in this case we have three xs to represent city. Technically speaking in order to test formally whether that predictor is associated with the outcome the left-hand side or not after adjusting for the other things in the model, we can't just look at the p-values for each of the individual slopes here, we need to test all three slopes at once and test what's called the joint null that all three together are equal to zero. So, why is that? Well, just think about this in the context of what we've done here. We said that Beta two hat is the difference between London and Baltimore. Beta three was the difference in the left-hand side for Delhi compared to the same reference of Baltimore. Beta four is Cape Town compared to the same reference in Baltimore. So, if we look at the three p-values for each of these, we're only testing three of six possible differences. We test the p-value for Beta two hat tests specifically that the difference in the left-hand side between London and Baltimore is zero. P hat for Beta three tests whether the difference between Delhi and Baltimore is zero, et cetera. But what we don't test by looking at those three individual slopes is we're missing the difference between Delhi and London, between Cape Town and London, and Cape Town and Delhi. We're simply missing those because of how we coded things, if we had coded differently, we would get those. So, it turns out in order to answer the question unequivocally of whether say city is associated with the outcome, we will want to test all three at once. Now you might say, well John, if the difference between London and Baltimore is significant, we already know that at least there's some differences between the cities. I would agree with that. So, you might say I could look at the individual p-values and get that for each of those three comparisons and that's true, but what if each of these three comparisons in the way we've coded it, none of these differences were significant? That just hypothetically the differences between London, Delhi, Cape Town each of them and Baltimore's respectively none of those three were significant. Well, it doesn't necessarily mean that the relationship between the left-hand side and region is not significant because there's three other comparisons we're not ceiling directly. So, this will catch that. So, we can have situations where because in the way we've coded things, none of the coefficients for our xs for the multi-categorical indicator are statistically significant, but as a predictor as a whole it's still statistically significant. So, that's the reason we need to do this more extensive test when we have multi-categorical predictors. We'll talk more about the mechanics of it, needless to say the computer will handle it, but I'll try and shed some light on what's going on behind the scenes for each of the types of regressions we look at specifically. So, in summary, we'll be moving forward and we'll get into some numerical examples to solidify this for linear regression next and then we'll follow up with this with each of the other types of regression we've covered in the course.