Welcome to this first lecture on simple regression. We'll first just give an overview of what regression is, and then we'll concentrate on flesh other details of a method called simple linear regression. So, in this set of lectures, we will develop a framework for simple linear, logistic, Poisson, and Cox proportional hazards regression. We'll do that in the first section. Then the remaining sections, we'll focus on simple linear regression, which is a general framework for estimating the mean of a continuous outcome based on a single predictor, where the predictor may be binary, categorical or even continuous. So, let's just give an overview of regression in general to start. So, what I'd like you to do in this section, is re-familiarize yourselves with the properties of linear equation, because that will be paramount to everything we do regardless of the type of regression we're looking at. Be able to identify the group comparisons being made by a simple regression coefficient, regardless of the outcome variable type for the regression. So, I'm going to link in some of what we do in this these first steps, will be alternative ways of doing what we did in the first term, but we'll also show that we can extend those to include situations where we have, for example, continuous predictor, and we don't want to group it arbitrarily into two or a discrete number of groups. We can still make group comparisons across the levels of the continuous variable keeping the continuous. We'll also be able to build on this in a way that we couldn't extend the methods in the first term. But at first, you'll see some connections between some of the methods we did in the first term, and the simple regression models that we looked at. So, for example, comparing means between two or more groups, when we computed mean differences, and put confidence limits on them, and then also got p-values from things like t-tests or analysis of variance, these can be done via a simple linear regression model. Comparing proportions between two or more groups where we measured, we actually did risk differences relative risks, and odds ratios, but we can actually get odds ratios and a p-value for the comparisons via simple logistic regression model. By comparing incidence rates between two or more groups, computing incidence rate ratios, and getting p-values like we did with the log rank test, can be done via a simple Poisson, or Cox proportional hazards regression model. So, the basic structure of this simple models, is all the same on the right hand side, it's a linear equation. We're going to relate some function of an outcome to a linear equation which looks like this. It has an intercept which is the static value, plus a slope times another variable called, we'll just generically call it x1. So, generally speaking, we're taking some function of an outcome y and relating it to x1. Another variable through an equation that has an intercept, we'll call this intercept b0 or beta zero, and a slope beta one times other variable x1. Where x1 is what we call predictor of interest, and we'll see situations where a single predictor maybe need to be re-represented with more than a single x. We'll see that in the situation where our predictor is multi-categorical. But the simple in simple regression, refers to the fact that we only have one predictor, whereas multiple regression, which we'll get to in the latter part of this course, involves more than one predictor. Simple is not pejorative, and does not mean that these methods are easy per say, it just means that they have one predictor. The left-hand side, the function of y that we're estimating as a linear function of our x, depends on what variable type the outcome of interest is. So, for example, if we want to estimate some function of continuous outcomes, things measure, continuum, length of stay in days, or blood pressure in millimeters of mercury. What we'll be doing this with the left hand side, is we'll be relating the mean of our outcome y, the mean y-bar, and we'll be relating that to x. So, we'll be estimating different values of the mean of y for different values of x, and the regression type is linear regression. For binary outcomes, where we'll start with a one-zero measurement on each of our observations. We're ultimately going to summarize that across different groups defined by their x values in terms of the odds, but on the log scale. So, we'll define this in detail when we take on logistic regression, explained the reasons for the scaling, but what we're going to be doing, is relating the log odds of a binary outcome, the log of p over one minus p, to a given value of x. We're going to estimate different values of the log odds for different values of x, and the regression type for this is logistic regression. For time to event outcomes, where the individual event times in sensory names or not known, y our outcome is a yes-no indicator whether the event occurred in the common follow-up period. The number of cases occurred over a year each person who succumbed and became a case gets a value one, each person who did not gets a value of zero. The left hand side is the log of the incidence rate of this outcome over the common followup period in the regression type would be Poisson regression. For time-to-event outcomes where the individual events and censoring times are known y is a composite outcome taking into account both the time, and whether the event occurred. The left hand side will be the log of the hazard rate, which is very similar to an incidence rate, pretty much synonymous. But the regression type we can use when we have individual event times in censoring times is called Cox proportional hazards regression. But right here inside will always look like this. There will be a slight difference when we have Cox regression, but the form will be the same. Some intercepts with plus some slope times our predictor x1. The predictor of interests can be continuous, binary, or categorical, in which case it will be represented by more than one x, as we'll see shortly. So, let me tell you things play out generally when we have a binary predictor. Suppose x1 is a binary predictor, such as sex, we went to relate some outcome to sex, in other words compare the outcome ultimately between males and females. The resulting regression model will look something like this. We have a left hand side which will depend on what type of regression were doing, but it will equal to some intercept plus some slope times x1, where x1 is going to represent sex. So, the way we can represent a binary predictor as a variable, is to code one of the values as one, and the other level as the zero. So, for sex, we can arbitrarily code females as one, and males is zero, or do the opposite. But with this coding, suppose we code females as one and zeros for males. When this coding when x1 equals one when we're looking at the females, the left-hand side is equal to the intercept plus the slope times x1 value of one, or beta naught plus beta one. When we're estimating the left hand side for the reference group, the males that's simply equal to the intercept, because the x values for this group is zero. So, when zero is multiplied by the slope, the whole thing is zero, and we're left with just the intercept plus zero or the intercept. So, in other words, this intercept is the value of the left-hand side when x1 equals zero. Now if you look at this, the difference in these estimated values in the left hand side for these two groups, if I took the difference, the intercept cancels and the difference is just this slope. So, beta one, the slope is the difference in the value of the left hand side when x1 equals one compared to when x1 equals zero. So, for example, the difference in the left hand side for females compared to males. How could we code x when the predictor of interest is the nominal category for example? So, suppose we were doing a multi-site trial and we wanted to compare outcome differences between the sites, and we had three sites, Hopkins, University of Maryland, University of Michigan for example. So, for handling multiple nominal categories the approach is to designate one of the groups, one of the three groups as the reference category and create binary x's for each of the other groups. The coding or grouping here is arbitrary, but we just need to know who is coded by what. So for example, if we make Hopkins our reference group, we will need two additional variables to indicate observations from the University of Maryland and observations from the University of Michigan. So, just arbitrarily created a variable called x1 which takes on a value one, if our observations from University of Maryland, zero if not, an x2 equals one of our observations from University of Michigan, zero if not. If we fit a resulting regression, the functional form would look like this. Just to remind you these betas will all be when we have real data that'll be actual number, so this is just a generic algebraic version, but most examples we have there'll be actual numbers. So, the resulting regression model is whatever estimating with this equation, the left hand side is equal to some intercept plus slope beta one times x1, the indicator of Marilyn, plus sum slope times beta two times x2 indicator of Michigan. So, there's only resulting three possible estimates in the left hand side for all three possible combinations of x1 and x2. So when x1 equals zero, and x2 equals zero, we're dealing with a reference group that would be Hopkins in our example. The estimated left-hand side whatever we're estimating as a function of our x is simply the intercept for that group. When we're dealing with University of Maryland and x1 is equal to one, and x2 equals zero, our estimate the left-hand side is the intercept plus the slope beta one times one, plus the slope beta two times zero, or the intercept beta naught plus beta one. For the third group Michigan, x1 is equal to zero, and x2 is equal to one. The left hand side is that same intercept beta naught plus beta one times zero, plus beta two times one or beta naught plus beta two. So, the intercept estimates the left-hand side for the reference group in this case Hopkins, beta one is the difference in the left hand side between the group when x1 equals one, University of Maryland and that reference group of Hopkins, and beta two is the difference in the left hand side for groups with x2 equal to one, Michigan and that same reference group of Hopkins. So you may say, "Well, if you think about what we've just laid out, we had or relating some outcome to two groups when our predictor is binary, and getting a slope that compares the outcome between those two groups, when our predictor is multicategorical we're relating outcome to multiple groups and getting slopes to estimate group differences in the outcome." You might be saying this is just a more complicated way of setting up some things we did in term one. As I said before, "Yes, anything we did in term one can be represented as a regression." So, you might be thinking, "Why bother? Why go to this extra trouble?" But the beauty of regression, one of the beauties of regression that extends what we could do from the first term is that it also allows for continuous predictors unlike the methods we looked at in term one where we had to have discrete groups. So, this can be inefficient approach to handling measurements that are made continuously like age, height, etc, without having to arbitrarily categorize them into quartiles or older and younger etc,. If our outcome predictor association is well characterized by a line and we'll get into the details of assessing that in subsequent lecture sections. So for example, suppose x1 is age in years in the regression equation looks like this. The left-hand side is a function of age is some intercept beta naught plus some slope beta one times x1 which is age in years. How would we interpret the slope and the intercept? So hopefully, this is a flashback to algebra from your high school days. The intercept beta naught is the value in the equation of left hand side when x1 equals zero. Some of you may have seen this represented in a different way, very traditionally US we saw something like y equals b plus mx, where b represents the y-intercept. We're just replacing that b with the new symbol beta naught, and this is the value of the left hand side when x1 equals zero. If we're looking at the line on a graphic, this is the point on the graph where the line crosses the vertical axis when the x-coordinate is equal to zero and the y values equal to beta naught. So the slope, and again, if you've seen that other formulation y, our outcome equals b plus mx. I'm going to change this to more generically left-hand side, but in that formulation m is what represented the slope we're going to rename that beta one. If you haven't seen this version before, you've seen some other version, but hopefully, you can map what you remember of the slope to what we're calling beta one here. This is the change or the slant of the line describes the change in whatever the Y axis is which I'm going to read a left-hand side corresponding for a one unit increase in x1. So in this video here, the slope is positive and a one unit increase in x1 incurs an increase in the left hand side of beta one. Slope could be negative as well and then the line will be downward sloping and as x one increase, the left hand side would decrease in value. This slope is the change in the left-hand side corresponding to a unit increase in x1, in other words, it's a difference in the left hand side for any two values x1 plus one and x1, any two values who differ by one unit in x1. Anywhere across this line, this slope describes the difference in the left hand side values for two groups who differ by one unit in the predictor. All information about the difference in the left-hand side for two different values of x1 is contained in this slope. For example, if we had two values of x1 three units apart, for example, six and three or 28 and 25, the difference in the left hand side or our vertical axis will be three times that difference for a one unit difference in x1, or three times the slope. So, the slope is a single number that tells us everything we need to know about differences in the line values across the entire range of the line. So, we'll get into interpreting these intercepts and slopes in scientific contexts with real data starting in the next section, but regression is a general set of methods for relating a function of an outcome variable to a predictor via a linear equation of the form left hand side equals sum intercept plus sum slope times x1. We'll see that this as I noted before that this slope is a critical value giving us a numerical comparison of our left-hand side values for different groups who differ by different values in x1. When x1 is continuous, this slope covers all possible comparisons as multiples of x1 differences for different values in the left hand side. So regardless of whether the predictor x1 is binary, categorical, or continuous, the intercept beta naught is the value of the left hand side when x1 or all x's at the predictors multicategorical is or are equal to zero. Then beta one, the slope is the change in the value of the left-hand side for a one unit difference in x1. Even when x is binary that still holds, and we may have more than one slope if we have more than one x when we have multicategorical predictors. So, starting in the next section we'll take this out of the generic and get into the specific and start looking at some examples of estimating the mean of an outcome as a linear function of our predictor via a simple linear regression.