While we now have evidence that depression is significantly associated with the number of nicotine dependent symptoms endorsed by young daily adult smokers, our sample. Another likely predictor of nicotine dependent symptoms is, of course, the number of cigarettes a person smokes each day. >> What if, number of cigarettes is associated with both our explanatory variable, major depression, and response variable, nicotine dependent symptoms? What if it is really smoking, rather than major depression, that is associated with number of nicotine dependent symptoms? To evaluate whether this is true, I add number of cigarettes smoked per day to my model. >> Before doing this though, we want to make sure that our categorical explanatory variables have one group that is coded zero and that we center our quantitative variables. Our major depression variable is already coded. One equals depression and zero equals no depression. However our quantitative number of cigarettes smoked variable ranges from 1 to 98. Because zero is not a valid value for this variable we should center the variable by subtracting the mean number of cigarettes smoked from the actual value for each observation. Going back to the SAS program for this example, we first need to find the mean of the number cigarettes variable by using the means procedure. The mean is 13.3642586. We can add this code to center the number of cigarettes smoked variable. Because we are doing some additional data management, we need to create a new temporary data set, in SAS all data management that we do must be within a data set. So we create a new data set new2 from the data set we were currently working with, which is called new in this example. Then we create a new variable called numberscigsmoked _c, which equals the name of our original number of cigarettes variable minus the mean, followed by a semicolon. Then we type run; to run the code. We can check to see if the variable is properly centered by calculating the mean of our centered variable using the means procedure. We can see the mean is equal to negative zero point then seven zeroes four four eight seven which essentially centers the variable at zero. We can now use the centered variable in our regression analysis. Here is the output. We examine the P values and parameter estimates for each predictor variable, i.e., our explanatory variable, depression, and our potential confounder, number of cigarettes smoked. As you can see, both P values are less than 0.05. And both of the parameter estimates are positive, indicating that having major depression and smoking more cigarettes is associated with having a greater number of nicotine dependent symptoms. >> Thus, we can conclude that both major depression and number of cigarettes smoked are significantly associated with number of nicotine dependent symptoms. After partialing out the part of the association that can be accounted for by the other. In other words, depression is positively associated with a number of nicotine dependence symptoms after controlling for number of cigarettes smoked. And number of cigarettes smoked is positively associated with number of nicotine dependent symptoms after controlling for the presence or absence of depression. Note that if a parameter estimate is negative, and the p-value is significant, it would mean that there was a negative relationship between that variable and the response variable. >> Suppose we started with a different explanatory variable. Dysthymia is pervasive, low level depression that lasts a long time. Often a few years. Suppose we wanted to test the linear relationship between dysthymia, a binary categorical explanatory variable, and number of nicotine dependent symptoms, a quantitative response variable. The code would be, PROC GLM;, model NDSymptoms = DYSLIFE/ solution;. You can see from the significant p value in positive parameter estimate, that dysthymia is positively associated with number of nicotine dependence symptoms. That is, the presence of dysthymia is associated with a larger number of nicotine dependence symptoms. And the absence of dysthymia is associated with a smaller number of nicotine dependence symptoms. While dysthymia is long-lasting, low-level depression, Major depression is a disorder characterized by a discreet episode of severe depression. So what happens when we control for major depression in this model? As you can see dysthymia is no longer significantly associated with number of nicotine dependence symptoms after controlling for major depression. Here we have an example of confounding. We would say that major depression confounds the relationship between dysthymia and number of nicotine dependence symptoms, because the P value for dysthymia is no longer significant when major depression is included in the model. As in the previous example, using multiple regression we can continue to add variables to this model in order to evaluate multiple predictors over quantitative response variable, number of nicotine dependence symptoms. Here we can see that when evaluating the independent association among several predictor variables, and number of nicotine dependent symptoms, major depression and number of cigarettes smoked are positively and significantly associated with number of nicotine dependence symptoms while dysthymia, age, and gender are not. >> Know also that we centered our quantitative age variable by subtracting the mean age from the actual age per each observation following the same procedure we used to center our number of cigarette smoked explanatory variable.