When the explanatory variable has more than two levels, the chi-square statistic and associated p value, do not provide insight into why the null hypothesis can be rejected. It does not tell us what way the rates of nicotine dependence are not equal across the frequency categories. There are of course many ways for the rates to be unequal. Having each of them as unequal to the other is just one of them. >> Maybe there are only two of the population rates that are not equal to one another. To determine which groups are different from the others, we will again need to perform a post hoc test. By conducting post hoc comparisons between pairs of rates in a way that avoids excessive type one error, in other words, avoids rejecting the null hypothesis when the null hypothesis is true. We will be much better able to appropriately describe which population rates are different from the others. If we reject a null hypothesis, we need to perform comparisons for each pair of nicotine dependent's rates across the six smoking frequency categories. In the case of six groups, we actually need to perform 15 pair wise comparisons. With these red brackets I'm illustrating 15 paired comparisons that we'll need to conduct. As you can see, there are so many it's actually difficult to illustrate this graphically. >> If you'll recall, the family-wise error rate for 15 different comparisons is .54. This means that if we do not protect against type 1 error, we will be wrongly rejecting the null hypothesis in saying that there's an association over half the time. Having about a 50 50 chance of being right would obviously give us absolutely no confidence in our decisions. >> So, to appropriately protect against type 1 error in the context of a chi-squared test, we will use the post doc approach known as the Bonferroni Adjustment. The goal of using the Bonferroni adjustment, is to control the family-wise error rate, also known as the maximum overall type 1 error rate. So that we can evaluate which pairs of nicotine dependence rates are different from one another. >> Briefly, the process would be to conduct each of the 15 paired comparisons. But rather than evaluating significance at the p .05 level, we would adjust the p value to make it more difficult to reject the null hypothesis. The adjusted p value is calculated by dividing p 0.05 by the number of comparisons that we plan to make. So if we make 3 comparisons we would only reject the null hypothesis if the p value were .017 or less. For the 15 paired comparisons that we planned to make to better understand the association between smoking frequency and nicotine dependence, our adjusted p value is .003. Adjusting the p value is definitely the easy process. Now for the more challenging piece. For the actual post hoc testing, we need to run a chi-square test for each of the 15 paired comparisons. To do this, I can add syntax to the program where I create new variables that allow me to choose only two smoking frequency groups at a time. There are multiple ways to do this. But I am going to employ recodes in conjunction with the map functions. So here I start by comparing the usual smoking frequency per month group 1, and the usual smoking frequency per month group 2.5. First, in this new recode2 object, I am keeping the 1 and the 2.5 value as is. That is 1:1 and 2.5:2.5, but I'm excluding from the statement all other values in the USFREQMO variable. Next, I name this new variable that I intend to have only two levels. Here I'm calling it COMP1v2. And then, setting it equal to USFREQMO, and using the app function to give the new COMP1v2 variable the values that are justified in the recode object. Now, I add this new COMP1v2 variable to the crosstabs, colpct, and chi-square syntax. If I save and run this program, I can get a new chi-square table that includes only those two frequency groups. 1.0 versus 2.5 by the presence or absence of nicotine dependence. Again, I want to focus here on the column percentages. 9.86% and 18.46%. Are these two rates significantly different from one another? If I look down at my chi-square value and probability value, a p value of 0.23, I can see that they aren't. So I want to accept the null hypothesis since this probability value is not only not less than 0.05, it is definitely not less than my Bonferroni adjusted p value of 0.003. Going back to my graph showing the rates of nicotine dependence for each smoking frequency group, I am going to use letter notation to designate the first two rates with the same letter. That is capital A, indicating that they do not differ significantly from one another. But this is just the first step in our post hoc analysis. Now we need to run two-by-two chi-squares for each of the remaining 14 paired comparisons. Here's the syntax requesting a chi-square analysis comparing those smoking one day per month and those smoking, approximately, 6 days per month. Per our code, we can see that only smoking frequency groups equal to 1 and equal to 6 are included in the chi-square table and analysis. The nicotine dependence rates are 9.86% and 21.59%. The p value associated with the chi-square statistic is 0.07. Even if it were 0.04, initially, we might want to say that this is a significant finding because it is less than .05. Remember though, that the adjusted p value for this comparison is 0.003. So, to be significant, the p value would need to be 0.003 or smaller. >> Going back to the graph of nicotine dependence rates. We now know that frequency groups equal to 1 and equal to 6 do not have significantly different rates of nicotine dependence. We'll illustrate this by again adding the same letter to the smoking frequency group equal to 6. >> Here's the code for several more paired comparisons. Basically, we repeat the same lines of syntax for each of the 15 paired comparisons. Changing only the name of the recode object, its values. The name of the new variable to reflect the specific comparisons. And the names of any new objects that we create for our calculations or our printing results. When we run this program, the results include the overall chi-square table. That is, the sixth level smoking frequency variable by the nicotine dependence response variable, and then the chi-square tables for each of the paired comparisons that we have requested. >> The goal is to examine the p value for each of the paired comparisons, and to use the adjusted Bonferroni p value of 0.003 to evaluate significance. Here, we've created a table that shows the p values for each of the paired comparisons from the output. Obviously, there are several that are less than p is 0.05. Here are the p values that are less than 0.003. As we can see, smoking frequency group 30, that is, those who smoke 30 days in a usual month, is significantly different from each of the other smoking frequency levels. In addition, smoking frequency group 1 has significantly different nicotine dependence rates, than smoking frequency groups of 14 and 22. Using the letter convention, in which nicotine dependent rates with the same letter are not significantly different, these post hoc findings can be pictured like this. Here's another way we could picture the significant differences between rights. As you can see, the more differences there are, the more challenging the visualization can be to create.