But what if my categorical explanatory variable has more than two groups? In this example I'm going to examine the association between ethnicity and smoking quantity. The explanatory variable, ethnicity, actually has five levels or groups, 1 is White, 2 is Black, 3 is American Indian, Alaskan native, 4 is Asian native, Hawaiian Pacific Islander and 5,Hispanic or Latino. So by running an analysis of variance, we're asking whether the number of cigarettes smoked differs for different ethnic groups. >> This time I'm going to create a new data frame before I run this new model. I will call it sub3 and include in it my response variable, NUMCIGNO_EST And my new five level explanatory variable measuring ethnicity, ETHRACE2A. I again include the drop NA function in the statement so the data frame only includes observations with valid data for both my explanatory and response variables. I will use the same model statements but change the data frame to sub3 and the explanatory variable to ETHRACE2A.. Again, it's important that I indicate to Python that this is a categorical variable. Then we'll save and run the program. Here again, we see our F statistic, an associated p value for our explanatory variable with more than two levels. This time the f statistic is 24.4. And the p value is written by Python in scientific notation. So we know to move the decimal point to the left 19 times, or in other words, to add a decimal point and 18 zeros in front of the 1.18, resulting in an extremely small P value. So this tells me I can safely reject the null hypothesis and say that there is an association between ethnicity and number of cigarettes smoked. All the means are equal. I could eyeball each mean by viewing the output from code using the group by function. And I could make a guess as to which pairs are significantly different from one another. For example, the ethnic group with the lowest mean number of cigarettes smoked per month among young adult smokers is ethnic group five, Hispanic or Latino. And the group with the highest number of cigarettes smoked per month is ethnic group one, white. >> The F-test and the p-value do not provide insight into why the null hypothesis can be rejected because there are multiple levels to my categorical explanatory variable. They do not tell us in what way the population means are not statistically equal. >> Note that there are many ways for population means not to be all equal. Having each of them not equal to the other is just one of them. Another way could be that only 2 of the populations are not equal to one another.