0:18

In this video, we set out to answer this follow up question, and when doing so,

we're going to discuss how to control the Type I error rate that would be

inflated by doing many paralyzed tests in the quest for

identifying groups whose means are significantly different from each other.

Remember that to determine whether two means are different from each other,

we use t tests.

And with each test that you do, you incur a probability of doing a Type I error.

The probability of committing a Type I error is the significance level

of the test, which is often set to 5%.

So when you do multiple tests, you're going to be

inflating your Type I error rate, which is an undesirable outcome.

Thankfully, there is a simple solution.

Use a modified significance level that is lower than the original significance

level for these pairwise tests, so that the overall Type I error rate for

the series of tests you have to do can still be held at the original low rate.

Testing many pairs of groups is often called multiple comparisons, and

a common modification that we use when doing

multiple comparisons is the Bonferroni correction,

which uses a more stringent significance level for each of the pairwise tests.

More specifically,

we adjust our alpha by dividing it by the number of comparisons being considered.

For example, if you have four groups in your ANOVA, and it does yield

a significant result, then you need to compare group one to group two,

two to three, three to four, so on and so forth.

But counting these out is somewhat tedious and error prone, so

we usually use a shortcut formula for determining this value and

then use this value to adjust the significance level.

So the Bonferroni correction is a two step process.

First, find the number of comparisons as k times k minus one divided by two,

and then correct your original alpha by this level,

as alpha divided by the number of comparisons.

2:19

Let's practice this.

The social class variable we have been considering has four levels.

That is, k = 4, and if the original alpha was .05, what

is going to be the modified significance level for the multiple comparisons test?

Each group needs to be compared to three other groups,

resulting in a total of 12 comparisons.

But, if you have already compared group one to group two, say middle class to

lower class, you don't need to go back around and compare lower to middle again.

So, the total number of comparisons can be cut in half.

Hence, the formula k times k minus one over two,

which results in six total comparisons, and

we can use this number to correct our significance level down to 0.0083.

This is the significance double we're going to use for the pairwise comparisons,

to see if two pairs of means are different from each other.

There are a couple other considerations when doing

these multiple comparisons after ANOVA.

First, it's related to the constant variance condition.

Since for

ANOVA, we need to meet this condition, we need to now rethink the standard error and

the degrees of freedom to be used in the multiple comparisons test.

And, of course, now we have a new modified significance level that we're going to

compare to the p-values of these tests to determine significance.

So what are these consistent standard error and

degrees of freedoms we need to use?

The formula for the standard error should seem familiar to you,

except that instead of the individual group variances,

we're actually using mean squared error from the ANOVA table.

Remember, the mean's squared error is actually the average within group

variance, so

we're still getting at the same thing, the individual group variances, but now,

we have a consistent measure that we can use for all of the tests.

If, indeed, the constant variance condition is satisfied,

this value should be very close to your group variances anyway.

And the consistent degrees of freedom is going to be the DF

error from the ANOVA output, as opposed to the minimum of the sample sizes minus one,

from the two groups that we're comparing.

So let's put all of this information together.

Pick one of the pairs of groups and do the comparison.

Is there a significant difference between the average vocabulary scores

between middle and lower class Americans?

Our hypotheses are, in the null hypothesis, there is no difference,

the averages are equal, and the averages are different for

the alternative hypothesis.

We calculate our T-score as the difference between the two group means,

minus zero, our null value, divided by the standard error,

calculating the mean square error from the ANOVA output.

That is 6.76 for the average middle class score, minus 5.07 for

the lower class score, divided by the mean square error,

which yields a point estimate of 1.69, and

a standard error of 0.315, for an overall T-score of 5.365.

And the degrees of freedom comes from the ANOVA table, 791.

Let's clean up our plate here, write down our test statistic and

the degrees of freedom, and

before we can finally get to the p-value, remember to always draw the curve.

So here is our T-distribution with 791 degrees of freedom, which by the way,

remember, is going to be just like the normal distribution at this point,

because of the really high degrees of freedom.

And a T-score that's so high,

that's five standard deviations away from the center,

is going to result in a really, tiny, tail area, because it's really unusual

to get an observation that's more than five standard deviations from the mean.

It's also clear from the sketch with the really skinny

tail areas when you get that far from the center.

We can also use r for this.

We use the pt function, and remember to multiply one of your tail areas by

2 to account for the 2-sided alternative hypothesis, and

also remember that the significance level we're going to use for

this test is that modified significance level we calculated earlier.

So even though that was very conservative and stringent, because we

have a pretty tiny p-value, we actually can reject the null hypothesis again.