For this video, we use a simpler example involving the breaking strength of materials from three processes. We'll use ANOVA to test the null hypothesis that all of the means are the same, versus the alternative that at least two means are not equal. ANOVA is based on a test statistic, called the F-Ratio. The F-Ratio is a ratio of two variances: the variance between the groups and the random error variance. If the null hypothesis is true, then the between-group variance is due entirely to random variation, and the F-Ratio will be close to 1.0. However, if the null is not true, then the between-group variance will be large relative to the error variance. The resulting F-Ratio will be greater than 1.0. We omit many details in this quick introduction, but let's see where these variances come from and how this test works. To compute the variance between the three groups, you calculate the difference between each group mean and the overall mean, square each of these differences, and then add up all of the squared differences. This produces a quantity called the sum of squares for the model, or SSM. To convert the sum of squares model to a variance, we divide SSM by the degrees of freedom for the model, k-1. This produces the mean square model, or MSM, which is the estimate of the variance between the groups. For this example, the sum of squares model is 768.67 and the degrees of freedom is 3-1, or 2. The resulting mean square model is 384.33. Why is the degrees of freedom for mean square model k-1? Remember that degrees of freedom is the number of freely varying observations in a calculation. There are k groups, which you're comparing to the grand mean. You "lose" one degree of freedom for the grand mean. See the first lesson in the Exploratory Data Analysis module for a general discussion of degrees of freedom. To compute the random error variance, you calculate the difference between each observation and it's group mean, square each of these differences, and add up all of the squared differences. This produces a quantity called the sum of squares error, or SSE. To convert the sum of squares error to a variance, we divide SSE by the degrees of freedom for error. This is N, the number of observations, minus k, the number of groups. This produces the mean square error, or MSE. This is the estimate of the within-group, random error variance. For this example, the sum of squares error is 142.25, and the degrees of freedom is 12-3, or 9. The resulting mean square error is 15.806. Now, to compute the test statistic, the F-Ratio, we simply divide mean square model by mean square error. Our F-Ratio is 24.316. This is a ratio of between-sample variation to within-sample variation. You can think of this as a signal to noise ratio. The signal is a difference between at least two means, and the noise is sampling variation. The larger the F-Ratio, the stronger the signal. Here's the question. How large must the ratio be to cast doubt on the null hypothesis that the means are equal? The F-Ratio follows a distribution that describes the ratio of two variances, the F distribution. The shape of the F distribution is based on the degrees of freedom for the mean square model (in the numerator of the F-Ratio) and the mean square error (in the denominator). Our F-Ratio falls way out in the tail of this distribution. Remember the discussion of probability values, or p-values? In ANOVA, the p-value measures how extreme an F-Ratio is relative to the F distribution. This p-value, and all of the computations that go into computing the F-Ratio, are reported in the Analysis of Variance table. The Analysis of Variance table reports the source of variation (the factor name, random error, and the total), the degrees of freedom for each source, the corresponding sum of squares, the mean squares, and the F-Ratio. In JMP, the p-value is reported as Prob greater than F. For the breaking strength example, the p-value is very small. This leads us to reject the null hypothesis that all of the means are equal. However, remember that the alternative hypothesis is that at least two of the means are not equal. ANOVA is what statisticians call a "whole model test." If you reject the null, you know that there are significant differences. However, you don't know which means are different from one another. This is the topic of the next video.