Last time, we were talking about the two-sample or the pooled t-test and we looked at our Portland cement mortar problem from that perspective. We saw that the computed value of the two-sample t-test statistic was minus 2.20 and that fell into the lower critical region of our t-distribution with 18 degrees of freedom. Now that was a fixed significance level test because that shows five percent to generate those critical values of 2.101 and minus 2.101, but the p-value is the probability or area in the tails beyond 2.20 and above plus 2.20 because it's a two-sided test. The p-value can be found in most cases by computers. It is the risk of wrongly rejecting the null hypothesis of equal means, in other words, it measures how unusual the event is. The exact p-value in our problem turns out to be 0.042 and I've found that from a computer program, but you can approximate the p-value with a t-table. Most t-tables only give probabilities greater than positive values of t. So just like we had to do with the normal distribution z-statistic, take the absolute value of t0, which is minus 2.20, and turn it into a positive 2.20. Now with the value of 18 degrees of freedom, go into the t-table and see if you can find an exact value of 2.20. Well, you can't, but you can find values that bracket that. 2.101 is less than 2.20 and 2.552 is greater than 2.20. So you can bracket this value quite nicely. The right tail probability for the smaller value 2.101 is 0.025 and for the larger value 2.552 is 0.01. Now you have to double those because this is a two-sided test. So the p-value has to lie between 0.05 and 0.02. Those are lower and upper bounds on the p-value, and we know that the actual p-value turns out to be 0.042. We find that from a computer program. Here is some two-sample t-test results from computer software. The upper part of this table is the output from a product called Minitab, which is a very nice, very useful product for analyzing data. It's a good statistics package and the output we're seeing there is the two-sample t-test for the Portland cement data. If you look through that output, you will find the estimated difference and you will find the value of the t-statistic minus 2.19. Now I got minus 2.20. The computer carried a few more decimal places than I did and it has 18 degrees of freedom and the p-value is 0.042. At the bottom of the output table is the output from jump and once again, the calculation is very similar. The t-ratio is 2.186. Notice it's positive instead of negative because the software subtracted them in a different order than I did, and then it gives you the standard error of the difference, that's the bottom of the t-ratio, 18 degrees of freedom, and here is the probability that the computed value is greater than the absolute value of t, it's 0.0422, that is the two-sided p-value for this problem. So this is what computer output looks like and you're going to get some guidance on how to actually use the software to obtain these numbers in another class. Checking assumptions in the t-test. Now remember we're assuming that the observations come from a normal distribution and we have also assumed that the variance of those normal distributions is the same. So we have two normal distributions with equal variances, but possibly unequal means. How do we check those assumptions? Well, an easy way, a convenient way to do that is with normal probability plotting. Here is a normal probability plot of the tension bond strength data from both samples of our Portland cement experiment. The solid dots are the modified mortar and the little rectangular plotting positions, those are the unmodified mortar. Now when you look at this normal probability plot, the first thing I think that I see is that both of these samples tend to lie along straight lines and remember, in a normal probability plot, if the sample data does lie approximately along a straight line, that's some reasonable evidence that the samples are drawn from a normal distribution. So normality seems to be reasonable here. It turns out that on the normal probability plot, the slope of the straight line is proportional to standard deviation. So if the straight lines have similar or nearly identical slopes, then you feel pretty good about the assumption of constant variance. When I look at these plots, these lines, it looks to me like the slope of these two lines is very, very similar. Now if you're drawing these plots and interpreting them by hand, I always urge people to concentrate on the central portion of the plots when you visualize the straight line. Don't get too carried away with the tails because the bulk of the probability is in the center of the plot and that's what you want to use in deciding where to draw the straight line. How important are these assumptions? Well, the normality assumption is only moderately important. The t-test works pretty well even for moderate departures from normality. As long as the population is reasonably symmetric and reasonably unimodal, you're not going to have any real problems with the t-test. It's pretty robust to the normality assumption. The constant variance assumption is more important. If you inadvertently make a wrong assumption there, it tends to impact the sensitivity of the test. Its ability to detect differences is negatively impacted by that. So that's a more important assumption. Why is the t-test an important procedure? Well, I think that its biggest value is that for simple comparative experiments, it gives you an objective basis for making decisions. It removes judgment from the decision-making process and that's really important. That's the value of statistics in experimental work, is that it lends scientific objectivity to our analysis of the data. The t-test is quite versatile. It could be used for a lot of things. One of the things it could be used for would be to test all of the relevant hypotheses in a two level factorial design because remember all of those hypotheses involve comparing the mean response on one side of the cube to the main response on the other side of the cube, remember that discussion. Well, you could use a t-test to do that, but we don't. We end up using another procedure to do that because it works better and it's actually simpler. Finally, one of the things that we sometimes like to do in addition to a test of hypothesis is to construct a confidence interval on the difference in means and this is because while hypothesis testing gives you an objective statement concerning either the means are different or they're not, but it doesn't really specify how different they are. That's what confidence intervals do. A confidence interval is typically a statement of the form that you see here. The parameters Theta and L and U are called the lower and upper confidence limits. These limits are probability limits that are chosen in such a way so that the probability that Theta lies between L and U is equal to 1 minus Alpha. The 100 times 1 minus Alpha percent confidence interval on the difference in two means, assuming of course that we have identical variances, is given by the equation at the bottom of the slide. You'll notice that S sub p is used for the standard deviation. Then these are the lower and upper Alpha over 2 percentage points of the t-distribution with n1 plus n2 minus 2 degrees of freedom. We can actually calculate these intervals pretty easily for the Portland cement mortar formulation problem. We know everything, we know the two-sample averages, we know S square sub p, and we know that the appropriate t-percentile, the appropriate two and a half percentile point of t with 18 degrees of freedom is 2.101. So plugging those numbers into our confidence interval equation gives us a 95 percent confidence interval on the difference in means for minus 0.55, that's the lower bound, up to minus 0.01. Another way to say that is the confidence interval is minus 0.28 plus or minus 0.27 or the difference in means strands for the accuracy of this interval is plus or minus 0.27. Notice, in looking at this interval, that zero is not in the interval. That's because we rejected the null hypothesis of equal means at the five percent level and this is a 95 percent confidence interval. If we had been unable to reject that null hypothesis at the five percent level, this interval would have included zero. So looking at a confidence interval, isn't the since another way to look at a hypothesis test? If the null hypothesis is not rejected, then the confidence interval on that parameter will include zero. So this is an alternate way to look at the results of the experiment and get a little bit more information. By the way, if you look at the computer output that we had earlier, both of these computer packages report a confidence interval. Minitab reports the confidence interval here and jump reports the confidence interval here. The signs are different on the lower and upper bound between these two computer outputs. Why is that? Well, that's because Jump did the calculations by subtracting the difference in means in a different order than Minitab did. So that's the confidence interval and the confidence intervals will appear in this course again from time to time as useful ways to interpret data from designed experiments. So once again, we're done and thanks for watching. We'll be back soon.