We're going to pick up now with the finishing the analysis of our 2_3 factorial design. Recall, this is an experiment on plasma etching. There are three factors; gap, anode-cathode gap, or gas flow, and RF power, and this experiment was replicated twice. In the previous lecture, I showed you a table that summarized the effect estimates and the sums of squares. Well, here are the sums of squares now displayed in a complete analysis of variance table. Since the experiment was replicated twice, there are eight degrees of freedom for pure era, and the F column here shows you the results of taking each of the mean squares and dividing it for the factorial effects and dividing by mean square error. As we guessed from looking at the percent variability of explained in the last lecture, gap is highly significant, power is highly significant, and so is the gap times power interaction. None of the other interactions or the main effect of gas flow appear to be significant. Here is the output from JMP that follows that up. At the top of the page is a plot of actual versus predicted and you can see that we're following that 45- degree land pretty accurately. On the right-hand side is the analysis of variance and it shows you that the overall model is highly significant. Now, the model there has six degrees of freedom. Main effects and two-factor interactions are included in that model, the three-factor interaction is not included. If you look at the effects summary over on the right, you notice that I showed you the three main effects and the three two-factor interactions under the effect summary. The residual plots look pretty reasonable, and here is the plot of the predictive profiler. You can see that decreasing the gap between the anode and cathode has an effect on iterate. It tends to increase the etch rate and increasing the power tends to increase the etch rate as well. I'm sorry. Here we're decreasing the gap or increasing the gap and that's decreasing the etch rate. Here we're increasing the power and that's increasing the etch rate. Gas flow as you can see has very little effect. Here is the reduced model where we have only gap and power in the model. You can see that in the gap power interaction, and you can see that there is again a very strong effect due to both gap and the power. Decreasing the gap increases the etch rate, increasing the power increases the etch rate, and this is something that you would expect from the overall physics of the problem. Model summary statistics. The R-square we've talked about before. R-square is just the sum of squares model divided by the total sum of squares. In this problem, for the reduced model, the R-square is about 96 percent. Loosely speaking, what that means is that this experiment, these two variables, account for about 96 percent of the total variability in the data. This is great. On the other hand, you can always make R-square bigger by putting additional terms in a model, whether those terms really are useful or not. So many people like to look at something called an adjusted R-square. The way adjusted R-square is computed is to compute a ratio of the mean square for era to the total sum of squares divided by the total degrees of freedom, and then subtract that from one. This can be shown to be a measure of the variability explained in a way that is adjusted for the size of the model. Because generally, if you add terms to a model that you don't need, that mean square for error will get larger, and this ratio, this R-square adjusted would get smaller if you put in non-significant terms. So lots of people like to look at R-square adjusted and make sure that R-square adjusted is not wildly different than the ordinary R-square. Well, in our problem, the R-square adjusted turns out to be about 95 percent. It's almost the same as the ordinary R-square. So this would lead us to believe that we have the right model and we don't necessarily have any unnecessary terms in it. The last quantity here is something called R-square for prediction, or some people call this the R-square PRESS. PRESS is a acronym for the prediction error or sum of squares. The way R-square predictive is calculated is by using a formula that is quite similar to the R-square adjusted in that it has the SS total down at the bottom, not the mean square but the total, and then up in the numerator, it has this quantity called PRESS, the prediction error sum of squares. How is PRESS computed? Well, theoretically, here's how you do it. You fit a model to your data that excludes one data point, say the first one. Now, use that model to predict the withheld value of whap. The error that you get from that is called a prediction error. Now, put that first observation back and withhold the second. Repeat the process. Fit a model to all of the data except the second observation. Use that model to now predict withheld second observation, and the difference between the actual and this predicted value is the second prediction error. So you do this for every single Observation 1, 2, on down to capital N. So you effectively fit in different regression models. You take the prediction arrows from each of these analyses and you square them and add them up, and that's the PRESS statistic. PRESS is thought to be a good measure of how well your model predicts new data, that is, not data that was used to actually build the model itself. So the R-square prediction is often taken as an indication of prediction capability. How well will this model work in predicting new data? In this case, the R-square for prediction is 93 percent, almost exactly the same as we had for the R-square and the adjusted R-square. This is the happiest of all situations. If the R-square for prediction or the PRESS R-square is really low, that's an indication that you're model does not have good predictive power. Other models summary statistics. The standard error of any individual modal coefficient is simply the square root of the variance of that coefficient or it's Sigma square over capital N. The square root of Sigma square over capital N. N of course, this little n, the number of replicates times two to the k. Now, we would estimate that by replacing Sigma square by the error means square from the ANOVA. So in our problem, when we perform that substitution and divide by 16 runs, get the square root, the standard error of any model coefficient is 11.87, and you can find that on the computer output for each coefficient. You could also use the standard error of each coefficient to construct a confidence interval on that coefficient if that's what you wanted. Here's the appropriate confidence interval equation. Again, this would be a 100 times one minus Alpha over two percent confidence interval. This is what the regression model actually looked like in our problem. I've shown you the final equation in terms of both the coded variables and the natural or actual variables. Here's another useful display. This is a cube plot of the data. What you're looking at here is a plot of the actual totals at each of the eight test combinations in this design. This is sometimes very useful in helping experimenters visualize how the response is going to change as they adopt different conditions in the design space. You can see that the combination of small gap and our power is producing very high etch rates. You can also plot ranges of the observations at each corner of the cube. I think this is very useful because it sometimes can give you an indication of places in the design space where there's more variability. For example, look at the circle corners in this plot. Those are ranges now, that's the difference between the absolute value or the difference between the two replicates at that design point. You notice that the variability there is much higher than it is at any other location in the design. So the combination of gap at the high level and either flow rate seems to produce a situation where there's much more variability in the observed etch rates than there is at the other test combinations. Here are the contour plots and the response surface plot. Notice that the contour plot or rather the response surface plot is a twisted plane, that's because there's an interaction between gap and power, and then the contour plot on the right has curved lines to represent contours of constant etch rate again because there's an interaction. Now, the way you would use these plots would be to try to determine what combination of gap and power produce the desired etch rate. For example, if you want an etch rate around 750 angstroms per minute, there's a contour line, very close to that, and you can operate at any power gap combination along that contour line to get an etch rate that's approximately 150 angstroms per minute. Once again, the question of replication, how much replication do we really need? Full factorial model, Alpha of 0.05, and suppose we have an effect size of one standard deviation. Actually, this is an effect size of two standard deviations. Here I should have said a regression coefficient of one standard deviation. Well, here are the effect size is two standard deviations. Look at the power. The power is 94 percent. So two replicates is really great for this problem. So you've seen a two square, you've seen a two to the three, now we're going to talk about the general case of a 2_k, k factors each at two levels. These can be rather large designs. K-mean effects, that will be k things taken two at a time, two-factor interactions, k things taken three at a time, three-factor interactions, all the way on down to a single k-factor interaction. Of course remember, the number of runs is going up rather dramatically as k increases. K of four is 16, k of five is 32, k of 6 is 64, k of 10, 10 factors would be 1,024. That's a lot of runs. We'll talk about some things that we might do to keep the number of runs more manageable as we progress through the course. Thanks for watching. That's the end of this lecture in Module 6.