Welcome back to our experimental design class. Here's the chapter outline. We're going to cover actually a good bit of the material in this chapter because these are fairly important topics. Basically, nested and split-plot designs are multi-factor experiments, and there are lots of industrial applications of them. But even though that they're multi-factor experiments, they do not have the factorial structure. So they are different from factorials, and knowing how to recognize the situation where one of these is used is really important. These two types of experiments also very frequently involve one or more random factors. So all the things that we talked about in module 13, chapter 13 of the book, variance component estimation, REML, those are all issues that you'll see discussed again in here. I should point out that there are many many variations of these designs, and we're not going to talk about all of them. We're going to only talk about some of the basic situations that seem to come up more often than others. Let's start off by talking about the simplest type of nested design, a two-stage nested design. Now, in a nested design, the two-stage design, we have two factors. One factor B is similar to but not identical to each other at different levels of another factor A. So that means that the levels of B rather look alike but they're not really exactly the same. Remember, in a factorial experiment, it's every level of A is running combination with every level of B. That's what gives you the factorial structure. In a nested design, that can't happen because the levels of B are not identical to each other at different levels of A. So here's a typical industrial example. Suppose there's a company that buys its raw material from three different suppliers and this material comes in batches. A question that they would like to address is, is the purity of the material that they are getting consistently, is it uniform? So they decide to run an experiment in the following way. They select four batches of material at random from each supplier, and then they make three purity determinations on each batch. So they end up with an experimental design that looks like figure 14-1 at the top of this slide. See, you have your three suppliers 1, 2 and 3, and then we have four batches selected at random from 1, four from supplier 2 and four from supplier 3. Then here are our observations on purity from each of these batches. Now, if this were a factorial experiment, then batch 1 from supplier 1 would have to be exactly the same as batch 1 from supplier 2, would have to be exactly the same as batch 1 from supplier 3. But that can't be the case. That can't be at all possible because these are different batches of material. In fact, we could actually renumber the batches 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, as I have done in figure 14-2. So this arrangement cannot be a factorial, it meets the definition of a nested factor. The levels of batches are similar to each other, they have similar numbers and similar material under each level of a supplier, but they're not the same. They can't be the same. So this is a classic example of a two-stage nested design. Here's the statistical model for a two-stage nested design. y_i, j, k is the kth observation on this system 4. Main level i and nested level j, talks about is the effect of the upper or main level, that would be the supplier in our problem, Beta j within i, that is the level of the nested factor. Notice that j within i subscript the j [inaudible] subscript, that notation is adopted to clearly indicate nesting. Then Epsilon i, j, k is the error term, which is the usual zero Sigma square era. Notice that we've also indicated some nesting there to just continue to indicate the nested structure of the design. So the total sum of squares is going to be partitioned into a component for A, a sum of squares for B within A, and then a sum of squares for error. abn, total number of observations, abn minus 1, total degrees of freedom, a minus 1 of them are associated with the upper level factor or the primary factor. Then a times b minus 1 of them are associated with the nested factor y. Well, you have b levels of the nested factor. So there are b minus 1 degrees of freedom there, and there are a suppliers or a levels of the upper factor. So there would be a times b minus 1 total degrees of freedom, and ab times n minus 1 are the error degrees of freedom. Notice that this model has no interaction term. You can't have interaction when you have nested factors. When a factor is nested, it cannot be involved in interaction with the factor that it is nested within. So table 14.1 at the bottom of the slide shows you the expected mean squares for the three cases that appear in practice. Both factors are fixed. A is fixed and B is random, and then both factors are random. This is the case that I seem to run into the most, although I have seen the other two cases as well. There is of course, another possibility you could have A random and B fixed. But I don't think I've ever seen that, I don't think that happens very often. The expected mean squares show you how to do the tests in the ANOVA. If both factors are fixed, then both the primary factor and the nested factor are tested against the mean square error. If one of the factors or both of the factors are random, then the primary factor is always tested against the nested factor, and the nested factor is tested against the error term. So let's actually take a look at this example for three suppliers, four batches, the batches are taken at random from each supplier, the materials are sampled at random as well. So we're going to take a look at the data in just a moment. It's in table 14.3. The data is coded. Both JMP and Minitab will analyze nested designs. This is a mixed model in this case, and we'll assume the restricted form. So here's the data. The data has been coded by subtracting 93 from all of the purity measurements, and now we're going to take a look at doing the analysis. How would we go about doing the statistical analysis? Well, the total sum of squares is calculated in the usual way and so is the sum of squares for the primary factor. Just simply get the total of all of the observations for that primary factor and square them, add them up, divide by little b times little n, and subtract the usual correction factor. Now the analysis to compute the sum of squares for b within a, take a look at this equation. What you are really doing in that sum of squares calculation, is you are getting the sum of squares for the main effect of b as if it were not a nested factor and the sum of squares for the ab interaction as if this were not a nested factor and combining them. Another way to look at it is that you are getting the, what we might call the cell totals, squaring them and adding them up, dividing the number of observations in each cell. That's the number of replicates and then subtracting off this correction factor, which is just the same term that you used for computing the sum of squares for a. Notice that this term and this term are basically the same. Then the error sum of squares we get by subtraction and here is the analysis of variance. When we look at the analysis of variance here, we notice that there is a highly significant effect of batches within supplier. So this is a strong indication that the batches are different, they have different purity. But the p value for the supplier effect is quite small. So there's very little difference on average between suppliers. Here is the Minitab output. Minitab gives you estimates of the variance components. These estimates of the variance components are obtained by using the ANOVA method. Here are the p values associated with the supplier and the batch within supplier effect and you notice the calculations agree with what we did by hand. Here is the output from JMP, and it uses the REML estimates for the variance components. You'll notice that the REML estimates are exactly the same as the ANOVA estimates because this is a balanced design. We do have a confidence interval on the batch within supplier effect. The confidence interval is fairly wide because again, we don't have very many batches here. The sample size is pretty small. Practical interpretation here, well, there's no difference in average parity among suppliers, but there's a significant difference in purity among the batches, essentially from all suppliers. Practical implications of this. Well, if there were big differences between the suppliers, wouldn't your first step if you wanted to improve the situation be to go to the suppliers that seemed to be the most problematic and work with them to reduce the variability in their material? Sure, but that isn't going to work here because all of the suppliers have a lot of variability in the material. So you're going to have to work with all these suppliers to reduce the variability. We're going to look at the residual plots in just a minute. To plot the residuals versus supplier is very important. Can you see why? Unless the residuals have about the same variability across suppliers. That means you have some suppliers with much more variability than others and that'll maybe indicate the ones that you need to start working with right away. We're also going to talk about incorrectly analyzing this experiment as a factorial. We'll take a look at that. Remember, we have used both the ANOVA method and the REML method to estimate the variance components here. Remember we found a strong batch within supplier effect, but no supplier effect. So here are the residual plots that I mentioned earlier, residuals versus predicted seemed to have no indication of a problem. But now look at plots of residuals within supplier. The inherent variability here is about the same. If there was a lot of difference in the width of these plots, it would indicate that some suppliers were much worse than others in terms of their variability. Finally, here's what would happen. Here's what would happen if you incorrectly analyze this as a factorial experiment. Now in this analysis, I have assumed suppliers effect in batches are random, just like we did in the nest and arrangement. But now look, when you do this, look what happens. Both the batch and the batch supplier interaction turn out to be highly significant. That's interesting. We know that that can't be true because this is a nested design. But if you didn't know that and you incorrectly analyzed that, you might start thinking that, well, the supplier batch interaction is significant. Maybe there are some differences between suppliers. So you might possibly end up with the wrong conclusions here about suppliers. You might think that that's significant interaction had somehow masked the supplier effect. So when you analyze this thing incorrectly as a factorial, you could potentially get misleading conclusions. Now I want you to notice one other thing. Notice that we have nine degrees of freedom for the total of batches and suppliers. Three of those degrees of freedom were for batches and six of them were for suppliers. Now that's in the incorrect analysis, but if you go back and look at the correct analysis, there are nine degrees of freedom for batches within suppliers. Well remember I said that this sum of squares for batches within supplier, is made up of the sum of squares for batches and the sum of square for batches times supplier in the incorrect analysis. You can see that in this table. If you add these two quantities together, that will give you the sum of squares for batches within supplier in the correct analysis.