In today's class, my goal is to show you how we analyzed data from factorial experiments when three factors were used. This example is based on one from this textbook by Box, Hunter and Hunter, called Statistics for Experimenters. The experiments described in that example were run to find the combination of settings that would reduce the amount of pollution discharged from a water treatment facility. This is clearly a case where we would like to minimize the amounts of pollutants. So minimizing our outcome variable would be the objective. Three factors were considered. The first one factor C, was the chemical compound used. Let's call that compound P and compound Q. We don't really know their names. Factor T was the temperature of the treatments, whether we were treating the water at 72 or 100 Fahrenheit. And factor S was the stirring speed, either a slow speed of 200 revolutions per minute or a high speed of 400. Notice that every factor has two levels. And going back to that mathematical idea that two to the power k is the total number of experiments, K is equal to three in this example. So we get a total of eight possible combinations. Here's a short quiz to test that knowledge. So let's take a look at the results. We will always present our data and analyze it using what we called standard order. Standard order requires that we create a column for each of our factors. So C, S and T. Note that I could have used a, b and c for the three factors, but very often we'll switch to letters that actually match our factor names. But you don't have to. So back to the standard order table, and the rule is we vary the first factor the fastest minus plus minus plus minus plus minus plus. The second factor temperature is varied the next fastest, between its low and high levels. So two minuses, two pluses, two minuses, two pluses. And then the last factor S is very the slowest. So four low levels and four high levels. Those make up our entire table. Never run the experiments in the order of this table. The order must be randomly selected. So what we will do is add a column to our table to keep track of the order in which we actually run the experiments. Also add a column over here for the outcome variable. In this case, the outcome was the pollutant amount measured in pounds. One thing that's so great about the standard order table is that we can get a quick sense of the factors influenced on the outcome variable. Take a look for example, at how the pollution amounts changes when we change the chemical compound factor C. That factor goes low high, low high, low high, low high. We see that same pattern in the pollution amounts. Take a look at the effect of factor S. The first four experiments have a very high level of pollution on average, while the last four experiments have a low level of pollution. That also matches with factor S. We can already tell just from this table that factor C and factor S are going to be really important to understanding the results. Let's go back to our Q plot, and this time our Q plot is actually a cube. We can draw it by showing the first factor along the horizontal axis, the next factor on the vertical axis, and the final factor S is shown in and out of the page in this diagonal way. Next, we transcribe the values onto this cube. This is really easy when we follow the standard order sequence. Take a look, 5, 30, 6, 33, then 4, 3, 5, 4. I love this visual representation of the experimental data. It really helps us achieve our objective so quickly. Take a few seconds and answer this question. At what levels should we set our three factors in order to achieve the lowest pollution amounts? That's right. It's very clear we need to use chemical Q, operate at low temperature, and with high stirring speeds of 400 revolutions per minute. Later on in the course, we're going to start examining what happens when you move outside this cube. And I want you to already start to think along those lines. But let's come back to the data we have right here and analyze the main effects and the interactions. Start with the first factor C, the choice of either chemical P or chemical Q at the high level. If we look at the cube, we actually have four estimates of that main effect along each of the four horizontal edges. At high temperature and high stirring speed, in other words, high T and high S, that effect is equal to 4-5. At high temperature and low speed, that's 33-6. At T minus and high speed, in other words S plus, it is 3-4. And finally, at low temperature and low speed, it's 30-5. So four estimates of the effect of the chemical. And the average of these four is equal to 50/4 = 12.5. Let's pause here. I always tell my students it's no good just calculating numbers. What does this value of 12 and a half really mean in plain language? How would you describe this value to your manager who doesn't really understand any statistics? What it says is on average, we expect an increase in the pollution amount when we go from using chemical P to using chemical Q. And remember by convention, we report half of that amount. So report a value of 6.25. One further thing to notice is the discrepancy of that chemical effect at high S and low S. Notice the very large difference there. From the prior class, number 2C, this should be alerting to you that there's an interaction between factor C and factor S. But before we get to that, let's take a look at temperature. When we examined the table earlier, we didn't really notice anything special about temperature, and we should be able to confirm that numerically. We have four estimates of the temperature effect along the vertical axis. 4-3 here, 33-30 up here, 5-4 back there, and 6-5 here up at the front. So on average, we get a value of 1.5 as our difference. Or if we report half of it, that's an effect of 0.75. Lastly, let's take a look at the effect of stirring speed S. Along the four diagonal axes, we have 4-33 up here, 3-30 down here, 5-6 here, and 4-5 over there. The average of those differences is -14.5. And if we report half of it, that's -7.25. The -14.5 tells us that we expect on average a reduction of 14 and a half pounds of pollution when we go from a low stirring speed to a high stirring speed. So clearly, it's in our favor to use high stirring speeds in order to get that reduced pollution. You should always step back at this point and make sure these results make sense. Horizontally, we see going from chemical P to Q increases the pollution amounts. That value of 6.25 looks about right. The small value of 0.75 for temperature also looks right, because it really has a very small effect. And finally, increasing the stirring speed has the largest reduction on pollution. A decrease of 7.25 units. You noticed while I was reviewing these results, I started to build up a numeric representation for you on the screen. We did that in class where we considered the ginger biscuits, and I just followed the same idea here. Y represents the prediction of the pollution. The 11.25 value here is the baseline. It is the average of all eight of the outcome values: 5+30+6+33+4+3+5+4/8. The other three terms are the separate effects of each factor. Those are the main effects. Let's see how we can use this model to make some predictions. Consider the situation where we're using chemical Q. In other words, XC is coded as a value of plus one. Let's use low temperature. So XT is coded as minus one. And also let's use low stirring speed, so XS is minus one. The predicted value is 11.25+6.25-0.75+7.25. That's a value of 24. That's quite a bit difference to the value of 30 pounds which was actually recorded. There's something we haven't accounted for. And that's the interaction between C and S. An interaction is when you have one variable behaving very differently depending on the level of another variable. We noticed earlier that the chemical effect has a change from 30 to 5 and 33 to 6 over here on the front face at low stirring speed. Yet, on the back face of the cube at high speeds, the effect is almost zero. 3-4, 4-5, very small amounts. It's very clear the stirring speed modifies the effects of the chemical. There's an interaction between S and C. How do we quantify this? Well, like we did in class 2C, we have to add a new term to our prediction model, and that term the BCS is multiplied by XC and XS. But how do we go calculate that BCS value? Let's go follow the same idea as we did in class 2C. We have two chances to calculate it. One instance at high temperature and one instance at low temperature. We will calculate both and then average the answer. And then as we've always done, report half the value. So at high temperature, the difference due to C at high speed is 4-5. The difference due to C at low speed is much greater, 33-6. As you remember, interactions are always reported as half the difference going from high to low. In other words, that's -1-27 which equals -28, and half of that is -14. Let's report that at the lower temperature. The difference due to C at high speed is 3-4. The difference due to C at low speed is much greater, 30-5, report half the value from high to low, that is -1-25 which equals -26. Dividing that by 2 gives -13. So now we have two estimates of the interaction effect. One estimate is -14. The other estimate is -13. The average of those two numbers is minus 13 and a half. And when we report it, let's put in our model a value of half that amount. So in other words, -6.75. That's the value for BCS-6.75. Now, let me just pause here for a second and emphasize that this is all very tedious if you do it by hand. And we're going to show some computerized ways to do this faster in the next few classes. But I always recommend, let's start with by hand and then see the advantage of it later on when we go to computers. So let's take our predictions now again from the previous example and see if they improved. The predicted value earlier was 11.25+6.25-0.75+7.25. But now with this interaction turn, we have an additional part -6.75x1x-1. What that means is we actually get an additional amount of 6.75 due to the interaction, getting us a prediction of 30.75, much much closer to the actual value. Notice here that the interaction actually works against us. That interaction has increased the amount of pollution. We could also calculate CT interactions and TS interactions. I've only shown you for CS interactions. In fact, there's even a three factor interaction, the CTS interaction. But all of this gets very tedious and error prone. Coming up in the next module, we can't wait to show you some computerized shortcuts that will take care of all of this work for you. Now, at the risk of this course going on a little bit too long, I want you to sit back and just think about that interaction for a second. Don't just see it as a number, but let's try to interpret what's really going on over here. Why does chemical Q appear to be less effective at low speed, but at high speed it works really well. Maybe chemical Q just takes a little bit longer to dissolve in water than chemical P does. At low stirring speeds, chemical Q is not effective, but at high speeds both chemicals are equally effective. Now here's where experiments can be really powerful. We saw that the lowest pollution was over here in this corner, when we used chemical Q with high speed and low temperature. But what if the government requirements was pollution had to be smaller than 10? And imagine also that chemical Q costs you double the amounts of chemical P. You can see where this is going. We can see that any operating points on this plane would be effective, as long as it's not the point with low speed and chemical Q. In fact, it might be a whole lot more economically profitable to operate at this point over here producing five pounds of pollution. We still meet the requirements for safe operation because we're below the government level of 10 units. And we use less energy for stirring and a cheaper chemical P. Actually, what we've done here is considered an additional outcome in our mind, profits. Recognize that profits or costs often play a role in any system. So you should always be aware of the economic impact of every corner in your cube. To end this class and this example, I want you to consider this. Does the fact that temperature having a small effect imply that temperature is meaningless? The answer is no. It is important to recognize that even effects with small numbers have an important interpretation. It means that over the range of temperature selected, in this case between 70 to 100 Fahrenheit, that temperature has a small to negligible change on the pollution amounts. Now this is a key insight because the engineer or operator can take this and select operating conditions which are the most economically advantageous. Again, this comes down to profits. It is conceivable that when using lower temperature we will save energy. And because temperature has such a small effect on the system overall, it means that we will not significantly affect the pollution level when we operate at low temperature. That's a great result. So I want to thank you for staying with me during these examples. I know that they've been longer than normal, but I hope they have been insightful. In the module coming up next, we're going to start looking at how we can do fewer experiments, but still get a good amount of information about our process. Hope to see you over there.