Let's look at a good case study now, involving four factors, and two outcome variables. We're stepping up the complexity here a little bit. This is a good question, from the textbook by Box, Hunter and Hunter. It's a case study where we we are using solar panels, with a storage tank. The outcome values were from a computer simulation. Now just a quick piece of advice when using simulations. Running simulations is often really easy. But there's a temptation to really do this inefficiently. I often see people just playing around with the software, trying out different values, until they get an answer they like. You shouldn't treat simulations any differently from real life. Always use a systematic method. In this case we're going to use a set of experiments as our systematic method. There are two key advantages though to using simulations. You can run the simulations in parallel at very little cost and secondly, you don't have to randomize the order of experiments. And the reason for that is quite simple. When you repeat the simulation, you get the same answer, so the need for randomization isn't there anymore, which was, minimize the impact of disturbances. Be careful though: certain computer experiments, when repeated, don't give identical results. So then you should randomize. In fact, I always recommend you randomize. The cost of doing so if very minimal, and it guards against all sorts of problems. More on that in the next module though. Let's go back to the solar panel system. There are four factors. A: the total amount of insulation or sunlight received; B: the capacity of the storage tank; C: the water flow rate through the absorber; and D: the intermittency of the sunlight. You can read more about these types of systems, by following this link. The two outcome variables were "y_1" the collection efficiency, and "y_2" the energy delivery efficiency. You should be able to quickly tell how many experiments will be done, if each factor is operated at the low level and the high level. You should have: two to the power of four (2^4) which is 16. So 16 experiments were run, and I've put the results and the R code here on the screen. They're available on the course website. Copy and paste that code and follow along with me for the rest of the video. So here we define the four factors: A, B, C and D, and I've manually typed in the two outcome variables, "y_1" and "y_2". This is what you would do in practice, but to make things a bit simpler, and to avoid typing errors, you can also use the PID package in R. In a prior video I showed how you can download and install that package, to extend R's capability. That package includes the numeric results for this case study. And you can get that dataset by typing the following command: data(solar). So since we ran 16 experiments, we are able to estimate 16 parameters: there are four main effects (one for A, B, C and D). There are 6 two-factor interactions, there are 4 three-factor interactions, and then the single four-factor interaction. That's a total of 15 parameters, and it adds to 16 if you count the intercept. The software can create all of this for you, very compactly with the "lm(...)" command, as shown here. The reason why this A*B*C*D concept works is because of the principle of model hierarchy. Let's take a simple example: if you wrote just A*B, then R will expand that to include factor A and factor B in the model. After all, you can't have the two factor interaction A*B if you don't also have factor A and factor B. Similarly, when R encounters A*B*C, it ensures that the AB interaction is present, as well as factor C. But, we've already mentioned that the AB will be expanded into factors A and B. So it will ensure the BC interaction is present, and in a similar line of thinking, the AC interaction will also be present. So now you can understand why when we write A*B*C*D here in the lm(...) command, R will recursively expand this into all the main effects, all the two factor interactions, all the 3 factor interactions as well as the 4 factor interaction. It is as if we had written it all out by hand as shown here. But obviously that is tedious, and error-prone, so let R do the work for you. Now let's build those two separate linear models: for the collection efficiency, "y1", and for the energy delivery efficiency, "y2". If you use the summary(...) command, as we've done before, it might be fairly difficult to quickly locate what the important factors are that influence y_1. Rather let's use the Pareto plot to show us what the important parameters are. Here it is: the grey bars represent the terms with a negative sign. And black bars represent the terms with a positive sign. The most important terms are the B, the A, the AB interaction, and factor C. The other terms have a diminishing effect on the outcome. The collection efficiency will decrease when factor B is increased. In other words, as the storage tank capacity is increased, the collection efficiency drops. This is the most influential variable in the system. Next is the A factor, the amount of insolation, has a positive on the collection efficiency. Now try answering this question here on the screen: pause the video, and think about the AB interaction. The correct answer is the one that use a high level for factor A, and a low level for factor B. We can see this in the equation, and from the Pareto plot. In this case, setting factor B to a negative sign, helps boost our objective, but it also makes the two factor interaction work in our favour. So A, B and AB interaction are the three most influential terms in the model. But you also notice that factor D has little impact on the outcome. That's a useful result as it indicates we are relatively insensitive to the variation in the solar intermittency. If we were to run more experiments in the future, we might leave factor D out of consideration. Similarly, when trying to optimize the process for collection efficiency, y1, we can be confident that solar intermittency won't play a major role; at least according to this simulation system. Now let's take a look at our second outcome variable, y_2, the energy delivery efficiency. If you rebuild the model and look at the Pareto plot we see extremely strong effects from factor A, and the two factor factor interaction of AB. The other factors, C and D, are small. What you also notice here, and this is a very common result, is that many of the higher level interactions, such as the 3- and 4-factor interaction are small, or zero. I would like to point out an important issue at this moment using this example. Take a look at factor B, it is small and based on what we've done you might be tempted to conclude that factor B is not important. That's not entirely correct. We cannot exclude factor B from consideration, because AB interaction is very important. Remember what an interaction was defined as. In this example the AB interaction means that the effect of factor A is dependant on the level of factor B. Alternatively, the effect of factor B is dependant on the factor A. So because the AB interaction is strong, we cannot ignore factor B. The level that factor B is set at is also important. And so we cannot remove factor B from the model either. So let's end off today's class with this question for you to think about. Can you maximize both y_1 and y_2 simultaneously? What would be the best combination of settings of the factors to get that maximum? This is a question that we will discuss in the course forums. Please go ahead and participate in the forums, and discuss that issue. So that's a wrap. In this module, and in the prior one, you've seen how we can use pen and paper, or use computer software to analyze experiments to make improvements. Now in the coming module we start to get a little bit lazy. We want to do fewer experiments, but still extract the most information we can from the system. Well, we are not actually being lazy, we really just want to save money and time, because experiments are costly. So run as few experiments but extract the most information we possibly can. I'm looking forward to one way we might do that. See you over there.