Welcome back to our course on experimental design. This is the outline for Chapter 15, and we're going to cover some of the things in this chapter. We're going to talk about Section 15.1 on nonnormal responses, and we're going to talk a bit more about 15.3, the analysis of covariance. Those are probably the two most important sections in this chapter. There's also a lot of supplemental material for this chapter which you may find pretty interesting. It gives you more detail on some of the things that we're going to talk about as we roll through the chapter material. We're going to talk about, first of all, nonnormal responses. Now, we've mentioned earlier in the course that if the response distribution is not normal, it very often leads to some issues with inequality of variance and it can impact the sensitivity of your analysis procedures. Data transformations turn out to be a very effective way to deal with that. There are various ways to select transformations. One of the simplest ways to do it is to just use trial and error methods. But there are analytical methods, the Box-Cox method which is discussed in the book, is a very nice and effective way to choose a transformation. Now, the problem with using data transformations is that your experimenter may not be comfortable working with the response variable in this transform scale. He or she is interested in the number of defects, not the square root of the number of defects or the resistivity on this wafer, not the logarithm of resistivity. Now, on the other hand though, if transformations really work out to be successful and it really improves the understanding of the analysis, then experimenters will pretty quickly get used to working in this new scale. There are tabs though that transformations are not very nice solution. They give you a nonsensical value. For example, suppose you've used the square root transformation, and for some reason, the region of interest where the predicted square root is very small and in some cases it turns out to be negative. Now, this is not such a good idea because predicting a negative square root is just not a satisfactory answer to the problem. So we need some other way to do this. The other approach to doing this is to use something called a generalized linear model in the analysis. Now, a generalized linear model or GLM is closely related to standard linear regression. Equation 15.3 is a standard linear regression model. In this model, of course, the error term is assumed to have a normal distribution with mean zero and constant variance. The mean of that response variable is just this linear combination that you see here, which we can write as x prime Beta. This quantity x prime Beta is called the linear predictor. It turns out that the normal theory linear model is a special case of the generalized linear model. In a GLM, the response variable can have any distribution that is a member of the so-called exponential family. Important members of that family include the normal, the Poisson, the binomial, the exponential, and the Gamma distribution. So the exponential is a very rich family of distributions that we can use in a lot of different experimental situations. Also in a GLM, the relationship between the mean of the response, the expected value of y, and the linear predictor is driven by something called a link function. The link function g of Mu is equal to x prime Beta. So the regression model that relates the mean response to the linear predictor is given by the expected value of y or Mu is equal to g inverse of x prime Beta. A couple of examples, the identity link for example, Mu would be g inverse x prime Beta would just be x prime Beta. In fact, ordinary linear regression has a normal response distribution and a identity link. Another example is the log link. So the log of Mu is equal to x prime Beta. So that produces the model that says Mu, the mean, is equal to e to the x prime Beta. The log link is very often used with the Poisson response, and it's sometimes also used with the exponential or Gamma distribution. Another extremely important link function that we use with binomial data is the so-called logit link. Equation 15.9 is the logit link. The log of Mu over one minus Mu is equal to x prime Beta. This choice of link function leads to the model that you see here where Mu or the expected value of y is one over one plus e to the minus x prime Beta. To use a GLM in practice, what you have to do is specify your response distribution and choose a link function, and then the model fitting or the parameter estimation part of this is carried out using maximum likelihood. The reason we can use maximum likelihood is of course is because the form of the error distribution or the response distribution is known. It turns out that for the exponential family, the application of maximum likelihood in this situation turns out to be implemented using an iterative version of weighted least squares. For ordinary linear regression with a normal response distribution, the identity link turns out to be standardly squares. We can approach then the problems of analysis of the data and diagnostic checking using very similar techniques to those that we use in normal theory regression. There's a textbook by Myers, myself, Vining, and Robinson, that gives you a lot of details and a lot more background and examples that we're going to be able to talk about in here. Two software packages that do a very good job of supporting generalized linear models are SAS, and there's a procedure in SAS called PROC GENMOD that's very good at this, and then JMP. Minitab actually does some aspects of GLMs as well, particularly those that employ the binomial response, and that leads to what we sometimes call logistic regression. So here's an example that illustrates the application of the GLM, and its Example 15.2. It deals with a consumer products company that is studying the factors that impact the likelihood that customers will redeem a coupon for one of its products, and they've conducted an experiment with the two to the three factorial. The design variables are A, the coupon value, B, the length of time for which the coupon is valid, and C, the ease of use and each of these factors has two levels. So it's an eight-run experiment. So a total of a 1,000 customers were randomly selected for each of these eight treatment combinations, and they were given the appropriate coupon with these design factors deployed. The result that you get is the number of coupons that are actually redeemed, that's the response, and that data is shown along with the levels of the factors in Table 15.1. So we can think of the response here as the number of successes out of a 1,000 Bernoulli trials for each of these test combinations. So a reasonably, I think acceptable model would be to assume that the response is a binomial distribution, and so we would use a binomial response distribution and a logit link. This particular form of a GLM is very often called logistic regression. Both Minitab and JMP will fit a logistic regression model. So we're going to fit the main effects model and all the two-factor interactions in the linear predictor. Let me skip over that for a moment. Here is the Minitab output for this problem. The output is in the form of a logistic regression table, which looks very much like the displays that you get in regression outputs or experimental design outputs. Here we have all of our factors in the linear predictor, the three main effects and the three two-factor interactions. Then we have estimates of the coefficients and we have standard arrows of the coefficients. Then this quantity Z is the coefficient estimate divided by the standard error of the coefficient estimate. So it is a t-like statistic. But we can show that this ratio under the null hypothesis that Beta is equal to 0 is asymptotically, that is for large samples well approximated by a normal distribution. So let's look at what's significant here. Well, the intercept or constant term is highly significant, p value is very small. Minitab says it's zero but we know it's not really zero, it's just smaller than whatever number Minitab can print. Here is the z value for factor A, highly significant, there is the z value for factor B, highly significant, and then the only other thing that appears as being possibly significant is this BC interaction term that you see here. So what I decided to do in the model was to retain A, B, and C so that I could put that BC interaction term in and have some hierarchy in the model. So basically, I did that by refitting the model with only the terms A, B, C, and the BC interaction. So this is what the model actually looks like, and so this equation was generated by fitting that refined or reduced model. So as you look at the computer output, you'll notice that there's also some goodness of fit statistics down at the bottom of the output. Here they are for the reduced model. There are three different types of goodness of fit statistics, Pearson, Deviance, and Hosmer-Lemeshow. The book gives you more details on these. But what you're really looking for here is the fact that the p values are all pretty large. So there's no indication that this model doesn't fit the data well. There's also in this table something called an odds ratio. That's the column that you see here. What's this odds ratio all about? Well, the odds ratio follows from the idea of trying to estimate factor effect estimates. In standard two-level designs, we know how to do that. It's the effect estimate, is just the difference in average response when the factor is at the high level and compared to the difference to the average response when the factor is at the low level. Well, in logistic regression, the computed value of the odds ratio is e to the estimate of the regression coefficient. So for factor A, that would be e to the 0.168765, and that's 1.18. Now, how do you interpret that? Well, that's the odds of redeeming a coupon of high value to the odds of redeeming a coupon to value x_1 equal to 0. If you want to look at the odds of redeeming a coupon of high value to redeeming a coupon of low value, that is x_1 equal to plus 1 , x_1 equal minus 1, then you have to double that regression coefficients. So e_2 times Beta hat or e_2 times 0.168765, which turns out to be 1.40. That's the odds ratio for factor A. In other words, it expresses the increase in odds of redemption of a high-value coupon, and it's about 40 percent. So the high-value coupon has odds of being redeemed that's about 40 percent higher than that of a low-value coupon. You can do a similar interpretation on these other quantities. So that is an example of binary logistic regression, very common application of the generalized linear model. In our next lecture, we're going to look at a different application.