So we're back again talking about 2_k factorial designs. In this class, I want to introduce some of the formal details of a very important concept that we'll see in here quite often, and that's the notion of design optimality, and to point out that the 2_k design is a really simple example that exhibits many of these optimal characteristics. Now, I want to start off by pointing out that the model parameter estimates into 2_k design and of course the effect estimates are basically least squares estimates, and I'm going to use the two-square design to illustrate this. For a two-square design, the underlying model is the one you see at the top of the slide. Y equal to Beta_0 plus Beta_1, x_1 plus Beta_2, x_2 plus Beta_12 x_1, x_2 plus Epsilon, and x_1 of x_2, x_1 times x_2 is of course the interaction term. Now, there are four response observations for the two square, and they are represented by the symbols one, a, b, and ab. So let's take that model, let's take the model at the top and write that model out explicitly for all four of the observations, and that's what you see in the middle part of the display. If you look at for example, the first row there, one is equal to Beta_0 plus Beta_1 times, well, x_1 is now minus one, plus Beta_2 times x_2, or x_2 is now minus one, and then Beta_12 times x_1 times x_2, and of course x_1 times x_2, or x_1 and x_2 are both minus one plus Epsilon. So that's how I wrote out all four of those observations. Now, we can write this system of equations in a very compact matrix form, y equal to X Beta plus Epsilon. This is the general linear model form of those equations. Y is a four by one vector of the observations: one, a, b, and ab, X is made up of all of the values of the design variables in these equations. If you look at the X matrix, this part of the X matrix is the design matrix. Those are the four runs in the two square. This is the column that represents the x_1, x_2 interaction, and this is the intercept column. Basically, this X matrix is sometimes called the model matrix, and the model matrix is just the design matrix expanded to the form of the model. Here's the design matrix, and to expand it to the form of the model, I added the interaction column and I added the intercept column. Beta is the four by one vector of unknown regression coefficients, and Epsilon is the four by one vector of the random errors. It turns out that we can show that the least squares estimate of Beta or Beta hat is equal to X transpose X inverse times X transpose y, and I used the prime symbol to represent the transpose throughout the course. The X prime X matrix is diagonal because the design is orthogonal. The main diagonal elements are all equal to 4, which are the sums of the squares of the columns of the X matrix, and then all of the off-diagonal elements are 0, all the cross-products are 0 for the columns in the X matrix. Then right-hand side, X prime y, if you multiply that out, you will see that the elements in that vector X prime y are just the usual contrasts that are associated with a 2_2. So Beta hat would be X prime X inverse times X prime y, and X prime X inverse is just an identity matrix with one over four on the main diagonals, and so the vector of regression coefficient estimates that you see here, those are just exactly half of the usual factorial effect estimates that he would get from a 2_2. Now, this X prime X matrix has some very interesting and very useful properties. For example, the variants of any regression coefficient is just Sigma squared times the corresponding diagonal element of the inverse of X prime X. Well, the inverse of X prime X is very simple, it's one over four. So Sigma square over four is the variance of any regression coefficient and of course the square root of that Sigma over two, that would be the standard error. Well, Sigma square over four turns out to be the minimum possible value for a four-run design. There is no place that you can put those four runs that give you a smaller variance of the regression coefficients. Furthermore, you can show that the determinant of X prime X is 256, and that is the maximum possible value that you could achieve for the determinant for a four-run design. Now, why am I interested in the determinant? Well, we can show that the volume of the simultaneous confidence region, the joint confidence interval if you will, on all the model coefficients is inversely proportional to the determinant of X prime X. That is, as the determinant gets bigger, the volume of that region gets smaller. So these are very important results in terms of the precision with which your model parameters are estimated. Now, notice that these results depend on both the design you've chosen and the model that you're using. But for the model that we've chosen and four runs, this design gives us the best values of these criteria, and any design that maximizes the determinant of X prime X is called a D-optimal design, and the D pretty clearly stands for determinant. So that takes care of optimality in terms of parameter estimation, but what about predicting the response? Well, the variance of the response that you predict at any point in the design space can be shown to be Sigma square times little x prime times X prime inverse times little x, where little x is the coordinates of the point that you want to make the prediction at expanded to model form. So little x transpose would be what you see right here, and in this, x_1 and x_2 are the coordinates of the point that you're interested in, and we have to expand that to model form, so we have a cross-product x_1 times x_2, and then we add a one for the intercept. So when you substitute that into this variance expression, the variance expression turns out to be Sigma square over four times 1 plus x_1 square plus x_2 square plus x_1 squared times x_2 square, and that quantity is maximized when both x_1 and x_2 are at either plus one or minus one. Well, that means that your maximum prediction variance is equal to Sigma square. Incidentally, there is no other four-run design that you can use that makes that maximum prediction variance smaller. So you've minimized the maximum prediction variance by using this two-square design. This is referred to as the G-criteria. Minimizing the maximum prediction variants with your design produces what's called a G-optimal design. Now, the prediction variance does change throughout the design space. As you change the coordinates of x_1 and x_2, the prediction variance changes. For example, if you set x_1 and x_2 equal to 0, then the prediction variance is Sigma square over four. So maybe we need another measure of prediction variance over the design space, what about the average prediction variance over the design space? That seems like a reasonable thing to look at. Well, finding that average prediction variance is pretty straightforward. The way we do it would be to integrate that variance function over the design space and then normalize it by dividing by the area or the volume of the design space, which in this case that area would be two-square or four. So when we plug the quantities into this double integral expression and evaluate, the average prediction variance turns out to be four times sigma square over nine, and there is no other design for four runs for this model over this design space that gives you a smaller average prediction variance. Here's a graph that is very useful in displaying prediction variance. This is called a fraction of design space plot, and it plots the standard era, that is the standard deviation of the predicted response, and it's a scaled standard deviation. So the value of Sigma square is not included in here, but it's essentially the variance of the predicted value divided by Sigma square and then we take a square root. So we're plotting how that changes as a function of where you are in the design space. For example, at 25 percent of the design space of the variance or the scaled standard error of the predicted response is about 0.55 times Sigma. It's less than or equal to that, and over 50 percent of the design space is somewhere around maybe 0.6 or maybe a little bigger than 0.6 times Sigma. So this gives you an idea of how the prediction variance is distributed over the design space. What would an ideal FDS plot look like? Low, the lower it is the better, and the flatter it is the better. This prediction variance is pretty flat, it's relatively low up to about 75 percent of the design space. But once you get beyond about 75 percent of that design space, the prediction variance really takes off, it really starts to grow, but this is a very useful graph in terms of comparing prediction performance of different candidate designs. So in general, for the 2_k design, and while I've illustrated this for the two-square, it's true for the 2_k. All 2_k designs produce model regression coefficients that have the smallest possible variances, D-optimal. These designs also result in minimizing the maximum prediction variants of the response over the design space, that's G-optimal. They also minimize the average prediction variance over the design space, and that's referred to as an I-optimal design. These results give us some assurance, I think, that these are good designs and some very general ways, and factorial designs typically share a lot of these properties. There are excellent computer routines for finding optimal designs. We can actually construct optimal designs very easily using the D and the I criteria, and there are situations where finding an optimal design is a very good alternative to try to use one of the classical designs. Some of the examples include: things like unusual requirements on sample sizes, unusual blocking requirements, unusual model forms. Those are places where optimal designs can be very useful, and they're excellent computer routines for doing this in practice. We'll see how that works as we go along. JMP by the way is outstanding in this regard. So that gives you an introduction to the idea of optimal designs and this is an important concept that we'll come back and see in other places in the course.