Now, let's discuss what I think this one of the most important examples in regression. So, imagine that y breaks down into two vectors, a group one and a group two and my design matrix is I'm going to call w to become clear for reasons later is equal to a matrix called z. And a vector called x, where x is in n by 1 and z is n by 2 and z looks like this j n1 and then an n1 vector of 0's and then two vectors of 0's, I'm sorry and then an n1 vector of 0's and then j and 2's so the Z matrix looks like the two way anova matrix from previously, but we've appended an x onto it as well, an x factor. So, this is the example, if we do leaf squares with this W. We are interested in fitting models where we have a regression line A regression line but separate intercepts for two groups right? So, the coefficient in front of the X is the common slope and then the coefficient in front of each Z vector is the intercept For each of the groups. Okay, so, we want to minimize y-w and I'm not going to call it beta let me call it gamma quantity squared where gamma is equal to mew one the intercept for group one, mew two the intercept for group two and beta the common slope across the two groups. So, we can write this as y minus x beta minus z, Let me just call ut mu as the vector of mu. Okay, so we can write it out like that and then let's figure out what this works out to be. So, let's use our standard trick where we hold beta fixed and we come up with the estimate for meu condition as it depends on beta. Well, if beta's held fixed, that's just a vector. And this is just a two-way ANOVA problem that we discussed previously. Remember that the two-way ANOVA problem The solution worked out to be the mean in group one and the mean in group two. So, the estimate for Mu one as it depends on beta has to be the mean of this vector right here. So, that has to be Y one bar, the group one mean for the Y's minus x1 bar, beta. And then mu 2, the mean for group two, as it depends on beta, has to be y2 bar minus x2 bar times beta. Now, if I were to plug those back in for mu 1 and mu 2, into here. And subtract them off from y. What I get is nothing other than the centered versions of y. So, I get y1 minus y1 bar times jn1, y2 minus y2 bar times jn2. That vector minus x minus x1 minus x1 bar, x2 minus x2 bar. I didn't define x1 and x2. But let me just say those are just the group components of x. x1 is the first And 1 measurements of x and x2. And x2 is the n2 latter measurements of x. And that should be times beta, okay. So, oops, and I shouldn't say equals. It has to be greater than or equal two because we plugged in the optimal estimates from mu1 and mu2 for a fixed beta. Oh this is now nothing other than regression to the origin. With the group version center of y and the group center version of x. So, we know that the estimate of beta, beta hat, the best beta hat that I can get has to work out to be the regression to the origin Estimate from these data. So, that's just summation. Let me write it out this way. The double sum of, well, here. Probably the easiest way to write it out first is y tilde, the inner product of y tilde and x tilde. Over the inter product of X tilde with itself. Where Y tilde is the group center version of Y, and X tilde is the group center version of X. In other words, by group-centered i mean each observations with its group mean subtracted off. And you can show, and I have this in the notes okay, well let's just do it really quickly here. What does this work out to be? This works out to be the double sum, let's say, over i and j or yij minus y bar i. So, i equal 1 to 2 and j equal 1 The n sub i, x i j minus x bar i all over the double sum of xij minus x bar i. And let me just explain my notation here, so yij is the jaced component of Let's say y11 is the first component of the vector y1. y12 is the second component of vector y1. y21 is the first component of vector y2, and so on. So, we can write this out. And I think this is probably the nicest way to write this out. As for I equal one we can write this out as P times beta one hat plus one minus P times beta two hat. Where beta one hat is the regression estimate. The regression estimate For only group 1. If you only had the x1 and y1 data, the center x1 data and the center of y1 data. And beta2 hat is the regression estimate if you only had the y2, the centered y2 data and the centered x2 data. Okay, so it is interesting to note that the slope from ANCOVA works out to be a weighted average of the individual group-specific slopes, where, in this case, p works out to be the summation of xij minus x. Summation of x1j minus x1 bar over sum. This should have been squared. Sorry about that. The double sum of x1j minus x1 bar. Squared, okay? So, P works out to be the percentage and, Yeah, Xij minus x bar i. So, P works out to be the percentage of the total variation in the axis. From group one so if most of your variation in your x's is in your group one then your group one slope contributes more to the over all end of the slope, if the group two is more variable then the group two contributes more and if they are equally variable then both of them contribute equally. Okay, so let's go back to now once we have our beta hat, we can figure out what our mu1 hat and our mu2 hat is. So, mu1 hat is equal to y1 bar minus x1 bar beta1 hat, and mu2 hat is equal to y2 bar Minus x2 bar beta 2 hat. So, the difference in the means mu 1 hat minus mu 2 hat, works out to be y1 bar minus y2 bar minus x1 bar minus x2 bar beta hat. Now, so, one way to think about this, the most common way to think about ANCOVA is the instance where you want to compare treatments, so treatment one versus treatment two, but you have some confounding factor that you need to address for. Say for example you're looking at a A weight loss treatment, and your confounding factor is the initial weight of the person. Okay. And so if the initial starting weight of the people receiving the one weight loss treatment is different than the initial weight of the other weight loss treatment, then you'd be worried about just directly comparing The two means. Well, this shows you what in addition to the two means you need to subtract off if you model the data as an model. Most interestingly, is if you randomize and your randomization is successful in the sense of balancing this observe covariant that the Baseline age, then the group one average should be pretty close to the group two average. So, this difference in means should be quite small, so that whether or not, you adjust for baseline weight or omit baseline weight for your model and just do straight two group on over, the estimate should be very similar. However, on the other hand, if you happen to not have had some randomization, and had to have an imbalance so that the average for group one is very different than the average for group two, then the difference between the unadjusted estimate and the adjusted estimate can be quite large. Okay, so that's that's an important example, I have some more written about it in the notes. But I think you can actually learn a lot about regression and adjustment just by thinking about this one example