Now let's consider a setting where we can use some of our results. So imagine if we have a vector y and we are going to assume an outcome y in that screen n by 1. And we have a vector x of predictors, that's n by p. And we're going to assume that expected value of y is equal to x beta. And let's say that the variance of y is equal to sigma squared times an identity matrix. So the components of y are uncorrelated with a constant variance. And the expected value of y is this linear combination of the columns of x given by beta and we don't know and like to estimate beta. Well, if we use beta hat as our least squares estimator, x transpose x inverse x transpose y, the first thing we can note is that the expected value of beta hat is the expected value of x transpose x inverse, x transpose y, which is equal to x transpose x inverse x transpose expected value of y since we're assuming we're conditioning on x. So when x assuming x is not random and then expected value of y is just x beta. Well, this quantity here x transpose x and this quantity right here are just inverses, we have x transpose x and x transpose x inverse, so this works out to just be beta. So beta has unbiased, it's expected value is what its like to estimate, what we'd like to estimate. And then also we can calculate the variance of beta hat under these assumptions. So the variance of beta hat is equal to the variance of x transpose x inverse x transpose y which is equal to x transpose x inverse x variance of y. And then times the matrix transposed again. x transpose and then x transpose x inverse is symmetric, so that's just x transpose x inverse again. And then I get, I just going to write it out again, and then variance of y is sigma squared times I. So I'm just not going to write it, sigma squared times I. I'm not going to write that because I'm just going to pull it out. because of the I doesn't change anything and the sigma squared is the scalar. So then we get x transpose x transpose x inverse and then we can put our sigma squared out here, okay? So we have x transpose x inverse, x transpose x and x transpose x inverse. So this works out to be x transpose x inverse sigma squared. So a lot like our, Linear regression estimate. The variability of the regressor winds up being, in a sense, in a matrix sense, in the denominator. And, this tells us that just like in linear regression, in order to have our variance estimate of our coefficient be smaller we want the variances of our x's to be larger. And that makes a lot of sense. If you think back in linear regression, if you want to estimate a line really well, if you collect x's in a tight little ball, you're not going to be able to estimate that line very well. But if you collect x's all along the line, in other words the variance of x is very large, then you're going to be able to estimate that line with greater precision. So we want, it's interesting that we don't want variability in our y's, we want sigma squared to be small. But we do want variability in our x's, we want variability and our x's to be large. And in fact, in linear regression, the most variable you can make things is if you get half of your x observations at the lowest possible value, and half of your x observations at the highest possible value. That'll give you the maximum variance for the denominator. Of course, you're banking a lot about the relationship not doing anything funky in between this big gap where you didn't collect any data, but that is, if you're really quite certain about the linearity, then that would minimize the variance of the estimated coefficient. Okay, the last thing is we can estimate the variance of q transpose beta, where q is a linear contrast. So that works out to be q transpose beta hat. So that works out to be q transpose times the variance of beta hat times q, which is then just q transpose x transpose x inverse q times sigma squared. And it's interesting to note that as an estimate, q transpose beta hat is an estimate of q transpose beta. And we can show that this estimator, q transpose beta hat, is so called blue. Not blue because it's sad, in fact, blue because it's happy, because it's best linear unbiased estimator. So, First of all, let's check off these things to make sure, clearly it's an estimator and it's unbiased. You know that q transpose beta hat is unbiased because we know that beta hat is unbiased. And q is just another scalar or just another constant multiplier. So what we just pull out of the expectation, so it's unbiased. Beta hat is linear in y, so q transpose beta h is linear in y. So when we mean linear, we mean linear in y, so it's linear in y. And so best, well, what do we mean by best? And what we mean is that it has minimum variance and we can show this very quickly. And it has quite a clever proof and it involves all of our techniques that we have used in these last couple of lectures. So, take as an example to try to find another estimator that is also linear. So let's say k transpose times y is another linear estimator of q transpose beta for some value of k. And then because this estimator is linear and unbiased, the expected value of k transpose y is equal to K transpose expected value of y, which is x beta. And we want that to be unbiased, so we want it to be equal to q transpose beta. So we know that k transpose x has to equal q transpose because this statement has to be true for all possible betas, right? We can't, even though we don't know beta, we need this unbiased property to exist across all possibly betas so that means k transpose x has to equal q transpose. That's the first fact that you have to keep in your back pocket here. Okay, the second thing we are going to talk about is let's look at the covariance of the two estimators. Covariance of q transpose beta hat, and k transpose y and that's equal to q transpose, we pull that out of the covariance on that side. Covariance of beta hat times k transpose and when I pull the k out of it, I get y. And pulling out of k transpose, so I get k transpose transpose, which is k on that side. Now, covariance beta hat is x transpose x, inverse x transpose y. Then we have, with y there times k, so now we can pull out the x transpose x inverse part, x transpose part. So this is q transpose times x transpose x inverse x transpose, then covariance of y with itself times k which is equal to the q transpose x transpose x inverse x transpose. Covariance of y with itself is just a variance of y. We're assuming that that's I sigma squared. So I'm just going to put that as k times sigma squared. Okay, now from this point above right here, we can replace x transpose k with q, Which is equal to q transpose x transpose x inverse q times sigma squared. And if you look back, we can look back, that's exactly equal, To the variance of q transpose beta hat, okay. So that's a clever little thing. Okay, now let's get to the proof. That's another fact that you need to keep in your back pocket, okay. So we have two facts we need to keep in your back pocket. Well, first we're done with this one, that k transpose x has to equal q transpose. But the second fact that you have to keep in your back pocket now, the only one you have to remember now, is that this covariance between the two estimators works out to be equal to the variance of q transpose beta hat. Now, I'm going to take the variance of q transpose beta hat and subtract off k transpose y. Or by our definition of variances, that's the variance of q transpose beta hat + the variance of k transpose y- 2 times the covariance of q transpose beta hat in k transpose y. Now we know that the covariance just factors out as twice the covariance, because in these cases, there's scalars. q beta hat is a scalar, k transpose y is a scalar. So that the covariance of ab is the covariance of ba in this case, because that is true when decline is of scalars, okay. Now remember the covariance between the two is equal to, We just proved that the covariance between the two estimators is that exactly equal to the variance of q transpose beta hat. So we have this works out to be the variance of k transpose y- the variance of q transpose beta hat. The last point I'd like to point out is that this variance by virtue of being a variance has to be greater than or equal to 0. So if we take this statement and argue that it has to be greater than or equal to 0, because it's equal to a variance, then what we get is that the variance of k transpose y has to be greater than or equal to the variance of q transpose beta hat. So, there you have it, that if you take any other linear combinations of y's that results in an unbiased estimator, their variance has to be greater than or equal to the obvious linear combination of beta hat. So beta hat is the best linear, unbiased estimator. I also want to make one last point is that, taking the best in terms of minimum variance, is only really meaningful if you restrict yourself to the class of unbiased estimators. Take if we didn't have the class of unbiased estimators as a restriction, then we can always get minimum variance by just estimating things with a constant. If I just estimate everything with the number 5, the number 5 has 0 variance, but it's quite biased, unless you happen to be estimating 5. Okay, so biased estimators, particularly constants can have 0 variance but they are not good estimators. So, you can only do this trick where you compare minimum variance, where you compare variances, if you restrict yourself to a meaningful class of estimators in terms of bias. So, in this case we restricted ourselves to unbiased estimators and then the appropriate linear combination of beta hat we see is the best among all linear unbiased estimators. Okay, so that's a nifty little result and it used all of our tools where we built up tools for expected values, variances and covariances. And then in the next lecture, we're going to start working with the multivariate normal distribution, so we can not only just talk about moments, but the full, complete set of characteristics from the distribution.