Let's consider another way to prove that the least square is regression solution is x transpose x inverse x transpose y, that's beta half. Okay, so recall our y was an n by 1 vector, our x was an n by p full rank, matrix. And we want to explain the variation and why with linear combinations of the columns in x, and we're going to do that by minimizing the distance between the observed y and the fitted y. I may need some notations, so let me define H of x to be x(x transpose x) inverse, x transpose. Okay, so a couple things to note about H of x, one is H of x is symmetric and then two is that it's item potent. Another interesting fact is that, I minus H of x, notice that it is both symmetric and item potent as well, okay? And then finally, notice that I minus H of x, times any linear combination of the columns of x, let's say, x times A is 0. So, it's easy to see that, I think you just take (I-x(x transpose x) inverse x transpose) * x * a. If I multiply x times that x transpose, I get x transpose x, but there, I have an (x transpose x ) inverse. So that´s just I, and then when I multiply this x times I, I just get x, so I get x minus x, which is zero. Okay, so we need those facts. So I´m going to need this room, now, I'm going to try to minimize my least squares criteria. So let me just add and subtract an x beta hat. That then is going to be equal to The norm of y- x beta hat squared +2 (y- x beta hat) transpose (x beta hat- x beta) + Norm x beta hat- x beta squared. This finally quantity is always positive because its a norm, a squared norm. So I'm only going to make it smaller by getting rid of it [SOUND]. Now, let me just look at this term right here. So that's 2(y-x(x transpose x) inverse x transpose y) ( x Beta-hat- x Beta). So this quantity, oh and that's transposed. This quantity right there, I can do the following. I can pull out a y and get I- and then this quantity right here I've been calling H of x, I minus H of x and then transpose but remember, both I and H of x, H of x not H of y, H of x are symmetric so the transpose is irrelevant. So I can write that as 2 I-H of x and then over here, looking at this term, I can pull an x out and I get beta hat- beta. Okay, plus y- x beta hat. Okay, then recall if I take I-H of x and I multiply it times x. I get 0, so this whole term right here is 0, so I get norm y-x beta hat squared. So what we see is that, for any arbitrary beta the difference between x beta hat and y is. Strictly, or is greater than or equal to, the norm if we plug in the specific value of beta, beta hat. Therefore, beta hat must be the minimizer. [BLANK AUDIO]