0:00

In this lecture we're going to talk about partitioning variability.

Â So, assume my x matrix includes a 1.

Â So, it has a vector, let's just call it J.

Â 0:17

So, I'm going to define, well first of all, let me define HJ

Â as J J transpose J, inverse, J transpose, and Hx, as x.

Â 0:30

x transpose x, inverse, x transpose.

Â Now, since my h, and since my x, contains an intercept,

Â then my, then J can be written as a linear combination of the columns of x.

Â If I just take the vector that grabs the first column of x, so

Â a vector that's 1 and then a bunch of zeroes and multiply it times x.

Â That will grab the first column, which is the intercept.

Â So I know from the previous lecture that if I take (I- H sub x) and

Â I multiply it by any vector that's a linear combination of the columns of X.

Â Let's say x times gamma.

Â In this particular case I've set up gamma so that it grabs J.

Â Then, this has to be equal to 0.

Â Well that's going to imply that J minus H of X time J equals 0.

Â In other words J is equal to H of x times J.

Â So squirrel that little pearl of wisdom right there away, and

Â we're going to use that later.

Â 1:34

Now let me define as the total variation the sum of the squares of

Â total as the norm of y minus y-hat.

Â At y minus y bar times J, okay.

Â So y minus y bar times J.

Â 2:23

Then let me write out the residual sums of squares as the norm of the residuals.

Â The norm of e squared which is norm y- y-hat,

Â where y-hat is the fit of values from the full model with x in it, okay?

Â So that we know is y transposed I minus H of x times y.

Â So this is the numerator of the variability estimate that we would get

Â if we only included an intercept.

Â And this is the numerator of the variability estimate that we would get if

Â we had included an intercept plus all the other regressors.

Â 3:20

So, let me make a third definition, which is the sum of the squares for

Â regression, which is the distance between the fitted value

Â if I only include an intercept and the fitted value if I include

Â the intercept plus all these other regressors squared,

Â which I can write as norm HJ times y minus HX times Y.

Â Let me, squared, so let me work with this term a little bit.

Â The regression sums of squares a little bit, and so

Â I'm going to write this as y transpose time HJ minus H of X times HJ

Â minus H of X and I don't have to worry about the transposes for

Â those terms, because both HJ and H of X are symmetric.

Â Okay?

Â And then let me write this out, now, as y transpose times,

Â or HJ is item potent, so HJ times HJ.

Â HJ squared is just going to be HJ.

Â And then H of X times HJ minus,

Â that's going to get, that's from that one,

Â and then HJ times H of X.

Â That's going to be that one.

Â And then H of x squared.

Â Remember, H of x is item.

Â So that's plus H of x times y.

Â 5:07

J is equal to H of X times J but I could similarly multiply

Â here by a J transposed J inverse J transpose and

Â multiply here by J transpose J inverse J transpose so

Â I haven't done anything.

Â And then what I get is that H of J that is implied

Â is that H of J is equal to H of x times H of J and

Â then by taking the transpose, because they're symmetric,

Â I also see that H of J is also equal to H(j) H(x).

Â Okay, so this quantity right here is H(x).

Â This quantity right here is H(x).

Â So we get Hj- 1H(x),- another H(x), + 1H(x).

Â So we get that y transposed is Hj minus H of X times Y.

Â Now, I'll get to my point.

Â Let me take my sum of the squares for total,

Â which is the norm of Y minus Y bar times J squared which

Â we wrote out before as Y transpose times I minus HJ Times y,

Â H of J times y on the outside of the parenthesis.

Â 6:41

subtract and then add, I guess, and then let me organize it this way.

Â Hopefully, you'll see what I'm doing here.

Â H of x plus H of x minus H of j times y.

Â 6:57

Okay, so I haven't done anything going from this line to this line other than

Â adding or subtracting H of x.

Â Then I get y transpose I minus H of x

Â Times y + y transpose

Â (Hx-Hj) times y.

Â And this term right here is the sum of the squares for residual.

Â And this, right here, is the sum of the squares for regression.

Â Okay, so this small distinction is that I have Hx minus Hj down here and

Â Hj minus Hx up there.

Â But I would like you to prove for

Â homework that the order of subtraction doesn't matter in this quadratic form.

Â That the two are equal.

Â 8:06

So my total variability in my response gets decomposed into

Â the variability explained by my regression model, and

Â the remaining variability left unexplained by my regression model.

Â All these are positive.

Â All these are positive because they're all sums of squares, and so

Â what is very common thing to do is take SS regression, the amount of variability

Â explained by my regression model, and divide it, by the total variation.

Â And then, what is that going to give us?

Â That is going to give us the percentage

Â of the total variability.

Â Total variability,

Â 9:34

So R squared is interpreted as the percentage of your total variability

Â explained by the linear association with your added regressors.

Â And we see that it was a pretty easy to proof to get that the total

Â variability decomposes in to the residual variability.

Â And the regression variability and it all involved

Â this little trick up here that said that j was equal to h of x times j.

Â Okay, so in case you were wondering how these things worked out and why everything

Â added up when you were looking at your regression output, this is why.

Â Okay, so thank you for listening and we'll talk a lot more about partitioning

Â variability and how that relates to things like f test later on in the course.

Â