0:45

And so, this is a vector y.

Â Now let's say expect the value of a particular value y naught

Â of a scalar version of y at the particular value x naught,

Â will that expected value is x naught transpose beta.

Â Okay?

Â And so our estimate of that clearly,

Â is going to be x naught transpose beta hat.

Â So we can create a confidence interval for

Â this prediction very easily using the tools that we've developed so

Â far because this is, again, just a linear contrast.

Â Okay, and of the betas which we've already even covered how to create a confidence

Â interval for that.

Â So we know that the variance of this, so

Â we'll call that Y hat naught, okay.

Â So the variance of Y hat naught is equal to X

Â naught transpose variance of beta hat X naught

Â which is equal to X naught transpose (X

Â transpose X) inverse X naught sigma squared.

Â 2:25

Alpha over 2, and n-p degrees of freedom times s,

Â our residual variance estimate, which we're going to use for

Â that, times the square root of x naught transpose (x

Â transpose x) inverse x naught, okay?

Â So that is, say, if we have a linear regression

Â 2:54

What that is, is a confidence interval for

Â the line at a given value of x naught, okay.

Â But that's not the entire story here about prediction intervals because

Â this talks about how well we've estimated the line, okay.

Â So if we think about our diamond prices for example,

Â it talks about how well we've estimated the average cost of a diamond for

Â that particular weight or that particular mass.

Â 3:29

But if you're selling a diamond, you might be interested in knowing, okay not

Â if I collected all the diamonds of this particular mass and

Â took the average price that they were valued at.

Â Not that, but if I were to sell this particular diamond,

Â what's the range of possible values?

Â That would be reasonable as a price for this diamond and that's a different thing,

Â so there's a difference in this context between a confidence interval for

Â the mean value, in other words the value of the line or the plane or whatever,

Â at that particular collection of X values versus a prediction

Â that incorporates the uncertainty that is included in the Ys themselves, okay.

Â So imagine we want to predict Y naught,

Â which is the price of this diamond for this particular mass,

Â where we haven't actually observed the Y at this particular value of X naught.

Â Think of it as a new value of Y.

Â Well think about the the quantity y

Â naught- x naught beta-hat, okay?

Â That's the difference between our actual y naught at that particular

Â value of x naught, the new realized value of y and

Â what we would predict at this value of x naught, where, not beta naught, just beta.

Â Where again our beta-hat hasn't used this y naught in its calculation, okay?

Â So now the variance of

Â this is now the variance of y naught +

Â the variance of, let's say y naught hat.

Â Okay, and I can move that variance across that sum again,

Â because this beta hat didn't involve that y naught.

Â This potential new value of y naught in its calculations, so they're independent.

Â Well this variance of Y-naught is sigma squared plus the variance of Y-hat,

Â we just did that a second ago, that sigma squared, x-naught,

Â x transpose x inverse, x-naught.

Â There should be a transpose there, okay?

Â So if I wanted to estimate this variance it

Â would be sigma squared times 1 plus, x naught transpose.

Â X transpose x inverse x naught.

Â And then what I'm going to ask you to do for

Â homework because it should be old hat for you now,

Â is to prove to yourself that y naught minus x beta-hat over S square root,

Â 1 plus X naught transpose, X transpose X,

Â inverse X naught, follows the T-distribution,

Â with N minus P degrees of freedom.

Â 6:32

And so we can calculate the probability that

Â say a T quantile, the alpha over 2 t quantile,

Â with n minus t degrees of freedom n minus p degrees of freedom,

Â is less than or equal to y naught minus x.

Â Beta hat over S square root 1+x

Â naught transpose (x transpose

Â x) inverse x naught is less than or

Â equal to the t 1-alpha over 2 upper quantile.

Â That should be equal to 1-alpha, in other words,

Â we're looking at our t distribution, we're looking at the probability that it's,

Â 7:26

If we put alpha over 2 there and alpha over 2 of the mass there.

Â The probability that our statistic lies in between those two cut offs

Â should be equal to 1 minus alpha.

Â And we can rearrange that to make the probability statement the probability that

Â y naught is in the interval, X naught beta-hat,

Â plus or minus t 1 minus alpha over 2,

Â s times square root 1 plus x naught transpose x, transpose X.

Â Inverse X naught.

Â 9:15

Okay? So if we want a confidence interval for

Â the mean of the regression surface.

Â If we collect an infinite amount of data,

Â that confidence interval should get narrower and narrower.

Â It should always limit to the exactly the mean price.

Â So if we collected all the diamonds in the world of that specific weight, we should

Â have a very good estimate of what the line should be like at that particular point.

Â On the other hand, if we want to know what are the potential set of

Â prices we could get for this specific new diamond, that we're trying to sell.

Â Okay. There's some intrinsic variability that no

Â matter how well we estimate the line, would be surrounding the line.

Â Okay?

Â And that's why this one part doesn't go away no matter how much data,

Â how much better we estimate the line, it still is there.

Â It's sort of an intrinsic variability.

Â So as we collect an infinite amount of data, this part converges to sigma,

Â 10:13

and this part will converge to 0.

Â And so that one part will just stay there.

Â And then just represent the natural variability around the line.

Â Okay?

Â So that's the distinction between a prediction interval and

Â a confidence interval.

Â So, a prediction interval, by the way, is not a confidence interval because

Â if you look at the actual probability statement we used,

Â the quantity we're saying is in the interval is a random quantity.

Â So it's not a confidence interval but we derive it in kind of the same way.

Â And now at this point in the class,

Â I think you should be able to derive the prediction interval and the confidence

Â interval that's why I'm going over this kind of glossing over it a little bit.

Â But I just want to make sure that everyone understands the distinction

Â between the two and why it is there's this fixed quantity in the prediction interval.

Â Okay, it's because of this natural variability that exists around

Â the regression line or we're ground the regression surface that doesn't go away if

Â what you want to estimate is what are the potential likely values for

Â a response at that given value of x, or that given collection of values of x.

Â