0:03

Okay. Welcome back, troops.

Â We're going to start talking about student or

Â as I like to say Gosset's t distribution.

Â So, the reason I call it Gosset's t distribution is, it's usually

Â called student's t distribution because Gosset

Â published under the pseudonym's Student in 1908.

Â So it was actually Gosset's distribution.

Â He laid the found work for the actual

Â distribution, and then I believe that Fisher actually proved

Â the mathematics, some of them, the finer scale mathematics.

Â But I wanted to talk a minute about Gosset, because

Â he's a pretty interesting character in the analogs of Statistics.

Â So Gosset was a researcher and he worked at the Guinness Brewery in Ireland.

Â And when he created the t distribution, he was actually working for Guinness.

Â And at the time, he had actually several

Â really brilliant researchers working for him and he wouldn't

Â let them publish under their real names.

Â So that's how Gosset wound up publishing under a pseudonym

Â and make this sort of landmark discovery as a researcher.

Â It's interesting, so the reason he came up

Â with this distribution is because for him, the

Â central limit thereom just was simply not rich

Â enough to describe the problems he was looking at.

Â So he was working with these small batches in the science

Â of brew making and it wasn't adequate

Â to assume that things were heading to infinity.

Â So he came up with this distribution and we're all the more fortunate for it.

Â One thing I really like about Gosset is whenever you read about him, he was

Â apparently a tremendous nice guy, extremely humble,

Â and he made several major discoveries in Statistics.

Â He came up one of the first uses of Poisson distribution.

Â And then

Â he also, in the Guinness company, he rose up pretty high.

Â He was a head brewmaster at its, I think,

Â its London Brewery by the time he had retired.

Â So anyway, he's a really interesting character.

Â And if you get a chance, you should read about him.

Â Any rate, so he came up with this wonderful distribution called, as

Â far as I would like it to be called, Gosset's t distribution.

Â And the t distribution is

Â really kind of, used when you have smaller sample sizes.

Â It assumes your data is Gaussian but it

Â tends to work even if your data is non-Gaussian.

Â And so the t distribution, it has degrees of freedom.

Â It's indexed by something called degrees of freedom and

Â the t distribution which looks like a normal distribution where

Â someone kind of squashed it down at its tip

Â and all the extra mass went out into its tails.

Â Well, it looks more and

Â more like a standard normal as the degrees of freedom get larger and larger.

Â 2:40

So, how do you get a t distribution?

Â Say, you wanted to simulate it on a computer,

Â you would take a standard normal, say, Z here.

Â And you would divide it by an

Â independent Chi-squared divided by its degrees of freedom.

Â So where Z and Chi-squared here are independent standard normal and

Â Chi-squared random variables , that's how you wind with a t distribution.

Â So how is this useful?

Â On the next

Â slide, we'll look at how we apply this.

Â Well let's suppose that X1 to Xn are iid normal mu sigma squared, then x bar

Â minus mu divided by sigma over square root n is, of course, standard normal, right?

Â Because a linear combinations of normal random variables are themselves normal.

Â So in this case, X bar is normal.

Â And because they're iid, we know exactly what the standard

Â deviation of x bar is.

Â It's sigma over square root n and we know that its mean is mu.

Â And so when we shift and scale our non-standard normal by mu and divide

Â it by its standard deviation sigma over square root m, we get a standard normal.

Â Hopefully, this should not be news to you at this point in the class.

Â And then, we also know from earlier on in today's lecture, that n minus

Â 1s squared over sigma squared is Chi-squared

Â with n minus 1 degrees of freedom.

Â So, if we take n minus 1

Â S squared over sigma squared, and divide it by

Â an additional n minus 1 and square root the whole

Â thing, we get S over sigma and we've taken

Â a Chi-squared and divided it by its degrees of freedom.

Â So S over sigma is the square root

Â of a Chi-squared divided by its degrees of freedom.

Â Therefore, if we take X bar minus mu divided by sigma

Â over square root n, and then divide the whole thing by

Â S over sigma which if we do the arithmetic works out to

Â be X bar minus mu divided by S over square root n.

Â We wind up with a standard normal divided by a

Â square root of a Chi-square divided by its degrees of freedom.

Â 4:32

Now there's one small thing that we're kind of fudging over.

Â We haven't shown that the X bar and s are independent, right?

Â They're from the same data, so it doesn't seem obvious that they're independent.

Â They are, it's just not immediately clear and let's sweep that under the road.

Â So, forget about that for the time being, take my word for it.

Â x bar and S are independent so this exactly has Gosset's t distribution

Â with n minus 1 degrees of freedom and notice what a basically accomplish is.

Â So, we saw previously in constructing confidence intervals that X bar minus

Â mu divided by sigma over n, that that's, you

Â know, a nice kind of pivotal statistic to work with.

Â It's useful for generating confidence intervals, we'll

Â see that it's useful for doing hypothesis tests.

Â And all we've done is replaced sigma by S. And it's basically saying that

Â we can take the unknown population variants and replace it with the known

Â sample variance.

Â And we get a statistic whose distribution we know, okay?

Â And by the way, this statistic X bar minus mu S over square

Â root n, it also limits to a standard normal as n goes to infinity.

Â Which, you know, of course, the Gosset's t

Â distribution is the degrees of freedom goes to infinity.

Â If you look at it, if you plot it it looks

Â more and more like a normal distribution as n goes to infinity.

Â So we haven't violated the central limit

Â theorem or anything like that in the process of doing this stuff.

Â So, let's actually use this distribution to create a confidence interval.

Â It's a statistic who under the assumption of normality of the underlying

Â data, does not depend on the parameter mu that we're interested in.

Â And therefore we can use it to create a confidence interval for mu.

Â So let's let tdf alpha be

Â the alphath quantile of the t distribution.

Â So, t n minus 1, 1 minus alpha over 2 is, say, the upper quantile from the relevant

Â t distribution and tn minus 1 alpha over two

Â is the lower quantile from the relevant t distribution.

Â And so this probability statement here, 1

Â minus alpha is equal to the probability that

Â this statistic lies between those two conference

Â intervals is then, of course, true, right?

Â So the probability that this t random

Â variable lies between the alpha over 2 lower

Â quantile and the 1 minus alpha over two upper quantile is exactly 1 minus alpha.

Â Oh, and I should not here, by the way,

Â because the t distribution is symmetric, the alpha over

Â 2 lower quantile is equal to the negative

Â of the 1 minus alpha over 2 upper quantile.

Â This is because the t distribution is symmetric about zero.

Â So that's why here instead of writing alpha over 2,

Â I wrote -tn minus 1, 1 minus alpha over 2.

Â And you'll see why I do that in a second.

Â So anyway, this probability statement applies here, so we can just rearrange

Â terms and keep track of flipping our inequalities

Â around when we multiply by a negative sign.

Â And we get that X bar minus a t quantile times

Â a standard error is less than mu and X bar plus

Â a t quantile times a standard error is bigger than a

Â mu, that random interval contains mu with probability 1 minus alpha.

Â But if you look at the form of this interval when I wrote it out this way,

Â that happens to be X bar plus and minus the

Â upper quantile from the t distribution times the standard error.

Â And that's why I took only the upper quantile,

Â that way, we can write it as plus minus.

Â 8:15

Okay, so that's how we wind up with these intervals.

Â Estimate plus or minus quantile times standard error.

Â And that's where it comes from.

Â This interval assumes

Â that the data are iid normal, though it's very robust to this assumption.

Â You know, whenever the data is kind of roughly symmetric

Â and mound shaped, the t confidence interval works amazingly well.

Â And if you want, you know, if you

Â have paired observations, people before and after a

Â treatment, for example, you can subtract them and

Â then create a t confidence interval on the difference.

Â So often, paired observations are analyzed using

Â this exact confidence interval technique by taking differences.

Â And often, differences tend to be much more Gaussian-looking,

Â they tend to be nice and symmetric about zero.

Â And then for large degrees of freedom, the t

Â quantiles become the same as the standard normal quantiles.

Â And so this interval just converges to the same interval that you get as the CLT.

Â Some more notes.

Â For skewed distributions, the kind of spirit

Â of the t interval assumptions are violated.

Â You could probably show that it still works kind of okay.

Â And the reason is because those quantiles, the t n minus 1,

Â 1 minus alpha over 2, the quantiles are so far out there,

Â you know, the t distribution is a very heavy tail distribution that

Â shoves those quantiles way out there that makes the interval a lot wider.

Â And then it tends to work kind of conservatively

Â in a broad variety of settings.

Â But for skewed distributions, you're kind of violating

Â the, you know, the spirit of the t interval.

Â And you're often better off, you know, trying

Â some things like taking a natural log of your

Â data, if it's positive, to get it to be

Â more Gaussian-looking before you do a t confidence interval.

Â And we'll spend an entire lecture on the consequences

Â of logging data, so you can wait for that.

Â But, you know, I would just say, for skewed distribution,

Â it kind of violates the intent of the t interval, so

Â maybe think of things, like doing logs to consider it.

Â And also, I'd say for skewed distributions, maybe

Â it doesn't make as much sense to center

Â to the interval around its mean, in the way that we're doing with this t interval.

Â We're centering it right around the mean.

Â And then, the other thing, for discrete data, like binary data,

Â you know, again, you could probably, I bet you could do simulation

Â studies and show that the t interval, you know,

Â actually probably works okay for discrete data like binary data.

Â But, you know, we have lot of techniques for

Â binary data that make direct use of the binary data.

Â And you better off for those using, for example, things based

Â on Chi-squares or exact binomial intervals and that sort of thing.

Â Because, you know, you're so far from kind of the spirit

Â and intent of the t interval that is not worth using

Â regardless, the t interval is an incredibly handy tool.

Â And I'm sure actually, in some of these cases, it probably works

Â fine but you're so far from the kind of assumptions at that point.

Â And you better off using all these other

Â techniques that have been developed for these other cases.

Â And that's enough discussion about the t

Â confidence interval, let's go through an example.

Â So maybe take a break, go have a Guinness and

Â well be back in a second. Okay, so welcome back.

Â So we're going to talk about Guinness' original data, which involve sleep data.

Â So try not to fall asleep while we're talking about it.

Â So, Gosset's original data appeared in this journal called Biometrika, with a k.

Â And Biometrika, interestingly enough, was founded by a person called Francis Galton.

Â So Gosset was an interesting character.

Â If you really want to read up on another, you know,

Â absolutely brilliant, interesting character, read up on Gosset.

Â He was Charles Darwin's cousin.

Â He invented the term and the concept regression.

Â He, you know, invented the term and the concept correlation.

Â And he invented lots of other things, some good, some bad.

Â And he, he was just generally rather interesting character.

Â So any rate, Biometrika was founded

Â by Francis Galton and that is where Gosset's original

Â paper appeared and that's where the sleep data occurred.

Â So at any rate, the sleep data shows the increase

Â in hours slept for ten patients, on two sleeping drugs.

Â So, R treats the data as two groups rather

Â than paired, and I have to admit, I haven't

Â taken the time to go through and figure out

Â exactly why, there's a discrepancy between when you read

Â Gosset's Biometrika paper, which treats the data as

Â paired, and R treats it as two groups.

Â And, anyway, I haven't gone through the details

Â so I'm going to treat it exactly like Gosset's data.

Â So here is what it looks like as Gosset's data.

Â So we have patient one, two, up to ten. We have the two drugs and the difference.

Â 13:45

And this will give you our confidence interval manually.

Â But if you want to go the easier way, R actually has

Â a function, of course, to do the t confidence intervals because it's one

Â of the most popular statistical procedures.

Â So if you t.test and here, difference is

Â the name of the vector that contains the differences.

Â And then this dollar sign grabs the relevant output.

Â So in this case, I want the confidence interval so its $conf.int.

Â If you omitted the dollar sign when you hit Return,

Â it would give you lots of information including the confidence interval.

Â Here it just returns exactly the confidence interval and

Â you get 0.7 to 2.5 basically. We've talked a

Â lot about likelihoods so I wanted to talk about how you can use the t

Â distribution to create a likelihood for 0 mu.

Â So remember, we're in this kind of hard setting where we have a data that

Â have two parameters mu and sigma, you

Â know, the likelihood inherently is a two-dimensional object.

Â And we showed oh, where you can get a trick

Â and figure out how to get a likelihood for sigma.

Â And here I'm going to say, well here, you can do another trick and get another

Â likelihood for a single parameter but the single

Â parameter is a function of the two parameters.

Â So in this, the single parameter is mu divided

Â by sigma, which is actually quite an important parameter.

Â Mu divided by sigma is the mean in standard deviation unit.

Â So it's a unit for equantity and

Â it's often called the effect size and this is a

Â nifty little trick to create a likelihood for the effect size.

Â So if x is normal mu sigma squared and then this Chi-squared random

Â variable is a Chi-squared random variable with df degrees of freedom, then if you

Â take x divided by sigma and divide it by the square root of

Â a Chi-squared divided by its degrees of freedom, then notice we forgot to subtract

Â off mu in the top.

Â So, x over sigma still has a mean, in

Â this case, its mean is specifically mu over sigma.

Â So we have not taken a standard normal and divided by an

Â independent square root of an independent

Â Chi-squared divided by its degrees of freedom.

Â We took a non-standard normal and divided by a square

Â root of an independent Chi-squared divided by its degrees of freedom.

Â Well so it can't work out to be a

Â t random variable because we haven't satisfied the definition

Â of a t random variable.

Â So it's what's called a non-central t random variable.

Â And in the specific case when mu is zero, we wind up with a t random variable.

Â And this non-central t random variable also has degrees of freedom

Â but then it has a second parameter called the non-centrality parameter.

Â In this case, the non-centrality parameter is mu over sigma.

Â 17:34

The effect size values that we want to plot is, let's say,

Â we go from 0 to 1, and our length is a 1,000.

Â Our likelihood values are then

Â the t density, in this case, R's dt function, t

Â density function, has an argument ncp, which stands for non-centrality parameter.

Â So here are, we have our t density.

Â We plug in our t statistic.

Â Our degrees of freedom are n minus 1, and then we loop over all of our

Â non-centrality effect sizes, and that creates a collection of likelihood values.

Â And then, we want our likelihood values to be

Â peaked at one so, instead of figuring out what

Â the exact maximum likelihood is, let's just divide by

Â the maximum when we grid searched over 1,000 points.

Â And let's plot our effect size values by

Â our likelihood values, let's make sure it's a

Â line by doing type equals l, and let's draw lines at 1 8th and 1 16th.

Â