0:00

Welcome back to practical time series analysis.

Â In these introductory lectures,

Â we're reviewing basic statistics.

Â In this lecture in particular,

Â we'll look at some inferential statistics.

Â Now if your statistical background is strong,

Â and if you're very comfortable in the R environment,

Â you can move through these lectures very quickly.

Â They're really meant for people who either haven't done

Â statistics in any meaningful way in quite some time,

Â or who are just new to R. Our objectives.

Â We do say this is basic inferential statistics.

Â Our objectives are to review some basics.

Â Learn how to develop graphical intuition in new data set,

Â based upon the commands available in R.

Â And learn how to perform a simple hypothesis test.

Â The data set we'll be using is a famous traditional data set.

Â It's the Gossett Data on sleep.

Â So, he's reporting on results that other researchers actually had already published,

Â and using it to develop his techniques.

Â But the basic data set looks at two soporific drugs,

Â these are drugs meant to induce extra sleep in a patient.

Â And there are 10 people in play in this data set.

Â There are two drugs and we're looking at the increase over

Â control for each of these 10 individuals with each drug.

Â The formative data frame is 20 observations and we have three variables here.

Â So, 20 observations, these are 10 people,

Â and so we're looking at the effect of each of two drugs here.

Â So we've got extra group and ID.

Â Extra is going to be the extra amount of sleep.

Â Group will tell you which drug is in play,

Â and ID tells you which patient.

Â Of course, always plot your data.

Â The plot command is a rather powerful one,

Â and it will make decisions based upon the kind of data

Â you're presenting it as to what kind of plot it's going to return.

Â We're going to plot the extra sleep on group.

Â So, think of that as drug and main of course is going to put a title on a graph.

Â We'll look at extra sleep in Gossett Data by group.

Â Now, after that we'll do a couple other things.

Â I want to have the data available to me very easily in a variable.

Â So, I'm gonna say extra dot one,

Â is the extra sleep for those in group one.

Â So we're testing group identically equal to one or group identically equal to two,

Â as we assign our numbers to each of these two vectors.

Â When we look at the graph,

Â we've got extra sleep.

Â it looks like the second group,

Â the second drug has a pretty clear advantage over the first.

Â I don't see a huge difference in heterogeneity here, there is some,

Â but I think what's most pronounced in this graph is that,

Â now this bar here in a box plot of course is telling you the median not the mean,

Â but the median certainly seems to be higher in the second group.

Â Now that's a visual impression,

Â and what we'll do now is try to follow it up with a standard statistical test.

Â As we test our hypothesis,

Â we'll use the command t. test,

Â we'll put in extra one and extra two as our data.

Â You can do tests,

Â you'll recall from elementary stats with independent samples,

Â and there are different ways to go within

Â independent samples depending upon your variability.

Â Right now instead, we're going to treat these data as

Â paired because remember there are only 10 people in the study,

Â and there are two different drugs.

Â We're going to do a two sided test rather than a one sided test,

Â because coming in I had no theory,

Â no intuition that one drug would be better than the other.

Â Our results look like this.

Â We'll obtain a t value of negative four or so.

Â That's a fairly hefty t value.

Â Now, if that were a z value from a normal distribution,

Â it would be quite quite large.

Â How large it is with a T distribution,

Â depends upon your sample size.

Â Our degrees of freedom in a pair of T-test like this with

Â ten individuals remember is nine and minus one,

Â and the p value we obtain is less than the standard nickel.

Â Less than point zero five.

Â It's even less than point zero one.

Â And I think many people would say that these data are highly significant.

Â R agrees and is going to go with

Â the Alternative hypothesis that the there is a difference between the two drugs.

Â It's also good to report a confidence interval.

Â And another approach to a test like this would be to calculate a confidence interval and

Â see if it includes zero as a plausible value. It does not.

Â So, the 95 percent confidence interval

Â here is between around negative two and a half to a negative point seven.

Â 5:26

Now, if it's been a little while since you've done

Â a confidence interval or a hypothesis test,

Â let's go back and remember what this is all about.

Â In a standard hypothesis test we have a null hypothesis and an alternative hypothesis,

Â traditionally labeled H sub zero, and H sub one.

Â The null hypothesis will be no difference,

Â just that the mean response is going to be the same for both drugs.

Â The alternative since we're doing two tail tests will be that it's not the same.

Â Alpha is what people,

Â or researchers often set up before they conduct the tests,

Â probability of a type one error.

Â The probability that we're going to reject a true null hypothesis is

Â fairly standard to say that alpha equal to

Â the two values point zero five or point zero one.

Â The t value that we calculated here,

Â is we're going to look at the average of

Â the differences which is the same

Â if you follow the language as the difference of the averages.

Â So, we're looking at the bar here.

Â Essentially, we're just taking the average,

Â the mean value with the first group,

Â and subtracting off the average on the second.

Â It's a very intuitive thing to do.

Â We'll compare that to our null hypothesis value of zero.

Â Downstairs we're going to look at variability.

Â So, we're looking at the variability of the averages here not of individuals.

Â We're going to take s sub d. So,

Â this is the sample standard deviation of the differences.

Â Now, be careful, if you take

Â the differences for these 10 individuals between the two drugs,

Â take their response on the first drug and subtract off the response in the second,

Â and do that for all 10,

Â and then take the standard deviation of that,

Â that's the standard deviation of the differences.

Â That's not going to be the same number generally speaking,

Â as if you take the standard deviation of the first data set,

Â and subtract off the standard deviation of the second.

Â We just have to be a little bit careful here.

Â But the standard test is to look at the standard error down here,

Â standard deviation divided by the square of N,

Â to give us a measure of variability.

Â And that's how we calculate,

Â you can follow through the numbers,

Â negative for t value.

Â So as we just said,

Â d bar is the average of the differences,

Â or the difference of the averages however you like,

Â and Sd is the standard deviation of the differences from the sample.

Â Now R also got a p-value.

Â So what's the p-value?

Â It's the likelihood of seeing data this extreme under the null hypothesis.

Â And, what we'll do here is look at twice,

Â it was a two tailed test, twice.

Â Now the t distribution is the one in play for us,

Â that's that letter t right there.

Â And p is just short for probability.

Â So, what we're trying to do is get some tail areas here.

Â So, we're going to take twice the tail area,

Â I'll look down at the left tail,

Â I'll pop my negative four in there.

Â Nine degrees of freedom and calculate a p-value.

Â If your p value is small,

Â you'll reject your null hypothesis and our p value is really quite small.

Â So we rejected the null hypothesis.

Â In general, if you have a hypothesis test,

Â different books have different details here,

Â but it's all they all rhyme.

Â They're all basically telling the same thing.

Â You're going to state clearly what your variables are

Â so that everybody knows including you what you're talking about.

Â State your null and a whole alternative hypotheses,

Â and then divide, decide upon rather a level of significance.

Â Once you've got that basic framework down those organizing principles,

Â go ahead and look at your data,

Â compute a test statistic,

Â and you'll run across very often z's and z's chi square as an f's.

Â These are kind of the big four in an elementary statistics course.

Â You'll find the p value corresponding to your test statistic,

Â and then you'll form a conclusion,

Â you'll reject or not reject typically.

Â Confidence intervals are,

Â there is a difference between the word confidence and probability.

Â Many people get very sticky on this and say that once an event has occurred,

Â you really can't talk about probability anymore.

Â Instead you must talk about confidence.

Â The basic idea is we're trying to give

Â a good indication of where we believe the actual meaning would be,

Â here it's going to be a mean difference.

Â The common form that you'll see for many confidence intervals is estimate.

Â That was our D Bar here, plus and minus,

Â some sort of table value multiplied by an estimated standard error.

Â It's not hard to really demonstrate where this comes from.

Â In our particular case,

Â we'll look at d bar,

Â plus and minus the T value,

Â times our standard error.

Â We already saw that R will print this out for you.

Â If you like to follow along and do a hand calculation yourself,

Â we've got the numbers right here.

Â It's just a direct substitution.

Â