0:00

[MUSIC]

Â Hi, in this module I want to talk about statistical significance.

Â Now statistical significance is an important concept in statistics.

Â If you've taken any kind of statistic's class you've almost

Â certainly encountered it, and it's widely misunderstood.

Â Now for the purpose of this module, I'm not going to get into all the formulas,

Â you'll have to take a statistics class to learn how to compute a significance

Â test and so forth.

Â What I want to talk about is the ways in

Â which significance is sometimes misunderstood,

Â and help caution you against misapplication or misunderstanding.

Â It's maybe useful even if you have had a statistics class because statistics

Â classes tend to be heavily formal without thinking about the implications for

Â research.

Â Now a significance test or a measure of statistical significance,

Â all this assumes that we are making a measurement in

Â a sample that is drawn from some larger population.

Â So statistical significance hypothesis testing has always

Â been embedded in the framework of thinking about generalizing

Â to some large population that we cannot measure in its entirety.

Â By making measurements on a sample from that larger population, and

Â then making measurements and then somehow generalizing.

Â And it's quite important and often forgotten, that actually most of

Â the mathematics that underlies tests of statistical significance,

Â hypothesis testing and so

Â forth assumes that our sample is actually a random draw from the larger population.

Â In other words, in drawing our sample, we follow the procedures that

Â we talked about in previous lectures of randomly sampling or probability sampling

Â from some larger population in which we're actually interested.

Â So measurement in a sample, when we go out and, for example,

Â conduct an opinion poll of a few hundred people, we compute the percentage

Â of people that hold some attitude or believe in something.

Â The measurement we make in that sample is what we call

Â an estimate of a population parameter.

Â So out there in a population in which we are interested in, we think of there being

Â numeric values, parameters, that describe characteristics of that population.

Â Perhaps the total percentage of people with a particular attitude or

Â a preference for a particular party.

Â And then we draw our sample and we make a measurement.

Â And then we want to use that measurement to figure out how to say something about

Â the corresponding parameter that describes that in the larger population.

Â Now, this is very important, all of the mathematics

Â involved in the calculation of statistical significance

Â assumes that the measurement is influenced by chance.

Â That when we're drawing a sample at random from some larger population

Â in order to perform some sort of measurement on them,

Â measure the percentage of people that support a particular party and so forth.

Â The composition of that sample, again is based on luck of the draw, random chance.

Â So whatever the population parameter is,

Â whatever the true percentage of people who support one political party or another.

Â In the sample that we draw, it's like drawing colored balls from an urn

Â if you think about examples you may remember from probability classes.

Â And so even if in a urn we have 500 red balls and

Â 500 blue balls, if we randomly draw 10 balls or

Â 20 balls we may not always get an exact even distribution by color, right?

Â By the laws of chance,

Â occasionally you'll get perhaps more red balls or more blue balls.

Â A little bit different from the overall share of the urn from what we are drawing.

Â So the same reasoning applies to samples drawn from a larger population.

Â If we have some share of the population,

Â which has some affiliation with particular political party.

Â When we draw a sample by the laws of chance, yes, maybe on average that

Â sample will resemble the distribution in the larger population but

Â there's always a probability that it may tilt in one direction or another.

Â You may have people with affiliation with a particular party overrepresented in

Â the sample even if it's a well design sample.

Â Again, just by the laws of chance.

Â So a significance test, what it does is really trying to do is assess whether or

Â not the value that we measure in our sample, how likely that

Â value is prompts the proportion with the affiliation was a particular party.

Â How likely we are to see a value, right one we have observed

Â in a sample if the true population parameter the true value

Â in a larger population were some hypothesized value.

Â So typically, we put this into the format of a null hypothesis,

Â where we have a hypothesis that the parameter in the larger

Â population is some specific value.

Â And then a significance test works out the probability that just

Â by the luck of the draw, we could get a distribution like the one

Â that we have on our sample, purely as a result of random chance.

Â So going back to the voter analogy, we might hypothesize or

Â null hypothesis that the people are evenly divided between two parties,

Â or null hypothesis might be that the split is 50-50.

Â And then in drawing out our sample, measuring it,

Â we find that perhaps 48% of the people favor one party and 52% favor the other.

Â A significance test would be an assessment of

Â how likely we would be to get a split like that,

Â 48-52 if the population parameter or

Â the hypothesized value of 50-50.

Â As you might imagine,

Â more skewed distributions in the sample get progressively less likely.

Â So we often talk about a p-value, that's the specific estimate of the probability

Â of getting a value like the one we observed if the null hypothesis were true.

Â So in our example, it would be the probability

Â of observing a 48-52 split, if the true split

Â was as hypothesized 50-50 in the larger population.

Â So when we talk about correlation or regression,

Â the null hypothesis is usually that there is no relationship between two values,

Â the correlation coefficient, zero, or the regression coefficient is zero.

Â The changes in one variable are not associated in a systematic

Â fashion with changes in the other variable.

Â So statistical significance simply means that if the null hypothesis were true,

Â chance variation would be unlikely to yield a sample that produces one

Â that looks like the value that we've actually measured.

Â So for example, if we run a correlation or we ran a regression, and we got some

Â estimate off a correlation coefficient when we carried out a significance test.

Â And we claim that the correlation coefficient was significant

Â at the 0.05 level.

Â Then, all that it's saying is that there's only a 0.05

Â chance that if there was no relationship between the two variables in

Â the original population from which we drew the sample.

Â There's only a 0.05% chance that in a sample of the size that we've drawn,

Â we would see a correlation as large as the one that we've calculated.

Â And we might think that gee, 0.05, that's quite small,

Â a 1 in 20 chance that are result can be the product of random variation and

Â the composition of our sample.

Â And we might decide that this correlation really is not zero.

Â That somewhere in the larger population,the true correlation must be

Â not zero.

Â We reject the null hypothesis.

Â So what influences statistical significance?

Â One is the magnitude of the effect in the population for the sample.

Â So essentially, the further a measured value is from the hypothesized value.

Â And for example, the larger coefficient, the larger correlation coefficient,

Â the more likely it is to show up as being statistically significant.

Â Sample size matters as well.

Â So if you're thinking about same measured effects or

Â same measured correlation coefficients across samples of different sizes.

Â It's more likely that in a large sample that correlation coefficient will be

Â reported as statistically significant.

Â The sample size feeds into the calculation of the p-value,

Â feeds into the calculation of the test statistics

Â that shape the determination of the p-value.

Â And finally, the amount of error.

Â So if we're measuring our outcomes, our variables, with a lot of error,

Â that adds noise to the system.

Â And then it turns out that it makes it harder to actually

Â show a statistically significant relationship, or effect.

Â Because again, we're introducing noise into the system and

Â it becomes harder to rule out the possibility

Â that something that we've observed is the product of random chance.

Â So what are the limitations of significance testing?

Â The most important is that significance test are not proof of causal relationship.

Â Yet some of though have nothing to do with causality by themselves anyway.

Â Proof of cause and effect can only come from an appropriate study design.

Â Ideally, perhaps a actual control of treatment design.

Â Might be a natural experiment like we talked about in previous lectures or

Â a quasi experiment or instrumental variables.

Â But one way or the other, statistical significance by itself

Â does not say anything about cause and effect.

Â Significance tests assume that a sample is drawn at random from a larger population.

Â As soon as you got a sample that doesn't hold that assumption or

Â doesn't follow that assumption.

Â You're actually violating a fairly important assumption about

Â the computational test of statistical significance and

Â it's become a little harder to interpret the results.

Â You have to think more carefully about what you are actually

Â doing in terms of claiming statistical significance.

Â So we need to exercise caution when we're interpreting test of

Â statistical significance from non-random samples.

Â So test will again only show that an absolute value was unlikely

Â if the parameter in the population was as hypothesized but statistical

Â significant test never approves that absolute value is impossible.

Â So there is always some chance,

Â it might be very very small that whatever we have observed in our sample,

Â when we measured it, is actually the product of random chance.

Â And that the population parameter is in fact different from what we claimed.

Â And we'll talk about some of the problems that that leads to in the next module,

Â on type one errors.

Â