Probably one of the things you've definitely heard about

statistics is the idea of statistical significance or P-values.

So this lecture is going to tell you a little bit about

some of the high-level thinking about statistical significance.

So the basic idea here is we want to know if observed differences that we have in

the sample are replicable or more generally what people call real.

Now, real is a little bit of a fuzzy concept or what

does that mean necessarily? Is it totally clear?

But what it's supposed to imply is that there's a difference between

the two groups and it's usually in the mean value of the measurements that you're taking.

So here is an example. Here are three genes.

For each of the genes there is measurements from two groups.

There's the red group and the blue groups,

there's three dots from each group.

On the Y-axis is the log expression values.

So for each gene you see the plot of the six data points corresponding with their gene.

Gene 1, there's not much difference in the means,

and most of that is due to one little outlier that comes out of the blue group.

But that suggests that while the means might be different,

the variability is also high enough that it's hard to conclude any difference.

For Gene 2, you see what would appear to be a pretty clear difference.

The three red dots and the three blue dots are

tightly clustered and they clearly have different mean levels.

On the other hand, Gene 3 is

another example where it looks like there might be a difference.

So, for example, the red dot seemed to be a little bit higher than the blue dots.

They're are also very tightly clustered.

But they're not very far apart and

the variability isn't very much different than the difference in means.

So, how do we distinguish these cases?

How do we know when we've observed a difference that

appears to be large enough that we would call it a real difference?

The most common statistic that people use and the one you've

almost certainly heard about is called the t- statistic.

The t-statistic actually has

a general form that is also widely used in a number of other statistics.

So imagine you have measurements that you've taken,

and we've labeled them with Y,

and we've labeled them with X for the two different groups.

Then the t-statistic equals the average of

the Y values minus the average of the X values,

divided by a measure of variability.

So here we estimate how variable the Y values are with S squared of Y,

and how variable the X values are with S squared of X.

So these are estimates.

We can go into more detail about in a statistical class,

but for now you can just think of the denominator as scaling

the difference between Y and X by the units of variability.

So if they're very far apart in

variability units then we might believe that it's real and if not then maybe not.

So big t-statistics means that it's more likely that there's a difference we think,

and small t-statistics means it may be less likely that there's a difference.

So, how do we actually quantify what we

mean by how statistically significant a result is?

The most common approach and probably the most

widely used and known statistic ever created is the p-value.

So the idea here is suppose that we've calculated

a t-statistic for comparing the difference between two groups.

Suppose that statistic is equal to two.

Is that a big value or a little value?

Well, one way that we could figure that

out and a way that's commonly used is what's called a permutation test.

Basically, you take the group labels that you're using,

the values that you're calling X and

the values that you're calling Y, and you scramble them up.

So some on the X values get like the Y,

and some of the Y values get left in the X and you do

that randomly and you do it over and over again.

Each time you create a random labeling,

you recalculate the statistic.

What's going on here? We've broken the relationship between the label and the data,

because we've randomly scrambled them so we wouldn't expect there to be any association.

So then what we can do is we can make a histogram, like we have here,

of all the statistics that you get from these random scrambles.

You can see where the original statistic lands in that distribution.

To calculate a P-value,

you can basically sum up how many

times the scrambled values were larger than your observed value.

Usually, you do this in absolute value.

In other words, you don't care whether the statistic is

bigger or smaller in absolute value than the value that you got,

and so you calculate the average number of

times the scrambled statistics are bigger than the observed statistic.

This gives you a P-value.

The P-value is widely used to calculate statistical significance.

It's also widely both interpretive and

misinterpreted and it has some properties that are very useful.

In general, what you've probably heard is that p-values that are low,

so closer to zero are reported statistically significant.

The usual cut off is 0.05.

This P-value is on a higher or often considered to be less statistically significant.

It's important to know what a P-value is and what P-value isn't.

In fact, this is the best way to get

a statistician's blood pressure up is to misinterpret a p-value.

So the p-value is interpreted as the probability of observing

a statistic as extreme or more extreme than the one

you calculated in the real data if the null hypothesis is true.

It seems like a mouthful because it sort of is,

it's a bit of a hard concept to think about.

It's basically looking to see how many more times in the null data,

in the data where we scramble the labels is

the statistic big than the one we actually calculated.

A few things that p-value is not and will get you in trouble with statisticians

almost certainly is that this p-value is not the probability that the null is true.

In other words, the probability that there's no difference between the groups.

It's not the probability the alternative is true.

In other words, it's not the probability that there is

a difference and it's not a measure of statistical evidence.

If you're using any of these interpretations you're

potentially walking into a world of hurt.

So you should stick with the very standard,

although a little bit wary of what the definition of what a p-value means.

So a common mistake is to misinterpret this p-value.

So here's an example from the New York Times where they're actually

trying to describe what the p-value means and you see a 0.05 there.

0.05 is the common cutoff.

If a p-value is less than 0.05, people often call it significant.

There is absolutely no reason why 0.05 is the cutoff,

other than one time a person asked one of the original developers and users of p-values,

"What would be a good cut off?"

He said, "I guess 0.05 might be all right." But that's now

propagated throughout the entire medical establishment as they define cutoff.

In general, what happens is people over interpret or misinterpret

p-values and that's what gets us into a lot

of trouble with issues with statistical significance,

and why you've heard things like maybe most published medical research is false.