0:22

Before we go too much more into the discussion

of the concept, we're actually going to use

a practice problem to illustrate what we're talking about.

So think about two scenarios.

All else held equal, meaning everything else is the

same about these two scenarios however, in one scenario

we have a sample size of 100 and in

the other scenario we have a sample size of 1,000.

The question is, will the p-value be lower if n

is equal to 100 or n is equal to 10,000.

You can think about this as two separate hypothesis tests.

We have the same null hypothesis and the same alternative hypothesis.

Our sample means are exactly the same.

Our obviously null values are the same because they're driven by the hypotheses.

And our standard deviations are exactly the same

as well, but the sample sizes are different.

So which one of these sample sizes is going to yield a lower p-value?

1:15

All right so if the sample size is high, then what that's

going to effect first and foremost is going to be the standard error.

For the simple mean case, for example, we find the

standard error as s divided by square root of n.

So, if you increase your sample size, your standard error is going to shrink.

The standard error within a hypothesis testing framework

shows up in the denominator of our test statistic.

When we're calculating the test statistic,

for example, say the z statistic, we take our

point estimate, say our sample mean, minus our null value.

That comes from the hyp, null hypothesis.

And we divide it my the standard error.

So if I increase n, then we said that the standard error is going to go down.

And if your denominator goes down, your test statistic is going to go up.

The test statistic going up basically means that if you

are thinking about the standard normal curve, your z scores

are going to be closer to the ends of the tails as opposed to closer to the center.

And if your cutoff values your z scores are

actually closer to the end of your tails, then

the p-values, which are those tail areas that you

have, are going to be getting smaller and smaller.

Meaning that if you increase n, standard error decreases.

Our test statistic increases, which results in our p-values decreasing.

So the answer here

is going to be n is equal to 10,000.

So we can also illustrate this point mathematically.

Let's make up some data.

We're going to say that our x bar is 50, our sample standard deviation is 2.

Our null hypothesis is that mu is equal to 49.5,

and the alternative hypothesis is that mu is greater than 49.5.

The mu here that, the null value that I've chosen here is intentionally very

close to the sample mean that we're using,

and we're going to talk about that in a moment.

3:01

So, if I want to calculate the z score when the sample size

is equal to 100, my z score calculation would go as sample

mean 50 minus the null value 49.5 divided by a standard error,

which is 2, the standard deviation divided by square root of n.

And I can work through the math of this, and the z score turns out to

be 2.5.

If on the other hand I'm calculating it for the sample size of

10,000, everything stays the same except for the calculation of the standard error.

And going through the mathematical calculations over

there gives me a z score of 25.

So the first scenario we have a z score of 2.5.

It's still not a small value, but at least it's

close to the tails, but not all the way out there.

Versus in the second scenario, our p-value is

bound to be something very tiny, approximately zero.

We never claimed that the p-value is equal to zero.

But with a z score of 25, we know that using the 68,

95, 99.7% rule almost all of the observations under the normal curve

lie within three standard deviations of the mean, so a z-score of 25 basically

means almost no p-value.

Or in other words, highly statistically significant finding.

However, is it practically significant?

When we're thinking about practical significance, we focus on the effect size.

And remember, we define the effect size as the

difference between your point estimate, and then your null value.

So that would be in the calculation of the test statistic thinking about it as

the numerator.

In both instances we have the same exact effect size.

And it's a small effect size to be fair as well.

And even though we have a small

effect size, which may not be practically significant.

We are able to find the statistically

significant result simply by inflating our sample size.

And remember that the sample size is something

the researcher has control over, because after all,

you get to decide how many observations you want to sample.

Sure there's going to be a bound based on how many, how much resources you

have but at the end of the day that's the human controlled part of a study.

So, when you see highly statistically significant results make

sure that you have a critical eye and make

sure that you also inquire whether the effect size

is reported and what the sample size is as well.

And not only should you inquire this

stuff but if you are reporting

these highly statistically significant results it's

always a good idea to let your readers know of your effect

size and your sample size so that it, the discussion is made

clear if a statistically significant finding

is also practically significant or not.

So to summarize real differences between the point estimate and

the null value are easier to detect with large samples.

However very

large samples will result in statistical significance

even for tiny differences between the sample mean

and the null value or our effect size,

even when the difference is not practically significant.

So in order to make sure that your findings don't suffer

from this problem of being

statistically significant, but not practically significant.

Oftentimes what we do is we do some a priori

analysis before you actually do the data collection to figure out, based

on characteristics of the variable you

are studying, how many observations to collect.

So, it is highly recommended that researchers, either

they should do this themselves, or consult with

statisticians, but to figure out how many observations

to sample before they actually go into doing that.

Because the last thing you want to do is having to find

out is, we have already put in the researches to collect some

data, and you either don't have enough or you have too many observations.

This brings to mind a quote from a famous statistician, R.A. Fisher.

To call in the statistician after the experiment is done may

be no more than asking him to perform a post-mortem examination.

He may be able to say what the experiment died of.