0:34

There are different approaches to do this, and we'll talk about them in this lecture.

Â What I used to do, when I was doing my PhD, was either use exactly the same

Â number of participants as a previous study had used or I would use a heuristic.

Â In my time, it was quite common to use a sample size of 15 participants for

Â each between-subjects condition.

Â Nowadays, this number has increased a little bit.

Â I see that people very often use about 25 participants for

Â each between-subjects condition.

Â But is this sufficient?

Â And if not, what should you do instead of relying on these heuristics?

Â 1:11

Now first, let's realize what the problem is if you collect small samples.

Â Small samples always have large variation leading to very inaccurate estimates.

Â But when you perform an hypothesis test, they also lead to high Type 2 error rates.

Â Which means that you perform a study, you do a statistical test, and

Â even when there is true effect, the test result is not statistically significant.

Â It's important to prevent this, both Type 2 errors and inaccurate estimates.

Â 1:42

This is a nice visualization by Schonbrodt and Perugini who show

Â how the variation in data decreases as the sample size becomes larger.

Â From the left to the right, we see a continuously increasing sample size.

Â We also see a black line.

Â And in this case, the black line is actually the effect size estimate from

Â a study they performed as they collected more and more data.

Â So the effect size was calculated after every participants and

Â it's plotted here as a black line.

Â You can see that there's a huge drop in the effect size.

Â And what they show is that actually such variation is possible,

Â because small samples have very large variation.

Â I referred to this as sailing the seas of chaos into what Schonbrodt and

Â Perugini named the corridor of stability.

Â On the left you see huge waves.

Â If you have small samples, then your data is riding these huge waves.

Â Effects can be very large or very small.

Â The variation is substantial.

Â Now as you enter the corridor of stability, as Schonbrodt &

Â Perugini call this, you see that the variation becomes much less severe.

Â There's still some variation around the true effect size estimate, but

Â at this point, you're making very accurate estimations.

Â Now we see that researchers failed to take the sample size into account when they

Â design new studies.

Â And this is nothing new.

Â We've known this for about 50 years, and only now it is very slowly changing.

Â In certain literatures where sample sizes are very expensive, so

Â if you want to collect a large number of people, you have to have a lot of money,

Â such as neuroscience, where we put people in fMRI scanners,

Â we see that there's really a huge power failure.

Â Studies are severely underpowered.

Â Interestingly, people have been saying this for

Â years and years, and there are even papers that say that the studies of

Â statistical power don't have any effect on the power of studies.

Â This is slightly depressing, because here I'm talking about how you should

Â design well-powered studies, even though past research has shown that this

Â does not have an effect on what researchers really do.

Â But on the other hand, we see the things are now really changing.

Â More and more journals are requiring what's known as a sample size

Â justification.

Â You need to explicitly write down why you selected this specific sample size.

Â So I seriously think that in the future we'll see a big change in this

Â specific area.

Â We see that studies in psychology often have very low statistical power.

Â Estimates average around 50%,

Â which means that, even if you have a true hypothesis, you only have

Â a 50% probability of actually observing a statistically significant finding.

Â I think that if people realized this,

Â they would seriously reconsider doing the study,

Â because they only have 50% probability of finding an effect when it's there.

Â That doesn't seem to be worth all the effort to collect this data.

Â 4:39

Now, one reason for low powered studies is that people don't really

Â think about how to design their study and especially the sample size of their study.

Â But people use heuristics instead.

Â Now, heuristics are always wrong.

Â You should never rely on heuristics to plan the sample size,

Â no matter what the recommendation of this heuristic is.

Â Whether itâ€™s 10, 20, 50 people, you should always use a better approach and

Â more informed justification of your sample size.

Â Nowadays, journals ask you do this.

Â So you need to justify the sample size of a study.

Â The justification depends on the goal that you want to achieve.

Â Let's consider different goals.

Â In this graph, we can see again correlation between two variables.

Â We see a rather wide confidence interval.

Â In this case there are only ten observations.

Â So the confidence interval is really wide.

Â If we increase the sample size to 100 participants,

Â we see that the confidence interval becomes much more narrow.

Â If your goal is to design a study that will accurately

Â estimate the true effect size, then having a sample size

Â that allows you to achieve a certain accuracy is a good goal.

Â So one approach is to plan for accuracy.

Â Select a sample size based on the width of a confidence interval.

Â A classic paper on this is Maxwell, Kelley and Rausch.

Â I recommend that you can read this if you want to know how to design studies

Â based on accuracy.

Â The second approach is to design a study so

Â that it will have a specific statistical power.

Â Now we already saw this graph earlier on in the course.

Â And in this simulation, we have 50% power,

Â which means that 50% of the p-values end up in the leftmost bin.

Â Now, you might think this is too low.

Â You want to design a study which has much higher power.

Â For example, 95% power.

Â So you can design a a study where you plan for power.

Â You select the sample size based on the probability of observing a p-value

Â smaller than 0.05.

Â And in these cases we would do a priori power analysis based on an expected

Â effect size.

Â 6:56

If we take a look at the power for

Â an independent T test with a coens D of 0.5, we can the sample size

Â that you need to have a substantial power can become quite large.

Â Very often, if people start to perform power analysis, their first complaint is

Â that the sample size that's recommended is much larger than they think is necessary.

Â Now people seem to have some intuitive feel of the sample

Â size that they need to achieve 50% power.

Â As we see, the literature mainly has studies with about 50% power.

Â So in the beginning, you might be surprised by how many people you need.

Â If you want to have about 95% power,

Â you would need a sample size of about 100 people in each condition.

Â Which is quite a lot more than the 15 participants in

Â each between-subject condition that I designed when I was doing my PhD.

Â So be prepared for slightly larger sample sizes.

Â 7:52

Now when you base your power analysis on an effect size from the published

Â literature, be very careful.

Â We know that publication bias inflates effect size estimates in the literature.

Â So you always want to be a little bit more conservative.

Â A power analysis should be a starting point for

Â designing the sample size in a future study.

Â It should not be the end point.

Â So always be informed by more than the power analysis itself.

Â 8:30

If effect sizes are uncertain, there's not a lot to go on in the published

Â literature, or there's nothing comparable in the published literature,

Â then sequential analyses are one way to let you look at the data as it comes in,

Â while controlling your Type 1 error rates.

Â So if there is huge uncertainty, but

Â you still want to achieve a statistically significant result, that's the goal,

Â then instead of doing an a priori power analysis, you can use sequential analysis

Â and collect data until you find a statistically significant effect, or

Â until you don't want to collect more participants.

Â 9:05

Now instead of planning for accuracy or power,

Â it's perfectly fine to plan for feasibility.

Â You don't have infinite resources, infinite amount of time or money,

Â or sometimes the population that you want to collect data from is very,

Â very difficult to reach.

Â So in these cases the feasibility of collecting a large sample size

Â is limited by how easy it is to find participants.

Â So, it's perfectly fine to select the sample size based on the time,

Â the money, or the participants that you have available.

Â This can also be a way to plan a study.

Â This is just the best that you can do given the resources that you have.

Â And that's perfectly fine.

Â Of course, in these situations, it might be that the statistical power is low, but

Â in that case you might just want to estimate the effect size,

Â instead of performing a statistical test.

Â 9:59

You can also use Bayesian statistics.

Â In Bayesian statistics, you don't have to specify the sample size before you start.

Â You can collect data and continuing data collection until whenever you want.

Â This quote by Edwards, Lindman, & Savage says,

Â it is entirely appropriate to collect data until a point has been proven or

Â disproven, or until the data collector runs out of time, money, or patience.

Â If we design a study based on the statistical power, this is a frequentist

Â approach to designing a study and determining the sample size.

Â But in Bayesian statistics, you just quantify the evidence that's in the data.

Â And if you're not happy with the evidence you have,

Â you can always collect more data.

Â So Bayesian statistics is much more flexible.

Â You can determine the sample size any way you want.

Â On the other hand, there is no easy way to control Type 1 errors.

Â It's entirely uncontrolled, but

Â it's just much more difficult to quantify the Type 1 error rates directly.

Â Nowadays, its becoming increasingly important to justify

Â the sample size of a study that you are designing.

Â In the past, we've seen that people designed underpowered and

Â inaccurate studies, and in the future, this really needs to change.

Â The sample size is a very important aspect of the design of a new study.

Â Don't ignore it.

Â [MUSIC]

Â