[MUSIC] When you design a new study, one of the most important things to keep in mind is the sample size that you want to collect. We see that in many studies sample sizes are too small. And I want to give you some practical advice on how to decide upon a sample size for a future study that you'll do. So if you would design a new study, how would you determine the sample size? Take a moment to think about this. There are different approaches to do this, and we'll talk about them in this lecture. What I used to do, when I was doing my PhD, was either use exactly the same number of participants as a previous study had used or I would use a heuristic. In my time, it was quite common to use a sample size of 15 participants for each between-subjects condition. Nowadays, this number has increased a little bit. I see that people very often use about 25 participants for each between-subjects condition. But is this sufficient? And if not, what should you do instead of relying on these heuristics? Now first, let's realize what the problem is if you collect small samples. Small samples always have large variation leading to very inaccurate estimates. But when you perform an hypothesis test, they also lead to high Type 2 error rates. Which means that you perform a study, you do a statistical test, and even when there is true effect, the test result is not statistically significant. It's important to prevent this, both Type 2 errors and inaccurate estimates. This is a nice visualization by Schonbrodt and Perugini who show how the variation in data decreases as the sample size becomes larger. From the left to the right, we see a continuously increasing sample size. We also see a black line. And in this case, the black line is actually the effect size estimate from a study they performed as they collected more and more data. So the effect size was calculated after every participants and it's plotted here as a black line. You can see that there's a huge drop in the effect size. And what they show is that actually such variation is possible, because small samples have very large variation. I referred to this as sailing the seas of chaos into what Schonbrodt and Perugini named the corridor of stability. On the left you see huge waves. If you have small samples, then your data is riding these huge waves. Effects can be very large or very small. The variation is substantial. Now as you enter the corridor of stability, as Schonbrodt & Perugini call this, you see that the variation becomes much less severe. There's still some variation around the true effect size estimate, but at this point, you're making very accurate estimations. Now we see that researchers failed to take the sample size into account when they design new studies. And this is nothing new. We've known this for about 50 years, and only now it is very slowly changing. In certain literatures where sample sizes are very expensive, so if you want to collect a large number of people, you have to have a lot of money, such as neuroscience, where we put people in fMRI scanners, we see that there's really a huge power failure. Studies are severely underpowered. Interestingly, people have been saying this for years and years, and there are even papers that say that the studies of statistical power don't have any effect on the power of studies. This is slightly depressing, because here I'm talking about how you should design well-powered studies, even though past research has shown that this does not have an effect on what researchers really do. But on the other hand, we see the things are now really changing. More and more journals are requiring what's known as a sample size justification. You need to explicitly write down why you selected this specific sample size. So I seriously think that in the future we'll see a big change in this specific area. We see that studies in psychology often have very low statistical power. Estimates average around 50%, which means that, even if you have a true hypothesis, you only have a 50% probability of actually observing a statistically significant finding. I think that if people realized this, they would seriously reconsider doing the study, because they only have 50% probability of finding an effect when it's there. That doesn't seem to be worth all the effort to collect this data. Now, one reason for low powered studies is that people don't really think about how to design their study and especially the sample size of their study. But people use heuristics instead. Now, heuristics are always wrong. You should never rely on heuristics to plan the sample size, no matter what the recommendation of this heuristic is. Whether it’s 10, 20, 50 people, you should always use a better approach and more informed justification of your sample size. Nowadays, journals ask you do this. So you need to justify the sample size of a study. The justification depends on the goal that you want to achieve. Let's consider different goals. In this graph, we can see again correlation between two variables. We see a rather wide confidence interval. In this case there are only ten observations. So the confidence interval is really wide. If we increase the sample size to 100 participants, we see that the confidence interval becomes much more narrow. If your goal is to design a study that will accurately estimate the true effect size, then having a sample size that allows you to achieve a certain accuracy is a good goal. So one approach is to plan for accuracy. Select a sample size based on the width of a confidence interval. A classic paper on this is Maxwell, Kelley and Rausch. I recommend that you can read this if you want to know how to design studies based on accuracy. The second approach is to design a study so that it will have a specific statistical power. Now we already saw this graph earlier on in the course. And in this simulation, we have 50% power, which means that 50% of the p-values end up in the leftmost bin. Now, you might think this is too low. You want to design a study which has much higher power. For example, 95% power. So you can design a a study where you plan for power. You select the sample size based on the probability of observing a p-value smaller than 0.05. And in these cases we would do a priori power analysis based on an expected effect size. If we take a look at the power for an independent T test with a coens D of 0.5, we can the sample size that you need to have a substantial power can become quite large. Very often, if people start to perform power analysis, their first complaint is that the sample size that's recommended is much larger than they think is necessary. Now people seem to have some intuitive feel of the sample size that they need to achieve 50% power. As we see, the literature mainly has studies with about 50% power. So in the beginning, you might be surprised by how many people you need. If you want to have about 95% power, you would need a sample size of about 100 people in each condition. Which is quite a lot more than the 15 participants in each between-subject condition that I designed when I was doing my PhD. So be prepared for slightly larger sample sizes. Now when you base your power analysis on an effect size from the published literature, be very careful. We know that publication bias inflates effect size estimates in the literature. So you always want to be a little bit more conservative. A power analysis should be a starting point for designing the sample size in a future study. It should not be the end point. So always be informed by more than the power analysis itself. If you perform a power analysis, always use unbiased effect size estimates in the power analysis. So Hedges' g, epsilon, or omega. If effect sizes are uncertain, there's not a lot to go on in the published literature, or there's nothing comparable in the published literature, then sequential analyses are one way to let you look at the data as it comes in, while controlling your Type 1 error rates. So if there is huge uncertainty, but you still want to achieve a statistically significant result, that's the goal, then instead of doing an a priori power analysis, you can use sequential analysis and collect data until you find a statistically significant effect, or until you don't want to collect more participants. Now instead of planning for accuracy or power, it's perfectly fine to plan for feasibility. You don't have infinite resources, infinite amount of time or money, or sometimes the population that you want to collect data from is very, very difficult to reach. So in these cases the feasibility of collecting a large sample size is limited by how easy it is to find participants. So, it's perfectly fine to select the sample size based on the time, the money, or the participants that you have available. This can also be a way to plan a study. This is just the best that you can do given the resources that you have. And that's perfectly fine. Of course, in these situations, it might be that the statistical power is low, but in that case you might just want to estimate the effect size, instead of performing a statistical test. You can also use Bayesian statistics. In Bayesian statistics, you don't have to specify the sample size before you start. You can collect data and continuing data collection until whenever you want. This quote by Edwards, Lindman, & Savage says, it is entirely appropriate to collect data until a point has been proven or disproven, or until the data collector runs out of time, money, or patience. If we design a study based on the statistical power, this is a frequentist approach to designing a study and determining the sample size. But in Bayesian statistics, you just quantify the evidence that's in the data. And if you're not happy with the evidence you have, you can always collect more data. So Bayesian statistics is much more flexible. You can determine the sample size any way you want. On the other hand, there is no easy way to control Type 1 errors. It's entirely uncontrolled, but it's just much more difficult to quantify the Type 1 error rates directly. Nowadays, its becoming increasingly important to justify the sample size of a study that you are designing. In the past, we've seen that people designed underpowered and inaccurate studies, and in the future, this really needs to change. The sample size is a very important aspect of the design of a new study. Don't ignore it. [MUSIC]