The goal of sampling is to obtain the best possible estimate of a population value within the limits of our budget and our time. Suppose we've decided on a sampling method for our study, preferably a probability sampling method if this is at all possible. The question now remains, how many elements we need to sample in order to get an accurate estimate of the population value? An easy answer would be as large a sample as we can afford. Because as sample size increases, the margin of error will decrease. Accidental over or under-representation of certain elements will be less extreme and will become less likely. In other words, a bigger sample is always better in terms of accuracy. But this doesn't mean we should all collect samples consisting of tens of thousands of elements. This is because as the sample size grows, the decrease in the margin of error becomes smaller and smaller. At a certain point, the cost of collecting more elements outweighs the decrease in the margin of error. Say we want to estimate the proportion of votes for candidate A in an upcoming election. Suppose we have a sample of 500 eligible voters. Then the error won't be cut in half if we double the sample to 1,000 elements. The decrease in error will be much, much smaller. Note that it's the absolute size of the sample that matters, not the relative size. It doesn't matter if we're estimating election results in Amsterdam with slightly more than half a million eligible voters or national elections with more 13 million voters. As long as the samples are both randomly selected, the margin of error will be the same, all other things being equal. This seems very counterintuitive, but it's true nonetheless. Of course, there are other factors to consider when deciding on sample size. The variability of the population is an important factor. Heterogeneity or strong variation in the population on the property of interest results in a larger margin of error, all other things being equal. If values in the population vary widely, then a sample is more likely to accidentally over or underestimate the true population value. If the population is more homogenous or similar, meaning it takes on a narrow, limited set of values, well, then the sample value will automatically lie closer to the population value. If a population is more homogenous, we can sample more efficiently. This means, all of the things being equal, that we could achieve a smaller margin of error with the same sample size. Or conversely, we could obtain the same margin of error with a smaller sample, more efficient. If a probability sampling method is used, we could determine what margin of error we're willing to accept given a certain confidence level. We can say that we want our sample estimate of election results to differ by no more than 5% from the final results in 95% of the cases if we were to sample repeatedly. We, or rather a computer, can now calculate exactly what sample size we need to obtain this margin of error at this confidence level. This does require that we use random sampling and that we can estimate the variability in the population. For example, based on previous studies, old census data, or just a best guess if necessary. I'll just mention one other important factor to consider when determining the sample size. It's a good idea to plan ahead and compensate for non-response. Non-response refers to elements in the sample that cannot be contacted, that refuse to participate, fail to complete the study, or provide invalid responses. If the response rate can be estimated based on previous or comparable research, then we can take non-response into account and sample extra elements that will compensate for the expected loss of elements due to non-response.