There are many sophisticated probabilities sampling methods. I'll discuss two methods that go beyond the basic idea of random sampling, but are still relatively simple. These are stratified random sampling and multi-stage cluster sampling. In stratified random sampling, we divide the population into mutually exclusive strata. We sample from each stratum separately, using simple random sampling. The separately sampled elements are added together to form the final sample. Stratified random sampling is useful for two reasons. First, it allows us to ensure that at least in terms of the sample strata, our sample is representative. This means sub-populations are represented in the sample in exactly the same proportion as they appear in the population. With simple random sampling, we can expect the sample to be representative in the long run, but due to chance, in any particular sample, strata might be over or underrepresented. Second, stratification is useful because it can make sampling more efficient. This means, all other things being equal, that we achieve a smaller margin of error with the same sample size. Stratifying only increases efficiency if the strata differ strongly from each other relative to the differences within each strata. Imagine we want to sample the quality of cat food produced on an assembly line. The line produces cat food made with fish and cat food made with beef. Suppose the average quality of beef cat food is higher than that of fish cat food. Also, the quality varies relatively little when we consider each type of food separately. Under these circumstances, we will obtain a more accurate estimate of the populations food quality if we stratify on food type. This is because quality is related to food type. Even a small overrepresentation of one food type can distort our overall estimate of food quality. Stratifying prevents this distortion. If the quality does not differ between food types, then overrepresentation of one food type will not distort the overall estimate, and stratification will not improve efficiency. It is important to realize that stratified sampling requires that we know which stratum each element belongs to. If we can identify strata, then we also know their size. As a consequence, the size of our sub-samples does not have to correspond to the size of the strata. We can calculate a representative estimate by weighting the sub-samples according to stratum size. Why would we do this? Well, suppose our stratum of fish cat food is relatively small or is known to strongly vary in quality. In both cases, our estimate of the quality of fish cat food might be much less likely to be accurate than that of beef cat food. It might be worth it to take a bigger sample of fish cat food so we have a better chance of getting an accurate estimate. Of course, this means overrepresenting fish cat food. We can correct for this overrepresentation by weighing the separate estimates of fish and beef cat food according to their stratum sizes before averaging them into an overall estimate of food quality. This way, the sample value is representative, efficient and more likely to be accurate. Let's turn to multi-stage cluster sampling, the final type of random sampling I want to discuss. Multi-stage cluster sampling allows us to use random sampling without going bankrupt. Consider sampling frames that consist of all inhabitants, students or eligible voters in a certain country. If we were to randomly select elements from these frames, we would have to travel all over the country. In most cases, this is just too expensive. A solution is to randomly sample in stages by first selecting clusters of elements. Say, we want to sample math performance in the population of all Dutch students currently in their third year of secondary education. We start by forming a sampling frame of all school districts. This is the first stage, where students are clustered in districts. We randomly select a very small sample of school districts. We can use stratification to make sure we include districts in urban and rural areas. In the second stage, we randomly select schools from the previously selected districts. Students are now clustered in schools. In the third stage, third year math classes are randomly sampled from the previously selected schools. We can even include a fourth stage where students are randomly sampled from the previously selected classes. Stratification can be used in all of these stages. Multi-stage cluster sampling makes random sampling feasible. But the margin of error is harder to determine because the probability to be included in the sample is no longer the same for all elements like it was with simple random sampling. Also, cluster sampling is usually associated with a larger margin of error even if stratified sampling is used to increase efficiency. However, these disadvantages are generally more than outweighed by the reduction in cost and effort.