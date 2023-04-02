In this video, we are going to discuss about a different method of sampling called as stratified sampling. I said in simple random sampling, one might require a large sample to make meaningful assessments about the population. Stratified sampling on the other hand, makes it a little easier and less resource intensive to get good inferences about the population. What do we do in stratified sampling? In stratified sampling, we divide the entire population into few strata. For example, if you are interested in the number of books that a person has read over a period of time in a city, maybe we can divide the city into different localities and locality wise, I can sample a few people from locality A - “How many books have you read?” - and locality B, I can sample a few people, and locality C, I can sample a few people. This way I am likely to get a good representative of each of these localities which generally turns into good representative of the entire population and the sample becomes a good sample. For example, on the figure that you are looking on the slide, we can see that the entire population is divided into three strata and we can sample two people from each strata and make excellent inferences about the population. One of the biggest challenges when doing stratified sampling is the question on how are we going to stratify the entire population? For example, we can divide the entire city into localities. Perhaps another way of dividing the population could be whether they have a high-school level of education, or are they college graduates, or do they have a PhD? This probably could be a better way of stratifying the entire city. In that case, maybe we can collect sample from each of these strata differently. However, again, care has to be taken. If the strata are not chosen in a smart manner, you could bring in more biases and your data and your sample could be less useful than what it could have been in the case of simple random sampling. However, if we're smart about stratifying the entire population, then one can make excellent inferences using stratified sampling potentially with a much smaller sample size compared to simple random sampling. The next type of sampling we are going to discuss is what is called as, convenience sampling. In some cases, when there could be a resource constraint, it is not possible to do simple random sampling or even stratified sampling. In that case, convenience sampling serves as an alternative. Again, this is a type of sampling that could bring in biases and one has to be careful before doing convenience sampling. Let us say a person is interested in identifying the number of books or the average number of books that people in a certain city read in a month. Now a simple random sampling will correspond to go into different houses and different places across the city and asking a random person, “How many books did you read in the last month?” In contrast, somebody might just set up a stall outside a mall. They can ask the visitors of the mall, “How many books did you read over the last month?” Well, this does not make the entire population equally likely to provide data for your study. The data is going to be a little biased towards people who are going to mall. However, if you think that a person going into a mall is not likely to influence the number of books that they have read over a month, then this type of sampling could be really useful and could provide as good results as simple random sampling could give. For example, in this picture, assume the people on the border are easily accessible, then one could collect data only from the people who are towards the border and potentially not consider about the people more towards the center. This could give significant reduction in costs but we have to be careful to check whether this sample indeed serves as a good representative of the population. As an example of a type of bias that this could give, let's assume that the people have moved towards the right. For example, this is a situation where a person has set up stall outside a library and surveying people on how many books that they have read over the last month. Well, it is convenient to put stall outside a mall, however, people who have been to a library, or are walking out of a library or into a library, are more likely to have read more books than the average population. Which means, while it is convenient and this type of sampling has reduced your cost of collecting information about the number of books that people have read, we have introduced systematic bias. Similarly in the quality checking scenario, it's possible that in many cases people might be checking the quality of the material that is closest towards the packaging. Maybe, products that are closest towards the packaging are more likely to have systematic damages. As a result, they are more likely to consistently be of inferior quality than the remaining of the products. Well, it is convenient to sample the product that is closer to the packaging however, that might bring in systematic biases. One has to be careful about how good, how representative is the sample of the entire population however, if costs concerns are primary, convenience sampling could be of help.