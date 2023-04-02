So far, we have talked about simple random sampling, which is one of the very expensive types of sampling followed by stratified sampling, which was likely to include some bias, but could potentially be little cost-effective compared to simple random sampling. Then we talked about convenience sampling, which has potential to even more bias, but could give you even more cost savings. Now, we're going to talk about voluntary sampling, which could bring even more bias compared to convenience sampling, but it is going to be even more cost effective now. In voluntary sampling, there is no business of the statistician to go to different people and ask individually data. Instead, generally a question is posed in a common location, perhaps on a notice board asking people to mail in their reply or on your website asking people to fill a form or through email distribution asking people to fill a survey. Here, people can volunteer to provide information which could be used as a method to make inferences about the population. Well, if there are no biases, if there is no incentive or a disincentive to disclose the information or answer the survey. This is going to be amazing because it is going to be as random as simple random sampling. But again, how can biases be included and while along associated with voluntary sampling? For example, let us go to the same old question about a person being interested in finding the number of books a person in a city reads on an average. Now, if I post a question - “How many books did you read in the last month?” – and expect people to volunteer information? Who are the people who are more likely to answer the question? Well, I would expect bibliophiles who like to read a lot of books are more likely to answer this question. Why? Because there is some positive social notion associated with reading books and there is some negative association with a person not reading books. People who read a lot of books and love to brag about that are more likely to fill your survey while people who do not like reading books probably are not interested in filling the survey and expressing that they have not read the book. You have introduced systematic bias. Similarly, the income question, maybe people who are in the extremely low income bracket are not happy to disclose their income and look poor and people who are extremely at a high-income bracket are also not interested in expressing their income to not make others envious. In all these cases, we might involve systematic biases on what the sample could be and the sample can turn out to be not at all representative of the population. However, if your question is something that’s bleak and it is not likely to bring in any of these biases then voluntary sampling could be really cost-effective method to get data from the population. Finally, we are going to talk about snowball sampling, which is another way of sampling. This has little to do with cost effectiveness, and it has more to do with disclosing information. For example, let us say a person is interested in doing study among drug users. There is a strong social stigma against expressing who is a drug user or you might not even know where the population is. You may not even know who are the potential drug users in the past from whom you can collect data. The way sampling is done Is something called a snowball sampling in this context. Again, this is subject to extreme bias, but it is the only way to sample in some cases where people are not interested to express whether they belong to the population or not. In snowball sampling it works as follows: you typically begin with one person. In this case, you typically begin with a person whom you already know as drug users. Then it is likely that this person knows one or a few other drug users in the city. Well, now you got a new potential sample and you ask the required data from that person. You also ask the person to point to one or more people. In this way, you build a network of who knows who, and collect data from a sample. You hope the sample is reasonably diverse and it mimics all the properties of simple random sampling. Again, the bias could be person A might know B, B might know C and C might know only A. Which means instead of getting a representative sample from the entire population, you might be collecting sample only from a small clique, for example in the context of drug users, maybe you collect data only from a small group of friends who talk to each other, or perhaps somebody who buy drugs from the same dealers. But in many cases, we live with this bias and still do snowball sampling only because we do not have a better alternative to even find the sample or to even find who belongs to the population of interests and who doesn't.