So what we're trying to do with sampling is trying to draw inferences or conclusions about what the truth is. Lets say we take a country like Mali and an indicator like exclusive breastfeeding, the prevalence of exclusive breastfeeding. If we went to every mother of a baby aged zero to five months in Mali and we asked her about what her child consumed the previous day? We would have a pretty accurate and precise measure of the prevalence of exclusive breastfeeding in that population, right? Because we would have data on all the babies in that population, but we don't have the time or the money to do that. So instead of going to everyone in the population, we draw a sample of the population. And we try to draw that sample in such a way that it gives us an estimate, right? Of the indicator that is reasonably close to the truth. So we draw a sample, we make some observations that are imperfect based on the sample, and then we draw an inference about the truth and the population. One type of sampling is probability sampling and that's what I'm going to be talking about for the rest of this course. Probability sampling is the gold standard approach for sample surveys, and it means that every element. So in the case of household surveys, every household in the survey population has a known, non-zero probability of being sampled. So what is a non-zero probability of being sampled mean? That just means every household in the survey population, in the population that you're interested in drawing inferences about, every household has the possibility of being sampled. So the sampling design should give every household in the survey population the opportunity to be sampled. A known probability of selection just means that we need to be able to calculate the probability of selection for every household in the survey population. That sounds complicated but it's actually not, and we'll show you how we get to that probability of selection. So why are we talking so much about probability sampling? Why is it important? Probability sampling is designed to produce a sample that represents the population, right? So this reduces selection bias, this ensures that the estimate that you're getting out of your survey is reflective of what is actually happening in the population. We also need probability sampling in order to calculate the uncertainty around estimates. So the standard errors and confidence intervals around your estimates. The calculation of standard errors and confidence intervals actually makes a number of assumptions about sampling. And so if you have not used probability sampling, you are violating some of those assumptions when you are estimating your standard errors in confidence intervals. And then probability sampling is important in order to weight the sample appropriately. We'll talk a little bit about weighting later in this course. But essentially we use probabilities of selection to weight the sample. So often when you're sampling, you may be oversampling uncertain areas or the probabilities of selection may be different in different areas. And in order to get an estimate that again looks like the population, you will need to weight the sample and in order to do that, you need to know the probability of selection. What is the cost of not using probability sampling, right? So if I decide not to do this with too much bother, we don't have the technical expertize, it costs too much money, what are the consequences? First of all, you may have biased estimates, right? So your results may not reflect the true health status of the population, the actual coverage of the population. What does that mean for, say an evaluation? You can end up concluding either that your program had an impact when in fact it didn't, or that your program did not have an impact when it did. Even if you're just using a survey for sort of program planning purposes, right? To refine your program, understand the reason to do the survey is to understand what's happening in the population. So if you get estimates that are not reflecting what's happening in the population, it's not going to be particularly useful for your program planning. So what do we need for a probability sample? We need a sampling frame, a list of all the households in the survey population. So if you're doing your survey, say in Simiyu Region of Tanzania you would need, the sampling frame would be a list of all of the households in Simiyu region. Now, if you're doing your survey in say 10 villages, you can develop that kind of sampling frame, that's not a problem. But if you're doing a large scale survey in a district or a region or an entire country, it's not feasible, right? To develop a sampling frame of all of the households in that population. So instead, we have an alternative approach that we use where that is called cluster sampling and that we'll talk about a little bit later in this module.