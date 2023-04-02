In this video, we are going to talk about interval estimation. Now let us first find out what interval estimation means. In point estimation, we take a random sample from a population and compute a sample statistic as an estimate of a population parameter. Now we know that the probability that this estimate is equal to the parameter is statistically zero. Here, we would want to find out how confident we are that the estimate is reasonably close to the population parameter. For instance, if we want to take the sample mean as an estimate of a population mean, we want to find out how confident we are that the sample mean that we calculate is reasonably close to the population mean. To make the problem a little more concrete, let us take an example. Suppose each of the 1,000 employees who worked for a company five years back have a certain number of shares in their possession. Now we are interested in finding out details about the number of shares that these employees hold. If all the employees were willing to talk to us, we would know that the mean number of shares that they held of the company was 274.21 with a standard deviation of 122.65, but in most real-world situations, we are not likely to get this data. What we can do is we can collect data from a subset of these employees. Let us suppose we collect data from 25 employees chosen at random and we find out that the sample mean, which is a point estimator of the population mean is 288.04. The question that we're asking now is, how close is this value of 288.04 to the population mean? Let's see how we do this. Suppose that the sample mean here is a number on the number line, we do not know what the population mean is. What we will do is we will create an interval around it with a pre-specified rate and hope that the population mean is within this interval. The width of this interval is a measure of how close we think the sample mean is to the population mean. The probability that the population mean is within such intervals is a measure of our confidence in our claim. Now notice the interval is created around the sample mean, so if the sample mean is different, the interval will also change. The only thing that does not change is the width of this interval. Let us see what this means in practice. Here is the data about the number of shares held by all 1,000 employees. If we had all this data, we would see that the average number of shares held by employees is 274.21. However, we do not have all the data, but just a sample of 25 employees. Now here is a random sample of 25 employees from the population. We will use the sample mean as an estimator of the population mean. The sample mean, in this case, is 289.8. We've chosen an interval width of 20 here and this interval is symmetric around the sample mean. The interval that we're looking at is the interval from 279.8 to 299.8. The population mean is 274.21, although we do not know that while we're drawing the sample, but we see that the population mean does not lie within this interval. Now, note that as the sample changes, the population mean sometimes lies within this interval and sometimes it does not lie within this interval. As we see more often than not, it does not lie within the interval. Now how often does the population mean lie inside the interval that we have created? To answer this, we look at what happens if we generate 1,000 samples and see how often the population mean falls inside the interval. We represent this as a percentage of the numbers of samples generated. Let us see what happens when we generate the samples. We can see that this percentage quickly stabilizes around 30 percent. Let us wait for the 1,000 samples to be generated. The percentage does not change much. We see that if we draw a sample of size 25 and keep the interval rate at 20, the population mean will lie within that interval approximately 30 percent of the time. Let us now increase the width of the interval to 40, while we keep the sample size unchanged. As the interval width increases, we would think that the percentage of times that the population mean will lie inside that interval will increase. Let us see whether that is the case. You would be correct because we see that the percentage jumps from 30 percent to approximately 60 percent when we increase the interval width to 40. The percentage of times when the population mean lies within that interval has changed to approximately 60 percent. Finally, let us return to the interval width of 20 and increase the sample size. We'll increase the sample size to 100 and see what happens. As the sample size increases, the sampling distribution becomes narrower and the sample mean is more often closer to the population mean than before, so it is likely that the population mean will lie inside that interval more often. Let's see. We see that when the sample size increases, keeping the sample width the same, the percentage of times that the population mean lies within that interval increases from the original 30 percent to an approximate 55 percent. We have seen from our experiment that if the width of the interval increases, keeping the sample size constant, the chance that the population parameter will be inside that interval, that increases. We have also seen that if we keep the width of the interval constant and increase the sample size, the chance of the population parameter being inside that interval also increases. The question is, what is the minimum width of the interval so that the population mean will be inside that interval with the probability alpha, regardless of the value of the sample mean? This interval is called a 100 alpha percent confidence interval. For instance, if we want a 90 percent confidence interval, we want to find out the minimum width of the interval so that the population mean will be inside that interval with probability 0.9, regardless of the value of the sample mean.