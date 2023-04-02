In this video, we're going to look at properties of estimators. These properties will depend on the sampling distribution of the estimators that we look at. Now, as you would recall, these were the distributions of different sample estimators; the sample mean, the sample median, the sample standard deviation, and the sample range. So what are the properties that we would want an estimator to have so that we can use the estimator efficiently to estimate the population parameter? Now the first thing that we would want an estimator to be is unbiased. Point estimator is said to be unbiased if it neither systematically overestimates nor systematically underestimates the population parameter. This is so that we know that when we're estimating a population parameter based on a particular estimator, any overestimation or underestimation would be simply because of the sample that we chose and not because of the estimator. To check for unbiasness, what we ensure is that a point estimator must have its expected value equal to the population parameter that it estimates. Now, we see this happening very nicely when we look at a point estimator for means. If we look at a sample mean as an estimator for the population mean, we find out that the sampling distribution has this nice symmetric shape around the population parameter. There is a theoretical justification why this is so, and that is as follows. Now let us suppose that you choose a particular set of points to be in the sample. Now since choosing a particular value does not influence choosing of another value in the sample, all these sample values are independent. Now take a look at the first value. The first value is chosen at random from the population, so is the second value, so is the third value, and so are all the values of the samples. So all the values in the sample are also identically distributed. Therefore, the values in the samples are all independent and identically distributed and hence, the central limit theorem that we have looked at earlier applies. Now the central limit theorem tells us that if we choose a sample in this manner, then the sample average will be normally distributed around the population mean. In other words, the expected value of the sample mean would be equal to the population mean and hence, the sample mean would be an unbiased estimator of a population mean. On the other hand, if we look at the sampling distribution of the sample standard deviation, then we see that this distribution is not symmetric around the population parameter. We see that the sampling distribution is lopsided and there are a lot more values to the left of the parameter than to the right of the population parameter. So if we use this estimator, we systematically underestimate the population standard deviation. Therefore, a sample standard deviation is not an unbiased estimator of the population standard deviation, it is a biased estimator. Luckily, we have a correction factor that we know that will turn this biased estimator into an unbiased estimator. Another property that we look for in a point estimator is efficiency. Now, assume that we have two estimators and we want to find out which of these two estimators should we rather choose to estimate a population parameter. We will choose something with a higher relative efficiency where a point estimator is said to have a higher relative efficiency than another if it has a smaller standard error than the other. To take an example, let us look at the sample mean as an estimator for the population mean. This is the distribution of the sample mean around the population mean. Now, if we choose another hypothetical estimator whose distribution looks like this, then we see that if we compare the two, the distribution of that alternate estimator is more widely spread around the population parameter than the sample mean as an estimator. The standard error of the sample mean is smaller than the standard error of that other estimator and hence, we say that the relative efficiency of the sample mean as the estimator for the population mean is higher than that of an alternate estimator. A third property that we look at is called consistency. A point estimator is said to be consistent if the values of the point estimator tends to be closer to the population parameter as the sample size increases. So in this case, we are looking at a single estimator, but we are changing the population size. Now, let us see what happens to the sample estimator if we increase the sample size in terms of the sample mean, and the sample standard deviation. Here we again look at the sampling distribution of the sample mean when we keep the sample size as 25. Let us run this experiment. Look at the spread of this distribution, and now we will compare the spread with what happens when we choose the sample size to increase to 100. Now we run the experiment again, this time with samples of size 100. Even at this point, you must be noticing that the standard error of the sample distribution is less than that of the sampling distribution when the sample size was 25. There we have it. This sampling distribution, if you would notice, is much less spread out than the sampling distribution that we had when the sample size was 100. We now look at the same experiment, but this time with sample medians. Initially we'll start off with sample size of 25 and generate 1,000 samples and find the sampling distribution based on these 1,000 samples. As we saw earlier, the sampling distribution here is biased. It is positively biased in that, there's more mass to the left of the population parameter, than to the right of the population parameter, take a look at the standard error, that is the standard deviation of the sampling distribution. We will compare that to the case when the number of points in the sample increases to 100. Let us increase the sample size to 100, and run the experiment again. You can observe that the standard error of this sampling distribution is already much smaller than that when the sample size was taken as 25. Let us complete this experiment and see whether that holds. Yes, we see that at the end of generating 1,000 samples, the sampling error is much smaller, even for standard deviations, when a sample of size 100 was chosen compared to when a sample of size 25 was chosen. Now the diagrams that you see here are the sample distributions of the sample mean, when the sample size is 25 to the left, to when the sample size is 100 to the right. If you see the diagram to the left is a much more spread out diagram than the diagram to the right. In other words, if we choose a random value from the distribution shown on the left, and we choose a random value from the distribution that we see on the right, then we can say that the value that we chose in the diagram to the right has a higher probability of being closer to the population mean. In that way, we see that the sample mean is a consistent estimator of the population mean. As the sample size increases, the values of the sample mean becomes closer and closer to that of the population mean. We also saw the same thing happening when we looked at the sample standard deviation. Notice that both of these are biased estimates of the population standard deviation. If you see that the values in the distribution to the left are more spread out than the values that you see in the distribution to the right, therefore, this estimator is a consistent estimator. The values that you get when you increase the sample size are closer to the population parameter. This, for instance, is not observed when you're looking at the sampling distribution of the sample median. Here are two distributions of sample median. The one above is for a sample size of 25 and the one below is for a sample size of 100. If you notice, even as the sample size increases, the distribution does not show any tendency of coming closer to the population parameter and therefore, we see that the estimator for the population median is not a consistent estimator. In summary, what do we want from a good estimator? We want that a good estimator has to be unbiased, which means that the expected value of the estimator has to be the same as the parameter that it estimates. We would also want it to be relatively efficient, which means that if standard error should be the minimum among all candidate estimators that we have. We also want it to be consistent, which means that as the sample size increases, the estimates should come closer and closer to the population parameter that it estimates.