The bootstrap pushes this a bit further. It makes it possible to use Monte Carlo sampling even in situations where I can not draw as many samples as I wish. Let's look at a simple situation, where I have an estimate theta hat, and I want to know what the standard error of theta hat is. To explain how the bootstrap works, let's first look at the so-called plug-in principle. In the example we had earlier, we were interested in the average height of all people in the US, and we estimated that with the average height theta hat of 100 randomly selected people. This simple step already illustrates the plug-in principle. We can't compute the population mean because there are over 300 million people and we cannot possibly measure all of their heights. So what we do is, we plug in a sample of size 100 in place of the population and we simply compute the mean of the sample instead of the mean of the whole population. So let's look at what we did here in terms of histograms. There was a population histogram which is the histogram of the heights of all the people in the United States. The task was to compute the average of that histogram, and that's impossible to do. So what we did is we drew a sample of size 100, we looked at the histogram of the sample, and we used the average of the sample histogram in place of the average of the population histogram. The reason why this works is because the histogram of the sample tends to look very similar to the histogram of the population. That's really the key idea behind the bootstrap, and we will see how this idea can be used in all kinds of complicated situations. The bootstrap uses both this plug-in principle and Monte Carlo simulation to approximate quantities of interests such as the standard error of a statistic. To explain how it works, remember how we used Monte Carlo simulation to estimate the standard error of a statistic. If I can draw a sample X_1 to X_n, then I can compute my estimate as theta hat based on that sample. If I can sample as many times as I wish, then I can repeat this process many times. Let's say 1,000 times, and I get 1,000 copies of my estimator. We saw earlier, when we discussed Monte Carlo, that the standard deviation of these 1,000 estimates is close to the standard error of my estimator, and that's simply because of the law of large numbers. But remember, the caveat in Monte Carlo was that we only have one sample and we cannot sample as many times as we wish. The trick that the bootstrap uses is the plug-in principle. It simply simulates from the sample histogram instead of from the population histogram. In other words, the bootstrap pretends that the sample histogram is the population histogram, and then simply uses Monte Carlo. So, how does that work? Drawing a sample from the sample histogram means drawing with replacement from the n numbers X_1 to X_n. Let's call those numbers X_1 star to X_n star. That means, X_1 star is drawn at random from these n numbers, X_1 to X_n. And likewise, X_2 star is drawn at random and X_3 star is drawn at random. Such a bootstrap sample, X_1 star to X_n star, has numbers that are among the sample X_1 to X_n. Some of the X's may come up several times, and others may not come up at all. Now what the bootstrap does is, it draws capital B bootstrap samples, and computes the estimator for each bootstrap sample. So we draw a bootstrap sample, just as explained above, simply by drawing with replacement from the original data. We evaluate our estimator, and we call this thing theta 1 hat star. Then we repeat the whole process capital B times. Let's say 1,000 times. And we come up with 1,000 copies of these estimators. Then we use these 1,000 copies to approximate the quantity of interest just as we did in the example of Monte Carlo simulation. For example, we would approximate a standard error of theta hat by the standard deviation of these 1,000 estimates. So, in other words, the bootstrap uses two approximations. In the first approximation, it replaces the population histogram by the sample histogram. In the second approximation, it does Monte Carlo in order to approximate a quantity by the law of large numbers.