[SOUND] Welcome. In this lecture you will learn the statistical properties of ordinary least squares. To derive statistical properties of OLS, we need to make assumptions on the data generating process. These assumptions are similar to the ones discussed in previous lectures on simple regression. The first assumption is that the data are related by means of a linear relation. The next two assumptions are that the values of the explanatory factors are non-random, whereas, the unobserved error terms are random with mean zero. Two further assumptions are that the variance of the error terms is the same for each observation, and that the error terms of different observations are uncorrelated. Each observation then contains the same amount of uncertainty, and this uncertainty is randomly distributed over the various observations. The final assumption is that the postulated model in Assumption 1 is correct, in the sense that beta is the same for all observations, and that Assumption 4 is also correct, with unknown values of the parameters beta and sigma squared. These six assumptions are reasonable in many applications. In other cases, some of the assumptions may not be realistic and need to be relaxed. Econometrics has a wide variety of models and methods for such more general situations, and some of these cases will be discussed in other lectures of this course. Now I invite you to prove that Assumptions 4 and 5 give the variance covariance matrix of the n times 1 vector epsilon as shown on the slide. This result follows by direct calculations. You may wish to consult the Building Blocks for further information on the variance-covariance matrix of a vector of random variables. We will first show that the OLS estimator is unbiased. The core idea is to express the OLS estimator in terms of epsilon, as the assumptions specify the statistical properties of epsilon. Therefore, I now invite you to answer the following test question. The answer is shown on the slide. The main idea is to use the well-known OLS formula for b in terms of the data X and y, and to use Assumption 1 to express y in terms of epsilon. We can now compute the expected value of b by means of the steps shown on the slide, where each step shows which assumptions are used. Next, we compute the k times k variance-covariance matrix of b. The main step is, again, to express b in terms of epsilon, as was done before. With this expression, the result is obtained by a sequence of steps shown on the slide. I invite you to take some time to verify these steps. The OLS estimator b has k components, so that the variance-covariance matrix has dimensions k times k. The variances are on the diagonal, and the covariances are on the off diagonal entries of this matrix. Under assumptions one through six, the unknown parameters of our model are beta and sigma squared. In the previous lecture, we provided intuitive arguments to estimate sigma squared by the sum of squared residuals divided by the number of degrees of freedom. We will now show that this estimator is unbiased if assumptions one through six hold true. We present the proof in two parts: first the main ideas, and then the mathematical details. The latter part is optional because it's not needed for the sequel. The main idea of the proof is again to express the random variable of interest in epsilon, as the assumptions are on epsilon. Applying results of the previous lecture, we find that the residual vector e is equal to M times epsilon. The details are on the slide. I invite you to take some time to verify these steps. It then follows rather easily that the variance-covariance matrix of the residual vector e is equal to sigma squared times M. We need the expected value of the sum of squared residuals. And the so-called trace trick states that this is equal to the trace of the variance-covariance matrix of e. It can be shown that the outcome is n minus k times sigma squared, so that the expected value of the sum of squared residuals divided by n minus k is equal to sigma squared. Details of the proof are shown on the next slide which, I repeat, is optional. The crucial idea in the last step of the proof was to use the trace of a square matrix. In general, the matrix product A times B differs from the matrix product B times A, but both have the same trace. This trace trick gives the result. I invite you to check the steps shown on the slide to get additional appreciation of the power of matrices. We derived expressions for the mean and variance of the OLS estimator b. Under Assumptions one through six, the data are partly random, because of the unobserved effects epsilon on y. Because the OLS coefficients b depend on y, these coefficients are also random. Now it is very important to realize that we get a single outcome for b, namely, the one computed from the observed data, y and X. We cannot repeat the experiment and average the results. In the wage example, we cannot ask the employees to redo their lives to get different education levels, let alone to get another gender. Because we get only a single outcome of b, it is important to maximize the chance that this single outcome is close to the DGP parameter beta. This chance is larger the smaller is the variance of b. For this reason, it is important to use efficient estimators. That is, estimators that have the smallest possible variance. We then have most confidence that our estimate is close to the truth. Under assumptions one to six, OLS is the best possible estimator in the sense that it is efficient in the class of all linear unbiased estimators. This result is called the Gauss-Markov theorem. This means that any other linear unbiased estimator has a larger variance than OLS. Because the variance-covariance matrix has dimensions k times k, we say that one such matrix is larger than another one if the difference is positive semi-definite. This means, in particular, that the OLS estimator bj, of each individual parameter beta j, has the smallest variance of all linear unbiased estimators. Now I invite you to make the training exercise to train yourself with the topics of this lecture. You can find this exercise on the website. This concludes our lecture on statistical properties.