[SOUND] Welcome. In this lecture, you will learn how to evaluate whether a model is actually a good model. Suppose you use the techniques from lectures 3.2 and 3.3, and obtained estimates for the parameters of some model. How to know whether the model is satisfactory? We will turn to a number of tests you can use to evaluate the model. In lecture 3.3, we started with a linear model and extended this to a non-linear model by adding square and interaction terms. Suppose you want to test whether the linear model is good enough or that these extra terms should be added. A simple idea is to study the joint significance of the gamma coefficients on the squared and interaction terms. The key challenge here is that this model contains many parameters. Here we have been even fairly modest by only considering squares, but of course more powers can be added which multiplies the number of parameters. Fortunately, there is an easy way to reduce the number of parameters. We simply include powers of fitted y values based on the linear model, instead of the square and interaction terms. The test for non-linearity is then on the joint significance of the gammas in this model. The test here is written general, with p powers and thus p gamma coefficients. Under the null of a correct linear specification, the gammas are 0, and the test is an F-test. The number of restrictions are p, and the total number of parameters in the unrestricted model k+p, such that the degrees of freedom are p and n-k-p. The F distribution is however approximate, as the y hat is not a usual fixed regressor. The test is called RESET which stands for Regression Specification Error Test. Strictly speaking the null is that of correct specification, which is more general than simply the null of linearity. For this reason the test is a general mis-specification test which the name RESET also alludes too. To examine the gains of this more restrictive approach, I ask you to derive the number of parameters of this model for p=1 compared to a model where squares and interactions of the explanatory variables are added. In the above model, we only have the k usual parameters plus p=1 extra. So in total, k+1 parameters are to be estimated. Would we have included squares and cross-terms, we would get the usual k beta parameters, the k-1 squared terms, note the square of an intercept is simply the intercept, and a number of interactions. Now we turn to two tests that are both based on the idea that there is some possible break in the sample, with which the full sample can be split in two groups, one before and one after the break. In the first break test, we write a model for the first and a model for the second group. We write n1 for the number of observations in the first group, and n2 for the number of observations in the second group. In both groups, we have similar models, and the only difference is that the parameter beta changes from beta1 to beta2. These two models can be written in one framework using vector and matrix notation. We stack the y1 and y2 vectors, make a block structure of X1 and X2 to get a new larger X matrix, and also stack the beta and disturbance vectors. We can simply call the stacked y a new vector y, and the new X matrix some new X, such that linear regression can be used again to estimate beta1 and beta2. The idea of the Chow break test is that we test a restricted set-up, where beta1=beta2, against this unrestricted setup. Under the null, this is an F-test, which is given here. As usual, e denote residuals, and the subscript r stands for the residuals from the restricted model, and u for the residuals of the unrestricted model. The degrees of freedom are k, the number of imposed restrictions in the restricted model, and n-2k, which is the number of observations minus the total number of parameters in the unrestricted model. In this particular case it turns out that the unrestricted residuals can be split into two groups, the residuals from the first group and the residuals from the second group. In fact, the residuals from the first group are based on only data for the first group and similarly for the second group. With this result the unrestricted sum of squared residuals is simply the sum of squared residuals in the first group plus the sum of squared residuals in the second group. For notational convenience we write this as S1 plus S2. Now the F-test can be expressed as at the bottom of the slide here, where we've written S0 for the sum of squared residuals in the restricted model. The Chow break test assumes that only the parameter vector beta changes across the two samples, but the rest of the model structure remains the same. The second break test is a variant of the Chow break test and relaxes this assumption. The test equation is given here. Now, I invite you to test your familiarity with our notation and examine how many parameters there are to be estimated in the above specification. The sum runs over n2 elements. The dummy D(ji) is defined to be one is observation i is equal to j and 0 else. There is thus exactly one dummy for each of the n2 observations in group 2. In total there are the usual k parameters in the vector beta plus n2 gamma parameters. Because of these dummies, the fit in the second sample will be perfect. The residuals for all observations i in the second group are equal to 0 as any deviation of x(i)�beta from y(i) is already captured with gamma(i). The F-test for the joint significance of all the gammas then simplifies to the expression given at the bottom here. Compared to the Chow break test, the S2 term drops out as the second sum of squared residuals is equal to 0. The number of restrictions imposed in the restricted model is n2, as all the gammas are set equal to 0. The degrees of freedom in the denominator is equal to the total number of observations n, which is n1+n2 minus the total number of parameters in the unrestricted model, which is n2+k. Thus the denominator degrees of freedom is n1-k. If the test statistic is large, the second group of observations does not fit the pattern from the first group of observations well and we reject the null of constant module structure. The interpretation is that the test examines whether the relationship in the first sample can be used to forecast the relationship in the second sample, hence the name of the Chow forecast test. We always do our very best to specify a good model, but of course this is not always easy. We should always perform checks on the chosen model specification, for example, by studying the residuals. We often assume, for example, in the t and the F-tests that the disturbances are normally distributed. We can test the validity of this assumption by studying the distribution of the residuals. Ideally, this distribution should resemble the nice bell shaped curve of the normal distribution, which is symmetric and does not have thick tails. The test for normality is based on the third and fourth moments, which are skewness and kurtosis that were discussed in the building blocks. If the skewness and kurtosis of the residuals differ too much from those of the normal distribution, which are zero and three respectively, we reject the null that the disturbances are normally distributed. The Jarque-Bera test is based on this idea and given on this slide. If normality is rejected, further inspection of the model is typically required. Now, I invite you to make the training exercise to train yourself with the topics of this lecture. You can find this exercise on the website. This concludes our lecture on model evaluation.