[SOUND] Welcome. In this lecture, you will learn more about statistical testing. You will see how to use observations to test a hypothesis on a parameter. So what do hypothesis, critical value and p-value mean? This lecture discusses these topics. We finished lecture S1 with estimates for the mean and variance of stock market returns. We assumed an IID normal distribution for a set of 26 yearly returns on the stock market and calculated a sample mean of 9.6% and sample standard deviation of 17.9%. Suppose that you consider investing in the stock market. You then expect to earn a return equal to mu percent every year. Of course, you hope to make a profit. However, a friend claims that the expected return on the stock market is 0. Perhaps your friend is right. How can you use a statistical test to evaluate this claim? A statistical hypothesis is an assertion about one or more parameters of the distribution of a random variable. Examples are that the mean mu is equal to 0, that it is nonnegative or larger than 5%, or that the standard deviation sigma is between 5 and 15%. We want to test one hypothesis, the null hypothesis against another one, the alternative hypothesis. We denote the null hypothesis by H0 and the alternative by H1. So H0 can be mu = 0 and H1, mu is unequal to 0. In this lecture, I focus on null hypotheses that are equality restrictions though the methodology applies more generally. A statistical test uses the observations to determine the statistical support for a hypothesis. It needs a test statistic t which is a function of the vector of observations y and a critical region C. If the value of the test statistic falls in the critical region, we reject the null hypothesis in favor of the alternative, if not we say that we do not reject the null hypothesis. Note that we do not say that we accept the null hypothesis. Suppose that we want to test the null hypothesis that mu is equal to 0, against the alternative that it is unequal to 0, with the variance sigma-squared known. For a test statistic we use the sample mean. We define a critical region as the range below minus c and beyond c with c a positive constant. Small c is called the critical value. If the sample mean falls below minus c or beyond c, we reject the null hypothesis. The sample mean is then too far away from 0 for the null hypothesis to be true. If it falls between minus c and c, we do not reject it. The result of the statistical test is always one of the four possibilities as you can see on the slide. The null hypothesis being true or false is in the columns and the test outcome is on the rows. If H null is false and the test rejects it, we call the outcome a true positive. If H null is true and the test does not reject it, we call it a true negative. If H null is true but a test rejects it, the outcome is a false positive or a type I error. If H null is false but a test does not reject it, the outcome is a false negative or type II error. The probability of a type I error, so the probability to reject while the null hypothesis is true is called the size of the test or the significance level. The probability to reject while the null is false is called the power of the test. We prefer tests with small size and large power. A smaller critical region means that we need larger deviations from the null hypothesis for a rejection. So the significance level decreases. However, this also means that the power of the test goes down. So in determining the critical region, we have to make a trade-off between size and power. Let's apply this to our previous example with n observations. You test whether mu is equal to 0 based on the sample mean. You know from building block S1, that the sample mean follows a normal distribution with mean mu and variance sigma-squared divided by n. The critical region is below minus c and beyond c. I ask you to calculate the size of the test. You can see the answer on the slide. You need to calculate the probability of rejection, conditional on the null hypothesis being true. This is the probability that m is at most minus c plus the probability that m is at least c conditional on mu being 0. We transform m to a standard normal distribution and substitute that mu equals 0. We use that a normal distribution is symmetric about the mean to find the result. Calculation of power for a given value of mu unequal to 0 goes similarly. Three points deserve some more attention here. First in the calculations we move to the standard normal distribution. Mostly, we formulate test statistics such that they have a standard distribution that does not depend on unknown parameters. For example, we test with the standardized mean. Second, we call the test of the example two-sided. The null hypothesis is rejected for very negative or very positive values of m, so large deviations on the left hand or on the right hand side of the distribution lead to rejection. The test mu equal to 0 against the alternative of mu being positive has a critical region beyond c. This test is one sided. Third, we mostly specify the size of the test first and then determine the corresponding critical value c. Typical values for the size are 1, 5 and 10%. It's also possible to calculate the so called p-value of the value of the test statistic. The p-value is the minimum size for which the value of the statistic leads to rejection. In the test of mu equal to 0 against mu unequal to 0, the standardized mean equals 2.1. If you fix the standardized c at 2.1, and use the distribution under the null hypothesis the size equals 3.6%. Many software packages can compute these probabilities. Now, a question for you. For all sizes larger than 3.6%, you can reject H0, whereas you do not reject for all sizes smaller than 3.6%. We can also say that the mean is significantly different from 0 at the 5% level but not at the 1% level. In the example so far, the variance sigma-squared was given. In reality, this is mostly not the case. It means that we cannot calculate the standardized mean because we do not know the value of sigma-squared. Instead, we use the unbiased estimator s-squared. The resulting statistic t is called the t-statistic. The t-statistic does not follow a normal distribution because s in the denominator is a random variable. Instead it follows a t-distribution. I show on the slides, that the t-statistic is the ratio of a standard normal random variable and the square root of a chi-square random variable divided by its degrees of freedom n - 1. Moreover, these two variables are independent. As discussed in building block B2, this ratio follows a t-distribution with n - 1 degrees of freedom. Let's consider the example now with the variance unknown. Suppose that the t-statistic equals 2.1 and is based on 15 observations. The t-statistic follows the students t distribution with 14 degrees of freedom. The p-value based on this distribution equals 5.4%. So the null hypothesis, mu = 0, is rejected at a significance level of 10%, but not at 5 or 1%. Under the assumption that the variance is known, the p-value was 3.6%. If the variance has to be estimated, the uncertainty is larger and this leads to larger p-values. When the uncertainty is larger, we need larger critical values for a fixed size to reject the null hypothesis. We can also test on the variance. Suppose that we want to test that the true variance is equal to a specific value sigma-squared_0 versus the variance being larger. As a test statistic we consider the standardized variance estimator. This statistic does not contain any unknown parameters and has a chi-square distribution with n - 1 degrees of freedom. So we use this distribution to calculate the size and p-value of the test. To test whether the variances of two independent distributions are equal, we can also use the variance estimators. We use from building block P2, that the ratio of two chi-square distributed random variables divided by their degrees of freedom follows a F-distribution. The test statistic is formed by the ratio of the estimated variances divided by the true variances. Under the null hypothesis that the true variances are equal, these cancel from the test statistic, which becomes the ratio of the two estimators. It follows an F-distribution with n_2 - 1 and n_1 - 1 degrees of freedom. Let's finish with the stock market example. The estimated mean and standard deviation were 9.6 and 17.9%. The t statistic for the mean equal to 0 equals 2.75. The one-sided p-value = 0.54%. So for all significance levels beyond 0.54% we reject the null hypothesis in favor of the mean being positive. The standard deviation of the stock market return is a measure for the risk of investing in the stock market. Suppose you want to limit your risk measured by the standard deviation to 25%. You test H0 that the standard deviation is equal to 25% against the alternative that it is smaller. How would you decide? You can see the answer on the slide. The test statistic has a value of 12.74, which falls inside the critical region from 0 to 14.61. So we reject that the variance equals 25%. The p-value for a test equals 2.1%. With this question, we finish our building block on testing. I invite you to make the training exercise, to train yourself with the topics of this lecture. You can find it on the website.