[SOUND] A study presented at the American Academy of Neurology states that more than 40% of retired National Football League players had signs of traumatic brain injury based on sensitive MRI scan. So when you read a study like this we see the researchers are making a claim about a population proportion based on a sample. If we were to challenge this finding then we will be conducting a hypothesis testing about the population proportion. The steps taken will not be any different than what we have seen when we are doing the hypothesis testing about the mean. There are few differences in notation as well as how to test statistic and that is what we will learn in this lesson. So let's start with the first step which is stating the null and the alternate hypothesis. Using the claim made about NFL players, let p represent the proportion of players who had signs of traumatic brain injury. Looking at the current belief, which is more than 40% have this problem, then the correct notation would be p is greater than 0.40. But I can't make this the null hypothesis. Why not? Do you remember? We have to have the equal sign in the null hypothesis. When our belief, when our current belief, doesn't allow for that, then it becomes the alternate hypothesis. And the null will be it's compliment, which is p is less than or equal to 0.40. These hypothesis lead to a one sided test. There are a lot of legal implications of such studies and studies like these will not go unchallenged. But challenging it means, taking yet another sample and possibly different tests in order to reject the findings. Every month, the Bureau of Statistic releases many reports. One such report is our nation's unemployment rate. For the month of March 2016, the average unemployment rate is reported as 5%. Some economists disagree with how the government reports this number. And they think it's actually worse. Meaning that it's higher than what is being reported. Now, let p represent the proportion of unemployed, and government is saying for March it was 5%. The challenge is that the proportion of unemployment is higher. So it's easier to write the alternate hypothesis first, which is p is greater than 5%, and then the null. Once again, these hypothesis lead to a one sided test. You may find that one tail test are a lot more common than two tailed test, but let me show you an example for a two tailed test. Dextrose is a prescription medication which is used to increase a persons blood sugar. One form of the medication is a 30% dextrose injection. Imagine if you were in charge of quality control for this. What would you like to see if you took a sample and measured the concentration of its active ingredients? Let p represent the percentage of the active ingredient in each vial. In this case the label says 30%. You want it to be 30%, right? No more, no less. Anything different can result in disastrous consequences. So the null hypothesis should show what is in the vial matches the label. So p is equal to 30%. And the samples are taken to test this hypothesis, which means, alternate is, p is not 30%. This will be a case of a two tail test. Now consider the following example. An HR director is concerned about her aging workforce and what will happen to the company if these employees decide to retire at about the same time. The manager believes that at least 18% of the employees are at or above the retirement age. To check the validity of this belief, a sample of 500 employees was examined, which showed the proportion to be 15%. Testing at the 5% level of significance, is the manager right? Now, let's help this manager analyze the data. Begin with stating the hypothesis. The manager believes that at least 18% are at or near retirement age. This will translate to null hypothesis of p is greater than or equal to 18%, again the belief is that at least 18% are near or at retirement age. And the alternate is p is less than 18%, testing at 5% level of significance, means that alpha is 0.05. Now, we are ready to calculate the p-value for our sample, which tells us what is the chance of getting a sample that differs from the hypothesized proportion by as much as this one if the null hypothesis is true. Based on sample size of 500 and the sample proportion, p hat of 15%. And assuming null is true, the sampling distribution of sample proportion is approximately normal, centered at p of 0.18, 18%. This is the hypothesized proportion, which is denoted as p0. And standard error, of these sample proportions again, this is if we sampled over and over again the distribution of sample proportions will have a spread which is its standard error. Standard error is calculated based on this equation which we learned in the previous course when we learned about central limit theorem. Substituting the values for the notations 18% is the p0 which is the hypothesized proportion. We get the value of 0.0172. Now we need to find the z value, which represent how many standard errors is this samples proportion from the hypothesized proportion of 18%. For testing, the population proportion the test statistics. z is calculated as follows. Substituting the values in the notation, sample proportion p hat is 0.15. The hypothesized proportion p0 is 18% and the denominator is the standard error we just calculated in the last slide to be 0.0172. And this gives us the z-value of -1.744. So, our sample proportion follows 1.744 standard errors away on the left side from the 0.18 hypothesized proportion. What are the chances of finding a sample like we did? The answer to this will be the p-value. To find the p-value, we will use NORM.S.DIST function in Excel, which has two arguments, z and cumulative. We just calculated z to be -1.744. And cumulative is true for us, and we enter it as 1. And we get the value p of 0.041. Now that we have completed steps 1 through 3 we are ready to make a decision. And since the p-value is less than alpha we reject the null hypothesis based on our sample study we can't support the manager's belief that at least 18% of the employees are at or near retirement age, now let's practice together. A drug company claims that only 2% of patients suffer any major side effects by taking one of their drugs. You are asked to check the validity of this claim at 1% level of significance. You start by serving 1,000 patients and find that 26 have reported major side effects. What is your conclusion? Start by stating the hypothesis and the value of alpha. We start by defining p as the proportion of patients who had some side effects based on what you see the claim is that p is 2%. But they are not just interested in saying that it's not. We are interested in knowing if the drug company is low balling this figure. In another word, we are investigating this to see if the proportion is higher than 2%, which means the alternate to the claim is that p is greater than 2%. Thus the null is, p is less than or equal to 2% and alpha is 0.01. So now let's move on to finding the p-value. For our study we had 26 out of 1,000 who reported some major side effects and that means our sample proportion is 2.6%. Now we can calculate z value and that is 1.36. Since our z value is positive, Excel returns the probability as 0.913. Again, this is the area being returned. We are interested in the area in the tail. That is the p-value, probability of finding a sample like ours. That many or more standard errors away from the center of the distribution. And in this case, that will be 1- 0.913 or 0.087. So now that you have this you can make a decision. What is your conclusion? Reject or not reject the null hypothesis? Since p is larger than 0.01 we will not reject the null hypothesis. So we retain the belief that less than 2% of people taking this medication will suffer major side effects. Before we end this topic I want to make you aware of a few things. Statistical significance does not mean that you have made an important or meaningful discovery. This is setting the probability of type one error, rejecting a null hypothesis which is actually true. Take this value, keeping both errors in mind. Remember, reducing alpha will increase the probability of type two error, not rejecting a null hypothesis when it's false. Not rejecting null hypothesis does not mean it is true. Remember last Armstrong, where there were suspicion around him using drugs it took more than a decade to actually reject the null hypothesis. In his case, the assumption that he was innocent and not using anything. The reason scientists were not able to reject this innocent hypothesis was because they didn't have tests that were good enough to detect the illegal performance enhancers which she was using. Size of the sample affects the p-value of a test. With enough data, a trivial difference from the null hypothesis leads to statistically significant outcome, or the other way around. Which means not all statistical significance have substantive importance. If you really get to understand what is going on when we do hypothesis testing, then you can design your questions and sampling better. Which will yield more meaningful results and understanding the principles of hypothesis testing also means that you would also be able to legitimately question when studies you are looking at, have failed to do these tests correctly, and thus, create erroneous insights. The more you learn on which factors influences the strengths of a statistical methods used, the more likely that you have investigated the right questions and arrived at the right solution. [SOUND]