0:12

In 1972, as part of a study on gender discrimination, 48 male bank

Â supervisors were each given the same personnel file, and asked to judge whether

Â the person should be promoted to a branch manager job that was described as routine.

Â The files were identical except that half of the supervisors had files showing

Â the person was male while the other half had files showing the person was female.

Â It was randomly determined which supervisors got male applications and

Â which got female applications.

Â Of the 48 files reviewed 35 were promoted.

Â The study is testing whether females are unfairly discriminated against.

Â Let's take a look at the data.

Â The percentage of males promoted is 21 out of 24, roughly 88%.

Â And the percentage of females promoted is 14 out of 24, roughly 58%.

Â So there's a considerable difference between the proportions of males and

Â females promoted in this study.

Â 1:10

There are two possible explanations as to what might be going on in this study.

Â And these are our two competing claims.

Â One, there is nothing going on.

Â Promotion and gender are independent.

Â There's no gender discrimination, and

Â the observed difference in proportions is simply due to chance.

Â This is our null hypothesis.

Â And two, there is something going on.

Â Promotion and gender are dependent on each other.

Â There is gender discrimination,

Â that observed difference in proportions is not due to chance.

Â This is the alternative hypothesis.

Â 2:08

If the data were likely to have occurred under the assumption that the null

Â hypothesis were true, then we would fail to reject the null hypothesis, and

Â state that the evidence is not sufficient to suggest that the defendant is guilty.

Â Note that when this happens, the jury returns with a verdict of not guilty.

Â The jury does not say the defendant is innocent,

Â just that there is not enough evidence to convict.

Â The defendant may in fact be innocent but the jury has no way of being sure.

Â Said statistically, we fail to reject the null hypothesis.

Â We never declare the null hypothesis to be true.

Â Because we do not know and cannot prove whether it's true or not.

Â Therefore, we also never say that we would accept the null hypothesis.

Â If the data were very unlikely to have occurred, then the evidence raises

Â more than a reasonable doubt in our minds about the null hypothesis, and hence

Â we reject the null hypothesis in favor of the alternative hypothesis of guilty.

Â In a trial, the burden of proof is on the prosecution.

Â In a hypothesis test, the burden of proof is on the unusual claim.

Â The null hypothesis is the ordinary state of affairs, the status quo.

Â So it's the alternative hypothesis that we must consider unusual, and for

Â which we must gather evidence.

Â 3:30

So to recap,

Â we start with a null hypothesis that represents that status quo.

Â We also have an alternative hypothesis that represents our research question,

Â in other words, what we're testing for.

Â We conduct a hypothesis test under the assumption that the null hypothesis is

Â true, either via simulation or using theoretical methods.

Â If the test results suggest that the data do not provide convincing evidence for

Â the alternative hypothesis, we stick with the null hypothesis.

Â If they do, then we reject the null hypothesis in favor of the alternative.

Â 4:05

So if you have a deck of playing cards handy,

Â you can actually conduct the simulation yourself with me.

Â Remember, the objective is to conduct a simulation under the assumption that

Â the null hypothesis is true.

Â In other words, assuming there is no gender discrimination.

Â And that differences in promotion rates that are observed,

Â are simply due to chance.

Â First, we're going to let a face card represent a not promoted, and

Â a non face card represent a promoted file.

Â We're going to first start with setting aside the jokers

Â 4:40

There are 52 cards in a deck, however, only 48 files in our experiment.

Â To simulate the experiment,

Â we need to remove some cards to hit a total sample size of 48.

Â We take cards out in such a way that if we let

Â a face card represent not promoted and a non-face card represent a promoted file.

Â The distribution of face and

Â non face cards match the distribution of the promoted and not promoted files.

Â So, we're also going to take out three aces.

Â 5:46

The same number, same number as the observations in our study.

Â Number cards represent files that were promoted, and there are 35 of them.

Â And face cards represent files that were not promoted, and there are 13 of those.

Â Then, we shuffle the cards and

Â deal them into two groups of size 24, representing males and females.

Â Note that random shuffling is what simulates this idea of

Â leaving things up to chance.

Â 6:38

Let's go through the results of my simulation together.

Â If you have been following along with your own deck of cards,

Â you might have different results than mine since the shuffling and

Â splitting into two piles was done completely randomly.

Â Since we're randomly splitting the promoted files into two groups, we would

Â expect to see no difference between the proportions of male and female promotions.

Â In other words, the proportions of number cards in the male and female piles.

Â That being said, the observed value may not exactly be zero.

Â In this case, we had 18 number cards in the male pile,

Â which yields a 75% promotion rate among the males.

Â And there are 17 number cards in the female pile.

Â Yielding a 70.8% promotion rate.

Â The difference between the simulated promotion rates is what we want to

Â keep track of.

Â We expect this number to be zero, but we also expect it to vary, and

Â we want to know how much it varies so that we can compare our original difference of

Â 30% to the distribution of differences simulated under the assumption of

Â independence between promotion decisions and gender.

Â In this case, we calculated the difference of 4.2%.

Â So, we note that, before we proceed to the next simulation.

Â 8:28

It doesn't really matter which one you're calling male versus female.

Â So let's just say this is our male pile, and this is our female pile.

Â The next step is going to be to determine how many files were promoted in each pile.

Â Which means we need to count the number of number cards in each pile.

Â Among the males, I'm counting one, two, three,

Â four, five, six, seven, eight,

Â nine, ten, 11, 12, 13, 14, 15, 17.

Â So we have 17 out of 24 males promoted.

Â Which should leave about 18 out of 24 females promoted.

Â In the next step we need to calculate the proportions and take the difference and

Â note that on our dot plot.

Â And we would repeat this many, many times to build a simulation distribution.

Â So how do we ultimately make a decision?

Â If the results from the simulations look like the data,

Â then we decide that the difference between the proportions of promoted files,

Â between males and females, was due to chance.

Â And that promotion and gender are independent.

Â If, on the other hand, the results from the simulations do not look like the data,

Â then we decide that the observed difference in the promotion rates

Â was unlikely to have happened just by chance, and

Â that it can be attributed to an actual effect of gender.

Â In other words, we conclude that these data provide evidence of

Â a dependency between promotion decisions, and gender.

Â If we repeat the simulation many times, and record the simulated differences in

Â proportions of males and females promoted, we can build a distribution like this one.

Â For example, here we have a dot plot of the distribution of

Â the simulated differences, and promotion rates based on a hundred simulations.

Â While we showed earlier how to simulate this experiment using playing cards,

Â we should note that the task of the simulation is best left up to computation.

Â It's faster and less prone to errors.

Â The distribution is centered at zero which we can also think about as the null value,

Â since according to the null hypothesis,

Â there should be no difference between the proportion rates of males and females.

Â Yielding a difference of zero.

Â We can see from the distribution of the simulated differences in promotion rates,

Â that it is very rare to get a difference as high as 30%,

Â the observed difference from the original data.

Â If in fact gender does not play a part in promotion decisions.

Â The low likelihood of this event, or a difference even more extreme,

Â suggests that promotion decisions may not be independent of gender, and so

Â we would reject the null hypothesis.

Â Our conclusion is then that these data show convincing evidence of an association

Â between gender and promotion decisions made by male bank supervisors.

Â 11:33

Then we simulated the experiment.

Â Assuming that the null hypothesis were true, we evaluated the probability of

Â observing an outcome at least as extreme as the one observed in the original data.

Â And since this probability was low,

Â we decided to reject the null hypothesis in favor of the alternative.

Â The probability of observing data,

Â at least as extreme as the one observed in the original study,

Â under the assumption that the null hypothesis is true, is called the p-value.

Â One of the commonly used criteria for

Â making decisions between competing hypotheses.

Â We will continue our discussion on p-values and

Â hypothesis tests in future units as well and learn various methods for

Â conducting hypothesis tests for various types of data.

Â