Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

34 ratings

Johns Hopkins University

34 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Two Binomials

In this module we'll be covering some methods for looking at two binomials. This includes the odds ratio, relative risk and risk difference. We'll discussing mostly confidence intervals in this module and will develop the delta method, the tool used to create these confidence intervals. After you've watched the videos and tried the homework, take a crack at the quiz!

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay. So, for the relative risk, our PA hat worked out to be 11 over 20 which is. 55. PB hat worked out to be 5 over 20 which is. 25 so the relative risk is. 55 over. 25 for just, I always think its a good habit to write that we're comparing A over B. In the relative risk, just to remind ourselves what order we divide it in and we're set to be 2.2. quite a, quite a large difference quite a large indication of a difference, but is it actually statistically significant? Is it, is it something that would be of interest that, in the sense that it could be more than just a chance association? Okay. So, let's calculate the standard error of the log relative risk. Here, I plug into the formula, I get. 44. The interval for the log relative risk is then log 2.2, log of our relative risk plus or minus 1.96. The standard normal, 97. This quantile, times. 44. That gives us negative. 07 to 1.65. We're interested on the log scale in comparing wether or not this interval contains zero and then if we were to exponentiate it back, the interval for the relative risk is. 93 to 5.21. which again shows an indication that drug A has a greater propensity for side effects than drug B. But isn't exactly significant because this interval contains one and on the log scale the interval contains zero. Of course, because you know, log is a monotonic function if it contains 0 on the log scale, it will contain 1 on the natural scale and vice versa. So the, you know, whether you check for 0 on the log scale or 1 on the unlog scale will always yield the identical answer. Okay, let's go over the odds ratio. The odds ratio for A divided by B. Well, let's just do this cross product formula. 11 times 15 divided by 9, times 5. That gives us 3.67. The standard error then is square root one over the addition of one over the cell counts. That works out to be 0.68, so the interval for the log ulti ratio is log 3.67 plus or minus 1.96 times 0.68. That works out to be negative 0.4 to 2.64. The interval for the odds ratio is 0.96 to 14.01. Now so this is on the natural scale. Okay. And then, just to finish off our thinking about this problem consider the risk difference of As well, so the risk difference would be subtract, here you know, I like, I think it's a good idea to put that you're subtracting A minus B there. PA hat minus PB hat. That works out to be 0.30. The standard error of the risk difference is given in this formula here. It works out to be 0.15 and the interval is, is again given here.

in this, this issues with the risk difference formula as well. And we covered some of that before. And, and showed that you can, maybe, improve on it's performance a little bit by adding in adding one to every cell for example. and that was covered when we when we in the last lecture I believe and then the final thing I wanted to show were were two plots. just to discuss, just to finish some thoughts from the last lecture where we talked about Bayesian Analysis. So, if you recall. If you look back to the last lecture, what we did was, we postulated a prior for P 1 and P 2 that were independent beta priors. We found that if we did that then we got independent beta posteriors for P 1 and P 2. We saw that an inefficient way to explore the posterior was to do a simulation from it. And that would allow us to calculate things like the posterior mean, the posterior variance and so on. So, if you go back to that, to the lecture you'll, you'll hopefully be reminded of, of exactly what we did.

when you conduct these posterior simulations, you get a PA and a PB. P 1 and P 2. you get lots and lots of pairs of those things that represent draws from the posterior distribution. And it's convenient to do that because it, it's a, just a convenient numerical way to investigate the posterior. If you take the, the arithmetic mean of those posterior draws, right? you would get the posterior mean for PA, and the posterior mean for PA. You could get the posterior mean then, for the risk difference. The posterior mean for the odds ratio by taking every pair, PA and PB.

and, calculating the odds ratio and, and so on. Well, here, what I did is I calculated for every PAPB pair simulated from the posterior. I then simply plotted a histogram of them. And this is just a approximation of the posterior. Where the accuracy of the approximation only depends on how many Monte Carlo samples. I elected to to for the computer to generate. So, if I let the computer run for a really long time I get a near perfect representation of the post error. So, this in this case is the posterior for the risk ratio. So again, I took PA divided it by PB right and then for all of those pairs I I, I plotted a, a density estimate. And there's the density, this gives you a lot of information about where the evidence concerning the risk ratio the relative risk lies. Here I drew the blue lines for the 95% credible interval where there's 2.5%. Above either below the lower blue line and, and above the upper blue line and I put a reference line at one, at, ze, at 1 in this case. And as we saw when we calculated the [UNKNOWN] interval the lower end point is just below one.

so that, that if you were sort of interested in something like significance, you, you, you wouldn't, you wouldn't get significance. But, but I think, you know, the posterior displays a lot of information and, and, does a lot, gives you a lot more than just a confidence interval or the result of a hypothesis test. Or, just even worse given the asterisks on how significant the P value is, which some software does. okay. and then on the next slide what I'm showing here is the same calculation done for the odds ratio. So, this is just to give you a flavor of what this sort of output or the desired output from simple Bayesian Analysis would be. The posterior is the quantity that you would use to, to, investigate the the relative proportions either through the odds ratio or through the relative risk.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.