A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

256 ratings

Johns Hopkins University

256 ratings

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

From the lesson

Module 4B: Making Group Comparisons: The Hypothesis Testing Approach

Module 4B extends the hypothesis tests for two populations comparisons to "omnibus" tests for comparing means, proportions or incidence rates between more than two populations with one test

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Greetings.

This will be a very short section just to give

you an idea about how you might want to compute a

sample size to achieve a desired level of precision, or

a margin of error when we're looking at population comparison measures.

We'll actually deal with this more in the context of,

instead of looking for a specific margin error, designing the study,

which we'll show in the next section, to have what's called

a certain power, which is a function of margin of error,

but it's the more commonly used way, to

design studies where there's 2 or more group comparisons.

But this will just give us some insight as to what influences that process.

So upon completion of this lecture section, you

will have a general idea of how to

create a table relating sample sizes to precision,

for an estimate of a population comparison quantities.

And we'll talk about this in the context of differences,

not ratios.

We'll deal with ratios when we get to the next section, because

it'll be easier to handle in the context of looking at power.

So again in order to justify the funding

request for a larger study, or any study really.

A researcher needs to both demonstrate that the study allows

for the estimation of outcomes with a good margin of error.

And that the study can be performed given the requested budget.

So designing a study such that the results have a certain margin of error,

requires some speculation about what the study

results will be, before the study is done.

So it's a little bit of a tricky situation.

And again, where can this information come from?

It can come from, researcher knowledge.

[BLANK_AUDIO]

It can come from pilot studies.

It can come from other research done on

differing populations, or it can be guess work.

So let's just look at one example to talk about this idea of designing, a

study to achieve a certain margin of error,

based on the results from a preliminary study.

So here's a summary of data on charges by sex based on a

random sample of 500 Carotid Endarterectomy

procedures performed in the State of Maryland.

This was in 1995.

Suppose we were doing the study today, and we

would ultimately convert everything to today's dollars, but we wanted

to design a study that had a lower margin of error than

this study, to estimate the mean

difference in costs, between males and females.

So you may recall the estimated mean

difference in charges between males and females

who had this procedure was negative $473,

but the result was not statistically significant.

Ostensibly cause there's a lot of variability in the

individual charges in each of the 2 sex groups.

Let's just talk about what the standard of error, the margin of error is, that

we used before to estimate a confidence interval

for this difference, based on the study results,

but let's represent it more generically, as a function of our 2 group sizes.

So the margin of error is 2 standard errors, that's

what we add and subtract to get the confidence interval.

And in this case, it would be estimated by taking 2

times our formula, for the standard error of a difference in means.

And so the margin of error depends, on

the sample sizes through their presence in the denominator

for the standard error computation.

So we could go ahead and run the numbers for

different, potential study sizes that we might plan to do.

And I'll look at the situation with equal

sizes first, supposedly planned a sample separately from

males and females, such that we'd ultimately end

up with the same number in our study.

So we could estimate the margin of error for a given study, given its sample size.

And we could

use the estimates we got from this previous study, to start at least to

input for our standard deviation values, in our standard error formula.

So if we wanted to do a study with 1,000 persons

in each group, then we'd expect our margin of error to be.

Using the preliminary information from that other study, we

expect our margin of error to be 2 estimated standard

errors, where the standard error and the difference was

based on 1,000 people in each of the 2 groups,

and if you do the math on this, we get a margin of error of plus or minus $410.

So we'd be able to estimate this mean difference, within plus or minus $410.

Again if we were doing the study today, we'd want to convert this

to current dollars, as this is based on 1994 dollars, but we could

use the consumer price index or some other measure to do so.

Suppose we decide that this was still too much uncertainty.

We wanted more precision to be able to estimate

the mean difference for the average costs, because this would

help hospitals with planning the expected charges they would levy

on incoming patients, and help insurers, and public and private.

Figure out what the needs may be.

So if we maybe up the sample sizes in both groups to

1,800 for example, what would that do to our estimated margin of error?

Well, if we run the numbers through this formula, we reduce the estimated

margin of error by a little over $100 to plus or minus $306.

So our confidence interval from a study of this size

would be formed by taking our estimated mean difference and adding

approximately 3.

And subtracting, adding and subtracting $306, or approximately that

because we're just estimating based on the previous study results.

We may get slightly different study results for the study going forward, to,

but adding or subtracting plus or minus $306 to create the confidence interval.

We could also, instead of playing around

saying well what would happen if, both samples

were 1800, or what would happen if they were both 700?

We could actually postulate what we wanted our margin of error to be.

We might say we want to be able

to estimate the mean difference in average cost

between females and males within, plus or minus, say, $150, or plus or minus $230.

However we may choose it.

And if we did that, we could actually solve this

equation, and because we have equal numbers in

both groups, it's a relatively easy equation to solve.

We could solve for the number we need in

each group to get that desired margin of error.

So just to know that is possible.

And if we were doing this we could, we could easily construct a table for our

grand application like the following, where we might

list some possibilities for our margin of error.

Here's what we need for a larger error.

Plus or minus $200.

Plus or minus 250. Plus or minus 300, and then we might

use our pilot or original study results as the middle guess, if you will,

for the standard deviation of values in both groups.

And then we might consider a scenario, where the variability is slightly

lower, in both the 2 groups in a situation where it's higher.

And then we could fill in this table using the approaches I showed in the last sets

of slides, and we could give a range of

sample sizes necessary to, under certain assumptions about variation.

And,

for our desired margin of error, to help justify why

we're asking for so much money to do the study.

This approach could easily be modified to allow for unequal sample sizes as well.

The reason I'm not spending so much time on, in

this section on doing the computations because, is because again.

It, studies where we're comparing 2 or

more groups, are unusually not designed to have

a specific precision for the confidence interval.

Moreover, they're designed to have something

that's a function of the precision,

another characteristic called power that we'll talk about in the next section.

We could take a very similar approach,

though, if we wanted to estimate sample sizes

necessary to have a desired margin of

error for risk difference, or difference in proportions.

If we had estimates of what our study

outcomes would be, the resulting proportions in each

of the groups we're comparing, we could play

around with different sample sizes, either equal or unequal.

Depending on how we want to design our study, to see what the impact is on margin

of error.

For ratios, though, this process is a

bit more difficult because thinking about margin of

error requires you to translate precision on the

original scale, to precision on the log scale.

So this is one of the reasons that we actually deal with design studies

and sample size through another window on precision something called power.

And this is really

how we go about and design studies, to have

certain characteristics when we're comparing 2 or more populations.

Power is a function of the standard error.

And certainly smaller standard error and more precision will translate into

better results on the power scale. So these all things are all interrelated,

but, it's a little more intuitive and easier to think about the characteristic

of power when designing studies to compare 2 or more groups.

So that's what we'll tackle in lecture set 13.

[SOUND]

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.