A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 2: Regression Methods

83 ratings

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

From the lesson

Module 2B: Effect Modification (Interaction

Effect modification (Interaction), unlike confounding, is a phenomenon of "nature" and cannot be controlled by study design choice. However, it can be investigated in a manner similar to that of confounding. This set of lectures will define and give examples of effect modification, and compare and contrast it with confounding.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in this next lecture set, Lecture 5, we're going to talk

about a phenomenon called effect modification, or interaction.

Which like confounding, involves interpreting an outcome exposure, or

two variable relationship, with regards to another variable behind the scenes.

But, the type of effect on this relationship that we are looking for

is different than when we were dealing with confounding.

So in this lecture set, we will define this concept of effect modification,

compare and contract it to confounding, and give multiple examples of how to

investigate effect modification if it's part of the research process.

And what conclusions to make when you look at the results of the investigation.

So in this first section, we're going to start with the introduction to effect

modification and give two examples.

In section B, we'll follow up with more examples.

And we'll reconnect in section C to reaffirm our

discussion about the differences between confounding and effect modification.

So hopefully, by the end of this lecture section,

you'll have, you'll be able to define the concept of effect modification, and

explain how it is different than confounding.

And, you'll also gain exposure to two examples of

investigating effect modification as part of a research question.

So we've talked in lecture 4 about confounding, and the idea of

confounding is the nature of association can differ substantially in magnitude.

And even reverse direction sometimes when we look at

it on the whole versus adjusting it for the potential influence by

other factors that are related to both things we're associating.

So this can occur, we know, confounding can occur when the association,

we generally say an exposure x and an outcome y,

is confounded by another variable or set of variables.

In that case the overall association we look at ignoring those other

variables versus the overall association we get after adjusting for

those other confounding variables will differ.

In fact modification has to do not with the relationship between a third variable

and it's relationship to the two variables we're into, interested in associating.

But it's relationship to the relationship between those two variables.

So, if that modification occurs when the nature of the association can differ and

sometimes substantially.

A different sub-populations from an overall population.

So for example, an association can differ for males and females.

A drug could work for females and have harmful effects for males.

A association between blood pressure and physical activity may differ by age group.

So when this occurs, when association between an exposure x and

an outcome y depends on at least one other variable.

Even if this other variable is not directly related to either x or

y then we have a phenomena called effect modification.

So just diagrammatically this occurs when the relationship between y and

x depends of the value of a third variable z.

So notice this picture I've drawn, I've drawn an arrow between Z and

a line connecting X and Y to indicate that Z effects the association between Y and

X if it's an effect modifier.

If we ignore this effect modificier, fier to look for it we,

this can result in estimating one overall outcome exposure relationship either

unadjusted or adjusted.

Comparing everyone in one number when we

really need to estimate separate outcome exposure relationships for

different values of our Effect Modified to tell the story of the phenomena.

So some examples, of results, for Effect Modifications.

Suppose we're looking at, comorbidities from a disease and,

surgery works much better.

On reducing these than drug therapy among younger patients, but among

older patients surgery is notable worse than drug therapy for older patients.

So there's not one clear overall association between the outcome and

the type of treatment people get.

We wouldn't want to estimate one overall association between surgery compared to

drug therapy for everyone together.

If we saw evidence in our data that the relationship between the outcome of

surgery was different for younger patients than it was for older patients.

Another example came into light recently when we were gearing up for

a potential outbreak of the swine flu, and vaccines were being tested and prepared.

And it was found that the vaccine worked provided immunity for those who got it.

But in order to get the same immunity women needed a larger dose.

About twice the amount of the same vaccine that men did to get the same effect.

So if we had ignored that we may have ended up.

Under dosing women for example.

If we gave the same doses to everyone and

that would effect their immunity versus that of males.

So just in the Venn diagram here's my attempt at picture of effect modification.

Here Z in fact impacts for each level of Z.

This potential effect modifier, we need to draw a separate picture of

the association between Y and X because it depends on this other variable, Z.

So if Z is two levels per binary it might be males or females.

If Z was ethnicity there might be three or four different potential groups that we,

and we'd have to compare the association between Y and X across those four.

To see if at least some of the associations were different than others,

if we were looking for whether the association between Y and

X was modified by ethnicity.

So what's the solution for effect modification?

Well, it's actually not a problem per se.

But in order to assess whether there is effect modification,

one must look at Y/X associations.

Separately for each level of the potential effect modifier and

decide whether at least some of the associations differ.

So this is can be done by strategy analysis, separating our

data into different subgroups based on levels of the variable Z.

And analyzing the YZ association separately and comparing them somehow.

Or we'll see when we get into the realm of multiple regression we can do

this systematically within the context of models we're using analyze data.

Effect modification is a, is a function of nature or circumstance or setting.

It's not a function of study design per se and

as such cannot be controlled by study design.

Whereas with confounding, we have ways to minimize the potential for confounding.

Randomization is the best way.

And there's other methods, as well.

However, we cannot do that.

We, we cannot eliminate the potential for effect modification by anything we

do in the studies on randomization while taking care of that.

And we'll show an example of what I mean shortly.

However if you wanted to be able to design a study with enough power and precision to

estimate separate Y/X relationships with reasonable precision for

differing levels of Z as part of your main research ideas.

You may want to design the study to be able to estimate these with some

desired precision.

So for example, if you wanted to compare the association between

pulmonary functioning and exercise level separately for smokers and nonsmokers.

You may want to design a study and sample in a way that you get enough smokers and

enough nonsmokers from your overall population of interest to be able to

detect a difference in those associations at some level.

So just a few things confounding is not necessary for

effect modification to occur, effect modification is not necessary for

confounding to occur these do not depend on each other.

Some studies will yield outcome exposure relationships that

are neither confounded or modified.

Some will, studies,

some studies will yield Y X relationships that are confounded but not modified.

some that are not founded but are modified, and

sometimes they'll occur together.

But we'll see, with some examples, that if this is the case interaction or

effect modification trumps confounding.

In other words, we won't want to report one overall association between

the outcome and exposure adjusted for other variables.

We'll want to split that adjusted association out for

other levels of our effect modifier.

So to get the ball rolling with the first example lets look at

the results from the PBC trial we looked at before in staff reasoning one and two.

So this is the famous data set from the Mayo clinic where we had 312 patients with

primary biliary cirrhosis.

The liver they were randomized to either receive the drug DPCA or a placebo.

And the main research question of interest was is, if there was a difference

in the risk of death during the follow-up period for the two treatment groups.

So the, the study was designed to detect and see if there was a.

The difference in or,

in fact reduced mortality for those who receive the drug versus the placebo.

And the treatment assignment was randomized so what does this mean about

the potential for example for confounding by patient sex?

Well we'll explore this in detail in the next slide but hopefully your knee

jerk reaction is that should minimize the potential for confounding.

What does that mean about the possibility of an interaction between treatment and

se, and sex?

Or effect modification by sex?

Well, like I said, study design has no impact on the, reducing or

increasing the potential for effect modification.

So the fact that this is randomized doesn't tell us anything about that.

So we're going to have to, if we want to investigate it,

we're going to have to do so, despite the fact that this was a randomized study.

So, let's just go for the overall analysis.

We saw that the overall analysis was such that the relative risk of death for

patients on DPCA compared to patients on placebo in the study was 1.06.

Patients on the drug who had a slightly elevated mortality.

But there was, after accounting for sampling variability, there was no

clear conclusion that mortality was increased or decreased by the drug.

So the end result, and this was a,

a reasonably powered study to detect an interesting reduction in mortality.

And the result was that there was no association found between

mortality and the drug.

So could this death and treatment association been confounded by sex?

Well, the necessary conditions for that are the outcome of death

and the predictor of treatment would have to both be related to sex.

So, so it's very possible that sex was related to death.

Maybe males were more likely to die with this disease than females or vice versa.

And there's nothing we can do to change that association.

But because, because the treatment was randomized that

should even out the sex distribution between the two groups.

And there shouldn't be any differential in the sex distribution between the DPCA and

placebo groups.

So these are the necessary conditions but hopefully with randomization

by study design we would remove this link here.

Remove the association between.

The exposure and this third variable, sex.

So let's just go over this.

The crude relative risk of death for patients was, we said, was 1.06.

It was not statistically significant.

This was based on the, comparing the overall proportion who died in

the DPCA group to the proportion dying in the placebo group.

So if we went and compared everybody in the DPCA group, whether they died or

not, to everybody in the placebo group, whether they died or

not, to get the relative risk.

Or incidence rate ratio this comparison should not be impacted or distorted or,

or negated by a differing distribution of males and females in the two groups.

And, in fact, in the study,

the majority of enrollees were female, about 70% versus 30% male, and

that was similar distribution in the DPCA and placebo groups.

So that overall association should not be distorted by a differing distribution.

And, in fact, when, if you were to compute the sex-adjusted relative risk,

it was almost equivalent.

To the unadjusted relative risk or

incidence rate ratio, with the, almost exact confidence interval.

So thus far, as we expect, DPCA was not associated with better or worse

patient survival, and as we expect this was not confounded by sex distribution.

And hopefully it was not confounded by any other

variable because of the randomization process.

However, as a researcher I might be curious.

Well, maybe I'm missing something if I analyze every one together,

even if I adjust it and I didn't see any change.

There was no confounding.

That doesn't necessarily mean that the overall relationship between mortality and

DPCA is the same for males and females.

So I might be thinking potentially could the treatment effect mortality differently

for men and women.

And this question is really about a possible interaction between sex and

treatment or in or language of effect modification sex.

Modifying the effect of treatment.

Well, at this point, we can't answer this question because we only have estimates of

one overall association for everyone, unadjusted and adjusted for sex.

But that adjusted comparison compares eh,

it holds for the comparison of males to males and females to females.

And it's the same number, so

long that the sex we're comparing is the same in the two groups.

So what we're going to have to do if we want to investigate whether or

not the relationship between depth.

And, and the treatment is modified by sexes.

We're going to actually have to look at separate associations between

mortality and treatment.

Though if we look at the incidence rate ratio,

I short handed here by calling it a relative risk.

But I'll put an I here to indicate instance rate ratios since this did

involve differential follow up time for females.

The incidence rate ratio is one.

No difference between the risk and

the sample for the drug and the DPCA group.

And the results is not statistically signficant.

For males, slighty higher association in the sample, but

if you look at the confidence intervals between that and females.

There's a lot of overlap.

So the potential tape simply by looking at

the estimates is that there was no impact of the drug on mortality.

And for males actually the drug increased mortality.

So even though at the estimation level it might look like there'

some difference in terms of the conclusions we did make about the drug.

That wouldn't change in that it's not a thing to give to these patients.

Shrink but

when you look at the uncertainty in these two estimates there's a lot of overlap.

So if we were to actually and we'll see how to do this in Regression.

But we could formally test, whether these estimated associations are different and

if we did so we would not find a difference.

So, I'm going to make a decision here and say we've investigated, but

we found no evidence of effect modification by sex.

So we weren't missing anything when we

compared all males and females together in our overall.

Association estimates by before.

But we want to have been able to answer the question of whether, or

was effect modification, unless we looked at the separate estimates by sex.

So this just summarizes what we saw throughout the different levels of

the analysis.

So if we just stopped here.

We couldn't assess, if we only estimated the crude association we

couldn't assess either confounding or effect modification.

If we went and

adjusted and looked at the adjusted estimate we could assess confounding.

But if we just stop there we'd have no insight into effect modification by sex.

And it's only if we took it to here and looked at separate outcome

exposure relationships separately by sex that we could answer that question.

About a differential association between mortality and treatment by sex.

So again, conclusions based on what we

saw here are essentially no effect to modification and no confounding,

no confounding by sex because this trial was randomized, so we expected that.

And then upon further no effect modification.

Let's look at another example just to keep this ball rolling think about this.

And we'll do, like I say, several more examples next section.

But here's an interesting data set.

Its a small data set.

Which when, you know, will give us power issues.

But, it's kind of fun to look at and explore nevertheless.

This is data on elevation and

percentage of dead or badly damaged trees, from 64 Appalachian sites in US,

the Appalachian region in the US runs along the East Coast from North to South.

And this study was not designed to look at different associations by region,

North or South, as only eight of the 64 sites are in Southern sites.

And the study design is observational because our outcome of interest.

Is the percentage of dead or badly damaged trees by characteristics of the sites, and

we can't randomize trees to sites.

So one of the things they measured.

As part of the data set with the elevation of the site.

And here's an overall scatterplot.

This outcome was not a binary outcome.

For each of, the unit of observation here was a site.

And on the site they actually documented all the trees and

took the percentage on the site that were dead or badly damaged.

So, as measured by site.

This is a percentage that runs potentially from zero to 100.

So here's what we can see in this picture.

This kind of a tough looking scatter plot to figure out what's going on.

Many of us may see different things the more we look at it.

So the first thing we might do just because we've got a continuous outcome and

a continuous predictor.

We may consider just estimating overall regression association to

get the unadjusted association.

Between tree damage and elevation and if we do and

you probably may have envisioned this given the spread of the points.

But the slope comes in slightly positive and not statistically significant.

So this estimates that there's a .01% increase.

But it's not statistically significant in the percent of trees damaged per

meter increase in elevation.

So actually that's not as small of an effect as it

sounds because it's per meter, but it's not statistically significant.

However if you actually one of the things you can do with the computer is we

have another variable stored in this data.

Which is the region it's from, and if you actually label each of these points.

With the region from which the points come from, it,

it's actually kind of interesting.

What do you notice, the, the, this cluster of points over here and

the majority are from northern sites.

This is sort of what's going on in northern sites, and

the majority of the southern sites.

I'm not circling them all perfectly, are down here.

That's a better way to circle it.

So let's think about this.

The southern region trees tend to be at higher elevation than the northern.

They also tend to have less damage.

If you go back and look at that scatter plot.

So it actually looked like I should actually say damage was not related to

elevation when we first accessed it.

But damage region is related to damage and region is related to elevation.

So the, the tools are in place here, if we have damage and elevation and region.

The tools are in place for

confounding because region is related to both these things.

So let's, let's first consider this as an issue.

So if we go back and we actually estimate an adjusted slope, adjusted for regions.

We estimate a common association between damage and elevation for the northern and

southern regions.

But as we'll see when we get into multiple regression,

these have different intercepts.

Such that the vertical distance between these lines represents the overall

distance in any elevation for damage in northern sites, and

damage in southern sites.

But these lines are parallel and what do you notice about these?

Well, they're both now increasing.

This suggests once we adjust for

the regional differences in damage and elevation.

We get an overall positive statistically

significant association between damage and elevation.

So it may look upon first pass, and certainly this gets at

the picture better than fitting this overall crude association here.

But it may look like our overall.

Association had been confounded, by region and

it certainly happen because that fact that southern regions had higher elevation.

And were at, had lower damage was, was pulling down the overall association.

And when we adjusted for that we got, A's more, a,

a positive association in both the regions.

But the question I have, or the question I might want to investigate is well this,

this looks like a reasonable fit to these points labeled North.

But down here I'm,

I'm a little wondering if we should be fitting a positive association.

Admittedly we only have a few points but whether we should be fitting a positive

association between percent trees damaged in elevation for southern sites as well.

So I want to take this one step further.

We could stop here, and say there was evidence of confounding and

when we adjusted we found evidence of a positive association.

As opposed to the null association we found ignoring region, but I'm curious.

I want to go ahead and estimate separate relationships between

percent of trees damaged and elevation by region and see what we get.

And if we do that, the slope for northern sites turns out to be more positive

in magnitude than it was when we combined both sites in the adjusted estimate.

And the estimate slope for southern sites, between damage and

elevation, turns out to be negative.

And so I'll put the numbers on top here.

So here's the unadjusted slope.

Again, we saw was .01 it was not statistically significant,

if we estimate separate slopes for northern sites versus southern.

The slope for northern site is positive .09 and statically significant.

The slope for the southern sites is, and unfortunately

this came with it's negative 0.02 and it's not statistically significant.

But keep in mind, we only had 8 observations in southern sites.

So what do you think about this?

Well if you look at the confidence intervals for these, they don't overlap.

And what I'm getting the vibe here is I'm not sure about this null finding here

because there's only 8 sites.

This appears to be a real positive finding for Northern sites.

When we unadjust it before, what the adjusted association was was

a weighted average of this estimate for the north and this estimate for the south.

And we adjust it as though it was dominated by the estimate for

the north since there were so many points.

So, just to recap, let's talk about what we've got here.

When we look at the overall association of damage and

elevation, with 95% confidence interval, it was positive but

relatively small magnitude and not statistically significant.

When we adjusted,

we saw the increase in the magnitude that became statistically significant.

So if we just stopped here compare the unadjusted and adjusted we'd say there

was confounding, and that would be a reasonable response to this.

However, if we took it one step further and said wait a minute I'm not sure I

feel comfortable making the same, th, this compares.

The average difference in percent damaged tree per meter of elevation for

two groups of sites in the same region whether they be north or south.

So as long as we're comparing them in the same region, we get the,

the same number applies.

I'm not sure I'm comfortable with that based on what we saw in the scatter plot.

So we took it one step further.

And estimated separate associations between damage and

elevation for the north and the south.

And we saw, that the north was again positive, statistically signficant.

The south, the estimate was negative.

There was not statistically signficant.

And while I don't have a lot of.

Comfort about the precision of the association amongst

the southern sites especially.

You know, based on these data and these data alone, I think I'd be willing, and

I'd want to talk with, environmental researchers, etcetera.

But I think I'd be willing to take it one step further and

said there was some evidence.

Notice these confidence intervals don't overlap, so

even though this wasn't significant, and was very imprecise.

It didn't actually go up to lower end-point for the North.

I might say, there's some evidence of effect modification by region.

And we could argue about that.

But we couldn't even have this discussion if we had stopped here,

and simply adjust it.

So if we're really interested there's a research question to effect modification.

We have to go the extra step and look at separate outcome exposure relationships.

And if I found evidence of effect modification, then I'm no longer

concerned about presenting the adjusted estimate and talking about confounding.

Because what I'm claiming is that, that is no longer an issue because I

really need to present separate estimates by region.

So, we'll look at four more examples of this investigation process in

formulating conclusions in section B.

But, in summary, effect modification or interaction occurs when the relation

between two variables Y and X depends on the level of the third variable Z.

Effect modification can only be ascertained by looking at

separate estimates and confidence intervals.

For the Y X relationship by different levels of Z.

And we'll see when we get into the realm of multiple regression,

we can do this sufficiently.

And actually get a p value to test whether any of

the Y X relationships differ across at least two of the levels of Z.

So we'll be able to do this more efficiently when interested.

But if we were simply to compare unadjust and

adjusted outcome exposure relationships adjusting for Z.

We can do this to assess confounding, but not to assess effect modification.

Now this doesn't mean that in every study one does you have to

investigate potential effect modification for many different factors.

A lot of times the reason for

investigating is based on a prior hypothesis by the researchers.

In advance they decide they want to investigate whether the association

they're interested in is effected by or modified, is modified by a, b, or c.

But you can't in, investigate it unless you plan to and look for it.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.