In this final section, I'm going to discuss subgroup analyses, which is another kind of statistical analysis that's frequently performed in clinical trials. In the context of clinical trials subgroup analyses are analyses where we're looking to see if there are varying treatment effects in different subsets of patients. So why do we want to do subgroup analyses? Well, we'd like to check the consistency of the treatment effect across subgroups. For instance, we'd like to know if the effect is the same in women and men or in children and adults. Or we might think that perhaps the treatment effect would vary for people with less severe disease versus those with more severe disease. And the two most common methods for doing subgroup analyses is to do stratified analysis or to do a test of interaction. And finally when we talk about doing a series of subgroup analysis we need to think about the problem of multiple comparisons. To do a subgroup analysis by stratification, you estimate the treatment effect separately in each of the subgroups. So you estimate an effect in men and you estimate an effect in women. And using this method, you can test whether there is a significant effect in men, and you can test if there's a significant effect in women. But what you cannot do is test whether the effect in men differs from the effect in women. So the important point here is that to test whether there's a treatment effect separately in a series of subgroups is not the same as testing whether the treatment effect differs in the various subgroups. The only way to make a statement about whether the treatment effects are different in one subgroup versus another is to do a formal test for interaction. So, to do this test we build a statistical model that includes main effects for our treatment group and for our subgroups and we use interaction terms to test for interactions between treatments in subgroup. And in this way, we can see if the treatment effects vary by subgroup. We can also use that model to estimate the treatment effects in the various subgroups like we could do with the stratification method. Frequently, when we're do subgroup analyses in a clinical trial, we aren't doing just one subgroup analysis. So we aren't only interested in seeing whether the effects differ in men and women, but we're also interested to see if the effects differ across people with different co-morbid conditions at baseline. Or people who are taking different concomittent medications. So in practice, we might do a whole series of subgroup analyses. And when significance tests are performed in multiple subgroups, the overall Type I error rate for subgroup analyses is inflated. That means, by chance alone, the probability that the pay value for a subgroup difference is less than 0.05 in one or more subgroups is greater than 5%. So the more subgroup analyses that we do, the more likely it is by chance alone that we will find a statistically significant difference. So this is problematic if you use statistical significance as the only criteria for judging whether or not a difference is true. To illustrate this point I've included, a graph from a publication in the New England Journal of Medicine, that shows the probability of getting a false positive, which is getting a P value, less than the cutoff just by chance alone. And on the y-axis we have the probability of a false positive. And on the x-axis we have the number of subgroup tests that were performed. And, you can see that, as you do more tests the probability of getting a false positive increases. And, in fact, if you do 40 subgroup tests the probability of getting at least one false positive is close to 90%. The probability of getting at least two false positives is about 60% and the probability of getting at least three false positives is close to 30%. So the cautionary note here is that we shouldn't get overly excited about significant findings that pop up when we're doing a series of subgroup analyses. Instead we need to interpret these, with appropriate skepticism. So that is, that's not to say that we shouldn't do subgroup analyses. Clinical trials represent a significant investment of time and money for the investigators and the sponsors, and also potential health risks for the participants. So it's our obligation to gain the maximum amount of knowledge that we can from the data that we collect. And it's important to determine if there are subgroups of people who are more or less likely to be helped by an intervention. So even if we think that the subgroup analyses are explorative and not definitive, they can help to guide future research priorities and they're important. But how do we perform and report subgroup analysis in a valid and transparent way? Well, as much as possible, the subgroup analysis should be prespecified and if there are important subgroups, investigators might consider inflating their sample size, so that they have a decent power in that subgroup. It's important in publications that investigators report the number of separate subgroup analyses that were performed so that the reader has some idea about the probability of false positives. In some cases, people will adjust for multiple comparisons. So they'll make an adjustment in the level of significance that's required to declare a difference significant. And finally, epidemiologists and statisticians really stress the importance of reporting confidence intervals instead of or in addition to the p-values, again so that the reader can see the uncertainty and the estimate of the subgroup effect. So these are some guidelines that we think summarize how to proceed cautiously with sub-group analyses. And this brings us to the end of the lecture on analysis issues, and in this lecture we've reviewed analysis philosophy, such as intention to trade, and the difficulties in the analysis, such as missing data in subgroup analyses.