There are several different types of analyses that are performed during and at the end of a trial. The types of analyses that we'll discuss are preplanned versus ad hoc analyses, sensitivity analyses, and subgroup analyses. Preplanned analyses should be used to address your hypothesis. This includes the primary hypothesis, which is the focus of the trial, as well as secondary hypothesis which covered known quantities of interests, and exploratory hypotheses, which cover topics of potential interests. It is important to control the type I error, the chance of falsely concluding a success. For your pre-planned hypothesis, the more comparisons that you do, the more likely such a mistake is to happen. How you control the type I error will determine the interpretation of your findings. To make a definitive conclusion, you should control the type I error. For exploratory or hypothesis-generating tests, you may choose not to control the type I error. Ad hoc analyses is unplanned. These are not specified in the protocol and may arise from emerging trends in the data, information from outside sources such as other trials or studies, and also safety issues, which raise the question of whether or not the participant is being put at undue risk. Typically they're exploratory in nature and do not control the type I error. However, that means the interpretation of the findings should be exploratory or hypothesis-generating. You should not overstate your conclusions. They will need confirmation either by doing another study for analyzing existing data. The important thing to consider is not whether we look because we all do, but how we interpret those results. Sensitivity analyses are a stress test of your inference and conclusions. Are my conclusions robust to a variety of factors? If they are, then you have more confidence in your findings. If they're not, then you have to discuss whether or not the scenarios that changed your conclusions are likely. Typically this involves reanalyzing using different approaches, assumptions, or outcomes. We do not typically adjust for multiple comparisons. We're not making a new conclusion, we're just commenting on how robust we think our original conclusion was. When do we do sensitivity analysis? Well, the main focus is on the primary outcome because that's your main conclusion. But it may also be important for secondary or exploratory outcomes and safety data especially if you're planning to use the results of those analyses for planning future studies or making recommendations on how a drug or device should be used. How do you present these findings? If your sensitivity analyses show that your conclusions are robust, you can typically report the findings were similar under a range of sensitivity analyses and detail them in a supplement. If your findings were not robust, however, they should take a larger role in your manuscript presentation or report so that the audience is fully aware of the potential issues in your conclusion. I've listed some of the common types of sensitivity analyses here. These involve analytic choices, such as the definition of an outcome , alternate modeling techniques, parametric or model-based versus non-parametric, choice of the type of regression, alternate modeling parameters, this could be using an adjusted model versus an unadjusted model, changing the assumptions about the correlation or clustering between measurements using alternate models, describing the shape of the relationship, also the underlying assumptions that you make about factors such as cost or utilization. You can also use sensitivity analysis to address data issues such as non-compliance or protocol violations, baseline imbalances between the two groups in the patient characteristics, outliers, and violations of modeling assumptions. Finally, sensitivity analyses are a very important tool for evaluating missing data. You can change the assumptions about the type and pattern of missing data, as well as the analyses techniques that you use to address it. Subgroup analyses are also very important. They examine the effect of an intervention across subgroups of interests and determine if the effect is consistent or not. If it's not, you may identify subgroups that have different effects. Now, this could be pre-specified, in which case you're doing hypothesis testing, or it could be ad hoc, in which case you're doing hypothesis-generating. It's better to restrict your definition of your subgroups to values collected prior to the intervention. There's less potential for bias if the subgroup categories cannot be influenced by the intervention. For example, if you have a bad outcome after a lack of adherence, is that bad outcome due to the lack of adherence? Or was the predication not working? That's why you didn't adhere and also had a bad outcome. Consider the case when you have a bad outcome after a lack of experience. This could be due to one of two causes. The lack of adherence led to the bad outcome. It's also possible that the participant was not adherent because the medication wasn't working. The root cause was the lack of efficacy of the medication. Examples of common choices for subgroup analyses include the baseline value of the outcome, age, gender, race, and disease severity. There are pros and cons to consider when performing subgroup analyses. On the plus side, if you have a heterogeneous population, you may find a subgroup where there is benefit or harm differing from the overall intervention effect. A good example of this is the relationship between the bracket gene and breast cancer survival. It could also be useful for generating hypotheses and helping you to develop your future studies. For example, identifying an appropriate study population that focused on those for whom the intervention worked. However, you have to be careful. It's very easy to get caught up in data dredging. It's important to establish a scientific rationale for your findings, not just do a random search of all possibilities. There's a famous phrase, lies, damned lies, and statistics. That could be because with enough effort, you can make any dataset support your argument. You need to have that scientific backing. It's also important to consider the impact of multiple comparisons. The more tests you do, the more likely you are to find some subgroup just by chance as a false positive and inflate your type 1 error. There are two goals behind subgroup analysis. The first is to look at the consistency of the effect across multiple subgroups. To answer this aim, we focus on graphical and numeric summaries as opposed to formal statistical tests. An example of this is the forest plot on the right side of your screen. It shows the treatment effect both overall in the top line and for a number of different subgroups, such as age, BMI, and hormone receptor status. Overall, the treatment worked better than the control across all subgroups. As is demonstrated by the fact that the point estimate and most of the confidence intervals were to the left-hand side of the bar. The hazard ratios and the p-values comparing whether or not the subgroups differed are presented on the right-hand side of the plot. Your conclusions from this analysis is that the pattern is homogeneous or consistent or heterogeneous. In addition to looking at whether or not the pattern is homogeneous, we can also perform formal statistical analyses. The gold standard is a test of interaction. You can think of this as a gateway test. It determines whether or not the effect of the intervention differs between two or more subgroups. If it's significant, that means there's a difference. You can examine the effects within the individual subgroups. Perhaps saying that for one, the drug worked and for another, it didn't. If it's not significant, it may mean one of two things. It may mean that the effect is consistent for all subgroups, or that we just didn't have enough power to detect the difference. The other option would be to look at the effect size for each subgroup separately. Now this is problematic as a formal comparison. You're pretending that each subgroup is a separate trial, but making a conclusion about their relative merits. We would not declare one intervention more effective than the other without a direct comparison test. We shouldn't do that for subgroups either. That is not to say that we cannot look. It just means that we need to be careful and reserve our conclusion, and say that any findings are for hypothesis generation and future testing. Just because a test of interaction is not statistically significant, does not mean that we cannot learn from it. It may still provide valuable information, either to guide our planning or to do future testing.