So, in this section, we'll tie up some loose ends here and talk about the Linearity Assumption with regards to Multiple Cox Regression and the inclusion of continuous predictors. We'll also talk about briefly about prediction using the results from multiple Cox regression. So, upon completion of this lecture section you will be able to: Speak about the linearity assumption related using continuous predictors in the multiple Cox regression model, expect to see comments about investigating this linearity assumption and articles were continuous predictors are used in simple and multiple Cox regression models, discuss potential strategies for choosing the final Cox regression model, and appreciate that survival curves can be estimated when using the results from multiple Cox regression models for various subgroups based on their X values. So, let's go back to our PBC trial example, the randomized clinical trial on 312 patients with primary biliary cirrhosis studied at the Mayo Clinic. Again, we've looked at this many times, but the primary question of interest was looking at the relationship between mortality and randomize treatment, but there were some other predictors of interest as well. So, one question would be, what is the association between mortality and bilirubin level at enrollment after adjusting for patient age, sex, and treatment group? We've done this by looking at a multiple Cox regression that included these four predictors. I made the choice to include bilirubin as continuous as the predictor in milligrams per deciliter. But the question I had to think about before doing that and presenting the results is, can we quantify this adjusted association with bilirubin, keeping bilirubin as continuous in the model? So, we had presented previously in looking at the unadjusted and adjusted associations, the results from a model unadjusted association and adjusted association that both included bilirubin as a continuous predictor. We got an adjusted hazard ratio which was equal to its unadjusted counterparts so it appears that the association was not confounded by treatment, age, or sex that suggested that there was an estimated 16 percent increase in the relative hazard of mortality for each one milligram per deciliter increase in bilirubin amongst persons with the same treatment, age and sex from the adjusted model. So, this is what the model looked like the very birth of these data. There was a slope for treatment of 0.10, we exponentiate that, you would get that adjusted hazard ratio comparing mortality for those on treatment of those placebo. There were the three slopes that can be the exponentiated to get the adjusted hazard ratios for the three non-reference age categories, and then the slope of bilirubin in this model was 0.15. Then, the slope for sex was negative 0.52, where sex was coded as a one for females. So, again, so the slope for bilirubin in this model is Beta five hat equals 0.15, so the adjusted hazard ratio of death for two groups whose patients with bilirubin levels that differ by one milligram per deciliter is e to the Beta five equals e to the 0.15 or 1.6, and this is the adjusted for treatment, age, and sex because this is a multiple Cox regression model. So, the linearity assumption in that multiple Cox regression model is that the relationship between the log hazard of death and bilirubin measured in milligrams per deciliter is linear in nature after adjustment for treatment group sex and age. How can we assess whether that linearity assumption is reasonable? There's no easy visual tool like unadjusted variable plot, like we had with multiple linear regression to help with this assessment. There's no lowest plot that we can use for Cox regression, and lowest plots are hard to adjust for other predictors in any case as well. So, like we did with multiple logistic regression, we can use an empirical approach to assess whether it is reasonable to model the predictor is continuous or not in a multiple Cox regression model. We can categorize continuous predictors into groups and refit the model with that predictor categorize and see if the difference in the log hazards between consecutive ordinal groups is similar in a model that includes other predictors. So, in both the Cox regression model, we would expect we categorize bilirubin to see the change in the log hazard, with increasing ordinal categories of bilirubin to be similar across those categories in a model that also included treatment, sex, and age. So, in early review this empirically, I went and created quartiles of bilirubin for roughly equal size groups based on the percentiles, the 25th, 50th, et cetera, and I created three indicators, X_5 is equal to one for bilirubin quartile two and zero otherwise, X_6 is equal to one for bilirubin quartile three and zero otherwise, and X_7 is equal to one for quartile four and zero otherwise. So, here are the coefficients that we get for those three bilirubin non-reference categories. So, let's just look at the results now and hone in on the bilirubin part. This now is the slope for treatment. It's numerically slightly different than what it was would bilirubin was coded as continuous, but the association is still and I'm not showing you, this is not statistically significant. Again, it indicates higher adjusted estimated association of mortality for those who got treatment versus placebo, but again, not significant. Here are the coefficients for the age categories. Each of the non-reference age categories compared to the reference of the first quartile, and these are numerically different slightly than what they were when we included bilirubin as continuous. But again, I'm not showing you this, but if you compare the estimate side-by-side in the confidence intervals there, they're similar in both in magnitude and also the confidence intervals overlap. Then, here are the three slopes for bilirubin quartiles, two, three, four respectively, and then the slope for sex. So, let's hone in on the piece it has to do with bilirubin and discuss it briefly. So, here's the piece that has to deal with bilirubin. It's 0.45 times the indicator of being in bilirubin quartile two plus 1.61, being in quartile, should be X_5 here, at quartile three and 2.74 times the indicator of premium quartile four. So, the difference in the log odds, adjusted difference between quartile two and quartile one is 0.45. There's the adjusted difference in the log odds for quartile three compared to the same reference quartile is 1.61. So, the difference in these two things is a little bit greater than one, it's equal to 1.16. Then, if we look at the difference in the fourth quartile compared to the same reference in the first is 2.74 and the difference between that and the subsequent third quartile is 2.74 minus 1.61, which is again also greater than one, it's 1.13. So, it turns out that the jump from quartile two to three is almost identical to the increase in log hazard for quartile three to four, but those are both larger than the jump from quartile one to quartile two, which is on 0.45. So, by modeling this as continuously, we might be missing an acceleration in the process that relates bilirubin in mortality as the levels get higher. We may overestimate what happens in early bilirubin levels and underestimate, and later, by fitting a line to it, but we wouldn't miss the overall gestalt which is that there is a consistent increase in the log hazard of bilirubin after adjusting for treatment, age, and sex with increasing levels of bilirubin. So, if I were working with an expert, a hepatologist or somebody else, I would seek their expert opinion on which one was more biologically relevant to model these, but certainly, by including bilirubin's continuous, it may not be optimal but it certainly is not going to miss the big idea of the log hazard increases substantially with increasing bilirubin. So, how given data, if you were to work on a data analytic team or analyze your own data, and you got a dataset, how would you choose with the time to event outcome? What your final multiple regression model would be? Would you keep all X's in? You first have to come with a working hypothesis or question. We want to see how these 10 things are related to survival, for example. But in many cases, the idea of feeling can be a little overwhelming to start. I have all these possible predictors, there may be some confounding, I can look at that, et cetera, but what constitutes the best final regression model? That really depends like with the other types of regression on the goals of the research. So, if the goal is to maximize the precision of the adjusted estimates, being adjusted log hazard ratios and ultimately the adjusted hazard ratios, it makes sense to keep only those predictors that are statistically significant in the final model, so that you don't have to estimate slopes for things that don't add knowledge to the outcome after counting for the other predictors in the model. That would compromise the precision of the predictors that are associated because we'd be estimating more things with the same amount of data, where some of these things don't add back information. If the goal is to present results comparable to the results of similar analyses presented by other researchers on similar or different populations, they want to present at least one model that includes the same predictor set as the other research, even if some of the predictors that they used are not statistically significant with your data. So, if the goal is to show what happens to the magnitude and association with different levels or adjustment, you may present the results from several methods that include different subsets or combinations of adjustment variables. So, with the first column would have all the unadjusted results. The second column might be adjusted for demographic characteristics. The third maybe adjusted for sociological characteristics. The fourth might be adjusted for biological characteristics. Then, the fifth column may have all adjustment factors in one model, and so that you could list, the reader could look at the association between the survival outcome in any specific predictor, as what happens to it after comparing the unadjusted to the different adjusted results. Might have some sense of what types of factors are confounding that relationship if any. But the goal of prediction, well, it's slightly more complicated story, and we can't give it a full treatise in this course, but we will discuss briefly the ideas of prediction and how to estimate survival curves from Cox regression results though. So, we did this and we talked about this for simple Cox regression, will just extend the idea in a multiple Cox. We can, but use the results of the regression to get estimated cumulative survival curves for a given set of x values used in the regression. How to do this is mathematically involved. But what you could do for display purposes is use the computer to estimate these curves and then show different curves for several specific value sets of x1 through xp, and I'll show some examples of that. So, how would you get the estimated cumulative survival? Here's the math part, if you're not familiar with calculus, we're not in the math, don't worry about it, I'll talk about it conceptually. But I do want to show the math for those who're interested and may do further statistical courses. So, how would you do that given a set of predicted values x1 through xp? Well, you have this equation that estimates the log hazard of the outcome occurring at any given time in the follow-up period that it is a function of the predictors x1 through p and time. So, what we need to get or the computer would do for us is it would fill in at any given time, the intercept piece here, and then this could be added to the linear combinations of slopes times their specific predictor values, and this would turn out to be a number here, and this would be the estimated log hazard at that given point in time for the group defined by their x values. Then this could be translated into an estimated hazard at that time for the group given their x values. Then what will happen just like we did with simple Cox regression is that, in order to estimate the survival curves, survivals dependent on the cumulative hazard or risk incurred in that group up through the time we're looking at. So, if we're looking at 30 weeks, the cumulative hazard will involve the risk from time zero up to 30 weeks for that group who's defined by this x values. This is found by integrating this function over time. In integration you can think of it's just summing up these time-specific hazards from time zero up to the current time that's being assessed and getting the cumulative risk incurred by this group up through the time, this has been computed for. Then the survival at that time is a function of the cumulative hazard as exponentiated version of the negative of that cumulative hazard. So, this is just FYI, but if you don't follow the math, that's fine, but just think of this conceptually, the more risky a group has incurred up through a given time, the lower their survival beyond that time. So, more risk equals lower survival. So, let's look at some predicted survival curves based on these Cox regression results for the PBC study. Again, we have a model here, if this is the multiple model, includes treatment, age, bilirubin and sex. I using the computer could create some estimated survival curves for different subgroups based on their x values. So, I can't show all possible survival curves for all possible unique combinations x values, but we could look at this. This is something that might be presented in the paper to show what the impact of bilirubin, for example, is among different age groups on survival. So side by side here, I have age-specific survival curves. This was the age at the start of the study for a group of female patients with low bilirubin who were on the drug group DPCA, and this side by side is a group of female patients with high bilirubin on the same drugs. So we show this separately by age group, both groups are female, both groups from the same treatment. So, you can see that we noticed that the hazard ratio associated with bilirubin was large, a 16 percent increase in the relative mortality, relative hazard of death per one milligram per deciliter increase. What I did here was for display purposes, took bilirubin and put it into quartiles and then showed what the impact would be on these specific quartiles given the numerical values for them. So, you can see that these patients on the certainly age is in both cases, the larger the age the worst of survival. But for comparable age groups with low bilirubin versus high, you can see the survival is much better for the low bilirubin groups than the high bilirubin groups. In the worst case scenario, for the oldest age group and with low bilirubin survival after 12 years is on the order of 60 percent, 60 percent make it beyond 12 years, I compare that to almost nobody surviving beyond 12 years in the high bilirubin at baseline group. Let's look at another example of using Cox regression results to present predicted survival curves. This is from predictors of infant mortality including gestational age, whether the mother was given, in this randomized study, beta carotene vitamin a or placebo, the sex of the child and maternal parity. I'm going to use the results from model three, includes all four predictors to look at some predicted survival curves for different subgroups of children. So, here's some examples of estimated survival curves based on their multiple Cox regression results from module three. So, we can see these are split out by gestational age groups, we're looking at males with two to four older siblings whose mothers were randomized with the placebo arm in this left graphic and in the right graph, we're looking at female infants with two to four older siblings whose mothers were randomized with the placebo arm. We know that sex had very little to do with survival either before or after adjusting for these other things. So we can see very clearly, again, that gestational age, that your drive survival and that being preterm was a huge risk factor, but this puts up absolute percentage on the risk as opposed to just the relative that we get from the hazard ratios from the model. So, it's nice sometimes to display the predicted survival curves for some of the subgroups involved to get a sense of the absolute nature and magnitude of the risk of the outcome above and beyond on what we have from the relative hazard ratio comparisons. I had mentioned before that there is uncertainty in linear except estimate as a function of time. Confidence intervals for these curves can be created, so I'm showing here are just two of the gestational age groups as opposed to all five on one graph because it gets messy with the confidence intervals, so I'm looking at those who were 36 to 38 weeks and those that were less than 36 weeks. Here, are their estimated survival curves and the confidence intervals. The uncertainty in these estimates comes from the uncertainty in the intercept for the model, as well as, the uncertainty in the slopes for the factors that are used to estimate these. All that uncertainty gets transformed when we transform back from the log hazard scale to the survival scale, it's complex, but confidence intervals for these curves can be created. Just thinking about using the resulting curves for prediction to predict, for example, survival for infants who were not enrolled in the study and the triage them. To get a sense of how well these curves predict just like we've seen for the other types of regression are the measures of model prediction evaluated using the same data used to fit the model were overly optimistic. There are ways that we can't get into here to assess how well these curves predict for the given dataset. But in order to do it properly, we wanted to do some calibration where we either use another set of comparable data and see how well the model will fit on this particular set, predicts for the other set of data or in the absence of having that, if there's enough data and the original data we have for mole, and you could split the data randomly into two subsets and use one of the subsets to fit the Cox model and the other to evaluate it's predictive power. Again, we don't show how to evaluate predictive power of Cox model because it's beyond the scope of what we can do in terms of this course, but just again, this principle of not using the data that was used to create the model, to validate the model and makes for a better assessment. So in summary, what we've talked about here is it start when including a potential predictor in the Cox model that's continuous, there's an underlying assumption in the model, underlying linear assumption involved and this is that the relationship between the log hazard in the continuous predictors linear nature after adjusting for other predictors in the model and we showed how to investigate that empirically. We also showed that the results from multiple Cox regression can be used to produce estimated survival curves for groups given a specific set of predictor values. We reinforce the idea that if one wants to build a predictive model and evaluate how well it predicts for the population from which the sample came, it's best to validate the prediction on another data set from the same population that was not used to fit the model.