So now let's look at some examples of simple Cox regression where a predictor is continuous. So we will continue to interpret the slopes from simple Cox regression models as log hazard ratios and the exponentiated slopes as hazard ratios. However, there's a structural assumption we'll have to investigate when feeding the Cox regression model with a continuous predictor. So we'll show how to empirically assess whether the relationship between the log hazard of the outcome, and the continuous predictor is in fact linear. So let's go back to our data from the randomized trial at the Mayo Clinic where 309 patients with primary biliary cirrhosis were randomized to either receive a drug or a placebo. And they were followed for up to 12 years to see whether they died or were censored during the follow up period. So one of the questions we may have above and beyond, what is the impact of the drug on survival? We looked at that in the last section, and I've done so previously in the first term as well, is what is the association between mortality and the bilirubin level of the enrollment where the bilirubin is measured in milligrams per deciliter. So bilirubin was measured continuously in units of milligrams per deciliter. Can we quantify the association between mortality and bilirubin, keeping bilirubin as continuous and not categorizing it for example? So let's just look at the distribution of the bilirubin levels in this sample for context. We can see here's a boxplot of the bilirubin levels and we can see it's a right skewed distribution with the majority of the levels being lower. And then some extremely high values for perhaps the sicker patients in the sample. So if we were to do a Cox regression for these data, let’s for a moment assume the relationship between the log hazard of mortality and bilirubin is linear in nature. And we’ll come back and assess that after we fit a model that assumes it. So let's first fit a model of the following formula. We're going to estimate the log hazard of mortality at time t for a given value of x1 which is bilirubin level, to be a function of this intercept that varies over time plus a slope times the continuous value of bilirubin for a group. So the slope beta 1 again is the difference in our left hand side, the log hazard of mortality in a specific time t, for two groups who differ by 1 unit in bilirubin. 1 milligram per deciliter in bilirubin. So beta 1 is the difference in this log hazard for a group with bilirubin levels of b + 1, for example. Minus the log hazard for a group of bilirubin levels 1 unit less of b at a specific time, any specific time in the following period. So again, if the slope is the difference in the log of hazards, we can re-express that as the log of the ratio of hazards. And so the slope is the estimated log hazard ratio of mortality at any point in the follow up period for two groups who differ by 1 unit in bilirubin. The group of higher bilirubin relative to the group with lower bilirubin. So the results for these data look like this. We get an equation that estimates that the log hazard of mortality at time t is a function of this intercept which varies over time. But at any given time, we start with that and then add 0.15 times the bilirubin level in milligrams per deciliter for the group we're looking at. So the slope here is beta 1 is equal to positive 0.15. So as bilirubin increases the risk of mortality in the follow up period increases. So this is the log hazard ratio of mortality, the difference in the log hazard ratio of mortality between two groups whose bilirubin levels differ by 1 unit at any point in the followup period. If we were to exponentiate this, we would get the estimated hazard ratio. So the estimated hazard ratio here for two groups will differ by 1 unit and bilirubin is e to the raise to the slope power, the 0.15 power and it's approximately equal to 1.16. So for each 1 milligram per deciliter increase in bilirubin is associated with a 16% increase in the risk of mortality across up to 12 year follow up period. The intercept here, the log of lambda hat of naught of t is a baseline quantity. Technically, it estimates the natural log of the hazard of death as a function of time for a group with bilirubin levels of 0 milligrams per deciliter. So it is a placeholder function of time, but there are no real group of persons anywhere, including in our sample, who have bilirubin levels of 0. So this does not estimate a function that applies to any particular group in our sample. But it establishes the shape of the death risk over time from which we can compute the value for other groups to bind on different bilirubin levels by adding the appropriate multiple of the slope to this risk in any given time on a log scale. So let's look at this, let's just try and figure this out. So what if we wanted to compare the hazard ratio of mortality for persons with bilirubin levels that differed by something other than 1 milligram per deciliter. We saw the hazard ratio for a 1 unit difference was 1.16, but what if we wanted to compare groups with high bilirubin levels, 3.5 milligrams per deciliter compared to a group with 0.8 milligrams per deciliter? How would this play out with the results we have? Well, we could write out this equation at any given time for both bilirubin levels, for the group with bilirubin levels of 3.5 milligrams per deciliter. The log hazard at any given time is equal to the intercept. The log of the baseline hazard evaluated at that specific time plus the slope of 0.15 times the bilirubin level of 3.5. When bilirubin is equal to 0.8 milligrams per deciliter, it's a similar operation. We start with the same intercept, the log of the baseline hazard function evaluated at the same time t plus the slope of 0.15 times the bilirubin level of 0.8. So the difference in these two log hazards evaluated at the same time, and again, it doesn't matter what time that is, as long as the same time. In the follow up here, the difference is the slope of 0.15 times the difference in the bilirubin levels. The difference in these bilirubin levels is 2.7 milligrams per deciliter. So the log and the hazard ratio for the groups we're comparing 3.5 versus 0.8 is 0.15 to the slope times that difference in bilirubin levels of 2.7, that's equal to 0.405. That is the log hazard ratio we're trying to estimate. So if we were to exponentiate this to get the hazard ratio, we would do so, and we get a hazard ratio of 1.5. The group with bilirubin levels with 3.5 milligram per deciliter, has 50% greater risk of death at any point in the follow up period compared to the group with bilirubin levels of 0.8. And just notice, again, the logic, same thing we showed with odds ratios in logistic regression when we're comparing groups who differ by something else than 1 unit. We could do it on the log scale like we just did and then exponentiate the end result. Or we can start with the estimated hazard ratio for 1 unit difference in bilirubin 1.16 and then raise that to 2.7 power. Raise that to the difference in the two bilirubin levels we're comparing. They're mathematically equivalent. This is also equal to 1.5. So it doesn't matter how you do it, but the end result will be the same. So how I had said before, we were estimating this based on the assumption, the relationship between the log hazard and bilirubin over time was linear. So I'd said at the beginning of this that in fitting this model, we were assuming that at any given time, the relationship between the log hazard of mortality and bilirubin was linear in nature. This is a hard thing to assess visually because we have multiple dimensions of our predictor set going on. Even though we only have one predictor, technically, it’s bilirubin level, there's also the time element that factors into the function that is the intercept. So there’s really several dimensions to look at here. The relationship of the function over time and with that the relationship of this function to the bilirubin level. So there's no easy graphic we can look at to assess whether the relationship between the log hazard regardless of time and bilirubin is linear in nature. But we can use an empirical approach, and this is what analysts do when they're checking this assumption before deciding how to model the predictor, whether to keep it as continuous or to do something else. So what we can do is we can categorize the continuous predictor of interest into groups and see if the difference in the log hazard between consecutive ordinal groups is similar. So what do I mean by that? Let's look at bilirubin. We used it as continuous, now let me categorize it arbitrarily into four quartiles, roughly equal size groups. So we'll use the first or lowest quartile bilirubin as a reference, and we'll create indicators for the other three groups. x1 = 1 for persons with measurements in quartile 2 and 0 otherwise. x2 = 1 for persons with bilirubin measurements in quartile 3 and 0 otherwise. And similarly, x3 = 1 for persons with quartile 4 measurements of bilirubin and 0 otherwise. And the reference group, quartile 1 is the group with all x = 0. So the resulting Cox regression equation is as follows. The log hazard at any given time, giving bilirubin is equal to the log of the baseline hazard. The hazard for the reference group of value in this specific time + 0.4 times x1 + 1.5 times x2 + 2.6 times x3. So if the relationship were truly linear, we’d expect to see a constant direction as bilirubin quartiles increased. We’d expect the log hazard to increase or decrease consistently across those levels. We certainly see a consistent increase in that the differences between each respective quartile and first quartile, a reference gets larger with increasing values or increasing quartiles. We'd also expect that difference between subsequent quartiles to be roughly the same. So the difference in the log hazard between the second quartile and the reference quartile, the first at any given time is 0.4. The difference between the third quartile and that same reference quartile is 1.5. And hence the difference between log hazard at any given time for the third quartile, and the second quartile of bilirubin, the difference between 1.5 and 0.4 is 0.9. Similarly, the difference in the log hazard mortality between those with the highest levels of bilirubin, the fourth quartile, and the reference of the first quartile at any given time is 2.6. But the difference between this fourth quartile and the third quartile at any given time is the difference between 2.6 and 1.5, it's also equal to 0.9. So in a perfect linear world, we would expect these three deltas between subsequent consecutive quartiles of bilirubin to be exactly the same. They are not, this one is notably lesser than these two. And so we could argue about whether we should model this as linear or not. If we do model as linear, we would certainly capture the general gist, which is that increasing bilirubin leads to increased risk of mortality. But we may tend to overestimate the degree of increase in the early quartiles and underestimate it in the latter quartiles, if we assume linearity. I'm going to say though that this looks pretty good from an empirical assessment because it's somewhat arbitrary how we categorized bilirubin. And I'm going to say that, given only these two things to choose from, I would probably opt for the result that treats it as linear. Because then we only have to estimate one slope, and we exploit the linearity in the data and get a more precise estimate of the relationship. However, nevertheless if we wanted to present the results in graphical form and show the Kaplan-Meier curves corresponding to different levels of bilirubin. We would need to do some categorization even if we reported the hazard ratio per one unit difference from a model that assumed the relationship was linear on the log hazard scale. So here are the Kaplan-Meier curves survival for groups defined by bilirubin quartiles. And we can see it does shake out orderly that the lower the quartile, the lower the bilirubin levels, the better the survival in order here. So that just corroborates with what we saw overall when we assume linearity and also when we categorize bilirubin. Let's look at another example, infant mortality in gestational age. So we want to see in this Nepalese cohort, this is the study where we first looked at whether vitamin supplementation of the mother during pregnancy was effective for reducing infant mortality. And there was no evidence that it was. But let’s see if there’s other predictors of mortality that could be helpful moving forward and triaging children to better care after birth. So we have the sample of over 10,200 Nepalese newborns, here’s the distribution of their gestational ages. On the low end, we have very several outliers less than 30 weeks, go all the way up to 46 weeks in the sample. So I want to assess the relationship between infant mortality and gestational age. So for the moment, we're again assuming, I'm going to assume the relationship between the log hazard of our outcome, and our predictor is linear in that six month follow up period. And then we'll come back and assess that assumption empirically. So I'm going to fit a model using the computer that looks like this. We're going to relate the log hazard of death as a function of time and gestational age as follows. It's going to have an intercept component. The log of some baseline risk as a function of time plus the slope times the gestational age for any group of children. So the slope here compares the log hazard of death at any specific time in the follow up period. And compares it for two groups of children who differ by one unit or one week in gestational age. So the slope beta 1 had, again, is this difference in the log hazard of death at time t for two groups of infants who differ by one unit of gestational age. Difference in log hazards is equivalent to the log of the ratio of those hazards. So this is the log of the hazard ratio mortality for two groups of children who differ by one week in gestational age. So when fitting this with the computer, we get a result that looks like this. We have a Cox regression with a negative slope. So the risk of mortality is decreasing with increasing gestational age, which makes sense. The slope is -0.13, And so this is the estimated log hazard ratio of mortality any given point in time in the follow-up period for two groups of children who's gestational age differs by one week. If we were to estimate this hazard course, hazard ratio, we would exponentiate this to get a hazard ratio of 0.88. So per week of gestational age, that relative hazard mortality is decreasing by 12%. The intercept here, the ln(lambda hat of 0(t)) is a baseline quantity. Technically, it estimates the natural log of the hazard of death as a function of time for a group of children with gestational age of zero weeks. This of course, is not a real group of children but it, again, establish as a starting point. A function over time for which in any given time we would get the corresponding log hazard of any given group of children by adding whatever this value is as that time, to -0.13 of the slope times that group's gestational age. So again, how to assess whether this linearity assumption between a log hazard of death in gestational age is reasonable? Well, there's no easy visual tool like I said before, but what we can do is take an empirical approach. Categorize the continuous predictor into groups and see if the difference in the ln(hazard) between consecutive ordinal groups is similar. So for this data, I could categorize gestational age into five groups, <36 weeks. So 36 weeks is the older definition of full-time. It's now been updated to 37, but I did <36 weeks here. 36 to 38 weeks, 38 to 39 weeks, 39 to 41 weeks, and 41+ weeks. So I'm going to make the preterm group by the old definition of <36 weeks, the designated reference group. And then four indicators for each of the other four gestational age categories. So these would not yield groups the same size and the intervals are not exactly the same width. But we should expect if the relationship between the log hazard and mortality and gestational age is roughly linear. We'd expect to see similar jumps or decreases in the ln(hazard) with its each increasing ordinal categories. So let's see what we got if we fit this regression. We have the intercept here, the log of the baseline hazard, which in this situation would be the log of the risk of mortality over time. For the reference group for the preterm children. And then the subsequent slopes for each of the other categories. At any given point in time, to get the log hazard of mortality for children who are gestational age is between 36 and 38 weeks. We find out what it was for the reference group at that time, and then add -0.91. So the difference at any given point in time, between the log hazard for children with gestational age is 36 to 38 weeks compared to the reference group is -0.91, a decrease of -0.91. If we look at the next group, the group that's 38 to 39 weeks, the difference between that and that ln(hazard) in a given time in the same reference, is -0.94. So not much of a difference. And the differences between these two gestational age groups. These only differ, the difference in the ln(hazard) between the 38 to 39-week group, and the 36 to 38-week group is -0.94 minus -0.91, or -.03. And then similarly if we look at the group whose gestational age is 39-41 weeks, compare them to the reference at any given point in time, we'd take the ln(hazard) at that point in time for the reference group and add -1.06. And that's only another difference of -0.12 between that and the previous gestational age group. So what it looks like to me with this data, is it's not so much that increasing gestational age is generally similarly protective across each level of gestational age. It looks like once the child reaches full term, that's the biggest hit. That's the biggest reduction in mortality. And then there's not so much added benefit per se of being in the rule longer. And if the gestational age gets too long, we start to lose up in fit a little bit. So it looks to me like the big story here is that the relationship is not particularly linear that the big jump is from pre-term to full-term with negligible affects thereafter. So given these result versus the one I attempt before, I would choose this model. It's a clear winner, whereas, it was more ambiguous in the situation with mortality in bilirubin. So I would not report the results with gestational age is continuous, but with instead, report these results. And you can see clearly if you look at the Kaplan-Meier curves by gestational age categories, that the only real difference visually, is that those who are preterm, have much poor survival outcomes than the other gestational age groups. And that's, perhaps, the bigger message if we were to stick with the model that treated gestational age is linear we would assume a constant change in the ln(hazard) over time for any one week difference in gestational age across the entire range. And we would miss out on this initial jump that attenuates thereafter. Certainly, we are consumers of the results of these analyses for the most part. So I'm not expecting you to go ahead and check this assumption on a regular basis, especially if you're not the one who's doing the data analysis at the computer. But when you read a paper and when they're doing a Cox regression, you want to pay attention to their method section. And at least if they're considering candidate predictors that are continuous in nature, they should address the assessment of the linearity assumption as part of the process. And I would be much more uncomfortable if the end result included continuous predictors but the authors had noted that they had checked the assumption of linearity for continuous predictors before including them as continuous in the models. If that is not specified and they don't make reference to that in the method section, then with little aware and concern about whether that assumption was met or verified. So in summary, again, slopes from simple Cox regression models with a continuous predictor continue, like they did when we had binary or categorical particulars to have a ln(hazard ratio) interpretation, and these slopes can be exponentiated to estimate hazard ratios. The assumption of linearity between the ln(hazard) of the binary outcome and a continuous predictor x1 can be assessed. It can't be assessed graphically with any ease, but it can be assessed empirically by comparing the results from Cox regressions with the x1 modeled as continuous, compared to situations where the continuous values are categorized into several ordinal groups. And seeing whether, when categorized the results bear out as we would expect if the relationship were linear in nature. In the next section, we'll come back at all these examples and there will be no surprises here, but we'll look at creating confidence intervals for the slopes and subsequent hazard ratios and getting a p-value for a hypothesis test.