So, let's look at some examples of the use of logistic regression the public health and medical literature. This will give you an opportunity to interpret the results from simple and multiple logistic regression models presented in at least three, because we'll look at three examples here of a published journal articles. So, the first one we'll start with is more of a clinical article from JAMA Surgery. But one of the reasons I chose it is because I liked the way they described doing their methods. So, let's just give the context for this from the abstract. The importance, they say, of this study is despite the increasing use of anti-tumor necrosis factor, TNF, therapy in ulcerative colitis, its effect on postoperative outcomes remain unclear with many patients requiring surgical intervention despite optimal medical management. So, their objective here is to assess the association, a preoperative use of anti-TNF agents with adverse postoperative outcomes. So, what they used was insurance claims data from a large national database to identify patients 18 years or older who had ulcerative colitis. These insured patients had inpatient and/or outpatient claims between January 1st, 2005 and December 31st, 2013 with current procedural terminology codes for, and they had three different types of surgery or subtotal colectomy or total abdominal colectomy, a total proctocolectomy with end ileostomy, or a combined total proctocolectomy and ileal pouch-anal anastomosis. So, these are three different groups. Only data regarding the first or index surgical admission within the time frame were extracted. Use of anti-TNF agents, corticosteroids, and immunomodulators within 90 days of surgery were identified using Healthcare Common Procedure Coding Systems. So, what they were looking at as far as outcomes goes in the first 90 days were 90-day complications from the surgery, emergency department visits, and the readmission, and what they used was multivariable logistic regression, was used to model these covariates, and multivariable logistic regression is a synonym for multiple logistic regression. Their primary predictor was anti-TNF agent use, but they included other things to control for differences between those who receive this and those who didn't. They may also be related to the outcomes as well. They had almost 2,500 patients, 2,476, a little over half, 55.75% were men, and the mean age was 42.1 years, but there was some variability in the individual ages as the standard deviation was 12.9 and among these groups, about a little over a third, 950, underwent subtotal colectomy or total abdominal collectomey, 354 of the 2,476 underwent total proctocolectomy with end ileostomy, and another 47.3% received ileal pouch-anal anastomosis. So, these are the three different groups we have here. I'll come back to the results as we work our way through the main sections of the article now. So first, I did a nice job of describing their statistical approach. So, they first said, and this is very standard to talk about doing some unadjusted comparisons between the exposed and unexposed groups of interest when there is a single exposure of interest, like in this case, anti-TNF agents. So, they said we use Wilcoxon and chi-square test is appropriate we're used to compare preoperative variables and post-operative outcomes between patients receiving and patients not receiving anti-TNF agents in each surgical group. Wilcoxon test is analogous to a t-test comparing means, instead of comparing a summary statistic and the population level, like means. Between two groups, it compares the distribution, but certainly, they could have used the two-sample unpaired t-test here as well, and then the chi-square test for comparing binary category outcomes between the two groups, those who got the anti-TNF agents and those who didn't. They go and say, multivariable, which is synonym for multiple logistic regression, was used to model the occurence of each outcome, remember there are three outcomes, within 90 days by the following covariates: anti-TNF agent use, which was their primary predictor of interest, and age, sex, co-morbidity index, malnutrition, failure to thrive, etc. They go on to say a linear association for co-morbidity index was assumed. What they mean there is on the log-odd scale in the multiple logistic regression. They go on to say though that non-linear associations on the log-odd scale for age were examined in these models. They say using something called B-splines, this is something very similar to low-S. So, they looked at whether they allowed for the relationship on the log-odd scale between the log odds of any of the outcomes and age to be flexible and they found that even by doing that, there was evidence of a linear fit. So they said, but linear associations were found to be appropriate in all cases and were used in the final models. Odds ratios and 95% confidence intervals were reported for all factors in the models. P less than 0.05 was considered statistically significant and 2-sided P values were used. They go on to say, "Among this subset of patients who did receive anti-TNF therapy, we examined whether outcomes different accordingly to time between anti-TNF agent use in surgery." They first use univariable or simple logistic regression to model the occurrence of each outcome as a function of time since infusion. Time was continuous, and they looked at whether there was relationship between the log odds, the outcome as a function of time was continuous or not, they looked at the possibility of non-linearity using B-splines, again, this is similar approach to low-S, but all non-linear curves were nonsignificant. So, there is a way to actually fit a more formal low-S where you can test whether result in non-linear, there's evidence in the fitted model of deviation from linearity, and what they're saying is they looked at this but all non-linear curves were not significant. In other words, there was no advantage statistically of assuming a non-linear relationship, and so they ultimately went back and treated time as a linear predictor in the log odds model. The estimated probabilities in each outcome as a function of time we're then constructed for these models. Multivariable analysis could not be performed because the event rates of each outcome were too small in the group that solely received the anti-TNF agents. In addition, they'd say sensitivity analysis that excluded patients undergoing emergency surgery was performed, given the inherent heterogeneity in disease severity and operative complexity in this population. All logistic models were refit after this exclusions and they looked at the results and compared them to the models that included those patients. But here's what I wanted to focus on in this section here, is they say statistical analyses were conducted using SAS software version 9.4 and R, R software version 3.1.2. What they're showing here are only the adjusted odds ratios. In this table, there's information elsewhere in the article for some of these unadjusted associations, if one want to look at confounding, and let's just look at what they found with anti-TNF agent used as primary predictor. They found that for all three outcomes, any complications, emergency department visits and readmissions, those who got the anti-TNF agents had an estimated lower odds of having the outcome but after accounting for sampling variability, the resulting confidence intervals all included the null value of one, and the resulting P values for testing the null that there was no relationship between anti-TNF agents and these outcomes; all P values were greater than 0.05. In fact, across these they didn't find many statistically significant predictors of the outcomes after accounting, adjusting for each other in these models. So, just to be clear, what am I getting in at what this model, well, let this model here comes from a multiple logistic regression that starts as the log odds of any complication equals some estimated intercept. Notice they don't provide the intercepts here as well, which means we could not, as the reader, take these results and estimated the probabilities of any complications for different groups given their predictor or be at most of the predictors are not statistically associated with the outcome, so that may not be that useful. So, they have a slope x1, which might be a one if they used anti-TNF agents. A zero if they did not. Then, we have a slope for age. Notice how they coded that though, this was in 10 year increments, but this is age. I remember they talked about testing for nonlinearity and they didn't find any reason not to assume linearity so they went with that. Then we have another slope for sex, which is a 1 for females and a 0 for males as per their designation here, and so on, and so forth. So, this is the basis for the models, but again, if you look across this table, there's very few things that they found were statistically associated with either the outcome of complication. This second model looked like the first one except they were estimating the log odds of an emergency department visit within 90 days, post operative. The third outcome, they looked at the log odds and ultimately express things on the odds ratio scale in this table for readmission. But they didn't find even these adjusted analysis and many predictors of these outcomes for those who had subtotal colectomy or total abdominal colectomy. So, let's see how they summarize this. A total of 950 patients who underwent subtotal colectomy or total abdominal colectomy procedure were identified, of whom 254, 26.7 percent had claims for anti-TNF agent within 90 days of surgery. Given it a mean of 39.1 days prior. Patients receiving anti-TNF agents compared with those with no anti-TNF agent use were significantly younger, mean age 37.6 versus 42.4 years P less than 0.001. So, this is a comparison of the mean age between these two groups. An unadjusted comparison, and underwent fewer emergency surgical procedures 33, 13 percent versus 191, 27.4 percent, P less than 0.001, but did not differ regarding sex, comorbidity index, or malnutrition status. Significantly, more patients receiving anti-TNF therapy compared to those with no anti-TNF therapy use, had corticosteroid use and immunomodulator use. So, there were two more significant differences between those who got anti-TNF agents and those who didn't. They go on to then say, "In univariate," in other words unadjusted or simple logistic regression. Also, patients receiving anti-TNF agents, compared with those with no anti-TNF agents had fewer ED visits within 90 days surgery. The P value the 31.1 percent versus 38.8 percent in the group who didn't get anti-TNF agents and that's statistically significant, but there were no differences between these two groups for readmissions or complications. However, on multivariable analysis, when they did the multiple logistic regression, which we just looked at the results of, the receipt of therapy was not significantly associated with these outcomes, and that's what we were talking about in that last table. They also did this for the other two types of surgery as well, but I'll just focus on this, for now. One thing they do say here and I'm going to come, bring back something we've talked about, and they said, "Amongst those who received the biologic agent, amongst those with subtotal colectomy or total abdominal colectomy who did receive an anti-TNF agent. The timing of its most recent administration do not influence the occurrence of any adverse outcomes within 90 days." This is what they're talking about, remember they talked about that secondary analysis where they did a logistic regression of the log odds of each of the outcomes on the time since getting the anti-TNF agent, only the subset of patients who got the anti-TNF agents. What they're showing here from that logistic regression, relating the log odds linearly to the days of most recent anti-TNF agent use in that 90 day period. They're showing the predictive probabilities from that logistic regression model and a confidence band around it, so each point on here estimates the predicted proportion of complications in patients who got subtotal or a total abdominal colectomy and were on anti-TNF agents the proportion of complications. At each day in the 30-day, 90-day follow-up period. So, they are basically taking the results from their linear logistic regression model of relating complications to time or day of most recent biological anti-TNF agent use, and transforming those into predictive proportions or probabilities and graphing that as a function of time. Again, these results were not statistically significant, however. They're showing in this portion of the graphic, the outcome of total complications for the three different types of surgeries shown, this is part of a larger graphic that went on to show it for the other two outcomes, ED visits, and readmissions as well. But I like this because they talked about investigating the linearity assumption and their method section found that there was no reason not to assume linearity and now they're presenting the predicted probabilities from these regression models graphically. So, could another example from Health Affairs, the title of this article was, "Mortality: The Gap Appears To Have Narrowed." It's about racial disparities in mortality. So, the abstract says, "Despite substantial attention to the greater likelihood of poor clinical outcomes among black versus white surgical patients, little is known about whether racial disparities and postoperative mortality in the United States have narrowed over time. Using nationwide Medicare inpatient claims for the period 2005 to 2014, we examined trends in the thirty-day postoperative mortality rates in black and white patients for the five high-risk and three low-risk procedures. Overall, national mortality trends improved for both black and white patients, by 0.10 percent per year and 0.07 percent per year, respectively, which significantly narrowed the black, white difference." That's what we looked at in the additional examples for lecture section, a multiple linear regression. "The reduction occurred primarily within hospitals, rather than between hospitals. Certain subsets of hospitals, such as small hospitals in the Midwest or West that were not minority-serving improved more than others. In spite of concerns that quality improvement efforts may widen disparities, these findings suggest that national racial disparities in surgical mortality are narrowing." So, one of the things they show in this, and I just pointed it out is because they're comparing trajectories of mortality for black and white patients in different cohorts across time, they actually do a nice job of showing a comparison of the characteristics of the black and white patients. In the first year they were looking at the data, which was 2005 and then last year which was 2014. So, this is a nice summary table, give you some sense of how the comparisons are similar or different over time. So, they go on to talk about their methods here and they say, "Hospital characteristics associated with improvements in mortality among black patients." That's just one of the outcomes they're interested in looking at. They say, "Among across all procedures, we had complete data on 2,769 hospitals treating black patients. We found that 43.6 percent had improving mortality rates for black patients, 26.3 percent had no substantial change, and 30.1 percent had worsening mortality rates. Baseline mean mortality rates." There's a little technical thing here, "After shrinkage estimations." Slightly beyond the scope of course, but they were comparing mean mortality rates, we can think of it that did not just differ substantially cost of three groups of hospitals. "In mulitivariate analysis," extensively the multiple logistical regression analysis, "We found that small and medium size hospitals were more likely than large hospitals to be in the group, whose mortality rates improved." They go on to list other associations that bore out of that multiple logistic regression analysis. But let's just look at the table they have of where they started talking about those results. So, they have a table here that says, "Hospital characteristics associated with improvement in mortality rates from 2005 to 2014 among black Medicare patients who had inpatient surgery." So, what they're doing in this slide is showing the results of multiple logistic regression. Looking at the outcome of the log odds, this is at the hospital level. Log odds of improvement in mortality over time for black patients. So, you looked at each hospital at its trajectory of mortality over time, we classified it is improving or not among the black patients, gave that a one or a zero for each hospital, and then they used hospital level data to relate this to. So, this is the natural log of course. I wrote L-O-G for the first time since starting the course but that is the natural log, so I will just be consistent and write log, natural log so we have intercept plus, we'd have the first predictor was hospital size, there's three categories, so we'd have two x's for that. And then, the next category is region and we'd have three more x's and three more slopes et cetera. So, all the predictors are categorical or binary and they're all characteristics of the hospital, what they present is to multiple adjusted odds ratio of each hospital characteristic as it relates to the odds of improvement in mortality for black patients so the reference group for the hospital size comparisons with large hospitals, small hospitals had three times the odds of larger hospitals of having improved mortality rates over the time period for black patients adjusting for regional differences profit status, teaching status, urban location et cetera. Those medium hospitals had also higher but not by a large amount compared to the large odds ratio 1.6. They do not include confidence intervals which would be nice, but they do include the p-values for each of these. I would have preferred to see confidence intervals for each of these odds ratios and just one overall p value for this multi categorical construct but can't have everything you want in an article. They don't present the unadjusted odds ratios per se but they do show the overall percentages of improvement in each of the three hospital types by size here and they additionally do this for other characteristics as well so, if we wanted to turn these percentages into odds and take the odds ratios of the small and the medium hospital, the odds ratios of improvement compared to the large or the unadjusted version, he could and then compare them to their multi regression estimates. The other thing they don't provide for these models is the exponentiated intercept so we can't take these and use them to predict whether a hospital would have an improvement in mortality given its characteristics. I'm guessing it's not particularly good at the predictive level because these characteristics are not that specific and furthermore, that's not really the point of what the researchers were doing here they just want it to look at overall what the associations were to see if they can find general characteristics. Finally let's look at one last article published in the American Journal of Public Health in 2015, looking at the subject of violence against women, the title of the article is 'violence against women in selected areas of the United States' the authors determine the prevalence of recent emotional, physical and sexual violence against women in their associations with great HIV related risk factors in women living in the United States. So they performed an assessment of women ages 18 to 44 years old with a history of unprotected sex and one or more personal or partner HIV risk factors in the past six months in the period 2009 to 2010. They use multi variable or multiple logistic regression analysis to examine the association of experiencing violence. So let's just read their method here it's nice if they tell us what they did which they had done in the previous two articles as well, they say, "We conducted bivariate analyses." So bivariate analysis is a synonym for simple analysis where we have one predictor, "We conducted bivariate analysis between each of the potential predictors they say between covariates and each type of violence outcome and several violence outcomes including emotional abuse, physical violence and sexual violence." They did this for each outcome one predictor at a time using logistic regression for each of these four violent experiences the bivariate relations, the simple models for which P was less than 0.01 so they used that as a threshold for including these as potential predictors in the multiple logistic regression analysis so if on its own, the predictor had a p-value less than 0.01 it was added to a list in which they would include it to start in a multiple logistic regression. In multiple logistic regression analyses associations with p-values less than 0.05 are considered statistically significant. Pairwise odds ratios with 95 percent confidence intervals were calculated to examine the association between types of violence. So, they do first in large table, I'm only showing a portion of it because they had multiple predictors of interest they are showing in this table too they say bivariate analysis for any type of violence or abuse in the past six months and they look at the outcome of emotional, physical, sexual or at least one of the three. These are the unadjusted associations between each of the predictors and these outcome so for example, if we look at this column this first thing looks at the association between the odds of experiencing emotional violence in the age of the women. So the reference group in each of these is the respective age group verses 18 to 26 year olds so 18 to 26 years old was the reference group and both these older groups had slightly lower odds, the older they got the lower the odds than the youngest group but these results were not statistically significant in the unadjusted version, nor were there statistically significant differences in race, those African-American and Hispanics vs the extensively non African-American women were whites or both had lower odds in the sample but these were not statistically significant and they do this and it just each of these unadjusted associations they go down a long list and there's more I'm only showing you part of a larger table and they do this for all four outcomes. In their next table which you could compare side-by-side with this other table they show the results from multiple logistic regression now I liked the fact that they included the table looks exactly the same in terms of its formatting, but when things were not included because they were not statistically significant either do not really need the inclusion criteria from the simple regression analyses or were not ultimately statistically significant in the multivariate model they still kept them in the table per se, but they put NI for not included so race was not a statistically significant predictor after adjustment for any of these four outcomes but they kept this placeholder in the table so that if you were to put this side-by-side with the unadjusted associations you could assess confounding. So something that jumps out here is that food insecurity is a strong predictor of positive predictor of having experienced violence even after accounting for other characteristics of these woman in all four types of violence, it was also a strong predictor when it was the only thing considered in the unadjusted analyses, the estimates and confidence intervals have attenuated a bit after adjustment for other characteristics including things that are likely related to food instability and violence like whether or not they have stable housing poor health status et cetera because those are all related economically, So these estimates attenuated a bit but they still are large in terms of increased odds after adjustment and they're still statistically significant. So anyway, these are just three examples of what I consider a good representation of some of the methods we've done and good explanations on the part of the authors on what they did to get the results and nice presentation of the results. We won't always be so lucky as to see such nice presentations.