The dataset from BioStatistics contains information on the survival times of patients after a kidney surgery.

i. Fit a Cox model that contains all covariates including the interactions of the categorical covariates.State the relative risk of

(a) 50 year old black males compared to 40 year old white females,
(b) 50 year old white males compared to 40 year old white females and
(c) 50 year old black females compared to 40 year old white females.

ii. The continuous covariates are now set for the model. Have a look { based on an appropriate measure of fit { using the partial likelihood whether or not the categorical covariates and their interactions should be included in the model and determine the optimal model.

iii. Is there a benefit in including the continuous covariates non-linearly in the model?

iv. Does it make sense to stratify for the categorical covariates?

Data Set details: Time to death or on-study time Death indicator (0=alive, 1=dead) Gender (1=male, 2=female) Race (1=white, 2=black) Age in years

library(dplyr)
library(readr)
biostat <- read_table2("F:/TUD lecture/SA Lecture/R-Programming/biostat.csv")
Parsed with column specification:
cols(
  S.NO. = col_double(),
  Time = col_double(),
  Event = col_double(),
  Gender = col_double(),
  Race = col_double(),
  Age = col_double()
)
kidney_stat<- select(biostat, Time, Event, Gender, Race, Age)
kidney_stat<- mutate(kidney_stat, Gender=ifelse(Gender==1,"male","female"),
                     Race=ifelse(Race==1,"white","black"))
kidney_stat

1. Cox-PH model

library(survival)
cox_model<- coxph(Surv(Time,Event) ~ Gender + Race + Gender:Race + Age, data = kidney_stat)
summary(cox_model)
Call:
coxph(formula = Surv(Time, Event) ~ Gender + Race + Gender:Race + 
    Age, data = kidney_stat)

  n= 863, number of events= 140 

                          coef exp(coef)  se(coef)      z Pr(>|z|)    
Gendermale           -0.489404  0.612991  0.378127 -1.294    0.196    
Racewhite            -0.453794  0.635213  0.312225 -1.453    0.146    
Age                   0.050672  1.051978  0.007199  7.039 1.94e-12 ***
Gendermale:Racewhite  0.586803  1.798230  0.427308  1.373    0.170    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                     exp(coef) exp(-coef) lower .95 upper .95
Gendermale              0.6130     1.6313    0.2921     1.286
Racewhite               0.6352     1.5743    0.3445     1.171
Age                     1.0520     0.9506    1.0372     1.067
Gendermale:Racewhite    1.7982     0.5561    0.7783     4.155

Concordance= 0.673  (se = 0.022 )
Likelihood ratio test= 58.94  on 4 df,   p=5e-12
Wald test            = 53.17  on 4 df,   p=8e-11
Score (logrank) test = 55.77  on 4 df,   p=2e-11
#Taking coefficients:
b.gender<- cox_model$coefficients[1]
b.race<- cox_model$coefficients[2]
b.age<- cox_model$coefficients[3]
b.interaction<- cox_model$coefficients[4]

#Risk/ Hazard values:
#50 aged black male
male.black<- exp(b.gender+0+(b.age*50)+0)

#50 aged white male
male.white<- exp(b.gender+b.race+(b.age*50)+b.interaction)

#50 aged black female
female.black<- exp(0+0+(b.age*50)+0)

#40 aged white female
female.white<- exp(0+b.race+(b.age*40)+0)

a<-male.black/female.white
#
#Another method using predict function:
#predict(cox_model, newdata = data.frame(Gender = "male", Race = "black",Age = 50), type = "risk") /predict(cox_model, newdata = data.frame(Gender = "female",Race = "white",Age= 40), type = "risk")
#
b<-male.white/female.white
c<-female.black/female.white  
print("50 year old black males compared to 40 year old white females")
[1] "50 year old black males compared to 40 year old white females"
print(a)
Gendermale 
  1.601777 
print("50 year old white males compared to 40 year old white females")
[1] "50 year old white males compared to 40 year old white females"
print(b)
Gendermale 
  1.829646 
print("50 year old black females compared to 40 year old white females")
[1] "50 year old black females compared to 40 year old white females"
print(c)
    Age 
2.61305 

Summary:

i. 50 year old black males has higher risk rate when compared to 40 year old white females.
ii. 50 year old white males has higher risk rate when  compared to 40 year old white females.
iii. 50 year old black females has very higher risk rate when compared to 40 year old white females

2. Model Selection:

#AIC:
aic<- function(cox_model){
  (-2 * cox_model$loglik[2])+(2*length(cox_model$coefficients))
}

m1<-coxph(Surv(Time,Event) ~ Gender + Race + Gender:Race + Age, data = kidney_stat )
aic(m1)
[1] 1707.403
m2<-coxph(Surv(Time,Event) ~ Gender + Race + Age, data = kidney_stat )
aic(m2)
[1] 1707.281
m3<-coxph(Surv(Time,Event) ~ Gender + Age, data = kidney_stat )
aic(m3)
[1] 1705.577
m4<-coxph(Surv(Time,Event) ~ Race + Age, data = kidney_stat )
aic(m4)
[1] 1705.304
m5<-coxph(Surv(Time,Event) ~ Age, data = kidney_stat )
aic(m5)
[1] 1703.601

Thus from the AIC value we can choose the model with just age as covariate and remove all the other covariates that are not necessary to get the best fit.

3. Nonlinear covariates in the model

library(ggplot2)
non_lin <- coxph(Surv(Time, Event) ~ pspline(Age, method = "aic"), data = kidney_stat)
aic(non_lin)
[1] 1734.785
             

According to the AIC it’s not advisable to convert the age into non-linear and add in the model, as it worsen the fit of the model. But, also by visually plotting the age as non-linearly and can check the model whether by adding the age the model fits the data perfectly or not.

4. Stratifying all the categorical covariates

s.race <- coxph(Surv(Time, Event) ~ Age + strata(Race), data = kidney_stat)
aic(s.race)
[1] 1562.95
s.gender <- coxph(Surv(Time, Event) ~ Age + strata(Gender), data = kidney_stat)
aic(s.gender)
[1] 1519.41
s.genrac <- coxph(Surv(Time, Event) ~ Age + strata(Race) + strata(Gender),
                  data = kidney_stat)
aic(s.genrac)
[1] 1379.705
s.ps.gender <- coxph(Surv(Time, Event) ~ pspline(Age, method = "aic") + strata(Gender),
            data = kidney_stat)
aic(s.ps.gender)
[1] 1550.619

Here AIC is drastically comes down as the model gets stratified and the best fit model is achieved through stratifying the Race and Gender covariates.

Conclusion: By stratification we could achieved a better fit that non-stratified model, even the non-linea covariates will give better fit when it’s stratified.

LS0tDQp0aXRsZTogIktpZG5leSBTdXJnZXJ5IEFuYWx5c2lzIg0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NClRoZSBkYXRhc2V0IGZyb20gQmlvU3RhdGlzdGljcyBjb250YWlucyBpbmZvcm1hdGlvbiBvbiB0aGUgc3Vydml2YWwgdGltZXMgb2YgcGF0aWVudHMgYWZ0ZXIgYSBraWRuZXkgc3VyZ2VyeS4NCg0KICAgIGkuIEZpdCBhIENveCBtb2RlbCB0aGF0IGNvbnRhaW5zIGFsbCBjb3ZhcmlhdGVzIGluY2x1ZGluZyB0aGUgaW50ZXJhY3Rpb25zIG9mIHRoZSBjYXRlZ29yaWNhbCBjb3ZhcmlhdGVzLlN0YXRlIHRoZSByZWxhdGl2ZSByaXNrIG9mDQogICAgDQogICAgKGEpIDUwIHllYXIgb2xkIGJsYWNrIG1hbGVzIGNvbXBhcmVkIHRvIDQwIHllYXIgb2xkIHdoaXRlIGZlbWFsZXMsDQogICAgKGIpIDUwIHllYXIgb2xkIHdoaXRlIG1hbGVzIGNvbXBhcmVkIHRvIDQwIHllYXIgb2xkIHdoaXRlIGZlbWFsZXMgYW5kDQogICAgKGMpIDUwIHllYXIgb2xkIGJsYWNrIGZlbWFsZXMgY29tcGFyZWQgdG8gNDAgeWVhciBvbGQgd2hpdGUgZmVtYWxlcy4NCiAgICANCiAgICBpaS4gVGhlIGNvbnRpbnVvdXMgY292YXJpYXRlcyBhcmUgbm93IHNldCBmb3IgdGhlIG1vZGVsLiBIYXZlIGEgbG9vayB7IGJhc2VkIG9uIGFuIGFwcHJvcHJpYXRlIG1lYXN1cmUgb2YgZml0IHsgdXNpbmcgdGhlIHBhcnRpYWwgbGlrZWxpaG9vZCB3aGV0aGVyIG9yIG5vdCB0aGUgY2F0ZWdvcmljYWwgY292YXJpYXRlcyBhbmQgdGhlaXIgaW50ZXJhY3Rpb25zIHNob3VsZCBiZSBpbmNsdWRlZCBpbiB0aGUgbW9kZWwgYW5kIGRldGVybWluZSB0aGUgb3B0aW1hbCBtb2RlbC4NCg0KICAgIGlpaS4gSXMgdGhlcmUgYSBiZW5lZml0IGluIGluY2x1ZGluZyB0aGUgY29udGludW91cyBjb3ZhcmlhdGVzIG5vbi1saW5lYXJseSBpbiB0aGUgbW9kZWw/DQoNCiAgICBpdi4gRG9lcyBpdCBtYWtlIHNlbnNlIHRvIHN0cmF0aWZ5IGZvciB0aGUgY2F0ZWdvcmljYWwgY292YXJpYXRlcz8NCiAgICANCkRhdGEgU2V0IGRldGFpbHM6DQpUaW1lIHRvIGRlYXRoIG9yIG9uLXN0dWR5IHRpbWUgDQpEZWF0aCBpbmRpY2F0b3IgKDA9YWxpdmUsIDE9ZGVhZCkgDQpHZW5kZXIgKDE9bWFsZSwgMj1mZW1hbGUpIA0KUmFjZSAoMT13aGl0ZSwgMj1ibGFjaykgDQpBZ2UgaW4geWVhcnMgDQpgYGB7cn0NCmxpYnJhcnkoZHBseXIpDQpsaWJyYXJ5KHJlYWRyKQ0KYmlvc3RhdCA8LSByZWFkX3RhYmxlMigiRjovVFVEIGxlY3R1cmUvU0EgTGVjdHVyZS9SLVByb2dyYW1taW5nL2Jpb3N0YXQuY3N2IikNCg0Ka2lkbmV5X3N0YXQ8LSBzZWxlY3QoYmlvc3RhdCwgVGltZSwgRXZlbnQsIEdlbmRlciwgUmFjZSwgQWdlKQ0Ka2lkbmV5X3N0YXQ8LSBtdXRhdGUoa2lkbmV5X3N0YXQsIEdlbmRlcj1pZmVsc2UoR2VuZGVyPT0xLCJtYWxlIiwiZmVtYWxlIiksDQogICAgICAgICAgICAgICAgICAgICBSYWNlPWlmZWxzZShSYWNlPT0xLCJ3aGl0ZSIsImJsYWNrIikpDQpraWRuZXlfc3RhdA0KYGBgDQojIyMgMS4gQ294LVBIIG1vZGVsDQpgYGB7cn0NCmxpYnJhcnkoc3Vydml2YWwpDQpjb3hfbW9kZWw8LSBjb3hwaChTdXJ2KFRpbWUsRXZlbnQpIH4gR2VuZGVyICsgUmFjZSArIEdlbmRlcjpSYWNlICsgQWdlLCBkYXRhID0ga2lkbmV5X3N0YXQpDQpzdW1tYXJ5KGNveF9tb2RlbCkNCmBgYA0KYGBge3J9DQojVGFraW5nIGNvZWZmaWNpZW50czoNCmIuZ2VuZGVyPC0gY294X21vZGVsJGNvZWZmaWNpZW50c1sxXQ0KYi5yYWNlPC0gY294X21vZGVsJGNvZWZmaWNpZW50c1syXQ0KYi5hZ2U8LSBjb3hfbW9kZWwkY29lZmZpY2llbnRzWzNdDQpiLmludGVyYWN0aW9uPC0gY294X21vZGVsJGNvZWZmaWNpZW50c1s0XQ0KDQojUmlzay8gSGF6YXJkIHZhbHVlczoNCiM1MCBhZ2VkIGJsYWNrIG1hbGUNCm1hbGUuYmxhY2s8LSBleHAoYi5nZW5kZXIrMCsoYi5hZ2UqNTApKzApDQoNCiM1MCBhZ2VkIHdoaXRlIG1hbGUNCm1hbGUud2hpdGU8LSBleHAoYi5nZW5kZXIrYi5yYWNlKyhiLmFnZSo1MCkrYi5pbnRlcmFjdGlvbikNCg0KIzUwIGFnZWQgYmxhY2sgZmVtYWxlDQpmZW1hbGUuYmxhY2s8LSBleHAoMCswKyhiLmFnZSo1MCkrMCkNCg0KIzQwIGFnZWQgd2hpdGUgZmVtYWxlDQpmZW1hbGUud2hpdGU8LSBleHAoMCtiLnJhY2UrKGIuYWdlKjQwKSswKQ0KDQphPC1tYWxlLmJsYWNrL2ZlbWFsZS53aGl0ZQ0KIw0KI0Fub3RoZXIgbWV0aG9kIHVzaW5nIHByZWRpY3QgZnVuY3Rpb246DQojcHJlZGljdChjb3hfbW9kZWwsIG5ld2RhdGEgPSBkYXRhLmZyYW1lKEdlbmRlciA9ICJtYWxlIiwgUmFjZSA9ICJibGFjayIsQWdlID0gNTApLCB0eXBlID0gInJpc2siKSAvcHJlZGljdChjb3hfbW9kZWwsIG5ld2RhdGEgPSBkYXRhLmZyYW1lKEdlbmRlciA9ICJmZW1hbGUiLFJhY2UgPSAid2hpdGUiLEFnZT0gNDApLCB0eXBlID0gInJpc2siKQ0KIw0KYjwtbWFsZS53aGl0ZS9mZW1hbGUud2hpdGUNCmM8LWZlbWFsZS5ibGFjay9mZW1hbGUud2hpdGUgIA0KcHJpbnQoIjUwIHllYXIgb2xkIGJsYWNrIG1hbGVzIGNvbXBhcmVkIHRvIDQwIHllYXIgb2xkIHdoaXRlIGZlbWFsZXMiKQ0KcHJpbnQoYSkNCnByaW50KCI1MCB5ZWFyIG9sZCB3aGl0ZSBtYWxlcyBjb21wYXJlZCB0byA0MCB5ZWFyIG9sZCB3aGl0ZSBmZW1hbGVzIikNCnByaW50KGIpDQpwcmludCgiNTAgeWVhciBvbGQgYmxhY2sgZmVtYWxlcyBjb21wYXJlZCB0byA0MCB5ZWFyIG9sZCB3aGl0ZSBmZW1hbGVzIikNCnByaW50KGMpDQoNCmBgYA0KU3VtbWFyeToNCiAgICANCiAgICBpLiA1MCB5ZWFyIG9sZCBibGFjayBtYWxlcyBoYXMgaGlnaGVyIHJpc2sgcmF0ZSB3aGVuIGNvbXBhcmVkIHRvIDQwIHllYXIgb2xkIHdoaXRlIGZlbWFsZXMuDQogICAgaWkuIDUwIHllYXIgb2xkIHdoaXRlIG1hbGVzIGhhcyBoaWdoZXIgcmlzayByYXRlIHdoZW4gIGNvbXBhcmVkIHRvIDQwIHllYXIgb2xkIHdoaXRlIGZlbWFsZXMuDQogICAgaWlpLiA1MCB5ZWFyIG9sZCBibGFjayBmZW1hbGVzIGhhcyB2ZXJ5IGhpZ2hlciByaXNrIHJhdGUgd2hlbiBjb21wYXJlZCB0byA0MCB5ZWFyIG9sZCB3aGl0ZSBmZW1hbGVzDQoNCiMjIyAyLiBNb2RlbCBTZWxlY3Rpb246DQpgYGB7cn0NCiNBSUM6DQphaWM8LSBmdW5jdGlvbihjb3hfbW9kZWwpew0KICAoLTIgKiBjb3hfbW9kZWwkbG9nbGlrWzJdKSsoMipsZW5ndGgoY294X21vZGVsJGNvZWZmaWNpZW50cykpDQp9DQoNCm0xPC1jb3hwaChTdXJ2KFRpbWUsRXZlbnQpIH4gR2VuZGVyICsgUmFjZSArIEdlbmRlcjpSYWNlICsgQWdlLCBkYXRhID0ga2lkbmV5X3N0YXQgKQ0KYWljKG0xKQ0KbTI8LWNveHBoKFN1cnYoVGltZSxFdmVudCkgfiBHZW5kZXIgKyBSYWNlICsgQWdlLCBkYXRhID0ga2lkbmV5X3N0YXQgKQ0KYWljKG0yKQ0KbTM8LWNveHBoKFN1cnYoVGltZSxFdmVudCkgfiBHZW5kZXIgKyBBZ2UsIGRhdGEgPSBraWRuZXlfc3RhdCApDQphaWMobTMpDQptNDwtY294cGgoU3VydihUaW1lLEV2ZW50KSB+IFJhY2UgKyBBZ2UsIGRhdGEgPSBraWRuZXlfc3RhdCApDQphaWMobTQpDQptNTwtY294cGgoU3VydihUaW1lLEV2ZW50KSB+IEFnZSwgZGF0YSA9IGtpZG5leV9zdGF0ICkNCmFpYyhtNSkNCmBgYA0KVGh1cyBmcm9tIHRoZSBBSUMgdmFsdWUgd2UgY2FuIGNob29zZSB0aGUgbW9kZWwgd2l0aCBqdXN0IGFnZSBhcyBjb3ZhcmlhdGUgYW5kIHJlbW92ZSBhbGwgdGhlIG90aGVyIGNvdmFyaWF0ZXMgdGhhdCBhcmUgbm90IG5lY2Vzc2FyeSB0byBnZXQgdGhlIGJlc3QgZml0Lg0KDQojIyMgMy4gTm9ubGluZWFyIGNvdmFyaWF0ZXMgaW4gdGhlIG1vZGVsDQpgYGB7cn0NCmxpYnJhcnkoZ2dwbG90MikNCm5vbl9saW4gPC0gY294cGgoU3VydihUaW1lLCBFdmVudCkgfiBwc3BsaW5lKEFnZSwgbWV0aG9kID0gImFpYyIpLCBkYXRhID0ga2lkbmV5X3N0YXQpDQphaWMobm9uX2xpbikNCiAgICAgICAgICAgICANCmBgYA0KQWNjb3JkaW5nIHRvIHRoZSBBSUMgaXQncyBub3QgYWR2aXNhYmxlIHRvIGNvbnZlcnQgdGhlIGFnZSBpbnRvIG5vbi1saW5lYXIgYW5kIGFkZCBpbiB0aGUgbW9kZWwsIGFzIGl0IHdvcnNlbiB0aGUgZml0IG9mIHRoZSBtb2RlbC4gQnV0LCBhbHNvIGJ5IHZpc3VhbGx5IHBsb3R0aW5nIHRoZSBhZ2UgYXMgbm9uLWxpbmVhcmx5IGFuZCBjYW4gY2hlY2sgdGhlIG1vZGVsIHdoZXRoZXIgYnkgYWRkaW5nIHRoZSBhZ2UgdGhlIG1vZGVsIGZpdHMgdGhlIGRhdGEgcGVyZmVjdGx5IG9yIG5vdC4NCg0KIyMjIDQuIFN0cmF0aWZ5aW5nIGFsbCB0aGUgY2F0ZWdvcmljYWwgY292YXJpYXRlcw0KYGBge3J9DQpzLnJhY2UgPC0gY294cGgoU3VydihUaW1lLCBFdmVudCkgfiBBZ2UgKyBzdHJhdGEoUmFjZSksIGRhdGEgPSBraWRuZXlfc3RhdCkNCmFpYyhzLnJhY2UpDQpzLmdlbmRlciA8LSBjb3hwaChTdXJ2KFRpbWUsIEV2ZW50KSB+IEFnZSArIHN0cmF0YShHZW5kZXIpLCBkYXRhID0ga2lkbmV5X3N0YXQpDQphaWMocy5nZW5kZXIpDQpzLmdlbnJhYyA8LSBjb3hwaChTdXJ2KFRpbWUsIEV2ZW50KSB+IEFnZSArIHN0cmF0YShSYWNlKSArIHN0cmF0YShHZW5kZXIpLA0KICAgICAgICAgICAgICAgICAgZGF0YSA9IGtpZG5leV9zdGF0KQ0KYWljKHMuZ2VucmFjKQ0Kcy5wcy5nZW5kZXIgPC0gY294cGgoU3VydihUaW1lLCBFdmVudCkgfiBwc3BsaW5lKEFnZSwgbWV0aG9kID0gImFpYyIpICsgc3RyYXRhKEdlbmRlciksDQogICAgICAgICAgICBkYXRhID0ga2lkbmV5X3N0YXQpDQphaWMocy5wcy5nZW5kZXIpDQpgYGANCkhlcmUgQUlDIGlzIGRyYXN0aWNhbGx5IGNvbWVzIGRvd24gYXMgdGhlIG1vZGVsIGdldHMgc3RyYXRpZmllZCBhbmQgdGhlIGJlc3QgZml0IG1vZGVsIGlzIGFjaGlldmVkIHRocm91Z2ggc3RyYXRpZnlpbmcgdGhlIFJhY2UgYW5kIEdlbmRlciBjb3ZhcmlhdGVzLg0KDQpDb25jbHVzaW9uOg0KQnkgc3RyYXRpZmljYXRpb24gd2UgY291bGQgYWNoaWV2ZWQgYSBiZXR0ZXIgZml0IHRoYXQgbm9uLXN0cmF0aWZpZWQgbW9kZWwsIGV2ZW4gdGhlIG5vbi1saW5lYSBjb3ZhcmlhdGVzIHdpbGwgZ2l2ZSBiZXR0ZXIgZml0IHdoZW4gaXQncyBzdHJhdGlmaWVkLg0K