The dataset from BioStatistics contains information on the survival times of patients after a kidney surgery.
i. Fit a Cox model that contains all covariates including the interactions of the categorical covariates.State the relative risk of
(a) 50 year old black males compared to 40 year old white females,
(b) 50 year old white males compared to 40 year old white females and
(c) 50 year old black females compared to 40 year old white females.
ii. The continuous covariates are now set for the model. Have a look { based on an appropriate measure of fit { using the partial likelihood whether or not the categorical covariates and their interactions should be included in the model and determine the optimal model.
iii. Is there a benefit in including the continuous covariates non-linearly in the model?
iv. Does it make sense to stratify for the categorical covariates?
Data Set details: Time to death or on-study time Death indicator (0=alive, 1=dead) Gender (1=male, 2=female) Race (1=white, 2=black) Age in years
library(dplyr)
library(readr)
biostat <- read_table2("F:/TUD lecture/SA Lecture/R-Programming/biostat.csv")
Parsed with column specification:
cols(
S.NO. = col_double(),
Time = col_double(),
Event = col_double(),
Gender = col_double(),
Race = col_double(),
Age = col_double()
)
kidney_stat<- select(biostat, Time, Event, Gender, Race, Age)
kidney_stat<- mutate(kidney_stat, Gender=ifelse(Gender==1,"male","female"),
Race=ifelse(Race==1,"white","black"))
kidney_stat
library(survival)
cox_model<- coxph(Surv(Time,Event) ~ Gender + Race + Gender:Race + Age, data = kidney_stat)
summary(cox_model)
Call:
coxph(formula = Surv(Time, Event) ~ Gender + Race + Gender:Race +
Age, data = kidney_stat)
n= 863, number of events= 140
coef exp(coef) se(coef) z Pr(>|z|)
Gendermale -0.489404 0.612991 0.378127 -1.294 0.196
Racewhite -0.453794 0.635213 0.312225 -1.453 0.146
Age 0.050672 1.051978 0.007199 7.039 1.94e-12 ***
Gendermale:Racewhite 0.586803 1.798230 0.427308 1.373 0.170
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
Gendermale 0.6130 1.6313 0.2921 1.286
Racewhite 0.6352 1.5743 0.3445 1.171
Age 1.0520 0.9506 1.0372 1.067
Gendermale:Racewhite 1.7982 0.5561 0.7783 4.155
Concordance= 0.673 (se = 0.022 )
Likelihood ratio test= 58.94 on 4 df, p=5e-12
Wald test = 53.17 on 4 df, p=8e-11
Score (logrank) test = 55.77 on 4 df, p=2e-11
#Taking coefficients:
b.gender<- cox_model$coefficients[1]
b.race<- cox_model$coefficients[2]
b.age<- cox_model$coefficients[3]
b.interaction<- cox_model$coefficients[4]
#Risk/ Hazard values:
#50 aged black male
male.black<- exp(b.gender+0+(b.age*50)+0)
#50 aged white male
male.white<- exp(b.gender+b.race+(b.age*50)+b.interaction)
#50 aged black female
female.black<- exp(0+0+(b.age*50)+0)
#40 aged white female
female.white<- exp(0+b.race+(b.age*40)+0)
a<-male.black/female.white
#
#Another method using predict function:
#predict(cox_model, newdata = data.frame(Gender = "male", Race = "black",Age = 50), type = "risk") /predict(cox_model, newdata = data.frame(Gender = "female",Race = "white",Age= 40), type = "risk")
#
b<-male.white/female.white
c<-female.black/female.white
print("50 year old black males compared to 40 year old white females")
[1] "50 year old black males compared to 40 year old white females"
print(a)
Gendermale
1.601777
print("50 year old white males compared to 40 year old white females")
[1] "50 year old white males compared to 40 year old white females"
print(b)
Gendermale
1.829646
print("50 year old black females compared to 40 year old white females")
[1] "50 year old black females compared to 40 year old white females"
print(c)
Age
2.61305
Summary:
i. 50 year old black males has higher risk rate when compared to 40 year old white females.
ii. 50 year old white males has higher risk rate when compared to 40 year old white females.
iii. 50 year old black females has very higher risk rate when compared to 40 year old white females
#AIC:
aic<- function(cox_model){
(-2 * cox_model$loglik[2])+(2*length(cox_model$coefficients))
}
m1<-coxph(Surv(Time,Event) ~ Gender + Race + Gender:Race + Age, data = kidney_stat )
aic(m1)
[1] 1707.403
m2<-coxph(Surv(Time,Event) ~ Gender + Race + Age, data = kidney_stat )
aic(m2)
[1] 1707.281
m3<-coxph(Surv(Time,Event) ~ Gender + Age, data = kidney_stat )
aic(m3)
[1] 1705.577
m4<-coxph(Surv(Time,Event) ~ Race + Age, data = kidney_stat )
aic(m4)
[1] 1705.304
m5<-coxph(Surv(Time,Event) ~ Age, data = kidney_stat )
aic(m5)
[1] 1703.601
Thus from the AIC value we can choose the model with just age as covariate and remove all the other covariates that are not necessary to get the best fit.
library(ggplot2)
non_lin <- coxph(Surv(Time, Event) ~ pspline(Age, method = "aic"), data = kidney_stat)
aic(non_lin)
[1] 1734.785
According to the AIC it’s not advisable to convert the age into non-linear and add in the model, as it worsen the fit of the model. But, also by visually plotting the age as non-linearly and can check the model whether by adding the age the model fits the data perfectly or not.
s.race <- coxph(Surv(Time, Event) ~ Age + strata(Race), data = kidney_stat)
aic(s.race)
[1] 1562.95
s.gender <- coxph(Surv(Time, Event) ~ Age + strata(Gender), data = kidney_stat)
aic(s.gender)
[1] 1519.41
s.genrac <- coxph(Surv(Time, Event) ~ Age + strata(Race) + strata(Gender),
data = kidney_stat)
aic(s.genrac)
[1] 1379.705
s.ps.gender <- coxph(Surv(Time, Event) ~ pspline(Age, method = "aic") + strata(Gender),
data = kidney_stat)
aic(s.ps.gender)
[1] 1550.619
Here AIC is drastically comes down as the model gets stratified and the best fit model is achieved through stratifying the Race and Gender covariates.
Conclusion: By stratification we could achieved a better fit that non-stratified model, even the non-linea covariates will give better fit when it’s stratified.