Problem:
Hint: Results from the Cox-PH with constant baseline are equivalent to results of an Exponential-AFT model. W.K.T Beta_cox = - Beta_AFT (if scale(sigma/ dispersion parameter)=1)
Data set details:
horTh-hormonal therapy, a factor at two levels no and yes. age-of the patients in years. menostat-menopausal status, a factor at two levels pre (premenopausal) and post (postmenopausal). tsize-tumor size (in mm). tgrade-tumor grade, a ordered factor at levels I < II < III. pnodes-number of positive nodes. progrec-progesterone receptor (in fmol). estrec-estrogen receptor (in fmol). time-recurrence free survival time (in days). cens-censoring indicator (0- censored, 1- event).
library(dplyr)
data(GBSG2,package="TH.data")
head(GBSG2)
glm_model<- glm(cens~1, family = poisson(link = "log"), offset = log(time), data = GBSG2)
summary(glm_model)
Call:
glm(formula = cens ~ 1, family = poisson(link = "log"), data = GBSG2,
offset = log(time))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4357 -1.1071 -0.6851 1.0219 2.2833
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.85552 0.05783 -135.8 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 822.49 on 685 degrees of freedom
Residual deviance: 822.49 on 685 degrees of freedom
AIC: 1422.5
Number of Fisher Scoring iterations: 6
lambda_GLM<- exp(glm_model$coefficients)
print(paste("GLM Lambda value:",lambda_GLM))
[1] "GLM Lambda value: 0.000387606948405752"
library(survival)
aft_model<- survreg(Surv(time,cens)~1, dist = "exponential", data = GBSG2)
summary(aft_model)
Call:
survreg(formula = Surv(time, cens) ~ 1, data = GBSG2, dist = "exponential")
Value Std. Error z p
(Intercept) 7.8555 0.0578 136 <2e-16
Scale fixed at 1
Exponential distribution
Loglik(model)= -2647.8 Loglik(intercept only)= -2647.8
Number of Newton-Raphson Iterations: 4
n= 686
lambda_COX<- exp(-aft_model$coefficients)
print(paste("COX Lambda value:",lambda_COX))
[1] "COX Lambda value: 0.000387606948465559"
Thus both the Beta coefficient of Exponential Cox model and the GLM model is same.
Comparing again both by adding addition covariates
glm_model<- glm(cens~1 + horTh + tsize + progrec + estrec + menostat,
family = poisson(link = "log"), offset = log(time), data = GBSG2)
summary(glm_model)
Call:
glm(formula = cens ~ 1 + horTh + tsize + progrec + estrec + menostat,
family = poisson(link = "log"), data = GBSG2, offset = log(time))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8227 -1.0033 -0.5240 0.9523 2.4663
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.043e+00 1.563e-01 -51.462 < 2e-16 ***
horThyes -3.679e-01 1.280e-01 -2.876 0.00403 **
tsize 1.524e-02 3.596e-03 4.239 2.24e-05 ***
progrec -2.585e-03 5.792e-04 -4.463 8.10e-06 ***
estrec 7.556e-05 4.714e-04 0.160 0.87264
menostatPost 1.554e-01 1.247e-01 1.246 0.21263
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 822.49 on 685 degrees of freedom
Residual deviance: 765.96 on 680 degrees of freedom
AIC: 1376
Number of Fisher Scoring iterations: 6
lambda_GLM<- exp(glm_model$coefficients)
print(lambda_GLM)
(Intercept) horThyes tsize progrec estrec menostatPost
0.0003214781 0.6921560440 1.0153597609 0.9974187336 1.0000755624 1.1681472281
aft_model<- survreg(Surv(time,cens) ~ 1 + horTh + tsize + progrec + estrec + menostat,
dist = "exponential", data = GBSG2)
summary(aft_model)
Call:
survreg(formula = Surv(time, cens) ~ 1 + horTh + tsize + progrec +
estrec + menostat, data = GBSG2, dist = "exponential")
Value Std. Error z p
(Intercept) 8.04e+00 1.56e-01 51.46 < 2e-16
horThyes 3.68e-01 1.28e-01 2.88 0.004
tsize -1.52e-02 3.60e-03 -4.24 2.2e-05
progrec 2.58e-03 5.79e-04 4.46 8.1e-06
estrec -7.56e-05 4.71e-04 -0.16 0.873
menostatPost -1.55e-01 1.25e-01 -1.25 0.213
Scale fixed at 1
Exponential distribution
Loglik(model)= -2619.5 Loglik(intercept only)= -2647.8
Chisq= 56.53 on 5 degrees of freedom, p= 6.3e-11
Number of Newton-Raphson Iterations: 5
n= 686
lambda_COX<- exp(-aft_model$coefficients)
print(lambda_COX)
(Intercept) horThyes tsize progrec estrec menostatPost
0.0003214781 0.6921560440 1.0153597609 0.9974187336 1.0000755624 1.1681472281
Thus both the Beta coefficient of Exponential Cox model and the GLM model is same with covariates and without covariates. Further analyzing, statistically none of the covariates increasing the risk rate as all the coefficients all almost close.
library(survminer)
fit<-coxph(Surv(time,cens) ~ horTh + tsize + progrec + estrec + menostat+age,data = GBSG2)
cox.zph(fit)
chisq df p
horTh 0.300 1 0.5838
tsize 0.873 1 0.3501
progrec 5.938 1 0.0148
estrec 5.867 1 0.0154
menostat 5.465 1 0.0194
age 9.899 1 0.0017
GLOBAL 16.814 6 0.0100
#Proportional Hazard assumption - Schoenfeld residuals
ggcoxzph(cox.zph(fit))
# Testing influential observation - Deviance and dfbets residuals
ggcoxdiagnostics(fit, type= "dfbeta", ggtheme= theme_bw(), linear.predictions = F)
ggcoxdiagnostics(fit, type= "deviance", ggtheme= theme_bw(), linear.predictions = F)
#Testing of non-linearity - Martingale residuals
ggcoxfunctional(Surv(time, cens) ~ age+ log(age)+ sqrt(age), data = GBSG2)
ggcoxfunctional(Surv(time, cens) ~ time+ log(time)+ sqrt(time), data = GBSG2)
ggcoxfunctional(Surv(time, cens) ~ tsize+ log(tsize)+ sqrt(tsize), data = GBSG2)
Conclusion: 1. Proportional Hazard assumption: This model fails the proportional test if we include covariates like estrec, menostat and progrec that has p-value < 0.05, thus we must ignore these covariates and include only tsize and horTh 2. Influential Observation: We could that its is centered around 0 and few outliers are detected 3. Non-linearity testing: Non- linearity of Age and Time is tested and we could see non-linearity in the continuous variable