Problem:

  1. Estimate the Cox model with constant baseline hazard rate for the (German Breast Cancer Study Group 2)- GBSG2 from package “TH.data”" using the function glm() and compare the results to the ones from the function survreg().

Hint: Results from the Cox-PH with constant baseline are equivalent to results of an Exponential-AFT model. W.K.T Beta_cox = - Beta_AFT (if scale(sigma/ dispersion parameter)=1)

  1. Model diagnosis of Cox Ph model using residuals

Data set details:

horTh-hormonal therapy, a factor at two levels no and yes. age-of the patients in years. menostat-menopausal status, a factor at two levels pre (premenopausal) and post (postmenopausal). tsize-tumor size (in mm). tgrade-tumor grade, a ordered factor at levels I < II < III. pnodes-number of positive nodes. progrec-progesterone receptor (in fmol). estrec-estrogen receptor (in fmol). time-recurrence free survival time (in days). cens-censoring indicator (0- censored, 1- event).

library(dplyr)
data(GBSG2,package="TH.data")
head(GBSG2)

1. GLM model (log-linear poisson model)

glm_model<- glm(cens~1, family = poisson(link = "log"), offset = log(time), data = GBSG2)
summary(glm_model)

Call:
glm(formula = cens ~ 1, family = poisson(link = "log"), data = GBSG2, 
    offset = log(time))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.4357  -1.1071  -0.6851   1.0219   2.2833  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -7.85552    0.05783  -135.8   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 822.49  on 685  degrees of freedom
Residual deviance: 822.49  on 685  degrees of freedom
AIC: 1422.5

Number of Fisher Scoring iterations: 6
lambda_GLM<- exp(glm_model$coefficients)
print(paste("GLM Lambda value:",lambda_GLM))
[1] "GLM Lambda value: 0.000387606948405752"

COX Model (Exponential)

library(survival)
aft_model<- survreg(Surv(time,cens)~1, dist = "exponential", data = GBSG2)
summary(aft_model)

Call:
survreg(formula = Surv(time, cens) ~ 1, data = GBSG2, dist = "exponential")
             Value Std. Error   z      p
(Intercept) 7.8555     0.0578 136 <2e-16

Scale fixed at 1 

Exponential distribution
Loglik(model)= -2647.8   Loglik(intercept only)= -2647.8
Number of Newton-Raphson Iterations: 4 
n= 686 
lambda_COX<- exp(-aft_model$coefficients)
print(paste("COX Lambda value:",lambda_COX))
[1] "COX Lambda value: 0.000387606948465559"

Thus both the Beta coefficient of Exponential Cox model and the GLM model is same.

Comparing again both by adding addition covariates

glm_model<- glm(cens~1 + horTh + tsize + progrec + estrec + menostat, 
                family = poisson(link = "log"), offset = log(time), data = GBSG2)
summary(glm_model)

Call:
glm(formula = cens ~ 1 + horTh + tsize + progrec + estrec + menostat, 
    family = poisson(link = "log"), data = GBSG2, offset = log(time))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.8227  -1.0033  -0.5240   0.9523   2.4663  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -8.043e+00  1.563e-01 -51.462  < 2e-16 ***
horThyes     -3.679e-01  1.280e-01  -2.876  0.00403 ** 
tsize         1.524e-02  3.596e-03   4.239 2.24e-05 ***
progrec      -2.585e-03  5.792e-04  -4.463 8.10e-06 ***
estrec        7.556e-05  4.714e-04   0.160  0.87264    
menostatPost  1.554e-01  1.247e-01   1.246  0.21263    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 822.49  on 685  degrees of freedom
Residual deviance: 765.96  on 680  degrees of freedom
AIC: 1376

Number of Fisher Scoring iterations: 6
lambda_GLM<- exp(glm_model$coefficients)
print(lambda_GLM)
 (Intercept)     horThyes        tsize      progrec       estrec menostatPost 
0.0003214781 0.6921560440 1.0153597609 0.9974187336 1.0000755624 1.1681472281 
aft_model<- survreg(Surv(time,cens) ~ 1 + horTh + tsize + progrec + estrec + menostat, 
                    dist = "exponential", data = GBSG2)
summary(aft_model)

Call:
survreg(formula = Surv(time, cens) ~ 1 + horTh + tsize + progrec + 
    estrec + menostat, data = GBSG2, dist = "exponential")
                 Value Std. Error     z       p
(Intercept)   8.04e+00   1.56e-01 51.46 < 2e-16
horThyes      3.68e-01   1.28e-01  2.88   0.004
tsize        -1.52e-02   3.60e-03 -4.24 2.2e-05
progrec       2.58e-03   5.79e-04  4.46 8.1e-06
estrec       -7.56e-05   4.71e-04 -0.16   0.873
menostatPost -1.55e-01   1.25e-01 -1.25   0.213

Scale fixed at 1 

Exponential distribution
Loglik(model)= -2619.5   Loglik(intercept only)= -2647.8
    Chisq= 56.53 on 5 degrees of freedom, p= 6.3e-11 
Number of Newton-Raphson Iterations: 5 
n= 686 
lambda_COX<- exp(-aft_model$coefficients)
print(lambda_COX)
 (Intercept)     horThyes        tsize      progrec       estrec menostatPost 
0.0003214781 0.6921560440 1.0153597609 0.9974187336 1.0000755624 1.1681472281 

Thus both the Beta coefficient of Exponential Cox model and the GLM model is same with covariates and without covariates. Further analyzing, statistically none of the covariates increasing the risk rate as all the coefficients all almost close.

2. overall COX-PH model diagnosis

library(survminer)
fit<-coxph(Surv(time,cens) ~ horTh + tsize + progrec + estrec + menostat+age,data = GBSG2)
cox.zph(fit)
          chisq df      p
horTh     0.300  1 0.5838
tsize     0.873  1 0.3501
progrec   5.938  1 0.0148
estrec    5.867  1 0.0154
menostat  5.465  1 0.0194
age       9.899  1 0.0017
GLOBAL   16.814  6 0.0100
#Proportional Hazard assumption - Schoenfeld residuals
ggcoxzph(cox.zph(fit))


# Testing influential observation - Deviance and dfbets residuals
ggcoxdiagnostics(fit, type= "dfbeta", ggtheme= theme_bw(), linear.predictions = F)

ggcoxdiagnostics(fit, type= "deviance", ggtheme= theme_bw(), linear.predictions = F)


#Testing of non-linearity - Martingale residuals
ggcoxfunctional(Surv(time, cens) ~ age+ log(age)+ sqrt(age), data = GBSG2)

ggcoxfunctional(Surv(time, cens) ~ time+ log(time)+ sqrt(time), data = GBSG2)

ggcoxfunctional(Surv(time, cens) ~ tsize+ log(tsize)+ sqrt(tsize), data = GBSG2)

Conclusion: 1. Proportional Hazard assumption: This model fails the proportional test if we include covariates like estrec, menostat and progrec that has p-value < 0.05, thus we must ignore these covariates and include only tsize and horTh 2. Influential Observation: We could that its is centered around 0 and few outliers are detected 3. Non-linearity testing: Non- linearity of Age and Time is tested and we could see non-linearity in the continuous variable

LS0tDQp0aXRsZTogIkdMTSBtb2RlbCB2cyBDb3giDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KUHJvYmxlbToNCg0KMS4gRXN0aW1hdGUgdGhlIENveCBtb2RlbCB3aXRoIGNvbnN0YW50IGJhc2VsaW5lIGhhemFyZCByYXRlIGZvciB0aGUgKEdlcm1hbiBCcmVhc3QgQ2FuY2VyIFN0dWR5IEdyb3VwIDIpLSBHQlNHMiBmcm9tIHBhY2thZ2UgIlRILmRhdGEiIiB1c2luZyB0aGUgZnVuY3Rpb24gZ2xtKCkgYW5kIGNvbXBhcmUgdGhlIHJlc3VsdHMgdG8gdGhlIG9uZXMgZnJvbSB0aGUgZnVuY3Rpb24gc3VydnJlZygpLg0KDQpIaW50OiBSZXN1bHRzIGZyb20gdGhlIENveC1QSCB3aXRoIGNvbnN0YW50IGJhc2VsaW5lIGFyZSBlcXVpdmFsZW50IHRvIHJlc3VsdHMgb2YgYW4NCkV4cG9uZW50aWFsLUFGVCBtb2RlbC4gVy5LLlQgQmV0YV9jb3ggPSAtIEJldGFfQUZUIChpZiBzY2FsZShzaWdtYS8gZGlzcGVyc2lvbiBwYXJhbWV0ZXIpPTEpDQoNCjIuIE1vZGVsIGRpYWdub3NpcyBvZiBDb3ggUGggbW9kZWwgdXNpbmcgcmVzaWR1YWxzIA0KDQpEYXRhIHNldCBkZXRhaWxzOg0KDQpob3JUaC1ob3Jtb25hbCB0aGVyYXB5LCBhIGZhY3RvciBhdCB0d28gbGV2ZWxzIG5vIGFuZCB5ZXMuDQphZ2Utb2YgdGhlIHBhdGllbnRzIGluIHllYXJzLg0KbWVub3N0YXQtbWVub3BhdXNhbCBzdGF0dXMsIGEgZmFjdG9yIGF0IHR3byBsZXZlbHMgcHJlIChwcmVtZW5vcGF1c2FsKSBhbmQgcG9zdCAocG9zdG1lbm9wYXVzYWwpLg0KdHNpemUtdHVtb3Igc2l6ZSAoaW4gbW0pLg0KdGdyYWRlLXR1bW9yIGdyYWRlLCBhIG9yZGVyZWQgZmFjdG9yIGF0IGxldmVscyBJIDwgSUkgPCBJSUkuDQpwbm9kZXMtbnVtYmVyIG9mIHBvc2l0aXZlIG5vZGVzLg0KcHJvZ3JlYy1wcm9nZXN0ZXJvbmUgcmVjZXB0b3IgKGluIGZtb2wpLg0KZXN0cmVjLWVzdHJvZ2VuIHJlY2VwdG9yIChpbiBmbW9sKS4NCnRpbWUtcmVjdXJyZW5jZSBmcmVlIHN1cnZpdmFsIHRpbWUgKGluIGRheXMpLg0KY2Vucy1jZW5zb3JpbmcgaW5kaWNhdG9yICgwLSBjZW5zb3JlZCwgMS0gZXZlbnQpLg0KYGBge3J9DQpsaWJyYXJ5KGRwbHlyKQ0KZGF0YShHQlNHMixwYWNrYWdlPSJUSC5kYXRhIikNCmhlYWQoR0JTRzIpDQpgYGANCiMjIDEuIEdMTSBtb2RlbCAobG9nLWxpbmVhciBwb2lzc29uIG1vZGVsKQ0KYGBge3J9DQpnbG1fbW9kZWw8LSBnbG0oY2Vuc34xLCBmYW1pbHkgPSBwb2lzc29uKGxpbmsgPSAibG9nIiksIG9mZnNldCA9IGxvZyh0aW1lKSwgZGF0YSA9IEdCU0cyKQ0Kc3VtbWFyeShnbG1fbW9kZWwpDQoNCmxhbWJkYV9HTE08LSBleHAoZ2xtX21vZGVsJGNvZWZmaWNpZW50cykNCnByaW50KHBhc3RlKCJHTE0gTGFtYmRhIHZhbHVlOiIsbGFtYmRhX0dMTSkpDQpgYGANCiMjIENPWCBNb2RlbCAoRXhwb25lbnRpYWwpDQpgYGB7cn0NCmxpYnJhcnkoc3Vydml2YWwpDQphZnRfbW9kZWw8LSBzdXJ2cmVnKFN1cnYodGltZSxjZW5zKX4xLCBkaXN0ID0gImV4cG9uZW50aWFsIiwgZGF0YSA9IEdCU0cyKQ0Kc3VtbWFyeShhZnRfbW9kZWwpDQoNCmxhbWJkYV9DT1g8LSBleHAoLWFmdF9tb2RlbCRjb2VmZmljaWVudHMpDQpwcmludChwYXN0ZSgiQ09YIExhbWJkYSB2YWx1ZToiLGxhbWJkYV9DT1gpKQ0KYGBgDQpUaHVzIGJvdGggdGhlIEJldGEgY29lZmZpY2llbnQgb2YgRXhwb25lbnRpYWwgQ294IG1vZGVsIGFuZCB0aGUgR0xNIG1vZGVsIGlzIHNhbWUuDQoNCkNvbXBhcmluZyBhZ2FpbiBib3RoIGJ5IGFkZGluZyBhZGRpdGlvbiBjb3ZhcmlhdGVzDQpgYGB7cn0NCmdsbV9tb2RlbDwtIGdsbShjZW5zfjEgKyBob3JUaCArIHRzaXplICsgcHJvZ3JlYyArIGVzdHJlYyArIG1lbm9zdGF0LCANCiAgICAgICAgICAgICAgICBmYW1pbHkgPSBwb2lzc29uKGxpbmsgPSAibG9nIiksIG9mZnNldCA9IGxvZyh0aW1lKSwgZGF0YSA9IEdCU0cyKQ0Kc3VtbWFyeShnbG1fbW9kZWwpDQoNCmxhbWJkYV9HTE08LSBleHAoZ2xtX21vZGVsJGNvZWZmaWNpZW50cykNCnByaW50KGxhbWJkYV9HTE0pDQpgYGANCg0KYGBge3J9DQphZnRfbW9kZWw8LSBzdXJ2cmVnKFN1cnYodGltZSxjZW5zKSB+IDEgKyBob3JUaCArIHRzaXplICsgcHJvZ3JlYyArIGVzdHJlYyArIG1lbm9zdGF0LCANCiAgICAgICAgICAgICAgICAgICAgZGlzdCA9ICJleHBvbmVudGlhbCIsIGRhdGEgPSBHQlNHMikNCnN1bW1hcnkoYWZ0X21vZGVsKQ0KDQpsYW1iZGFfQ09YPC0gZXhwKC1hZnRfbW9kZWwkY29lZmZpY2llbnRzKQ0KcHJpbnQobGFtYmRhX0NPWCkNCg0KYGBgDQpUaHVzIGJvdGggdGhlIEJldGEgY29lZmZpY2llbnQgb2YgRXhwb25lbnRpYWwgQ294IG1vZGVsIGFuZCB0aGUgR0xNIG1vZGVsIGlzIHNhbWUgd2l0aCBjb3ZhcmlhdGVzIGFuZCB3aXRob3V0IGNvdmFyaWF0ZXMuIEZ1cnRoZXIgYW5hbHl6aW5nLCBzdGF0aXN0aWNhbGx5IG5vbmUgb2YgdGhlIGNvdmFyaWF0ZXMgaW5jcmVhc2luZyB0aGUgcmlzayByYXRlIGFzIGFsbCB0aGUgY29lZmZpY2llbnRzIGFsbCBhbG1vc3QgY2xvc2UuDQoNCiMjIyAyLiBvdmVyYWxsIENPWC1QSCBtb2RlbCBkaWFnbm9zaXMNCmBgYHtyIGZpZy53aWR0aD0xMyxmaWcuaGVpZ2h0PTYsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQpsaWJyYXJ5KHN1cnZtaW5lcikNCmZpdDwtY294cGgoU3Vydih0aW1lLGNlbnMpIH4gaG9yVGggKyB0c2l6ZSArIHByb2dyZWMgKyBlc3RyZWMgKyBtZW5vc3RhdCthZ2UsZGF0YSA9IEdCU0cyKQ0KY294LnpwaChmaXQpDQojUHJvcG9ydGlvbmFsIEhhemFyZCBhc3N1bXB0aW9uIC0gU2Nob2VuZmVsZCByZXNpZHVhbHMNCmdnY294enBoKGNveC56cGgoZml0KSkNCg0KIyBUZXN0aW5nIGluZmx1ZW50aWFsIG9ic2VydmF0aW9uIC0gRGV2aWFuY2UgYW5kIGRmYmV0cyByZXNpZHVhbHMNCmdnY294ZGlhZ25vc3RpY3MoZml0LCB0eXBlPSAiZGZiZXRhIiwgZ2d0aGVtZT0gdGhlbWVfYncoKSwgbGluZWFyLnByZWRpY3Rpb25zID0gRikNCmdnY294ZGlhZ25vc3RpY3MoZml0LCB0eXBlPSAiZGV2aWFuY2UiLCBnZ3RoZW1lPSB0aGVtZV9idygpLCBsaW5lYXIucHJlZGljdGlvbnMgPSBGKQ0KDQojVGVzdGluZyBvZiBub24tbGluZWFyaXR5IC0gTWFydGluZ2FsZSByZXNpZHVhbHMNCmdnY294ZnVuY3Rpb25hbChTdXJ2KHRpbWUsIGNlbnMpIH4gYWdlKyBsb2coYWdlKSsgc3FydChhZ2UpLCBkYXRhID0gR0JTRzIpDQpnZ2NveGZ1bmN0aW9uYWwoU3Vydih0aW1lLCBjZW5zKSB+IHRpbWUrIGxvZyh0aW1lKSsgc3FydCh0aW1lKSwgZGF0YSA9IEdCU0cyKQ0KZ2djb3hmdW5jdGlvbmFsKFN1cnYodGltZSwgY2VucykgfiB0c2l6ZSsgbG9nKHRzaXplKSsgc3FydCh0c2l6ZSksIGRhdGEgPSBHQlNHMikNCmBgYA0KDQpDb25jbHVzaW9uOg0KMS4gUHJvcG9ydGlvbmFsIEhhemFyZCBhc3N1bXB0aW9uOiBUaGlzIG1vZGVsIGZhaWxzIHRoZSBwcm9wb3J0aW9uYWwgdGVzdCBpZiB3ZSBpbmNsdWRlIGNvdmFyaWF0ZXMgbGlrZSBlc3RyZWMsIG1lbm9zdGF0IGFuZCBwcm9ncmVjIHRoYXQgaGFzIHAtdmFsdWUgPCAwLjA1LCB0aHVzIHdlIG11c3QgaWdub3JlIHRoZXNlIGNvdmFyaWF0ZXMgYW5kIGluY2x1ZGUgb25seSB0c2l6ZSBhbmQgaG9yVGgNCjIuIEluZmx1ZW50aWFsIE9ic2VydmF0aW9uOiBXZSBjb3VsZCB0aGF0IGl0cyBpcyBjZW50ZXJlZCBhcm91bmQgMCBhbmQgZmV3IG91dGxpZXJzIGFyZSBkZXRlY3RlZA0KMy4gTm9uLWxpbmVhcml0eSB0ZXN0aW5nOiBOb24tIGxpbmVhcml0eSBvZiBBZ2UgYW5kIFRpbWUgaXMgdGVzdGVkIGFuZCB3ZSBjb3VsZCBzZWUgbm9uLWxpbmVhcml0eSBpbiB0aGUgY29udGludW91cyB2YXJpYWJsZSAgIA==