Problem:
Freireich et al (1963) present a study of 42 Leukemia patients in remission. Half of the patients received an experimental treatment (6-MP), the other half a placebo. The goal of the study was to investigate the effect of treatment on the duration until relapse.
Hint: The data set is available as leuk2 from the package bpcp.
Compare the survivor functions for the two patient groups visually and interpret the results.
Now fit an exponential AFT model to the data and interpret the treatment effect. Hint: Function survreg in the survival package.
What assumptions are implicit in this modeling approach and how could these be checked (graphically)?
Repeat the analysis using the Weibull AFT model.
Dataset details: A data frame with 42 observations on the following variables.
1.time-time in remission (in weeks)
2.status-event status, 1 is relapse, 0 is censored
3.treatment-treatment group: either ‘placebo’ or ‘6-MP’
4.pair-pair id number
#install.packages("bpcp")
data("leuk2", package="bpcp")
head(leuk2,10)
leuk2$treatment <- relevel(leuk2$treatment, ref = "placebo")
library(survival)
library(GGally)
km<-survfit(Surv(time,status)~treatment,data= leuk2)
ggsurv(km)+
ylim(c(0,1))+
labs(x="Survival",y="Time")+
geom_hline(yintercept = 0.5, linetype=3)+
geom_vline(xintercept = (summary(km)$table[,"median",drop=TRUE]), linetype=3)+
theme_bw()
NA
NA
Median: Placebo=8 6-MP=23
The survival rates of patient who take Placebo is very less than patient taking 6-MP treatment, all placebo patient had experienced the event before half of the 6-MP patient had experienced the event.
aft<- survreg(Surv(time,status)~ treatment,data=leuk2, dist ="exponential",score= TRUE)
summary(aft)
Call:
survreg(formula = Surv(time, status) ~ treatment, data = leuk2,
dist = "exponential", score = TRUE)
Value Std. Error z p
(Intercept) 2.159 0.218 9.90 < 2e-16
treatment6-MP 1.527 0.398 3.83 0.00013
Scale fixed at 1
Exponential distribution
Loglik(model)= -108.5 Loglik(intercept only)= -116.8
Chisq= 16.49 on 1 degrees of freedom, p= 4.9e-05
Number of Newton-Raphson Iterations: 4
n= 42
print("Score/ Gradient value:")
[1] "Score/ Gradient value:"
aft$score
[1] 2.500654e-10 2.439244e-10
We can reject the Null hypothesis as the loglikelihood of both the models are same.
AFT Acceleration:
Expected Survival time= {(Estimated time | Treatment)/(Estimated time| No treatment)}=> t= exp(coefficient vale of 6-MP)
exp(coef(aft))
(Intercept) treatment6-MP
8.666667 4.602564
The survival rates of patient who’s undergoing 6-MP treatment has 4.6 time larger survival rate or accelerated/stretched than that of Placeba(no) treatment.
Assumptions: • Distributional assumptions (here T ~ Exp(lambda)) • Covariate effects ac-/decelerate the (expected) life time by a constant factor
Diagnostics: S(t) = exp(-(lambdat)) => log(-log(S(t))) = log(lambda) + log(t) => plot of the LHS (left hand side of the equation) vs. RHS (right hand side of the equation) should be a straight line with: 1. Intercept a= log(lambda)= -(coef[0]+ coef[1] I), where I=0 for Placeba, 1 for treatment. 2. Slope b=1
km<- broom::tidy(km)
head(km)
summary(km)
time n.risk n.event n.censor
Min. : 1.00 Min. : 1.00 Min. :0.000 Min. :0.0000
1st Qu.: 7.75 1st Qu.: 4.75 1st Qu.:0.000 1st Qu.:0.0000
Median :14.00 Median : 9.50 Median :1.000 Median :0.0000
Mean :15.07 Mean :10.00 Mean :1.071 Mean :0.4286
3rd Qu.:22.00 3rd Qu.:15.25 3rd Qu.:2.000 3rd Qu.:1.0000
Max. :35.00 Max. :21.00 Max. :4.000 Max. :2.0000
estimate std.error conf.high conf.low
Min. :0.0000 Min. :0.0708 Min. :0.3225 Min. :0.007032
1st Qu.:0.4314 1st Qu.:0.1280 1st Qu.:0.8074 1st Qu.:0.248788
Median :0.5994 Median :0.1854 Median :0.8960 Median :0.439394
Mean :0.5290 Mean : Inf Mean :0.8074 Mean :0.389119
3rd Qu.:0.7529 3rd Qu.:0.3003 3rd Qu.:0.9676 3rd Qu.:0.585919
Max. :0.9048 Max. : Inf Max. :1.0000 Max. :0.787535
NA's :1 NA's :1
strata
Length:28
Class :character
Mode :character
library(dplyr)
# compute log(-log(S(t))) and log(t)
km <- mutate(km,
lls = log(-log(estimate)),
lt= log(time))
str(km)
tibble [28 x 11] (S3: tbl_df/tbl/data.frame)
$ time : num [1:28] 1 2 3 4 5 8 11 12 15 17 ...
$ n.risk : num [1:28] 21 19 17 16 14 12 8 6 4 3 ...
$ n.event : num [1:28] 2 2 1 2 2 4 2 2 1 1 ...
$ n.censor : num [1:28] 0 0 0 0 0 0 0 0 0 0 ...
$ estimate : num [1:28] 0.905 0.81 0.762 0.667 0.571 ...
$ std.error: num [1:28] 0.0708 0.1059 0.122 0.1543 0.189 ...
$ conf.high: num [1:28] 1 0.996 0.968 0.902 0.828 ...
$ conf.low : num [1:28] 0.788 0.658 0.6 0.493 0.395 ...
$ strata : chr [1:28] "treatment=placebo" "treatment=placebo" "treatment=placebo" "treatment=placebo" ...
$ lls : num [1:28] -2.302 -1.554 -1.302 -0.903 -0.581 ...
$ lt : num [1:28] 0 0.693 1.099 1.386 1.609 ...
## intercept placebo group:
lam0 <- -coef(aft)[1]
lam0
(Intercept)
-2.159484
## intercept treatment group
lam1 <- -sum(coef(aft))
lam1
[1] -3.686098
aft_viz<-ggplot(km,aes(x=lt,y=lls, col= strata))+
geom_point()+
geom_step()+
geom_abline(intercept = lam0,slope=1)+
geom_abline(intercept = lam1,slope=1)+
labs(x="log(t)",y=expression(log(hat(Lambda)(t))),title = "Exponential distribution")+
ylim(c(-4,2))+
theme_bw()
aft_viz
The assumptions are almost and thus the straight linear line is achieve.
Same assumptions but using Weibull distribution: T~WB(lamda,p) S(t)=exp(=(lambdat)p) =>{(log(S(t))(1/p))/lambda}=t 1/lambda=exp((coef[0]+ coef[1] I))
Accelertion Factor: a= exp(coef[1])
weibull<- survreg(Surv(time,status)~ treatment,data=leuk2,
dist="weibull",score= TRUE)
summary(weibull)
Call:
survreg(formula = Surv(time, status) ~ treatment, data = leuk2,
dist = "weibull", score = TRUE)
Value Std. Error z p
(Intercept) 2.248 0.166 13.55 < 2e-16
treatment6-MP 1.267 0.311 4.08 4.5e-05
Log(scale) -0.312 0.147 -2.12 0.034
Scale= 0.732
Weibull distribution
Loglik(model)= -106.6 Loglik(intercept only)= -116.4
Chisq= 19.65 on 1 degrees of freedom, p= 9.3e-06
Number of Newton-Raphson Iterations: 5
n= 42
print("Score/ Gradient value:")
[1] "Score/ Gradient value:"
weibull$score
[1] -5.640003e-08 -3.161867e-09 5.083317e-08
Acceleration:
exp(coef(weibull)[2])
treatment6-MP
3.551374
No treatment(Placeba)~ 4* Treatment
Same as before with slope p=1/scale and log(-log(S(t))) = log(lambda) + plog(t) with assumption 1. Intercept a= log(lambda)= P -(coef[0]+ coef[1]* I), where I=0 for Placeba, 1 for treatment. 2. Slope p
#slope:
p=1/weibull$scale
## intercept placebo group:
wlam0 <- p*(-coef(weibull)[1])
wlam0
(Intercept)
-3.070704
## intercept treatment group
wlam1 <- p*(-sum(coef(aft)))
wlam1
[1] -5.034316
wei_viz<-ggplot(km,aes(x=lt,y=lls, col= strata))+
geom_point()+
geom_step()+
geom_abline(intercept = wlam0,slope=p)+
geom_abline(intercept = wlam1,slope=p)+
labs(x="log(t)",y=expression(log(hat(Lambda)(t))),title = "Weibull distribution")+
ylim(c(-5.5,2))+
theme_bw()
wei_viz
library(patchwork)
aft_viz/wei_viz
Both distributions we couldn’t find any strong violations, but there’s is some slight disagreement in the weibull model for treatment. Thus both models can be used since exponential model is just a special case of weibull model