Problem:
The function survival::survreg in R implements the estimation of transformation models. Fit a Weibull transformation model to the Melanoma data without any covariates, including only the intercept term Beta_0. -What are the estimates for the parameters lambda and alpha of the Weibull distribution? 1.1 Plot the estimated baseline hazard with a kernel density estimator (muhaz) 1.2 Plot the baseline survival function together with the Kaplan-Meier estimator, respectively.
Now include the covariates sex, thickness, ulcer and age into the model. Based on the parameter estimates interpret the influence of each covariate on the hazard rate.
Plot the estimated hazard rates for males and females jointly with the baseline hazard. To do so plug-in the mean values for the remaining covariates. Plot the survival curves for males and females jointly with the corresponding Kaplan-Meier estimates and the baseline survival function.
Now fit a Weibull AFT model using the covariate age. Model this covariate semi- parametrically using splines with degree 2. Use a P-spline basis (pspline{survival})and a B-spline basis (bs{splines}) and compare the two estimations graphically.
Covariates used: sex: 1 = male, 0 = female age: age in years thickness: tumor thickness in mm ulcer: 1 = presence, 0 = absence
data(Melanoma, package="MASS")
Melanoma$status= 1* (Melanoma$status == 1)
head(Melanoma)
Weibull Transformation model:
library(survival)
weibull<-survreg(Surv(time,status)~1, data=Melanoma)
summary(weibull)
Call:
survreg(formula = Surv(time, status) ~ 1, data = Melanoma)
Value Std. Error z p
(Intercept) 8.8669 0.1727 51.35 <2e-16
Log(scale) -0.0812 0.1190 -0.68 0.49
Scale= 0.922
Weibull distribution
Loglik(model)= -567.2 Loglik(intercept only)= -567.2
Number of Newton-Raphson Iterations: 6
n= 205
alpha<-1/weibull$scale
lambda<- exp(-coef(weibull))
print(paste("lambda",lambda))
[1] "lambda 0.000140975097542914"
print(paste("Alpha",alpha))
[1] "Alpha 1.08459840805882"
Kernel model using the muhaz function with the default kernel: “Epanechnikov”
library(muhaz)
ker_hazard<- muhaz(Melanoma$time, Melanoma$status)
str(ker_hazard)
List of 9
$ pin :List of 13
..$ times : int [1:205] 10 30 35 99 185 204 210 232 232 279 ...
..$ delta : num [1:205] 0 0 0 0 1 1 1 0 1 1 ...
..$ nobs : int 205
..$ min.time : num 0
..$ max.time : num 4124
..$ n.min.grid : num 51
..$ min.grid : num [1:51] 0 82.5 165 247.4 329.9 ...
..$ n.est.grid : num 101
..$ bw.pilot : num 230
..$ bw.smooth : num 1148
..$ method : int 2
..$ b.cor : num 2
..$ kernel.type: num 1
$ est.grid : num [1:101] 0 41.2 82.5 123.7 165 ...
$ haz.est : num [1:101] 0.000111 0.000111 0.000111 0.000111 0.000111 ...
$ bw.loc : num [1:51] 4593 235 1562 2698 3077 ...
$ bw.loc.sm: num [1:101] 2782 2696 2601 2501 2401 ...
$ msemin : num [1:51] 1.86e-06 4.50e-10 6.89e-10 6.08e-10 5.40e-10 ...
$ bias.min : num [1:51] 1.35e-03 4.42e-06 -2.82e-06 2.55e-06 1.86e-06 ...
$ var.min : num [1:51] 3.04e-08 4.30e-10 6.81e-10 6.01e-10 5.37e-10 ...
$ imse.opt : num 1e+30
- attr(*, "class")= chr "muhaz"
Hazard function:
library(dplyr)
library(tidyr)
hweibull <- function(grid, shape, scale = 1){
exp(dweibull(grid, shape = shape,scale = scale, log = TRUE) -
pweibull(grid, shape = shape,scale = scale,lower.tail = FALSE,log.p = TRUE))
}
bas_haz<- data.frame(grid = ker_hazard$est.grid, ramlau_estimation=ker_hazard$haz.est)[-1,]
bas_haz<-mutate(bas_haz,manual_estimation= hweibull(grid,scale = 1/lambda,shape = alpha))
bas_haz<- gather(bas_haz,Type_of_Estimation,Hazard, ramlau_estimation,manual_estimation)
extract<- group_by(bas_haz,Type_of_Estimation)
slice(extract,1:5)
Visualization:
library(ggplot2)
ggplot(bas_haz,aes(grid,Hazard))+
geom_line(aes(col=Type_of_Estimation,
lty=Type_of_Estimation),lwd=2)+
labs(x="time",y=expression(hat(lambda)(t)), title = "Weibull baseline Hazard rate vs Ramlau-Hansen estimation of Hazard rate")+
theme_bw()
Intially both the Ramlau(Non-parametric) estimation and Manually estimated hazard function increases but after it reaches the time point grid of 1500 manual estimation smoothly extrapolating whereas ramlau estimation goes nearby to 0 value.
Kaplan-Meier estimation and weibull Baseline survival function:
km<- broom::tidy(survfit(Surv(time,status)~1, data= Melanoma))
km<- rename(km, KM_estimation= estimate)
#Weibull SUrvival:
km<- mutate(km, Weibull_Survival =pweibull(time, shape= alpha, scale=1/lambda, lower.tail = FALSE))
km<- gather(km, Type_of_Estimation, Survival, KM_estimation, Weibull_Survival)
km
Visualization:
ggplot(km,aes(x=time, y= Survival))+
geom_line(aes(col= Type_of_Estimation),lwd=1.5)+
labs(x="time",y=expression(hat(S)(t)),title = "Kaplan-Meier estimation vs Weibull Baseline Survival rate")+
ylim(c(0,1))+
theme_bw()
Both the type of estimation estimates almost similar survival rate, but once there is no event observation to absorb weibull extrapolates towards the same direction and Kaplan-Meier moves horizontally.
Including all the covariates in the Weibull model like sex, thickness, ulcer and age into the model and need to analyze the influence of the each covariates
weibull<-update(weibull, .~.+sex+thickness+ulcer+age)
weibull
Call:
survreg(formula = Surv(time, status) ~ sex + thickness + ulcer +
age, data = Melanoma)
Coefficients:
(Intercept) sex thickness ulcer age
10.40924911 -0.33334418 -0.08476035 -0.97594460 -0.01343855
Scale= 0.8165441
Loglik(model)= -545.4 Loglik(intercept only)= -567.2
Chisq= 43.46 on 4 degrees of freedom, p= 8.29e-09
n= 205
#Hazard Ratio with 1 unit increase:
alpha<-1/weibull$scale
exp(-coef(weibull)*(alpha))[-1]
sex thickness ulcer age
1.504165 1.109383 3.304264 1.016594
#Survival Rate/ Acceleration effect:
exp(coef(weibull)[-1])
sex thickness ulcer age
0.7165235 0.9187324 0.3768362 0.9866513
We could see that there is a decrease in the survival rate from 1.0 and increase in hazard rate from 1.0, all by adding covariates the possibility of experiencing the event chances are higher than using only the intercepts as the effects of covariates affecting the survival rates.
Estimating the Hazard rate jointly for both males and females:
melanoma<- group_by(Melanoma,sex)
melanoma<- cbind(melanoma,intercept=1)
melanoma<- summarize_all(melanoma, funs(mean))
melanoma<- select(melanoma,intercept, sex, thickness, ulcer, age)
melanoma
beta<- coef(weibull)
scale<- as.matrix(melanoma) %*% beta
scale
[,1]
[1,] 9.141517
[2,] 8.514293
bas_haz<- select(bas_haz, grid)
bas_haz<-mutate(bas_haz,
baseline = hweibull(grid, shape = alpha,scale = exp(beta[1])),
male = hweibull(grid, shape = alpha, scale = exp(scale[2,])),
female = hweibull(grid, shape = alpha, scale = exp(scale[1,]))
)
bas_haz<-mutate(bas_haz,intercept= hweibull(grid,scale = 1/lambda,shape =1.084598))
bas_haz<- gather(bas_haz,Type, Hazard, baseline, male, female, intercept)
extract<- group_by(bas_haz,Type)
slice(extract,1:2)
ggplot(bas_haz,aes(grid,Hazard))+
geom_line(aes(col=Type),lwd=1.3)+
labs(x="time",y=expression(hat(lambda)(t)), title = "Hazard rate Comparison")+
theme_bw()
Generally, We could see that by including the covariates the hazard rate increases drastically. ALso, if we compare from females to males, the survival rate of female is higher than the male.
Estimating the Hazard rate jointly for both males and females:
km<- select(km,time)
km<-mutate(km, baseline = pweibull(time, shape = alpha,scale = exp(beta[1]),
lower.tail = FALSE),
male = pweibull(time, shape = alpha, scale = exp(scale[2,]),
lower.tail = FALSE),
female = pweibull(time, shape = alpha, scale = exp(scale[1,]),
lower.tail = FALSE))
km<- mutate(km, intercept =pweibull(time, shape=1.084598, scale=1/lambda,
lower.tail = FALSE))
km<- gather(km,Type,Survival, baseline:intercept)
extract<- group_by(bas_haz,Type)
slice(extract,1:2)
ggplot(km,aes(time,Survival))+
geom_line(aes(col=Type),lwd=1.3)+
#geom_step(aes(col=Type))+
labs(x="time",y=expression(hat(S)(t)), title = "Survival rate Comparison")+
ylim(c(0,1))+
theme_bw()
As stated above, as we add covariates to the model the probability of experiencing the event is higher and this leads to decrease in the survival rate.
library(splines)
knots<- quantile(Melanoma$age, p= seq(from=0.1, to= 0.9 , length=3))
b_spline<- survreg(Surv(time, status)~ bs(age, knots = knots, degree = 2),
data = Melanoma, dist = "weibull")
summary(b_spline)
Call:
survreg(formula = Surv(time, status) ~ bs(age, knots = knots,
degree = 2), data = Melanoma, dist = "weibull")
Value Std. Error z p
(Intercept) 9.1607 1.3715 6.68 2.4e-11
bs(age, knots = knots, degree = 2)1 -0.2483 1.7370 -0.14 0.886
bs(age, knots = knots, degree = 2)2 0.0429 1.3572 0.03 0.975
bs(age, knots = knots, degree = 2)3 -0.5256 1.4282 -0.37 0.713
bs(age, knots = knots, degree = 2)4 -0.9119 1.4648 -0.62 0.534
bs(age, knots = knots, degree = 2)5 -2.7511 1.5826 -1.74 0.082
Log(scale) -0.1553 0.1198 -1.30 0.195
Scale= 0.856
Weibull distribution
Loglik(model)= -561.7 Loglik(intercept only)= -567.2
Chisq= 10.86 on 5 degrees of freedom, p= 0.054
Number of Newton-Raphson Iterations: 7
n= 205
p_spline<- survreg(Surv(time, status)~ pspline(age,df=2),
data = Melanoma, dist = "weibull")
summary(p_spline)
Call:
survreg(formula = Surv(time, status) ~ pspline(age, df = 2),
data = Melanoma, dist = "weibull")
Value Std. Error z p
(Intercept) 9.12841 1.30779 6.98 3e-12
ps(age)3 0.00511 0.77500 0.01 0.99
ps(age)4 0.02591 1.21423 0.02 0.98
ps(age)5 -0.02581 1.34826 -0.02 0.98
ps(age)6 -0.30382 1.33207 -0.23 0.82
ps(age)7 -0.89470 1.31602 -0.68 0.50
ps(age)8 -1.77568 1.38933 -1.28 0.20
ps(age)9 -2.70542 1.70365 -1.59 0.11
Log(scale) -0.15107 0.11946 -1.26 0.21
Scale= 0.86
Weibull distribution
Loglik(model)= -562.4 Loglik(intercept only)= -567.2
Chisq= 9.63 on 1.5 degrees of freedom, p= 0.0041
Number of Newton-Raphson Iterations: 3 11
n= 205
Visualization:
age_ordered<- order(Melanoma$age)
spline<- data.frame("B-Splines" = predict(b_spline)[age_ordered],
"P-Splines" = predict(p_spline)[age_ordered])
spline<- stack(spline)
spline$Age<- rep(Melanoma$age[age_ordered], 2)
colnames(spline) <- c("Predict", "Fit", "Age")
ggplot(spline,aes(Age,Predict,col=Fit))+
geom_line(lwd=1.5)+
labs(x="Age",y="Prediction", title = "Spline Comparison")+
theme_bw()
Thus both the splines are goes down as we known that when age gets increased the life span gets decreased. Whereas, the Basis spline is little bit too complicated as it overfits the model, but Penalized spline gives us smooth curve.