Technical details

Data and notation

We assume that a true event time for individual $i$ ( $i = 1, . . ., N$ ) exists, denoted $T_{i}^{*}$ , but that in practice it may or may not observed due to left, right, or interval censoring. Therefore, in practice we observe outcome data $D_{i} = {T_{i}, T_{i}^{U}, T_{i}^{E}, d_{i}}$ for individual $i$ where:

$T_{i}$ : the observed event or censoring time
$T_{i}^{U}$ : the observed upper limit for interval censored individuals
$T_{i}^{E}$ : the observed entry time (i.e. the time at which an individual became at risk for the event)

and $d_{i} \in {0, 1, 2, 3}$ denotes an event indicator taking value:

0 if individual $i$ was right censored (i.e. $T_{i}^{*} > T_{i}$ )
1 if individual $i$ was uncensored (i.e. $T_{i}^{*} = T_{i}$ )
2 if individual $i$ was left censored (i.e. $T_{i}^{*} < T_{i}$ )
3 if individual $i$ was interval censored (i.e. $T_{i} < T_{i}^{*} < T_{i}^{U}$ )

The hazard rate, cumulative hazard, and survival probability

The hazard of the event at time $t$ is the instantaneous rate of occurrence for the event at time $t$ . Mathematically, it is defined as:

\begin{aligned} h_{i} (t) = lim_{Δ t \to 0} \frac{P (t \leq T_{i}^{*} < t + Δ t | T_{i}^{*} > t)}{Δ t} \end{aligned}

where

Δ t

is the width of some small time interval. The numerator is the conditional probability of the individual experiencing the event during the time interval

[t, t + Δ t)

, given that they were still at risk of the event at time

t

. The denominator converts the conditional probability to a rate per unit of time. As

Δ t

approaches the limit, the width of the interval approaches zero and the instantaneous event rate is obtained.

The cumulative hazard is defined as:

\begin{aligned} H_{i} (t) = \int_{s = 0}^{t} h_{i} (s) d s \end{aligned}

and the survival probability is defined as:

\begin{aligned} S_{i} (t) = \exp [- H_{i} (t)] = \exp [- \int_{s = 0}^{t} h_{i} (s) d s] \end{aligned}

It can be seen here that in the standard survival analysis setting there is a one-to-one relationship between each of the hazard, the cumulative hazard, and the survival probability. These quantities are also used to form the likelihood for the survival model described in the later sections.

Hazard scale formulations

When basehaz is set equal to "exp", "weibull", "gompertz", "ms" (the default), or "bs" then the model is defined on the hazard scale as described by the following parameterisations.

We model the hazard of the event for individual $i$ at time $t$ using the regression model:

\begin{aligned} h_{i} (t) = h_{0} (t) \exp [η_{i} (t)] \end{aligned}

where

h_{0} (t)

is the baseline hazard (i.e. the hazard for an individual with all covariates set equal to zero) at time

t

, and

η_{i} (t)

denotes the linear predictor evaluated for individual

i

at time

t

. For full generality, we allow the linear predictor to be time-varying; that is, it may be a function of time-varying covariates or time-varying coefficients (i.e. a time-varying hazard ratio). However, if there are no time-varying covariates or time-varying coefficients in the model, then the linear predictor reduces to a time-fixed quantity and the definition of the hazard function reduces to:

\begin{aligned} h_{i} (t) = h_{0} (t) \exp [η_{i}] \end{aligned}

Our linear predictor is defined as:

\begin{aligned} η_{i} (t) = β_{0} + \sum_{p = 1}^{P} β_{p} (t) x_{i p} (t) \end{aligned}

where

β_{0}

denotes the intercept parameter,

x_{i p} (t)

denotes the observed value of

p^{t h}

(p = 1, . . ., P)

covariate for the

i^{t h}

(i = 1, . . ., N)

individual at time

t

, and

β_{p} (t)

denotes the coefficient for the

p^{t h}

covariate.

The quantity $\exp (β_{p} (t))$ is referred to as a “hazard ratio”. The hazard ratio (HR) quantifies the relative increase in the hazard that is associated with a unit-increase in the relevant covariate, $x_{i p}$ ; e.g. a hazard ratio of 2 means that a unit-increase in the covariate leads to a doubling in the hazard (i.e. the instantaneous rate) of the event. The hazard ratio can be treated as a time-fixed quantity (i.e. proportional hazards) or time-varying quantity (i.e. non-proportional hazards), as described in later sections.

Distributions

Exponential model (basehaz = "exp"): for scale parameter $λ_{i} (t) = \exp (η_{i} (t))$ we have:

$h_{i} (t) = λ_{i} (t)$
Weibull model (basehaz = "weibull"): for scale parameter $λ_{i} (t) = \exp (η_{i} (t))$ and shape parameter $γ > 0$ we have:

$h_{i} (t) = γ t^{γ - 1} λ_{i} (t)$
Gompertz model (basehaz = "gompertz"): for shape parameter $λ_{i} (t) = \exp (η_{i} (t))$ and scale parameter $γ > 0$ we have:

$h_{i} (t) = \exp (γ t) λ_{i} (t)$
M-splines model (basehaz = "ms", the default): letting $M (t; γ, k_{0}, δ)$ denote a degree $δ$ M-spline function with basis evaluated at a vector of knot locations $ = $ and parameter vector $γ > 0$ we have:

$h_{i} (t) = M (t; γ, k_{0}, δ) \exp (η_{i} (t))$
The M-spline function is calculated using the method described in Ramsay (1988) and implemented in the splines2 R package (Wang and Yan (2018)). To ensure that the hazard function $h_{i} (t)$ is not constrained to zero at the origin (i.e. when $t$ approaches 0) the M-spline basis incorporates an intercept. To ensure identifiability of both the intercept parameter in the M-spline function and the intercept parameter in the linear predictor (i.e. $β_{0}$ ) we constrain the M-spline coefficients to a simplex, that is, $\sum_{j = 1}^{J} γ_{j} = 1$ . The default degree in rstanarm is $δ = 3$ ; that is, cubic M-splines. However this can be controlled by the user via the basehaz_ops argument. It is worthwhile noting that $δ = 0$ would correspond to a piecewise constant baseline hazard.
B-splines model (for the log baseline hazard): letting $B (t; γ, k_{0}, δ)$ denote a degree $δ$ B-spline function with basis evaluated at a vector of knot locations $k_{0}$ and parameter vector $γ$ we have:

$h_{i} (t) = \exp (B (t; γ, k_{0}, δ) + η_{i} (t))$
The B-spline function is calculated using the method implemented in the splines2 R package (Wang and Yan (2018)). The B-spline basis does not require an intercept and therefore does not include one; any constant shift in the log hazard is fully captured via the intercept in the linear predictor (i.e. $β_{0}$ ).

Note: When the linear predictor is not time-varying (i.e. under proportional hazards) there is a closed form expression for the survival probability (except for the B-splines model); details shown in the appendix. However, when the linear predictor is time-varying (i.e. under non-proportional hazards) there is no closed form expression for the survival probability; instead, quadrature is used to evaluate the survival probability for inclusion in the likelihood. Extended details on the parameterisations are given in the appendix.

Accelerated failure time formulations

When basehaz is set equal to "exp-aft", or "weibull-aft" then the model is defined on the accelerated failure time scale as described by the following parameterisations.

Following Hougaard (1999), we model the survival probability for individual $i$ at time $t$ using the regression model:

\begin{aligned} S_{i} (t) = S_{0} (\int_{0}^{t} \exp [- η_{i} (u)] d u) \end{aligned}

where

S_{0} (t)

is the baseline survival probability at time

t

, and

η_{i} (t)

denotes the linear predictor evaluated for individual

i

at time

t

. For full generality, we allow the linear predictor to be time-varying; that is, it may be a function of time-varying covariates or time-varying coefficients (i.e. a time-varying acceleration factor). However, if there are no time-varying covariates or time-varying coefficients in the model, then the linear predictor reduces to a time-fixed quantity (i.e.

η_{i} (t) = η_{i}

) and the definition of the survival probability reduces to:

\begin{aligned} S_{i} (t) = S_{0} (t \exp [- η_{i}]) \end{aligned}

Our linear predictor is defined as:

\begin{aligned} η_{i} (t) = β_{0}^{*} + \sum_{p = 1}^{P} β_{p}^{*} (t) x_{i p} (t) \end{aligned}

where

β_{0}^{*}

denotes the intercept parameter,

x_{i p} (t)

denotes the observed value of

p^{t h}

(p = 1, . . ., P)

covariate for the

i^{t h}

(i = 1, . . ., N)

individual at time

t

, and

β_{p}^{*} (t)

denotes the coefficient for the

p^{t h}

covariate.

The quantity $\exp (- β_{p}^{*} (t))$ is referred to as an “acceleration factor” and the quantity $\exp (β_{p}^{*} (t))$ is referred to as a “survival time ratio”. The acceleration factor (AF) quantifies the acceleration (or deceleration) of the event process that is associated with a unit-increase in the relevant covariate, $x_{i p}$ ; e.g. an acceleration factor of 0.5 means that a unit-increase in the covariate leads to an individual approaching the event at half the speed. If you find that somewhat confusing, then it may be easier to think about the survival time ratio. The survival time ratio (STR) is interpreted as the increase (or decrease) in the expected survival time that is associated with a unit-increase in the relevant covariate, $x_{i p}$ ; e.g. a survival time ratio of 2 (which is equivalent to an acceleration factor of 0.5) means that a unit-increase in the covariate leads to an doubling in the expected survival time. The survival time ratio is equal to the inverse of the acceleration factor (i.e. $STR = 1 / AF$ ).

Distributions

Exponential model (basehaz = "exp-aft"): When the linear predictor is time-varying we have:

$S_{i} (t) = \exp (- \int_{0}^{t} \exp (- η_{i} (u)) d u)$
and when the linear predictor is time-fixed we have:
$S_{i} (t) = \exp (- t λ_{i})$
for scale parameter $λ_{i} = \exp (- η_{i})$ .
Weibull model (basehaz = "weibull-aft"): When the linear predictor is time-varying we have:

$S_{i} (t) = \exp (- {[\int_{0}^{t} \exp (- η_{i} (u)) d u]}^{γ})$
for shape parameter $γ > 0$ and when the linear predictor is time-fixed we have:
$S_{i} (t) = \exp (- t^{γ} λ_{i})$
for scale parameter $λ_{i} = \exp (- γ η_{i})$ and shape parameter $γ > 0$ .

Note: When the linear predictor is not time-varying (i.e. under time-fixed acceleration), there is a closed form expression for both the hazard function and survival function; details shown in the appendix. However, when the linear predictor is time-varying (i.e. under time-varying acceleration) there is no closed form expression for the hazard function or survival probability; instead, quadrature is used to evaluate the cumulative acceleration factor, which in turn is used to evaluate the hazard function and survival probability for inclusion in the likelihood. Extended details on the parameterisations are given in the appendix.

Time-fixed and time-varying effects of covariates

The coefficient $β_{p} (t)$ (i.e. the log hazard ratio) or $β_{p}^{*} (t)$ (i.e. log survival time ratio) can be treated as a time-fixed quantity (e.g. $β_{p} (t) = β_{p}$ ) or as a time-varying quantity. We refer to the latter as time-varying effects because the effect of the covariate is allowed to change as a function of time. In stan_surv time-varying effects are specified by using the tde function in the model formula. Note that in the following definitions we only refer to $β_{p} (t)$ (i.e. the log hazard ratio) but the same methodology applies to $β_{p}^{*} (t)$ (i.e. the log survival time ratio).

Without time-varying effects we have:

\begin{aligned} β_{p} (t) = θ_{p 0} \end{aligned}

such that

θ_{p 0}

is a time-fixed log hazard ratio (or log survival time ratio).

With time-varying effects modelled using B-splines we have:

\begin{aligned} β_{p} (t) = θ_{p 0} + \sum_{m = 1}^{M} θ_{p m} B_{m} (t; k, δ) \end{aligned}

where

θ_{p 0}

is a constant,

B_{m} (t; k, δ)

is the

m^{th}

(m = 1, . . ., M)

basis term for a degree

δ

B-spline function evaluated at a vector of knot locations

k = {k_{1}, . . ., k_{J}}

, and

θ_{p m}

is the

m^{th}

B-spline coefficient. By default cubic B-splines are used (i.e.

δ = 3

). These allow the log hazard ratio (or log survival time ratio) to be modelled as a smooth function of time.

The degrees of freedom is equal to the number of additional parameters required to estimate a time-varying coefficient relative to a time-fixed coefficient. When a B-spline function is used to model the time-varying coefficient the degrees of freedom are $M = J + δ - 2$ where $J$ is the total number of knots (including boundary knots).

With time-varying effects modelled using a piecewise constant function we have:

\begin{aligned} β_{p} (t) = θ_{p 0} + \sum_{m = 1}^{M} θ_{p m} I (k_{m + 1} < t \leq k_{m + 2}) \end{aligned}

where

I (x)

is an indicator function taking value 1 if

x

is true and 0 otherwise,

θ_{p 0}

is a constant corresponding to the log hazard ratio (or log survival time ratio for AFT models) in the first time interval,

θ_{p m}

is the deviation in the log hazard ratio (or log survival time ratio) between the first and

(m + 1)^{th}

(m = 1, . . ., M)

time interval, and

k = {k_{1}, . . ., k_{J}}

is a sequence of knot locations (i.e. break points) that includes the lower and upper boundary knots. This allows the log hazard ratio (or log survival time ratio) to be modelled as a piecewise constant function of time.

The degrees of freedom is equal to the number of additional parameters required to estimate a time-varying coefficient relative to a time-fixed coefficient. When a piecewise constant function is used to model the time-varying coefficient the degrees of freedom are $M = J - 2$ where $J$ is the total number of knots (including boundary knots).

Default knot locations: The vector of knot locations $k = {k_{1}, . . ., k_{J}}$ includes a lower boundary knot $k_{1}$ at the earliest entry time (equal to zero if there isn’t delayed entry) and an upper boundary knot $k_{J}$ at the latest event or censoring time. The boundary knots cannot be changed by the user. Internal knot locations – that is $k_{2}, . . ., k_{(J - 1)}$ when $J \geq 3$ – can be explicitly specified by the user (see the knots argument to the tve function) or are determined by default. The default is to place the internal knots at equally spaced percentiles of the distribution of uncensored event times. When a B-spline function is specified, the tve function uses default values $M = 3$ (degrees of freedom) and $δ = 3$ (cubic splines) which in fact corresponds to a cubic B-spline function with no internal knots. When a piecewise constant function is specified, the tve function uses a default value of $M = 3$ (degrees of freedom) which corresponds to internal knots at the $25^{th}$ , $50^{th}$ , and $75^{th}$ percentiles of the distribution of the uncensored event times.

Note on subscripts: We have dropped the subscript $p$ from the knot locations $k$ and degree of the B-splines $δ$ discussed above. This is just for simplicity of the notation. In fact, if a model has time-varying effects estimated for more than one covariate, then each these can be modelled using different knot locations and/or degree if the user desires.

Likelihood

Allowing for the three forms of censoring and potential delayed entry (i.e. left truncation) the likelihood for the survival model takes the form:

\begin{aligned} \begin{aligned} p (D_{i} | γ, β) = & {[h_{i} (T_{i})]}^{I (d_{i} = 1)} \\ \times {[S_{i} (T_{i})]}^{I (d_{i} \in {0, 1})} \\ \times {[1 - S_{i} (T_{i})]}^{I (d_{i} = 2)} \\ \times {[S_{i} (T_{i}) - S_{i} (T_{i}^{U})]}^{I (d_{i} = 3)} \\ \times {[S_{i} (T_{i}^{E})]}^{- 1} \end{aligned} \end{aligned}

Priors

The prior distribution for the so-called “auxiliary” parameters (i.e. $γ$ for the Weibull and Gompertz models, or $γ$ for the M-spline and B-spline models) is specified via the prior_aux argument to stan_surv. Choices of prior distribution include:

a Dirichlet prior is allowed for the M-spline coefficients $γ$
a half-normal, half-t, half-Cauchy or exponential prior is allowed for the Weibull shape parameter $γ$
a half-normal, half-t, half-Cauchy or exponential prior is allowed for the Gompertz scale parameter $γ$
a normal, t, or Cauchy prior is allowed for the B-spline coefficients $γ$

These choices are described in greater detail in the stan_surv or priors help file.

The prior distribution for the intercept parameter in the linear predictor is specified via the prior_intercept argument to stan_surv. Choices include the normal, t, or Cauchy distributions. The default is a normal distribution with mean zero and scale 20. Note that – internally (but not in the reported parameter estimates) – the prior is placed on the intercept after centering the predictors at their sample means and after applying a constant shift of $\log (\frac{E}{T})$ where $E$ is the total number of events and $T$ is the total follow up time. For example, a prior specified by the user as prior_intercept = normal(0,20) is in fact not centered on an intercept of zero when all predictors are at their sample means, but rather, it is centered on the log crude event rate when all predictors are at their means. This is intended to help with numerical stability and sampling, but does not impact on the reported estimates (i.e. the intercept is back-transformed before being returned to the user).

The choice of prior distribution for the time-fixed coefficients $θ_{p 0}$ ( $p = 1, . . ., P$ ) is specified via the prior argument to stan_surv. This can any of the standard prior distributions allowed for regression coefficients in the rstanarm package; see the priors vignette and the stan_surv help file for details.

The additional coefficients required for estimating time-varying effects (i.e. the B-spline coefficients or the interval-specific deviations in the piecewise constant function) are given a random walk prior of the form $θ_{p, 1} \sim N (0, 1)$ and $θ_{p, m} \sim N (θ_{p, m - 1}, τ_{p})$ for $m = 2, . . ., M$ , where $M$ is the total number of cubic B-spline basis terms. The prior distribution for the hyperparameter $τ_{p}$ is specified via the prior_smooth argument to stan_surv. Lower values of $τ_{p}$ lead to a less flexible (i.e. smoother) function. Choices of prior distribution for the hyperparameter $τ_{p}$ include an exponential, half-normal, half-t, or half-Cauchy distribution, and these are detailed in the stan_surv help file.

Usage examples

Example: A flexible parametric proportional hazards model

We will use the German Breast Cancer Study Group dataset (see ?rstanarm-datasets for details and references). In brief, the data consist of $N = 686$ patients with primary node positive breast cancer recruited between 1984-1989. The primary response is time to recurrence or death. Median follow-up time was 1084 days. Overall, there were 299 (44%) events and the remaining 387 (56%) individuals were right censored. We concern our analysis here with a 3-category baseline covariate for cancer prognosis (good/medium/poor).

First, let us load the data and fit the proportional hazards model

mod1 <- stan_surv(formula = Surv(recyrs, status) ~ group, 
                  data    = bcancer, 
                  chains  = CHAINS, 
                  cores   = CORES, 
                  seed    = SEED,
                  iter    = ITER)

The model here is estimated using the default cubic M-splines (with 5 degrees of freedom) for modelling the baseline hazard. Since there are no time-varying effects in the model (i.e. we did not wrap any covariates in the tve() function) there is a closed form expression for the cumulative hazard and survival function and so the model is relatively fast to fit. Specifically, the model takes ~3.5 sec for each MCMC chain based on the default 2000 (1000 warm up, 1000 sampling) MCMC iterations.

We can easily obtain the estimated hazard ratios for the 3-catgeory group covariate using the generic print method for stansurv objects, as follows

print(mod1, digits = 3)

stan_surv
 baseline hazard: M-splines on hazard scale
 formula:         Surv(recyrs, status) ~ group
 observations:    686
 events:          299 (43.6%)
 right censored:  387 (56.4%)
 delayed entry:   no
------
                Median MAD_SD exp(Median)
(Intercept)     -0.671  0.202     NA     
groupMedium      0.798  0.200  2.222     
groupPoor        1.585  0.169  4.880     
m-splines-coef1  0.001  0.001     NA     
m-splines-coef2  0.006  0.005     NA     
m-splines-coef3  0.147  0.029     NA     
m-splines-coef4  0.268  0.055     NA     
m-splines-coef5  0.102  0.078     NA     
m-splines-coef6  0.214  0.142     NA     
m-splines-coef7  0.235  0.164     NA     

------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

We see from this output we see that individuals in the groups with Poor or Medium prognosis have much higher rates of death relative to the group with Good prognosis (as we might expect!). The hazard of death in the Poor prognosis group is approximately 5.0-fold higher than the hazard of death in the Good prognosis group. Similarly, the hazard of death in the Medium prognosis group is approximately 2.3-fold higher than the hazard of death in the Good prognosis group.

It may also be of interest to compare the different types of the baseline hazard we could potentially use. Here, we will fit a series of models, each with a different baseline hazard specification

mod1_exp      <- update(mod1, basehaz = "exp")
mod1_weibull  <- update(mod1, basehaz = "weibull")
mod1_gompertz <- update(mod1, basehaz = "gompertz")
mod1_bspline  <- update(mod1, basehaz = "bs")
mod1_mspline1 <- update(mod1, basehaz = "ms")
mod1_mspline2 <- update(mod1, basehaz = "ms", basehaz_ops = list(df = 10))

and then plot the baseline hazards with 95% posterior uncertainty limits using the generic plot method for stansurv objects (note that the default plot for stansurv objects is the estimated baseline hazard). We will write a little helper function to adjust the y-axis limits, add a title, and centre the title, on each plot, as follows

library(ggplot2)

plotfun <- function(model, title) {
  plot(model, plotfun = "basehaz") +              # plot baseline hazard
    coord_cartesian(ylim = c(0,0.4)) +            # adjust y-axis limits
    labs(title = title) +                         # add plot title
    theme(plot.title = element_text(hjust = 0.5)) # centre plot title
}

p_exp      <- plotfun(mod1_exp,      title = "Exponential")
p_weibull  <- plotfun(mod1_weibull,  title = "Weibull")
p_gompertz <- plotfun(mod1_gompertz, title = "Gompertz")
p_bspline  <- plotfun(mod1_bspline,  title = "B-splines with df = 5")
p_mspline1 <- plotfun(mod1_mspline1, title = "M-splines with df = 5")
p_mspline2 <- plotfun(mod1_mspline2, title = "M-splines with df = 10")

bayesplot::bayesplot_grid(p_exp,
                          p_weibull,
                          p_gompertz,
                          p_bspline,
                          p_mspline1,
                          p_mspline2,
                          grid_args = list(ncol = 3))

We can also compare the fit of these models using the loo method for stansurv objects

compare_models(loo(mod1_exp),
               loo(mod1_weibull),
               loo(mod1_gompertz),
               loo(mod1_bspline),
               loo(mod1_mspline1),
               loo(mod1_mspline2))


Model comparison: 
(ordered by highest ELPD)

              elpd_diff se_diff
mod1_mspline1   0.0       0.0  
mod1_bspline   -1.2       1.3  
mod1_mspline2  -1.9       1.7  
mod1_weibull  -18.7       5.5  
mod1_gompertz -32.3       6.0  
mod1_exp      -36.7       5.6

where we see that models with a flexible parametric (spline-based) baseline hazard fit the data best followed by the standard parametric (Weibull, Gompertz, exponential) models. Roughly speaking, the B-spline and M-spline models seem to fit the data equally well since the differences in elpd or looic between the models are very small relative to their standard errors. Moreover, increasing the degrees of freedom for the M-splines from 5 to 10 doesn’t seem to improve the fit (that is, the default degrees of freedom df = 5 seems to provide sufficient flexibility to model the baseline hazard).

After fitting the survival model, we often want to estimate the predicted survival function for individual’s with different covariate patterns. Here, let us estimate the predicted survival function between 0 and 5 years for an individual in each of the prognostic groups. To do this, we can use the posterior_survfit method for stansurv objects, and it’s associated plot method. First let us construct the prediction (covariate) data

nd <- data.frame(group = c("Good", "Medium", "Poor"))
head(nd)

   group
1   Good
2 Medium
3   Poor

and then we will generate the posterior predictions

ps <- posterior_survfit(mod1, newdata = nd, times = 0, extrapolate = TRUE,
                        control = list(edist = 5))
head(ps)

stan_surv predictions
 num. individuals: 3 
 prediction type:  event free probability 
 standardised?:    no 
 conditional?:     no 

  id cond_time   time median  ci_lb  ci_ub
1  1        NA 0.0000 1.0000 1.0000 1.0000
2  1        NA 0.3571 0.9966 0.9937 0.9979
3  1        NA 0.7143 0.9838 0.9769 0.9882
4  1        NA 1.0714 0.9604 0.9463 0.9711
5  1        NA 1.4286 0.9316 0.9085 0.9486
6  1        NA 1.7857 0.9034 0.8750 0.9271

Here we note that the id variable in the data frame of posterior predictions identifies which row of newdata the predictions correspond to. For demonstration purposes we have also shown a couple of other arguments in the posterior_survfit call, namely

the times = 0 argument says that we want to predict at time = 0 (i.e. baseline) for each individual in the newdata (this is the default anyway)
the extrapolate = TRUE argument says that we want to extrapolate forward from time 0 (this is also the default)
the control = list(edist = 5) identifies the control of the extrapolation; this is saying extrapolate the survival function forward from time 0 for a distance of 5 time units (the default would have been to extrapolate as far as the largest event or censoring time in the estimation dataset, which is 7.28 years in the brcancer data).

Let us now plot the survival predictions. We will relabel the id variable with meaningful labels identifying the covariate profile of each new individual in our prediction data

panel_labels <- c('1' = "Good", '2' = "Medium", '3' = "Poor")
plot(ps) + 
  ggplot2::facet_wrap(~ id, labeller = ggplot2::labeller(id = panel_labels))

We can see from the plot that predicted survival is worst for patients with a Poor diagnosis, and best for patients with a Good diagnosis, as we would expect based on our previous model estimates.

Alternatively, if we wanted to obtain and plot the predicted hazard function for each individual in our new data (instead of their survival function), then we just need to specify type = "haz" in our posterior_survfit call (the default is type = "surv"), as follows

ph <- posterior_survfit(mod1, newdata = nd, type = "haz")
plot(ph) + 
  ggplot2::facet_wrap(~ id, labeller = ggplot2::labeller(id = panel_labels))

We can quite clearly see in the plot the assumption of proportional hazards. We can also see that the hazard is highest in the Poor prognosis group (i.e. worst survival) and the hazard is lowest in the Good prognosis group (i.e. best survival). This corresponds to what we saw in the plot of the survival functions previously.

Example: Non-proportional hazards modelled using B-splines

To demonstrate the implementation of time-varying effects in stan_surv we will use a simulated dataset, generated using the simsurv package (Brilleman, 2018).

We will simulate a dataset with $N = 200$ individuals with event times generated under the following Weibull hazard function

\begin{aligned} h_{i} (t) = γ t^{γ - 1} λ \exp (β (t) x_{i}) \end{aligned}

with scale parameter

λ = 0.1

, shape parameter

γ = 1.5

, binary baseline covariate

X_{i} \sim Bern (0.5)

, and time-varying hazard ratio

β (t) = - 0.5 + 0.2 t

. We will enforce administrative censoring at 5 years if an individual’s simulated event time is >5 years.

# load package
library(simsurv)

# set seed for reproducibility
set.seed(999111)

# simulate covariate data
covs <- data.frame(id  = 1:200, 
                   trt = rbinom(200, 1L, 0.5))

# simulate event times
dat  <- simsurv(lambdas = 0.1, 
                gammas  = 1.5, 
                betas   = c(trt = -0.5),
                tde     = c(trt = 0.2),
                x       = covs, 
                maxt    = 5)

# merge covariate data and event times
dat  <- merge(dat, covs)

# examine first few rows of data
head(dat)

  id eventtime status trt
1  1  3.202809      1   0
2  2  4.907130      1   1
3  3  4.453174      1   1
4  4  2.566302      1   0
5  5  5.000000      0   1
6  6  4.262667      1   1

Now that we have our simulated dataset, let us fit a model with time-varying hazard ratio for trt

mod2 <- stan_surv(formula = Surv(eventtime, status) ~ tve(trt), 
                  data    = dat, 
                  chains  = CHAINS, 
                  cores   = CORES, 
                  seed    = SEED,
                  iter    = ITER)

The tve function is used in the model formula to state that we want a time-varying effect (i.e. a time-varying coefficient) to be estimated for the variable trt. By default, a cubic B-spline basis with 3 degrees of freedom (i.e. two boundary knots placed at the limits of the range of event times, but no internal knots) is used for modelling the time-varying log hazard ratio. If we wanted to change the degree, knot locations, or degrees of freedom for the B-spline function we can specify additional arguments to the tve function.

For example, to model the time-varying log hazard ratio using quadratic B-splines with 4 degrees of freedom (i.e. two boundary knots placed at the limits of the range of event times, as well as two internal knots placed – by default – at the 33.3rd and 66.6th percentiles of the distribution of uncensored event times) we could specify the model formula as

Surv(eventtime, status) ~ tve(trt, df = 4, degree = 2)

Let us now plot the estimated time-varying hazard ratio from the fitted model. We can do this using the generic plot method for stansurv objects, for which we can specify the plotfun = "tve" argument. (Note that in this case, there is only one covariate in the model with a time-varying effect, but if there were others, we could specify which covariate(s) we want to plot the time-varying effect for by specifying the pars argument to the plot call).

plot(mod2, plotfun = "tve")

From the plot, we can see how the hazard ratio (i.e. the effect of treatment on the hazard of the event) changes as a function of time. The treatment appears to be protective during the first few years following baseline (i.e. HR < 1), and then the treatment appears to become harmful after about 4 years post-baseline. Thankfully, this is a reflection of the model we simulated under!

The plot shows a large amount of uncertainty around the estimated time-varying hazard ratio. This is to be expected, since we only simulated a dataset of 200 individuals of which only around 70% experienced the event before being censored at 5 years. So, there is very little data (i.e. very few events) with which to reliably estimate the time-varying hazard ratio. We can also see this reflected in the differences between our data generating model and the estimates from our fitted model. In our data generating model, the time-varying hazard ratio equals 1 (i.e. the log hazard ratio equals 0) at 2.5 years, but in our fitted model the median estimate for our time-varying hazard ratio equals 1 at around ~3 years. This is a reflection of the large amount of sampling error, due to our simulated dataset being so small.

Example: Non-proportional hazards modelled using a piecewise constant function

In the previous example we showed how non-proportional hazards can be modelled by using a smooth B-spline function for the time-varying log hazard ratio. This is the default approach when the tve function is used to estimate a time-varying effect for a covariate in the model formula. However, another approach for modelling a time-varying log hazard ratio is to use a piecewise constant function. If we want to use a piecewise constant for the time-varying log hazard ratio (instead of the smooth B-spline function) then we just have to specify the type argument to the tve function.

We will again simulate some survival data using the simsurv package to show how a piecewise constant hazard ratio can be estimated using stan_surv.

Similar to the previous example, we will simulate a dataset with $N = 500$ individuals with event times generated under a Weibull hazard function with scale parameter $λ = 0.1$ , shape parameter $γ = 1.5$ , and binary baseline covariate $X_{i} \sim Bern (0.5)$ . However, in this example our time-varying hazard ratio will be defined as $β (t) = - 0.5 + 0.7 \times I (t > 2.5)$ where $I (X)$ is the indicator function taking the value 1 if $X$ is true and 0 otherwise. This corresponds to a piecewise constant log hazard ratio with just two “pieces” or time intervals. The first time interval is $[0, 2.5]$ during which the true hazard ratio is $\exp (- 0.5) = 0.61$ . The second time interval is $(2.5, \infty]$ during which the true log hazard ratio is $\exp (- 0.5 + 0.7) = 1.22$ . Our example uses only two time intervals for simplicity, but in general we could easily have considered more (although it would have required couple of additional lines of code to simulate the data). We will again enforce administrative censoring at 5 years if an individual’s simulated event time is >5 years.

# load package
library(simsurv)

# set seed for reproducibility
set.seed(888222)

# simulate covariate data
covs <- data.frame(id  = 1:500, 
                   trt = rbinom(500, 1L, 0.5))

# simulate event times
dat  <- simsurv(lambdas = 0.1, 
                gammas  = 1.5, 
                betas   = c(trt = -0.5),
                tde     = c(trt = 0.7),
                tdefun  = function(t) (t > 2.5),
                x       = covs, 
                maxt    = 5)

# merge covariate data and event times
dat  <- merge(dat, covs)

# examine first few rows of data
head(dat)

  id eventtime status trt
1  1  4.045842      1   0
2  2  5.000000      0   1
3  3  5.000000      0   1
4  4  5.000000      0   1
5  5  2.773858      1   0
6  6  1.007948      1   0

We now estimate a model with a piecewise constant time-varying effect for the covariate trt as

mod3 <- stan_surv(formula = Surv(eventtime, status) ~ 
                    tve(trt, type = "pw", knots = 2.5),
                  data    = dat, 
                  chains  = CHAINS, 
                  cores   = CORES, 
                  seed    = SEED,
                  iter    = ITER)

This time we specify some additional arguments to the tve function, so that our time-varying effect corresponds to the true data generating model used to simulate our event times. Specifically, we specify type = "pw" to say that we want the time-varying effect (i.e. the time-varying log hazard ratio) to be estimated using a piecewise constant function and knots = 2.5 says that we only want one internal knot placed at the time $t = 2.5$ .

We can again use the generic plot function with argument plotfun = "tve" to examine our estimated hazard ratio for treatment

plot(mod3, plotfun = "tve")

Here we see that the estimated hazard ratio reasonably reflects our true data generating model (i.e. a hazard ratio of $\approx 0.6$ during the first time interval and a hazard ratio of $\approx 1.2$ during the second time interval) although there is a slight discrepancy due to the sampling variation in the simulated event times.

Example: Hierarchical survival models

To demonstrate the estimation of a hierarchical model for survival data in stan_surv we will use the frail dataset (see help("rstanarm-datasets") for a description). The frail datasets contains simulated event times for 200 patients clustered within 20 hospital sites (10 patients per hospital site). The event times are simulated from a parametric proportional hazards model under the following assumptions: (i) a constant (i.e. exponential) baseline hazard rate of 0.1; (ii) a fixed treatment effect with log hazard ratio of 0.3; and (iii) a site-specific random intercept (specified on the log hazard scale) drawn from a N(0,1) distribution.

Let’s look at the first few rows of the data:

head(frail)

  id site trt         b eventtime status
1  1    1   0 0.4229517 0.9058188      1
2  2    1   1 0.4229517 5.9190576      1
3  3    1   0 0.4229517 7.8525219      1
4  4    1   0 0.4229517 1.2066141      1
5  5    1   1 0.4229517 1.1703645      1
6  6    1   0 0.4229517 2.6209007      1

To fit a hierarchical model for clustered survival data we use a formula syntax similar to what is used in the lme4 R package (Bates et al. (2015)). Let’s consider the following model (which aligns with the model used to generate the simulated data):

mod_randint <- stan_surv(
  formula = Surv(eventtime, status) ~ trt + (1 | site),
  data    = frail,
  basehaz = "exp",
  chains  = CHAINS, 
  cores   = CORES, 
  seed    = SEED,
  iter    = ITER)

The model contains a baseline covariate for treatment (0 or 1) as well as a site-specific intercept to allow for correlation in the event times for patients from the same site. We’ve call the model object mod_randint to denote the fact that it includes a site-specific (random) intercept. Let’s examine the parameter estimates from the model:

print(mod_randint, digits = 2)

stan_surv
 baseline hazard: exponential
 formula:         Surv(eventtime, status) ~ trt + (1 | site)
 observations:    200
 events:          152 (76%)
 right censored:  48 (24%)
 delayed entry:   no
------
            Median MAD_SD exp(Median)
(Intercept) -2.30   0.31     NA      
trt          0.46   0.19   1.59      

Error terms:
 Groups Name        Std.Dev.
 site   (Intercept) 1.15    
Num. levels: site 20 

------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

We see that the estimated log hazard ratio for treatment ( ${\hat{β}}_{(trt)} = 0.46$ ) is a bit larger than the “true” log hazard ratio used in the data generating model ( $β_{(trt)} = 0.3$ ). The estimated baseline hazard rate is $\exp (- 2.3716) = 0.093$ , which is pretty close to the baseline hazard rate used in the data generating model ( $0.1$ ). Of course, the differences between the estimated parameters and the true parameters from the data generating model are attributable to sampling noise.

If this were a real analysis, we might wonder whether the site-specific estimates are necessary! Well, we can assess that by fitting an alternative model that does not include the site-specific intercepts and compare it to the model we just estimated. We will compare it using the loo function. We first need to fit the model without the site-specific intercept. To do this, we will just use the generic update method for stansurv objects, since all we are changing is the model formula:

mod_fixed <- update(mod_randint, formula. = Surv(eventtime, status) ~ trt)

Let’s calculate the loo for both these models and compare them:

loo_fixed   <- loo(mod_fixed)
loo_randint <- loo(mod_randint)
compare_models(loo_fixed, loo_randint)


Model comparison: 
(negative 'elpd_diff' favors 1st model, positive favors 2nd) 

elpd_diff        se 
     56.8       9.7

We see strong evidence in favour of the model with the site-specific intercepts!

But let’s not quite finish there. What about if we want to generalise the random effects structure further. For instance, is the site-specific intercept enough? Perhaps we should consider estimating both a site-specific intercept and a site-specific treatment effect. We have minimal data to estimate such a model (recall that there is only 20 sites and 10 patients per site) but for the sake of demonstration we will forge on nonetheless. Let’s fit a model with both a site-specific intercept and a site-specific coefficient for the covariate trt (i.e. treatment):

mod_randtrt <- update(mod_randint, formula. = 
                        Surv(eventtime, status) ~ trt + (trt | site)) 
print(mod_randtrt, digits = 2)

stan_surv
 baseline hazard: exponential
 formula:         Surv(eventtime, status) ~ trt + (trt | site)
 observations:    200
 events:          152 (76%)
 right censored:  48 (24%)
 delayed entry:   no
------
            Median MAD_SD exp(Median)
(Intercept) -2.27   0.26     NA      
trt          0.47   0.20   1.60      

Error terms:
 Groups Name        Std.Dev. Corr 
 site   (Intercept) 1.104         
        trt         0.443    -0.23
Num. levels: site 20 

------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

We see that we have an estimated standard deviation for the site-specific intercepts and the site-specific coefficients for trt, as well as the estimated correlation between those site-specific parameters.

Let’s now compare all three of these models based on loo:

loo_randtrt <- loo(mod_randtrt)
compare_models(loo_fixed, loo_randint, loo_randtrt)


Model comparison: 
(ordered by highest ELPD)

            elpd_diff se_diff
mod_randint   0.0       0.0  
mod_randtrt  -0.4       0.8  
mod_fixed   -56.8       9.7

It appears that the model with just a site-specific intercept is the best fitting model. It is much better than the model without a site-specific intercept, and slightly better than the model with both a site-specific intercept and a site-specific treatment effect. In other words, including a site-specific intercept appears important, but including a site-specific treatment effect is not. This conclusion is reassuring, because it aligns with the data generating model we used to simulate the data!

References

Bates D, Maechler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 2015;67(1):1–48.

Brilleman S. (2018) simsurv: Simulate Survival Data. R package version 0.2.2.

Hougaard P. Fundamentals of Survival Data. Biometrics 1999;55:13–22.

Ramsay JO. Monotone Regression Splines in Action. Statistical Science 1988;3(4):425–461.

Wang W, Yan J. (2018) splines2: Regression Spline Functions and Classes. R package version 0.2.8.

Appendix A: Parameterisations on the hazard scale

When basehaz is set equal to "exp", "weibull", "gompertz", "ms" (the default), or "bs" then the model is defined on the hazard scale using the following parameterisations.

Exponential model

The exponential model is parameterised with scale parameter $λ_{i} = \exp (η_{i})$ where $η_{i} = β_{0} + \sum_{p = 1}^{P} β_{p} x_{i p}$ denotes our linear predictor.

For individual $i$ we have:

\begin{aligned} \begin{aligned} h_{i} (T_{i}) & = λ_{i} \\ = \exp (η_{i}) \\ H_{i} (T_{i}) & = T_{i} λ_{i} \\ = T_{i} \exp (η_{i}) \\ S_{i} (T_{i}) & = \exp (- T_{i} λ_{i}) \\ = \exp (- T_{i} \exp (η_{i})) \\ F_{i} (T_{i}) & = 1 - \exp (- T_{i} λ_{i}) \\ = 1 - \exp (- T_{i} \exp (η_{i})) \\ S_{i} (T_{i}) - S_{i} (T_{i}^{U}) & = \exp (- T_{i} λ_{i}) - \exp (- T_{i}^{U} λ_{i}) \\ = \exp (- T_{i} \exp (η_{i})) - \exp (- T_{i}^{U} \exp (η_{i})) \end{aligned} \end{aligned}

or on the log scale:

\begin{aligned} \begin{aligned} \log h_{i} (T_{i}) & = \log λ_{i} \\ = η_{i} \\ \log H_{i} (T_{i}) & = \log (T_{i}) + \log λ_{i} \\ = \log (T_{i}) + η_{i} \\ \log S_{i} (T_{i}) & = - T_{i} λ_{i} \\ = - T_{i} \exp (η_{i}) \\ \log F_{i} (T_{i}) & = \log (1 - \exp (- T_{i} λ_{i})) \\ = \log (1 - \exp (- T_{i} \exp (η_{i}))) \\ \log (S_{i} (T_{i}) - S_{i} (T_{i}^{U})) & = \log [\exp (- T_{i} λ_{i}) - \exp (- T_{i}^{U} λ_{i})] \\ = \log [\exp (- T_{i} \exp (η_{i})) - \exp (- T_{i}^{U} \exp (η_{i}))] \end{aligned} \end{aligned}

The definition of $λ$ for the baseline is:

\begin{aligned} \begin{aligned} λ_{0} = \exp (β_{0}) ⟺ β_{0} = \log (λ_{0}) \end{aligned} \end{aligned}

Weibull model

The Weibull model is parameterised with scale parameter $λ_{i} = \exp (η_{i})$ and shape parameter $γ > 0$ where $η_{i} = β_{0} + \sum_{p = 1}^{P} β_{p} x_{i p}$ denotes our linear predictor.

For individual $i$ we have:

\begin{aligned} \begin{aligned} h_{i} (T_{i}) & = γ t^{γ - 1} λ_{i} \\ = γ t^{γ - 1} \exp (η_{i}) \\ H_{i} (T_{i}) & = T_{i}^{γ} λ_{i} \\ = T_{i}^{γ} \exp (η_{i}) \\ S_{i} (T_{i}) & = \exp (- T_{i}^{γ} λ_{i}) \\ = \exp (- T_{i}^{γ} \exp (η_{i})) \\ F_{i} (T_{i}) & = 1 - \exp (- {(T_{i})}^{γ} λ_{i}) \\ = 1 - \exp (- {(T_{i})}^{γ} \exp (η_{i})) \\ S_{i} (T_{i}) - S_{i} (T_{i}^{U}) & = \exp (- {(T_{i})}^{γ} λ_{i}) - \exp (- {(T_{i}^{U})}^{γ} λ_{i}) \\ = \exp (- {(T_{i})}^{γ} \exp (η_{i})) - \exp (- {(T_{i}^{U})}^{γ} \exp (η_{i})) \end{aligned} \end{aligned}

or on the log scale:

\begin{aligned} \begin{aligned} \log h_{i} (T_{i}) & = \log (γ) + (γ - 1) \log (t) + \log λ_{i} \\ = \log (γ) + (γ - 1) \log (t) + η_{i} \\ \log H_{i} (T_{i}) & = γ \log (T_{i}) + \log λ_{i} \\ = γ \log (T_{i}) + η_{i} \\ \log S_{i} (T_{i}) & = - T_{i}^{γ} λ_{i} \\ = - T_{i}^{γ} \exp (η_{i}) \\ \log F_{i} (T_{i}) & = \log (1 - \exp (- {(T_{i})}^{γ} λ_{i})) \\ = \log (1 - \exp (- {(T_{i})}^{γ} \exp (η_{i}))) \\ \log (S_{i} (T_{i}) - S_{i} (T_{i}^{U})) & = \log [\exp (- {(T_{i})}^{γ} λ_{i}) - \exp (- {(T_{i}^{U})}^{γ} λ_{i})] \\ = \log [\exp (- {(T_{i})}^{γ} \exp (η_{i})) - \exp (- {(T_{i}^{U})}^{γ} \exp (η_{i}))] \end{aligned} \end{aligned}

The definition of $λ$ for the baseline is:

\begin{aligned} \begin{aligned} λ_{0} = \exp (β_{0}) ⟺ β_{0} = \log (λ_{0}) \end{aligned} \end{aligned}

Gompertz model

The Gompertz model is parameterised with shape parameter $λ_{i} = \exp (η_{i})$ and scale parameter $γ > 0$ where $η_{i} = β_{0} + \sum_{p = 1}^{P} β_{p} x_{i p}$ denotes our linear predictor.

For individual $i$ we have:

\begin{aligned} \begin{aligned} h_{i} (T_{i}) & = \exp (γ T_{i}) λ_{i} \\ = \exp (γ T_{i}) \exp (η_{i}) \\ H_{i} (T_{i}) & = \frac{\exp (γ T_{i}) - 1}{γ} λ_{i} \\ = \frac{\exp (γ T_{i}) - 1}{γ} \exp (η_{i}) \\ S_{i} (T_{i}) & = \exp (\frac{- (\exp (γ T_{i}) - 1)}{γ} λ_{i}) \\ = \exp (\frac{- (\exp (γ T_{i}) - 1)}{γ} \exp (η_{i})) \\ F_{i} (T_{i}) & = 1 - \exp (\frac{- (\exp (γ T_{i}) - 1)}{γ} λ_{i}) \\ = 1 - \exp (\frac{- (\exp (γ T_{i}) - 1)}{γ} \exp (η_{i})) \\ S_{i} (T_{i}) - S_{i} (T_{i}^{U}) & = \exp (\frac{- (\exp (γ T_{i}) - 1)}{γ} λ_{i}) - \exp (\frac{- (\exp (γ T_{i}^{U}) - 1)}{γ} λ_{i}) \\ = \exp (\frac{- (\exp (γ T_{i}) - 1)}{γ} \exp (η_{i})) - \exp (\frac{- (\exp (γ T_{i}^{U}) - 1)}{γ} \exp (η_{i})) \end{aligned} \end{aligned}

or on the log scale:

\begin{aligned} \begin{aligned} \log h_{i} (T_{i}) & = γ T_{i} + \log λ_{i} \\ = γ T_{i} + η_{i} \\ \log H_{i} (T_{i}) & = \log (\exp (γ T_{i}) - 1) - \log (γ) + \log λ_{i} \\ = \log (\exp (γ T_{i}) - 1) - \log (γ) + η_{i} \\ \log S_{i} (T_{i}) & = \frac{- (\exp (γ T_{i}) - 1)}{γ} λ_{i} \\ = \frac{- (\exp (γ T_{i}) - 1)}{γ} \exp (η_{i}) \\ \log F_{i} (T_{i}) & = \log (1 - \exp (\frac{- (\exp (γ T_{i}) - 1)}{γ} λ_{i})) \\ = \log (1 - \exp (\frac{- (\exp (γ T_{i}) - 1)}{γ} \exp (η_{i}))) \\ \log (S_{i} (T_{i}) - S_{i} (T_{i}^{U})) & = \log [\exp (\frac{- (\exp (γ T_{i}) - 1)}{γ} λ_{i}) - \exp (\frac{- (\exp (γ T_{i}^{U}) - 1)}{γ} λ_{i})] \\ = \log [\exp (\frac{- (\exp (γ T_{i}) - 1)}{γ} \exp (η_{i})) - \exp (\frac{- (\exp (γ T_{i}^{U}) - 1)}{γ} \exp (η_{i}))] \end{aligned} \end{aligned}

The definition of $λ$ for the baseline is:

\begin{aligned} \begin{aligned} λ_{0} = \exp (β_{0}) ⟺ β_{0} = \log (λ_{0}) \end{aligned} \end{aligned}

M-spline model

The M-spline model is parameterised with vector of regression coefficients $θ > 0$ for the baseline hazard and with covariate effects introduced through a linear predictor $η_{i} = \sum_{p = 1}^{P} β_{p} x_{i p}$ . Note that there is no intercept in the linear predictor since it is absorbed into the baseline hazard spline function.

For individual $i$ we have:

\begin{aligned} \begin{aligned} h_{i} (T_{i}) & = M (T_{i}; θ, k_{0}) \exp (η_{i}) \\ H_{i} (T_{i}) & = I (T_{i}; θ, k_{0}) \exp (η_{i}) \\ S_{i} (T_{i}) & = \exp (- I (T_{i}; θ, k_{0}) \exp (η_{i})) \\ F_{i} (T_{i}) & = 1 - \exp (- I (T_{i}; θ, k_{0}) \exp (η_{i})) \\ S_{i} (T_{i}) - S_{i} (T_{i}^{U}) & = \exp (- I (T_{i}; θ, k_{0}) \exp (η_{i})) - \exp (- I (T_{i}^{U}; θ, k_{0}) \exp (η_{i})) \end{aligned} \end{aligned}

or on the log scale:

\begin{aligned} \begin{aligned} \log h_{i} (T_{i}) & = \log (M (T_{i}; θ, k_{0})) + η_{i} \\ \log H_{i} (T_{i}) & = \log (I (T_{i}; θ, k_{0})) + η_{i} \\ \log S_{i} (T_{i}) & = - I (T_{i}; θ, k_{0}) \exp (η_{i}) \\ \log F_{i} (T_{i}) & = \log [1 - \exp (- I (T_{i}; θ, k_{0}) \exp (η_{i}))] \\ \log (S_{i} (T_{i}) - S_{i} (T_{i}^{U})) & = \log [\exp (- I (T_{i}; θ, k_{0}) \exp (η_{i})) - \exp (- I (T_{i}^{U}; θ, k_{0}) \exp (η_{i}))] \end{aligned} \end{aligned}

where $M (t; θ, k_{0})$ denotes a cubic M-spline function evaluated at time $t$ with regression coefficients $θ$ and basis evaluated using the vector of knot locations $k_{0})$ . Similarly, $I (t; θ, k_{0})$ denotes a cubic I-spline function (i.e. integral of an M-spline) evaluated at time $t$ with regression coefficients $θ$ and basis evaluated using the vector of knot locations $k_{0}$ .

B-spline model

The B-spline model is parameterised with vector of regression coefficients $θ$ and linear predictor where $η_{i} = \sum_{p = 1}^{P} β_{p} x_{i p}$ denotes our linear predictor. Note that there is no intercept in the linear predictor since it is absorbed into the spline function.

For individual $i$ we have:

\begin{aligned} \begin{aligned} h_{i} (T_{i}) & = \exp (B (T_{i}; θ, k_{0}) + η_{i}) \end{aligned} \end{aligned}

or on the log scale:

\begin{aligned} \begin{aligned} \log h_{i} (T_{i}) & = B (T_{i}; θ, k_{0}) + η_{i} \end{aligned} \end{aligned}

The cumulative hazard, survival function, and CDF for the B-spline model cannot be calculated analytically. Instead, the model is only defined analytically on the hazard scale and quadrature is used to evaluate the following:

\begin{aligned} \begin{aligned} H_{i} (T_{i}) & = \int_{0}^{T_{i}} h_{i} (u) d u \\ S_{i} (T_{i}) & = \exp (- \int_{0}^{T_{i}} h_{i} (u) d u) \\ F_{i} (T_{i}) & = 1 - \exp (- \int_{0}^{T_{i}} h_{i} (u) d u) \\ S_{i} (T_{i}) - S_{i} (T_{i}^{U}) & = \exp (- \int_{0}^{T_{i}} h_{i} (u) d u) - \exp (- \int_{0}^{T_{i}^{U}} h_{i} (u) d u) \end{aligned} \end{aligned}

Extension to time-varying coefficients (i.e. non-proportional hazards)

We can extend the previous model formulations to allow for time-varying coefficients (i.e. non-proportional hazards). The time-varying linear predictor is introduced on the hazard scale. That is, $η_{i}$ in our previous model definitions is instead replaced by $η_{i} (t)$ . This leads to an analytical form for the hazard and log hazard. However, in general, there is no longer a closed form expression for the cumulative hazard, survival function, or CDF. Therefore, when the linear predictor includes time-varying coefficients, quadrature is used to evaluate the following:

\begin{aligned} \begin{aligned} H_{i} (T_{i}) & = \int_{0}^{T_{i}} h_{i} (u) d u \\ S_{i} (T_{i}) & = \exp (- \int_{0}^{T_{i}} h_{i} (u) d u) \\ F_{i} (T_{i}) & = 1 - \exp (- \int_{0}^{T_{i}} h_{i} (u) d u) \\ S_{i} (T_{i}) - S_{i} (T_{i}^{U}) & = \exp (- \int_{0}^{T_{i}} h_{i} (u) d u) - \exp (- \int_{0}^{T_{i}^{U}} h_{i} (u) d u) \end{aligned} \end{aligned}

Appendix B: Parameterisations under accelerated failure times

When basehaz is set equal to "exp-aft", or "weibull-aft" then the model is defined on the accelerated failure time scale using the following parameterisations.

Exponential model

The exponential model is parameterised with scale parameter $λ_{i} = \exp (- η_{i})$ where $η_{i} = β_{0}^{*} + \sum_{p = 1}^{P} β_{p}^{*} x_{i p}$ denotes our linear predictor.

For individual $i$ we have:

\begin{aligned} \begin{aligned} h_{i} (T_{i}) & = λ_{i} \\ = \exp (- η_{i}) \\ H_{i} (T_{i}) & = T_{i} λ_{i} \\ = T_{i} \exp (- η_{i}) \\ S_{i} (T_{i}) & = \exp (- T_{i} λ_{i}) \\ = \exp (- T_{i} \exp (- η_{i})) \\ F_{i} (T_{i}) & = 1 - \exp (- T_{i} λ_{i}) \\ = 1 - \exp (- T_{i} \exp (- η_{i})) \\ S_{i} (T_{i}) - S_{i} (T_{i}^{U}) & = \exp (- T_{i} λ_{i}) - \exp (- T_{i}^{U} λ_{i}) \\ = \exp (- T_{i} \exp (- η_{i})) - \exp (- T_{i}^{U} \exp (- η_{i})) \end{aligned} \end{aligned}

or on the log scale:

\begin{aligned} \begin{aligned} \log h_{i} (T_{i}) & = \log λ_{i} \\ = - η_{i} \\ \log H_{i} (T_{i}) & = \log (T_{i}) + \log λ_{i} \\ = \log (T_{i}) - η_{i} \\ \log S_{i} (T_{i}) & = - T_{i} λ_{i} \\ = - T_{i} \exp (- η_{i}) \\ \log F_{i} (T_{i}) & = \log (1 - \exp (- T_{i} λ_{i})) \\ = \log (1 - \exp (- T_{i} \exp (- η_{i}))) \\ \log (S_{i} (T_{i}) - S_{i} (T_{i}^{U})) & = \log [\exp (- T_{i} λ_{i})) - \exp (- T_{i}^{U} λ_{i})] \\ = \log [\exp (- T_{i} \exp (- η_{i})) - \exp (- T_{i}^{U} \exp (- η_{i}))] \end{aligned} \end{aligned}

The definition of $λ$ for the baseline is:

\begin{aligned} \begin{aligned} λ_{0} = \exp (- β_{0}^{*}) ⟺ β_{0}^{*} = - \log (λ_{0}) \end{aligned} \end{aligned}

The relationship between coefficients under the PH (unstarred) and AFT (starred) parameterisations are as follows:

\begin{aligned} \begin{aligned} β_{0} & = - β_{0}^{*} \\ β_{p} & = - β_{p}^{*} \end{aligned} \end{aligned}

Lastly, the general form for the hazard function and survival function under an AFT model with acceleration factor $\exp (- η_{i})$ can be used to derive the exponential AFT model defined here by setting $h_{0} (t) = 1$ , $S_{0} (t) = \exp (- T_{i})$ , and $λ_{i} = \exp (- η_{i})$ :

\begin{aligned} \begin{aligned} h_{i} (T_{i}) & = \exp (- η_{i}) h_{0} (t \exp (- η_{i})) \\ = \exp (- η_{i}) \\ = λ_{i} \end{aligned} \end{aligned}

\begin{aligned} \begin{aligned} S_{i} (T_{i}) & = S_{0} (t \exp (- η_{i})) \\ = \exp (- T_{i} \exp (- η_{i})) \\ = \exp (- T_{i} λ_{i}) \end{aligned} \end{aligned}

Weibull model

The Weibull model is parameterised with scale parameter $λ_{i} = \exp (- γ η_{i})$ and shape parameter $γ > 0$ where $η_{i} = β_{0}^{*} + \sum_{p = 1}^{P} β_{p}^{*} x_{i p}$ denotes our linear predictor.

For individual $i$ we have:

\begin{aligned} \begin{aligned} h_{i} (T_{i}) & = γ t^{γ - 1} λ_{i} \\ = γ t^{γ - 1} \exp (- γ η_{i}) \\ H_{i} (T_{i}) & = T_{i}^{γ} λ_{i} \\ = T_{i}^{γ} \exp (- γ η_{i}) \\ S_{i} (T_{i}) & = \exp (- T_{i}^{γ} λ_{i}) \\ = \exp (- T_{i}^{γ} \exp (- γ η_{i})) \\ F_{i} (T_{i}) & = 1 - \exp (- {(T_{i})}^{γ} λ_{i}) \\ = 1 - \exp (- {(T_{i})}^{γ} \exp (- γ η_{i})) \\ S_{i} (T_{i}) - S_{i} (T_{i}^{U}) & = \exp (- {(T_{i})}^{γ} λ_{i}) - \exp (- {(T_{i}^{U})}^{γ} λ_{i}) \\ = \exp (- {(T_{i})}^{γ} \exp (- γ η_{i})) - \exp (- {(T_{i}^{U})}^{γ} \exp (- γ η_{i})) \end{aligned} \end{aligned}

or on the log scale:

\begin{aligned} \begin{aligned} \log h_{i} (T_{i}) & = \log (γ) + (γ - 1) \log (t) + \log λ_{i} \\ = \log (γ) + (γ - 1) \log (t) - γ η_{i} \\ \log H_{i} (T_{i}) & = γ \log (T_{i}) + \log λ_{i} \\ = γ \log (T_{i}) - γ η_{i} \\ \log S_{i} (T_{i}) & = - T_{i}^{γ} λ_{i} \\ = - T_{i}^{γ} \exp (- γ η_{i}) \\ \log F_{i} (T_{i}) & = \log (1 - \exp (- {(T_{i})}^{γ} λ_{i})) \\ = \log (1 - \exp (- {(T_{i})}^{γ} \exp (- γ η_{i}))) \\ \log (S_{i} (T_{i}) - S_{i} (T_{i}^{U})) & = \log [\exp (- {(T_{i})}^{γ} λ_{i}) - \exp (- {(T_{i}^{U})}^{γ} λ_{i})] \\ = \log [\exp (- {(T_{i})}^{γ} \exp (- γ η_{i})) - \exp (- {(T_{i}^{U})}^{γ} \exp (- γ η_{i}))] \end{aligned} \end{aligned}

The definition of $λ$ for the baseline is:

\begin{aligned} \begin{aligned} λ_{0} = \exp (- γ β_{0}^{*}) ⟺ β_{0}^{*} = \frac{- \log (λ_{0})}{γ} \end{aligned} \end{aligned}

The relationship between coefficients under the PH (unstarred) and AFT (starred) parameterisations are as follows:

\begin{aligned} \begin{aligned} β_{0} & = - γ β_{0}^{*} \\ β_{p} & = - γ β_{p}^{*} \end{aligned} \end{aligned}

Lastly, the general form for the hazard function and survival function under an AFT model with acceleration factor $\exp (- η_{i})$ can be used to derive the Weibull AFT model defined here by setting $h_{0} (t) = γ t^{γ - 1}$ , $S_{0} (t) = \exp (- T_{i}^{γ})$ , and $λ_{i} = \exp (- γ η_{i})$ :

\begin{aligned} \begin{aligned} h_{i} (T_{i}) & = \exp (- η_{i}) h_{0} (t \exp (- η_{i})) \\ = \exp (- η_{i}) γ {(t \exp (- η_{i}))}^{γ - 1} \\ = \exp (- γ η_{i}) γ t^{γ - 1} \\ = λ_{i} γ t^{γ - 1} \end{aligned} \end{aligned}

\begin{aligned} \begin{aligned} S_{i} (T_{i}) & = S_{0} (t \exp (- η_{i})) \\ = \exp (- (T_{i} \exp (- η_{i}))^{γ}) \\ = \exp (- T_{i}^{γ} [\exp (- η_{i})]^{γ}) \\ = \exp (- T_{i}^{γ} \exp (- γ η_{i})) \\ = \exp (- T_{i} λ_{i}) \end{aligned} \end{aligned}

Extension to time-varying coefficients (i.e. time-varying acceleration factors)

We can extend the previous model formulations to allow for time-varying coefficients (i.e. time-varying acceleration factors).

The so-called “unmoderated” survival probability for an individual at time $t$ is defined as the baseline survival probability at time $t$ , i.e. $S_{i} (t) = S_{0} (t)$ . With a time-fixed acceleration factor, the survival probability for a so-called “moderated” individual is defined as the baseline survival probability but evaluated at “time $t$ multiplied by the acceleration factor $\exp (- η_{i})$ ”. That is, the survival probability for the moderated individual is $S_{i} (t) = S_{0} (t \exp (- η_{i}))$ .

However, with time-varying acceleration we cannot simply multiply time by a fixed (acceleration) constant. Instead, we must integrate the function for the time-varying acceleration factor over the interval $0$ to $t$ . In other words, we must evaluate:

\begin{aligned} \begin{aligned} S_{i} (t) = S_{0} (\int_{0}^{t} \exp (- η_{i} (u)) d u) \end{aligned} \end{aligned}

as described by Hougaard (1999).

Hougaard also gives a general expression for the hazard function under time-varying acceleration, as follows:

\begin{aligned} \begin{aligned} h_{i} (t) = \exp (- η_{i} (t)) h_{0} (\int_{0}^{t} \exp (- η_{i} (u)) d u) \end{aligned} \end{aligned}

Note: It is interesting to note here that the hazard at time $t$ is in fact a function of the full history of covariates and parameters (i.e. the linear predictor) from time $0$ up until time $t$ . This is different to the hazard scale formulation of time-varying effects (i.e. non-proportional hazards). Under the hazard scale formulation with time-varying effects, the survival probability is a function of the full history between times $0$ and $t$ , but the hazard is not; instead, the hazard is only a function of covariates and parameters as defined at the current time. This is particularly important to consider when fitting accelerated failure time models with time-varying effects in the presence of delayed entry (i.e. left truncation).

For the exponential distribution, this leads to:

\begin{aligned} \begin{aligned} S_{i} (T_{i}) & = S_{0} (\int_{0}^{T_{i}} \exp (- η_{i} (u)) d u) \\ = \exp (- \int_{0}^{T_{i}} \exp (- η_{i} (u)) d u) \end{aligned} \end{aligned}

\begin{aligned} \begin{aligned} h_{i} (T_{i}) & = \exp (- η_{i} (T_{i})) h_{0} (\int_{0}^{T_{i}} \exp (- η_{i} (u)) d u) \\ = \exp (- η_{i} (T_{i})) \exp (- \int_{0}^{T_{i}} \exp (- η_{i} (u)) d u) \end{aligned} \end{aligned}

and for the Weibull distribution, this leads to:

\begin{aligned} \begin{aligned} S_{i} (T_{i}) & = S_{0} (\int_{0}^{T_{i}} \exp (- η_{i} (u)) d u) \\ = \exp (- {[\int_{0}^{T_{i}} \exp (- η_{i} (u)) d u]}^{γ}) \end{aligned} \end{aligned}

\begin{aligned} \begin{aligned} h_{i} (T_{i}) & = \exp (- η_{i} (T_{i})) h_{0} (\int_{0}^{T_{i}} \exp (- η_{i} (u)) d u) \\ = \exp (- η_{i} (T_{i})) \exp (- {[\int_{0}^{T_{i}} \exp (- η_{i} (u)) d u]}^{γ}) \end{aligned} \end{aligned}

The general expressions for the hazard and survival function under an AFT model with a time-varying linear predictor are used to evaluate the likelihood for the accelerated failure time model in stan_surv when time-varying effects are specified in the model formula. Specifically, quadrature is used to evaluate the cumulative acceleration factor $\int_{0}^{t} \exp (- η_{i} (u)) d u$ and this is then substituted into the relevant expressions for the hazard and survival.

Estimating Survival (Time-to-Event) Models with rstanarm

Sam Brilleman

2019-06-18

Preamble

Introduction

Technical details

Data and notation

The hazard rate, cumulative hazard, and survival probability

Hazard scale formulations

Distributions

Accelerated failure time formulations

Distributions

Time-fixed and time-varying effects of covariates

Likelihood

Priors

Usage examples

Example: A flexible parametric proportional hazards model

Example: Non-proportional hazards modelled using B-splines

Example: Non-proportional hazards modelled using a piecewise constant function

Example: Hierarchical survival models

References

Appendix A: Parameterisations on the hazard scale

Exponential model

Weibull model

Gompertz model

M-spline model

B-spline model

Extension to time-varying coefficients (i.e. non-proportional hazards)

Appendix B: Parameterisations under accelerated failure times

Exponential model

Weibull model

Extension to time-varying coefficients (i.e. time-varying acceleration factors)