Restricted Mean Survival Time Estimation Methodology
Larry F. León
Source:vignettes/articles/weightedrmst_methodology.Rmd
weightedrmst_methodology.RmdIntroduction
In clinical trials comparing two treatments on a time-to-event endpoint, the hazard ratio (HR) from a Cox proportional hazards model is the most commonly reported summary measure. However, the HR has well-known limitations: it requires the proportional hazards (PH) assumption for straightforward interpretation, and even when PH holds, the HR does not have a direct clinical interpretation in units of time (Uno et al. 2014). When PH is violated—as is increasingly recognized in, for example, immuno-oncology trials with delayed treatment effects or crossing survival curves—the estimated HR depends on the censoring distribution and may not converge to a meaningful population parameter (Royston and Parmar 2013).
The restricted mean survival time (RMST) provides an attractive alternative. The RMST up to a truncation time is defined as:
that is, the area under the survival curve from to . This quantity has a clear clinical interpretation as the average time alive (or event-free) during the first units of follow-up. The RMST is model-free, requires no proportional hazards assumption, and is estimable directly from the Kaplan–Meier (KM) curve (Irwin 1949; Kaplan and Meier 1958).
Rather than evaluating the RMST at a single fixed , Zhao et al. (2016) proposed studying the RMST curve as a function of across the range of follow-up, along with inference procedures—including simultaneous confidence bands—for a single RMST curve and for the difference of two RMST curves. This approach provides a comprehensive, temporally resolved picture of how the treatment benefit (or harm) accumulates over time.
This vignette reviews the RMST methodology as implemented in the
weightedsurv package, emphasizing the cumulative RMST curve
approach and its extension to weighted (propensity-score-adjusted)
settings via the cumulative_rmst_bands function.
RMST: Definition and Motivation
Definition and Relationship to the Survival Function
Let denote a non-negative random variable representing the time to an event of interest, with survival function . The restricted mean survival time up to a truncation time is:
Geometrically, is the area under the survival curve on the interval .
The complementary quantity is the restricted mean time lost (RMTL):
which is the area above the survival curve up to . The RMTL represents the average time lost to the event during the interval .
Advantages Over the Hazard Ratio
The RMST has several advantages as a measure of treatment effect (Uno et al. 2014; Royston and Parmar 2013):
- Model-free: No parametric or semi-parametric model is required.
- Time-scale interpretation: Expressed in units of time, making the clinical significance transparent (e.g., “patients on treatment live on average 2.1 months longer during the first 40 months of follow-up”).
- Valid under non-proportional hazards: Unlike the HR, the RMST difference retains a clear interpretation regardless of the shape of the underlying hazard functions.
- Sensitivity to overall separation: When two survival curves separate for an extended period but eventually converge (e.g., crossing hazards), the HR-based test may fail to detect the difference, whereas the RMST captures the integrated separation of the curves.
As demonstrated by Uno et al. (2014), scenarios with crossing hazards can produce a non-significant logrank test and wide confidence interval for the HR, yet the RMST difference is statistically significant and clinically meaningful because it captures the cumulative separation of the survival curves.
Contrast Measures Based on RMST
For comparing two groups with survival functions and and corresponding RMSTs and , three natural contrast measures can be defined:
- RMST difference: , representing the gain (or loss) in mean event-free time.
- RMST ratio: .
- RMTL ratio: , which has an interpretation analogous to a risk ratio for the time lost.
The RMST difference equals , the integrated area between the two survival curves up to . This makes it geometrically intuitive: a positive means the experimental group survives longer on average during the first units.
Estimation
Nonparametric RMST Estimation via the Kaplan–Meier Curve
The RMST is estimated by the area under the KM estimator up to :
Since is a step function, reduces to a sum of rectangular areas:
where are the distinct event times up to , with the convention and .
By the uniform consistency of the KM estimator (Gill 1983), is uniformly consistent for over the interval where .
Variance Estimation
The variance of is obtained by recognizing that the RMST estimator is a smooth functional of the KM estimator. By the counting process representation:
where is the counting process martingale, , , and (Fleming and Harrington 1991, 98).
Applying the functional -method via integration over , the asymptotic variance of can be estimated by:
where is the number of events at time and is the number at risk just prior to .
This formula follows from the same martingale central limit theorem
that underpins Greenwood’s formula for
,
and is the standard variance estimator for RMST implemented in the
survRM2 package and in weightedsurv.
The Cumulative RMST Curve
Motivation
While the RMST at a fixed is useful for study design and overall summarization, the choice of is often somewhat arbitrary. Zhao et al. (2016) proposed studying the entire RMST curve as a function of , rather than evaluating it at a single time point. The corresponding RMST difference curve:
reveals how the treatment benefit accumulates over follow-up. This temporal profile is clinically informative:
- An RMST difference that grows linearly suggests a sustained, constant survival advantage.
- An RMST difference that flattens indicates that the benefit was concentrated in an early period.
- An RMST difference that accelerates indicates a late-emerging benefit, as in delayed treatment effects.
This approach avoids the need to pre-specify a single and provides a richer characterization of the treatment effect trajectory than any single summary statistic.
Asymptotic Theory for the RMST Curve
By the functional -method applied to the uniform consistency and weak convergence of the KM process, converges weakly to a zero-mean Gaussian process on , where (Zhao et al. 2016).
For the two-sample case, let and . Then:
a zero-mean Gaussian process whose distribution can be approximated by the perturbation-resampling (martingale resampling) method.
Simultaneous Confidence Bands via Martingale Resampling
The Perturbation-Resampling Principle
The key to constructing simultaneous confidence bands for the RMST curve is to approximate the distribution of the limiting Gaussian process. Following Lin, Wei, and Ying (1993), Parzen, Wei, and Ying (1994), and Zhao et al. (2016), the perturbation-resampling method proceeds as follows.
The distribution of is approximated, conditional on the data, by:
where are i.i.d. random variables independent of the data, and , , denote the observed quantities.
The critical insight formalized by Dobler, Beyersmann, and Pauly (2017), is that the martingale increments are represented by independent random variables multiplied by the observable counting process increments, with unknown parameters replaced by consistent estimators.
For the RMST curve, one then considers the random process over :
whose conditional distribution (given the data) approximates that of (Zhao et al. 2016, equation (1)).
Pointwise Confidence Intervals
Let denote the standard deviation estimate for the distribution of , obtained as the empirical standard deviation across independent realizations of . For any , a two-sided pointwise confidence interval for is:
Simultaneous Confidence Bands
The simultaneous, equal-precision confidence band for over is:
where the critical value is chosen such that:
This is estimated empirically from
independent realizations of the perturbation weights. The time interval
is chosen such that
and
,
ensuring statistical validity. In practice, the
weightedsurv package determines this interval from the
observed quantiles of the event times (via the qtau
parameter).
Two-Sample Simultaneous Bands
For the RMST difference curve , the procedure is analogous. Let denote the standard deviation estimate for , obtained from independent realizations of the perturbation weights . The simultaneous confidence band is:
where satisfies:
This band provides uniform coverage: if the band excludes zero for an interval , one can conclude that the treatment difference is significant over that entire range, simultaneously, at the level. This is particularly useful for equivalence/noninferiority assessment, as demonstrated by Zhao et al. (2016) using the VALIANT cardiovascular trial.
Extension to Weighted RMST Estimation
Causal Framework
In observational studies where treatment assignment depends on baseline covariates, the RMST must be estimated for counterfactual survival functions. Under the potential outcomes framework with treatment and potential survival time , the population-level RMST is:
where is the counterfactual survival function.
Identification of
from observed data requires the same causal assumptions as for the
weighted KM estimator: ignorability, random censoring, positivity, and
consistency (see the companion vignette
weightedkm_methodology for details).
Inverse Probability of Treatment Weighting
Given a propensity score , the IPTW approach constructs weighted KM curves using stabilized weights:
The weighted RMST estimator is simply the area under the weighted KM curve:
where
is the weighted KM estimator as described in the
weightedkm_methodology vignette. The weighted RMST
difference
estimates the causal RMST contrast
.
Weighted Resampling for Simultaneous Bands
The martingale resampling procedure extends naturally to the weighted setting. The weighted version of the resampled statistic for group replaces unweighted counting processes with their weighted counterparts:
where are i.i.d. random variables, is the weighted risk set, and is the kernel function appropriate for the integrated RMST functional.
The weightedsurv package implements this weighted
resampling directly in the cumulative_rmst_bands function
when a weight.name argument is supplied, enabling
propensity-score-adjusted RMST curves with simultaneous confidence
bands.
Connection to Weighted KM Survival Differences
Complementary Perspectives
The simultaneous confidence bands for KM survival
differences
(computed by plotKM.band_subgroups) and the RMST difference
curve
(computed by cumulative_rmst_bands) provide complementary
perspectives:
- The survival difference at time captures the instantaneous gap in the probability of surviving beyond .
- The RMST difference at time captures the cumulative, integrated gap in mean survival over .
The RMST curve smooths the survival difference process through integration, which can make the treatment effect trajectory more stable and interpretable. When the survival difference fluctuates around zero (e.g., crossing survival curves), the RMST difference may still accumulate in one direction, providing a clearer signal.
As demonstrated in the weightedsurv_examples vignette
the typical workflow is to display both the survival difference bands
and the cumulative RMST bands side-by-side: the former from
plotKM.band_subgroups and the latter from
cumulative_rmst_bands, using the shared fit
object returned by plotKM.band_subgroups.
Testing Based on Weighted KM Differences
The WKM test statistic of Uno et al. (2015) provides a formal testing framework that directly compares two survival functions via weighted integration of the standardized KM difference:
where is the standardized difference of two KM curves at each time point, and is a data-adaptive weight function. Uno et al. (2015) showed that their automatically-weighted tests outperform the logrank and other classical tests under many non-PH alternatives—including early, middle, late, and crossing differences—while maintaining competitive power under PH. The null distribution is obtained via the same perturbation-resampling framework used for RMST inference.
This testing approach is the natural companion to RMST estimation: the test provides formal evidence of a survival difference, while the RMST curve quantifies the magnitude and temporal profile of that difference.
Implementation in weightedsurv
The cumulative RMST methodology is implemented through the
cumulative_rmst_bands function, which works in concert with
the other analysis functions in the package.
cumulative_rmst_bands
Purpose. Computes cumulative RMST difference curves with pointwise and simultaneous confidence bands for two treatment groups, optionally incorporating subject-specific weights.
Key arguments:
-
df: Data frame with survival data. -
fit: A fitted model object (typically the$fit_ittcomponent returned byplotKM.band_subgroups). -
tte.name,event.name,treat.name: Column names for time-to-event, event indicator, and treatment indicator. -
weight.name: (Optional) Column name for subject-specific weights (e.g., stabilized propensity-score weights). -
draws_sb: Number of perturbation-resampling draws for simultaneous confidence bands (e.g., 1000). -
xlab: Label for the x-axis (typically"months"). -
rmst_max_cex: Character expansion factor for annotation of the RMST at the maximum truncation time.
Output. The function returns the estimated RMST difference curve along with pointwise and simultaneous confidence bands, and produces a publication-quality plot. The plot displays the estimated RMST difference as a solid line, pointwise 95% confidence intervals as dashed lines, and the simultaneous confidence band as a shaded region.
Typical Workflow
The standard analysis pipeline, as demonstrated in the
weightedsurv_examples vignette, follows this pattern:
-
Prepare data via
df_countingorget_dfcounting, optionally with aweight.namefor propensity-score-weighted analyses. -
Compute survival difference bands via
plotKM.band_subgroups, which returns afitobject alongside the KM difference plot. -
Compute cumulative RMST bands via
cumulative_rmst_bands, passing thefitobject from step 2.
# Step 1: Compute survival difference bands
temp <- plotKM.band_subgroups(
df = df, tte.name = "tte", treat.name = "treat",
event.name = "event", weight.name = "sw.weights",
draws.band = 1000, qtau = 0.025
)
# Step 2: Compute cumulative RMST bands
get_bands <- cumulative_rmst_bands(
df = df, fit = temp$fit_itt,
tte.name = "tte", event.name = "event",
treat.name = "treat", weight.name = "sw.weights",
draws_sb = 1000, xlab = "months"
)This two-step workflow ensures that the KM difference analysis and the RMST analysis share the same underlying fitted model and resampling framework, providing internally consistent inference.
Applications
RMST Under Non-Proportional Hazards
The RMST framework is especially valuable in clinical trials where the PH assumption is questionable. Royston and Parmar (2011) advocated for the RMST difference as a primary analysis measure when PH is in doubt, and Uno et al. (2014) demonstrated its practical advantages in oncology trials.
The cumulative RMST curve adds further diagnostic power: by visualizing as a function of , investigators can directly observe whether the treatment effect is immediate, delayed, or transient—information that is obscured by a single HR or even a single- RMST.
Equivalence and Noninferiority Assessment
Zhao et al. (2016) highlighted the utility of simultaneous RMST bands for equivalence and noninferiority trials. If the simultaneous band for lies within a pre-specified margin over the clinically relevant time interval, one can conclude equivalence at the level—simultaneously for all in the interval. This is more informative than evaluating equivalence at a single pre-specified .
Observational Studies with Propensity-Score Weighting
The weighted RMST analysis enables causal inference from
observational data when combined with IPTW. The
weightedsurv_examples vignette demonstrates this with the
Rotterdam breast cancer dataset, where propensity-score-weighted RMST
curves are compared to those from the randomized GBSG trial as a form of
external validation.
Summary
The restricted mean survival time provides a clinically interpretable, model-free, and robust summary of treatment effects in survival analysis. The key methodological elements are:
- RMST definition as the area under the survival curve, estimable from the KM estimator without parametric assumptions.
- Cumulative RMST curves that reveal the temporal profile of the treatment effect across all truncation times , rather than at a single pre-specified point.
- Simultaneous confidence bands via the perturbation-resampling (martingale resampling) method, providing uniform inference over the entire time range of interest.
- Weighted extension for propensity-score-adjusted RMST estimation from observational data, using the same IPTW framework as the weighted KM estimator.
The weightedsurv package implements this methodology
through the cumulative_rmst_bands function, which
integrates seamlessly with the survival difference analysis provided by
plotKM.band_subgroups and the broader counting-process
infrastructure of the package.