Weighted Kaplan-Meier Estimation Methodology
Larry F. León
Source:vignettes/articles/weightedkm_methodology.Rmd
weightedkm_methodology.RmdIntroduction
In clinical trials and observational studies, the Kaplan–Meier (KM) estimator is the most widely used nonparametric method for estimating survival functions from censored time-to-event data. When treatment groups are formed by randomization, the standard KM estimator provides consistent estimation of treatment-specific survival distributions. However, in observational studies—where treatment assignment may depend on baseline covariates—direct KM comparisons can be misleading due to confounding.
Inverse probability of treatment weighting (IPTW) provides a principled approach to address confounding by creating a pseudo-population in which baseline covariates are balanced across treatment groups. The weighted Kaplan–Meier estimator, constructed by applying propensity-score-derived weights to the at-risk and event counting processes, yields consistent estimation of population-level (counterfactual) survival distributions under standard causal assumptions (Cole and Hernán 2004; Austin 2014).
This article reviews the methodology underlying weighted KM
estimation as implemented in the weightedsurv package,
connecting the classical counting-process framework to the causal
inference formulation. We draw extensively on the theoretical
foundations established by Deng and Wang
(2025) for the adjusted Nelson–Aalen estimator, as well as the
martingale resampling approach of Dobler,
Beyersmann, and Pauly (2017) used in weightedsurv
for simultaneous confidence bands.
Notation and Setup
Observed Data
We observe independent subjects from two treatment groups ( for control, for active treatment). For each subject , the observed data are , where:
- : treatment indicator,
- : vector of baseline covariates,
- : observed follow-up time (the minimum of the event time and censoring time ),
- : event indicator.
Counting Process Formulation
In the counting process framework (Fleming and Harrington 1991), the individual-level processes are:
- At-risk process: ,
- Event counting process: .
For two groups , the aggregate processes are:
The Nelson–Aalen estimator of the cumulative hazard for group is
and the Kaplan–Meier estimator is obtained via the product-limit transformation:
Under random censoring within each treatment group, both estimators are consistent for their respective group-specific survival functions.
Standard (Unweighted) Kaplan–Meier Estimator
Definition
The Kaplan–Meier estimator (Kaplan and Meier 1958) is defined at the ordered event times as:
where denotes the number of events at time and is the number at risk just before .
Martingale Representation
Under the null hypothesis of a common hazard in group , the counting process martingale is defined as:
where and .
The process is a zero-mean martingale with respect to the natural filtration. This representation is central to deriving the variance of the KM estimator and forms the basis for the resampling methods described below.
Causal Framework for Weighted Estimation
Potential Outcomes and Causal Estimands
Following the potential outcomes framework, let denote the potential survival time under treatment . The population-level counterfactual survival function is:
which represents the survival probability that would be observed if the entire population were assigned treatment . This is the target estimand for weighted KM estimation.
Identifying Assumptions
Identification of from observed data requires the following assumptions, formalized for example by Deng and Wang (2025):
Assumption 1 (Ignorability). , for . Treatment assignment is independent of the potential outcomes given baseline covariates.
Assumption 2 (Completely random censoring). , for . The censoring time is unconditionally independent of the event time. This assumption is essential for nonparametric estimation. As discussed by Deng and Wang (2025), it may be relaxed to conditional random censoring, but at the cost of requiring additional modeling of the censoring mechanism.
Assumption 3 (Positivity). for a constant , where is the propensity score. Every subject has a positive probability of receiving either treatment.
Assumption 4 (Consistency). . The observed outcome equals the potential outcome corresponding to the treatment actually received.
Weighted Kaplan–Meier Estimation via IPTW
Propensity Score and IPW Weights
The propensity score summarizes the covariate information relevant to treatment assignment (Rosenbaum and Rubin 1983). The inverse probability weights are defined as:
where and .
Stabilized weights. To reduce variability, one commonly uses stabilized weights:
where is the marginal treatment probability. In practice, both and are estimated from the data. When stabilized weights are used, the weighted risk set at baseline equals the original sample size, which facilitates interpretation (Cole and Hernán 2004).
Weighted Counting Processes
By applying IPW weights to the individual counting processes, we construct weighted aggregate processes:
where is the propensity score weight for subject .
Adjusted Nelson–Aalen Estimator
The adjusted Nelson–Aalen estimator for the counterfactual cumulative hazard under treatment is defined as (Deng and Wang 2025; Winnett and Sasieni 2002):
This is equivalent to the standard Nelson–Aalen formula applied to the weighted counting and at-risk processes.
Weighted KM Estimator
The weighted Kaplan–Meier estimator is obtained via the product-limit:
When there is a single terminal event (no competing risks), the weighted KM estimator and the survival function derived from the adjusted Nelson–Aalen estimator are asymptotically equivalent (Deng and Wang 2025).
Practical Weight Computation
In the weightedsurv package, propensity score weights
are computed as stabilized IPTW weights. A practical recipe, following
Cole and Hernán (2004) and as implemented
in the weightedsurv_examples vignette, is:
- Fit a propensity score model: via logistic regression.
- Compute the marginal probability: from an intercept-only model.
- Calculate stabilized weights:
- (Optional) Truncate extreme weights to reduce variance, for example at the 5th and 95th percentiles.
Asymptotic Theory
Known Propensity Score
When the propensity score is known (as in a randomized trial with known allocation probabilities), Deng and Wang (2025) show that the weighted counting process admits a martingale decomposition. Specifically, for each subject in treatment group :
where is a martingale. This yields the finite-sample unbiasedness of for .
The variance of follows from martingale theory:
Estimated Propensity Score
In practice, the propensity score is unknown and must be estimated. A key contribution of Deng and Wang (2025) is deriving the influence function for when using an estimated propensity score . If the estimated propensity score is regular and asymptotically linear (RAL), the additional variance term due to propensity score estimation can be characterized.
The influence function of decomposes as:
where is the influence function that incorporates both the martingale term (from the counting process) and the correction term (from estimating ).
Importantly, Deng and Wang (2025) find through simulation that the additional variance contribution from the estimated propensity score is typically small. This is because the correction term takes the form of a weighted martingale whose expectation is nearly zero.
Variance Estimation in weightedsurv
In the weightedsurv package, standard errors for the
weighted KM estimator are computed via a robust variance formula
analogous to Greenwood’s formula, applied to the weighted counting
processes. These are validated against the adjustedCurves
package and survfit with case-weights (see the
weightedsurv_examples vignette for cross-checks).
For settings involving complex weighting and the simultaneous inference procedures described below, variance estimation via martingale resampling is preferred.
Inference for Survival Differences
Pointwise Confidence Intervals
For comparing two treatment groups, the survival difference is estimated by . Pointwise confidence intervals are constructed as:
where is obtained from the (independent) variance estimates of and .
Simultaneous Confidence Bands via Martingale Resampling
Pointwise confidence intervals provide valid inference at any single time point, but the survival difference is evaluated across a range of time points . Simultaneous confidence bands guarantee coverage uniformly over an interval of time.
The weightedsurv package implements simultaneous
confidence bands using the martingale resampling approach of Dobler, Beyersmann, and Pauly (2017).
Martingale Resampling Principle
The key idea is to approximate the distribution of the centered process:
by generating resampled versions of the process. Under the null (or conditionally on the observed data), the martingale increments are replaced by their observable counterparts multiplied by independent standard normal variates.
The resampled process is:
where are i.i.d. random variables, independent of the data.
The observable counting process increments serve as proxies for the unobservable martingale increments, and the Gaussian multipliers capture the stochastic variability. Unknown nuisance parameters are replaced by consistent estimates (“plug-in”).
Algorithm
The procedure, as implemented in plotKM.band_subgroups
and KM_diff, proceeds as follows:
- Compute the observed survival difference
at each event time within the quantile-trimmed range
(controlled by the
qtauparameter). - For
resamples:
- Generate independent variates .
- Compute the resampled survival difference process by perturbing the counting process increments.
- For each resample , compute .
- The simultaneous band is: where is the quantile of over the resamples.
Extension to Weighted Setting
The resampling framework naturally accommodates IPTW weights. In the
weighted case, the counting process increments are replaced by their
weighted counterparts, and the variance proxy accounts for the
propensity-score weights. This is operationally handled in
weightedsurv by passing a weight.name argument
to the relevant functions (e.g., plotKM.band_subgroups,
KM_diff, cumulative_rmst_bands).
Cumulative RMST Inference
RMST Definition
The restricted mean survival time (RMST) up to a truncation time is defined as:
The RMST difference is an interpretable, model-free summary of the treatment effect (Uno et al. 2014).
Cumulative RMST Curves
Rather than evaluating the RMST at a single
,
the cumulative_rmst_bands function in
weightedsurv computes
as a function of
across the entire follow-up range. This provides a “cumulative RMST
curve” that reveals how the treatment benefit (or harm) accumulates over
time.
Simultaneous confidence bands for the RMST difference curve are obtained by the same martingale resampling approach, applied to the integrated survival difference process.
Connection to Related Approaches
Relationship to the Adjusted Nelson–Aalen Estimator
As formalized by Deng and Wang (2025),
the adjusted Nelson–Aalen estimator and the weighted KM estimator are
asymptotically equivalent for a single terminal event. The Nelson–Aalen
approach avoids product limits in the estimator and connects directly to
the hazard-based identification results. The weightedsurv
package works primarily with the product-limit (KM) form, which is more
commonly used in clinical practice and has the natural interpretation as
an estimated probability.
Weighted Risk Set Estimators
An alternative approach to estimating treatment-specific survival distributions from sequential randomization designs was proposed by Guo and Tsiatis (2005). The weighted risk set (WRS) estimator uses inverse probability weights within the risk sets of a Nelson–Aalen-type estimator. Miyahara and Wahed (2010) proposed weighted KM estimators (with both fixed and time-dependent weights) for two-stage treatment regimes and showed that both forms are asymptotically unbiased.
These WRS and WKM estimators share the same fundamental structure as the IPTW KM estimator: weight each subject’s contribution to the risk set and event count by an inverse probability weight reflecting the probability of following the treatment regime of interest.
Doubly-Robust Estimators
Bai, Tsiatis, and O’Brien (2013)
developed doubly-robust estimators for treatment-specific survival
distributions that are consistent if either the propensity
score model or a model for the survival distribution as a function of
covariates is correctly specified. These augmented IPTW estimators can
achieve greater efficiency than pure IPTW methods. While the current
weightedsurv implementation focuses on IPTW, the
counting-process infrastructure is amenable to such extensions.
Covariate-Adjusted Log-Rank Tests
In the randomized trial setting, Ye, Shao, and
Yi (2024) developed covariate-adjusted log-rank tests with
guaranteed efficiency gains over the unadjusted test. A key result is
the universal applicability of their method to different randomization
schemes (simple randomization, stratified permuted block, Pocock–Simon
minimization). The weighted log-rank statistics in
weightedsurv are complementary: while the package’s
Fleming–Harrington class statistics target non-proportional hazards
alternatives, the covariate-adjusted framework of Ye, Shao, and Yi (2024) targets efficiency gains
through covariate adjustment under any hazard structure.
Adjusted Nelson–Aalen with Retrospective Matching
Winnett and Sasieni (2002) developed a weighted Nelson–Aalen estimator for retrospectively matched data, with stratum-specific random effects. Their work highlighted that ignoring heterogeneity from matched strata leads to variance underestimation—a cautionary result relevant to any weighted survival analysis where the source of weights introduces additional variability.
Covariate-Adjusted Logrank Tests and Working Models
Kong and Slud (1997) showed how to
construct robust covariate-adjusted logrank statistics by substituting
maximum partial likelihood estimators from various working models,
providing valid hypothesis tests through robust variance estimators even
under model misspecification. This work provides theoretical foundation
for the approach taken in weightedsurv where working models
need not be exactly correctly specified.
Sample Size Formulae for Weighted Designs
Li and Murphy (2011) developed sample size formulae for two-stage randomized trials with survival outcomes based on both a weighted KM estimator of survival probabilities at a fixed time-point and a weighted version of the log-rank test. Their conservative formulae demonstrate that the weighted KM framework extends naturally to the design phase of clinical trials.
Implementation in weightedsurv
The weighted KM methodology described above is implemented through
several interconnected functions in the weightedsurv
package. The key functions and their roles are:
df_counting/get_dfcounting: Prepare the counting-process dataset from raw survival data. When aweight.nameargument is supplied, these functions construct the weighted risk sets and event counts needed for weighted KM estimation.plot_weighted_km: Plots weighted (or unweighted) KM survival curves for two groups, with optional confidence intervals, risk tables, and summary statistics (log-rank p-value, Cox HR).KM_diff: Computes the KM survival difference , with both pointwise and simultaneous confidence intervals. Accepts arbitrary non-negative weights for IPTW.plotKM.band_subgroups: Produces publication-ready plots of the survival difference with simultaneous confidence bands via martingale resampling.cumulative_rmst_bands: Computes cumulative RMST difference curves with simultaneous confidence bands, again using the martingale resampling framework.
For comprehensive examples demonstrating both the randomized trial
(GBSG) and observational (Rotterdam) settings, including
cross-validation of weighted SEs against the adjustedCurves
and survfit packages, see the
weightedsurv_examples vignette.
Summary
The weighted KM estimator provides a principled nonparametric approach to estimating population-level survival distributions from observational data via IPTW. The methodology rests on:
- Causal identification via the ignorability, completely random censoring, positivity, and consistency assumptions.
- Counting process formulation enabling martingale-based variance estimation and resampling inference.
- Martingale resampling for simultaneous confidence bands, extending naturally to the weighted setting and to cumulative RMST curves.
The weightedsurv package provides an integrated
implementation of these methods, with careful attention to numerical
validation against established packages.