Biomarker Treatment Effects in the Weibull AFT/Cox Framework
Theory, Estimands, and Causal Foundations
ForestSearch Package Documentation
2026-02-20
Source:vignettes/articles/biomarker_effects.Rmd
biomarker_effects.RmdIntroduction
This vignette develops the mathematical framework for biomarker-dependent treatment effects used in the ForestSearch simulation machinery. The central idea is a piecewise-linear spline model embedded in a Weibull regression that is simultaneously an accelerated failure time (AFT) model and a Cox proportional hazards model. This dual representation allows treatment effects to be specified as causal log-hazard-ratios on the Cox scale while survival times are generated on the AFT scale.
Three treatment effect estimands are defined for summarizing biomarker-modulated heterogeneity:
- the Average Hazard Ratio (AHR) — a geometric-mean summary on the log-hazard scale,
- the Controlled Direct Effect (CDE) — a ratio of average hazards on the natural scale (Aalen, Cook & Roysland, 2015),
- the Marginal (causal) HR — a population-level Cox estimate from stacked potential outcomes.
The first two are deterministic functions of the data-generating mechanism (DGM); the third serves as a stochastic validation target. Their theoretical relationships, conditions for agreement and divergence, and causal standing are developed below.
The Weibull AFT/Cox Model
Weibull Distribution Essentials
For the Weibull distribution with shape parameter and scale parameter , the density, survival, hazard, and cumulative hazard functions are:
For the hazard is strictly decreasing; for it is strictly increasing. A useful distributional identity is that the cumulative hazard evaluated at the random variable is unit-exponential:
Probability-Integral Property. This follows because .
The Dual AFT / Cox Representation
Writing gives , which rearranges to the AFT form:
where is the AFT scale parameter and has the standard extreme-value distribution with density .
Incorporating a covariate vector , the Cox (hazard) parameterization is: and the corresponding AFT parameterization is: where the two coefficient vectors are linked by the AFT-to-hazard transformation:
AFT / Hazard-Scale Transformation.
R’s survreg estimates
on the AFT scale; hazard-ratio coefficients
are obtained via (4).
This transformation is the linchpin of the framework: potential outcomes are simulated on the AFT scale (3), while treatment effects are interpreted on the hazard-ratio scale via .
Biomarker-Dependent Treatment Effects
The Two-Phase Spline Model
To induce treatment effects that vary with a continuous biomarker , the Cox linear predictor is specified as a piecewise-linear spline with a single interior knot at :
where denotes the treatment indicator. The five terms have the following roles:
| Parameter | Term | Role |
|---|---|---|
| Main treatment effect (intercept of log-HR) | ||
| Prognostic biomarker effect (both arms) | ||
| Biomarker treatment interaction (slope of log-HR for ) | ||
| Spline term: change in prognostic slope above | ||
| Spline term: change in treatment-effect slope above |
The Causal Log-Hazard-Ratio Function
Under the potential outcomes framework, let denote the hazard function for a subject with biomarker level had they followed treatment . The causal log-hazard-ratio at biomarker level is:
Because the baseline hazard cancels in the ratio, is free of — a direct consequence of the proportional hazards structure conditional on . Expanding (6) piecewise:
Anchor-Point Parameterization
Rather than specifying directly, the framework specifies the log-HR at three interpretable anchor points — , , and for some — and solves:
Verification. These formulas are verified by substitution. For instance, evaluating from (6) gives ; solving for recovers the third expression in (7) exactly. The same check holds at and , confirming internal consistency.
Connection to Potential-Outcome Means
There is an important link between the log-HR and the conditional means of the potential log-survival times. Define under the AFT model (3). Then:
Derivation. On the AFT scale the treatment-related terms in contribute to . Hence the difference in conditional means between control () and treatment () at biomarker level is:
which gives as required. This confirms that measures the causal difference in expected log-survival between the two treatment arms, scaled by .
Sign Convention. Throughout this vignette, means treatment is beneficial (HR ), while means treatment is detrimental (HR ). This follows the standard Cox model convention where negative log-HR favors the experimental arm.
Extension to Prognostic Factors
When an additional baseline prognostic factor is included, the linear predictor becomes:
The prognostic term does not interact with treatment, so the causal log-HR remains: i.e., it depends on but not on . When prognostic factors do interact with treatment, would depend on jointly, and the AHR/CDE definitions below would average over both.
Working Example
A representative configuration for an oncology biomarker setting:
| Anchor | HR | Interpretation | ||
|---|---|---|---|---|
| Low | Strongly detrimental | |||
| Knot | Modestly detrimental | |||
| High | Strongly beneficial |
This profile produces a biomarker-modulated treatment effect that transitions from harm at low biomarker levels to substantial benefit at high levels, crossing the null (HR ) between and . The overall average hazard ratio across the full biomarker distribution is approximately , indicating net benefit in the population.
Treatment Effect Estimands
Three estimands are used to summarize treatment effects across biomarker-defined sub-populations. Each captures a different aspect of treatment effect heterogeneity.
Average Hazard Ratio (AHR)
The biomarker AHR is the exponentiated mean of across a biomarker-threshold sub-population:
The AHR is the geometric mean of the individual
causal hazard ratios
across the sub-population. In ForestSearch’s super-population data, this
is computed from the loghr_po column as
.
Key properties.
- Deterministic: depends only on the model coefficients and covariate distribution, not on the random error .
- Reproducible: identical across simulation replications for a fixed super-population.
- Dual causal status: as a geometric-mean ratio of potential-outcome hazards, the AHR achieves both individual-level and population-level causal interpretability under the Fay & Li (2024) taxonomy. This is because linearizes the ratio, so , and the order of comparison and summarization is irrelevant.
Controlled Direct Effect (CDE)
The CDE is the ratio of average exponentiated log-hazards under treatment versus control (Aalen, Cook & Roysland, 2015):
where
and
is the Cox linear predictor at treatment
and biomarker
.
In ForestSearch, this is computed from the theta_1 and
theta_0 columns:
.
Key properties.
- Deterministic (like the AHR): no dependence on .
- Natural-scale averaging: by averaging on the hazard (exponential) scale rather than the log-hazard scale, the CDE gives more weight to subjects with larger absolute hazard contributions.
- Differs from AHR under heterogeneity: Jensen’s inequality implies in general, with the discrepancy growing as within-subgroup treatment-effect variability increases.
Marginal (Causal) Hazard Ratio
The marginal HR for a subgroup is obtained by fitting a Cox model to stacked potential outcomes — a dataset where each subject contributes two rows (one under treatment, one under control), both with event indicator :
where
is the coefficient from coxph(Surv(time, event) ~ treat)
fit to the stacked data within
.
Key properties.
- Stochastic: depends on the realized error terms , introducing Monte Carlo variability across simulation replications.
- Population-level: the Cox partial likelihood implicitly weights subjects by their risk-set contributions, targeting a population-averaged effect.
- Validation role: serves as the DGM-level ground truth against which per-replicate estimator bias is assessed.
When the Three Estimands Agree — and When They Diverge
Under a constant treatment effect (no heterogeneity), the individual log-HR is identical for all subjects, and all three measures coincide:
Under heterogeneous treatment effects, three sources of divergence arise:
Jensen’s inequality separates AHR from CDE: unless all are identical.
Cox partial-likelihood weighting separates the marginal HR from the AHR, because risk-set membership creates implicit non-uniform weights.
Monte Carlo variability affects the marginal HR (which depends on ) but not the AHR or CDE.
The AHR is the recommended primary estimand for simulation studies because it is deterministic, directly interpretable as a geometric-mean HR, and achieves dual causal status. The marginal HR serves as a validation target, and the CDE provides a complementary natural-scale perspective.
Causal Foundations
Why the AFT Model Serves as the Causal DGM
Both Aalen, Cook & Roysland (2015) and Fay & Li (2024) identify the AFT model as a natural causal foundation for survival analysis. The Weibull AFT scale-change parameter satisfies: where is the geometric mean of the potential survival time . Crucially, this geometric-mean ratio is simultaneously:
- Individual-level causal: it equals , with comparison formed within individuals before averaging.
- Population-level causal: it is identifiable from marginal distributions in a randomized trial.
This dual status — unique among common survival ratio estimands — is why the ForestSearch DGM is built on the Weibull AFT. The biomarker log-HR inherits this property at each biomarker level.
The AHR as a Causally Valid Functional
The ForestSearch AHR is defined from individual potential-outcome log-HR differences: where . Because this is a geometric mean of individual causal log-hazard differences, it falls within the class of estimands that achieves both individual-level and population-level causal status under the Fay & Li (2024) taxonomy.
This property is not shared by the instantaneous Cox HR , which suffers from conditioning-set distortion at each event time (the “collider” argument of Hernan 2010; Aalen et al. 2015; Martinussen 2022): among survivors at time , the treatment and frailty are no longer independent even in a randomized trial.
Proportional Hazards and Marginal Non-PH
The Weibull DGM generates data satisfying proportional hazards conditional on the biomarker. Under conditional PH, the Cox HR is a valid population-level causal estimand (Fay & Li 2024; Beyersmann, Schmoor & Schumacher 2025). However, marginally — averaging over the biomarker distribution — PH need not hold when treatment effects are heterogeneous. This is precisely why the framework reports AHR and CDE (marginal summaries) rather than a single Cox HR as the primary characterization of biomarker-modulated treatment effects.
The Abrahamowicz et al. (2025) simulations further support this design: frailty-induced bias in Cox HR estimates is substantial only under very strong unmeasured susceptibility. Since the ForestSearch DGM has no unmodeled frailty, the Cox HR within simulations targets the correct conditional parameter, while the AHR provides the appropriate marginal summary for real-data applications where unmeasured heterogeneity may exist.
Cox HR as an Operational Tool for Subgroup Selection
In the ForestSearch workflow, per-replicate Cox HR estimates and thresholds (e.g., ) serve as operational selection criteria for identifying candidate subgroups. This is distinct from their role as causal estimands. The recent literature (Edelmann 2025; Martinussen 2022; Fay & Li 2024) notes that the instantaneous Cox HR at time is a causal effect for the baseline population but not necessarily for the population at risk at .
Within the ForestSearch framework, this distinction is addressed as follows:
- The true treatment effects are defined through , a baseline causal quantity.
- Subgroup selection uses the Cox HR as an efficient screening tool.
- Subgroup evaluation uses the AHR and CDE, which avoid the conditioning-set problem.
- Bootstrap bias correction (
hr.H.bc) and the decomposition of error into subgroup misidentification versus finite-sample noise provide further safeguards.
Computation Pipeline
The complete pipeline from model specification to estimand computation is summarized below.
| Step | Operation | Formula |
|---|---|---|
| 1 | Fit Weibull AFT | |
| 2 | Hazard-scale transform | |
| 3 | Individual log-hazards | |
| 4 | Individual causal log-HR | |
| 5a | AHR (subgroup ) | |
| 5b | CDE (subgroup ) | |
| 5c | Marginal HR (subgroup ) | from stacked potential outcomes |
In ForestSearch, Steps 1–4 are performed by
generate_aft_dgm_flex(), which stores
in dgm$model_params$b0 and the individual-level quantities
(loghr_po, lin_pred_0,
lin_pred_1) in dgm$df_super. Steps 5a–5b are
computed by cox_ahr_cde_analysis(); Step 5c is computed
within calculate_hazard_ratios().
Consistency Across the ForestSearch Vignettes. The
mathematical pipeline above — AFT model
hazard-scale transformation (4)
individual log-HR
AHR/CDE aggregation — is consistent across the MRCT analysis document,
the treatment_effect_definitions vignette, and the
causal_effects_brief_review vignette. All three use the
same sign conventions
(
beneficial), the same transformation
,
and the same averaging operations for AHR and CDE.
The correspondences are:
| This vignette | treatment_effect_definitions |
causal_effects_brief_review |
|---|---|---|
loghr_po
|
Individual-level AFT causal quantity | |
| Geometric-mean HR with dual causal status | ||
| Natural-scale complement to AHR | ||
| AFT-to-hazard transformation |
References
Aalen, O.O., Cook, R.J. & Roysland, K. (2015). Does Cox analysis of a randomized survival study yield a causal treatment effect? Lifetime Data Analysis, 21(4), 579–593.
Abrahamowicz, M., Beauchamp, M.-E., Roberts, E.K. & Taylor, J.M.G. (2025). Revisiting the hazards of hazard ratios through simulations and case studies. European Journal of Epidemiology, 40, 611–629.
Beyersmann, J., Schmoor, C. & Schumacher, M. (2025). Hazards constitute key quantities for analyzing, interpreting and understanding time-to-event data. Biometrical Journal, 67, e70057.
Edelmann, D. (2025). Revisiting hazard ratios: Can we define causal estimands for time-dependent treatment effects? Biometrical Journal, 67, e70100.
Fay, M.P. & Li, F. (2024). Causal interpretation of the hazard ratio in randomized clinical trials. Clinical Trials, 21(5), 623–635.
Hernan, M.A. (2010). The hazards of hazard ratios. Epidemiology, 21(1), 13–15.
Leon, L.F., Jemielita, T., Guo, Z., Marceau West, R. & Anderson, K.M. (2024). Exploratory subgroup identification in the heterogeneous Cox model. Statistics in Medicine. doi:10.1002/sim.10163.
Martinussen, T. (2022). Causality and the Cox regression model. Annual Review of Statistics and Its Application, 9, 249–259.
Prentice, R.L. & Aragaki, A.K. (2022). Intention-to-treat comparisons in randomized trials. Statistical Science, 37(3), 380–393.