Diagnose Censoring Consistency Between DGM Source Data and Simulated Data

Compares the censoring distribution observed in the data used to build the DGM against the censoring generated by simulate_from_dgm. Reports censoring rates, time quantiles, KM-based median censoring times, and flags substantial discrepancies.

Usage

check_censoring_dgm(
  sim_data,
  dgm,
  treat_var = "treat_sim",
  rate_tol = 0.1,
  median_tol = 0.25,
  verbose = TRUE
)

Arguments

sim_data: A data.frame returned by simulate_from_dgm.
dgm: An "aft_dgm_flex" object from generate_aft_dgm_flex. The super population (dgm$df_super) provides reference censoring times and event indicators on the DGM time scale.
treat_var: Character. Name of the treatment column in sim_data used for arm-stratified comparisons. Default "treat_sim".
rate_tol: Numeric. Absolute tolerance (proportion scale) for flagging a censoring-rate discrepancy. Default 0.10 (10 pp).
median_tol: Numeric. Relative tolerance for flagging a KM median censoring-time discrepancy. Default 0.25 (25 percent).
verbose: Logical. If TRUE, prints the full diagnostic table. Default TRUE.

Value

Invisibly returns a named list. Elements are: rates (data frame of censoring rates overall and by arm); quantiles (data frame of censoring-time quantiles among censored subjects); km_medians (data frame of KM-based median censoring times); and flags (character vector of triggered warnings, empty if none).

Details

The reference censoring distribution is derived from dgm$df_super, sampled with replacement from the data passed to generate_aft_dgm_flex(). Columns y (observed time) and event (event indicator) in df_super reflect the original observed censoring process on the DGM time scale.

The KM median censoring time is estimated by reversing the event indicator (1 - event), treating events as censored and censored observations as the event of interest. This gives a non-parametric estimate of the censoring time distribution unconfounded by event occurrence.

Common causes of discrepancy: (1) time-scale mismatch (DGM built on days, analysis_time in months); check exp(dgm$model_params$mu) against your analysis_time. (2) Large cens_adjust shifting censoring substantially from the fitted model. (3) Short analysis_time or time_eos making administrative censoring dominate the censoring process.

Examples

# \donttest{
dgm <- setup_gbsg_dgm(model = "null", verbose = FALSE)
sim_data <- simulate_from_dgm(dgm, n = 200)
check_censoring_dgm(sim_data, dgm = dgm)
#> 
#> =========================================================
#>   Censoring Diagnostic: DGM Reference vs Simulated
#> =========================================================
#>   DGM censoring type  : weibull
#>   DGM mu_cens         : 4.0492  [exp = 57.35 time units]
#>   DGM tau_cens        : 0.4363
#>   Reference n (super) : 5000
#>   Simulated n         : 200
#> 
#> --- 1. Censoring Rates ---
#>    Group Ref_rate Sim_rate     Diff
#>  Overall    55.3%    71.5% +16.2 pp
#>    Arm 0    54.5%    69.0% +14.5 pp
#>    Arm 1    56.2%    74.0% +17.8 pp
#> 
#> --- 2. Censoring-Time Quantiles (censored subjects only) ---
#>     Ref: y[event == 0]  |  Sim: c_time[event_sim == 0]
#>  Quantile   Ref   Sim Ratio
#>       25% 28.19 25.97 0.921
#>       50% 47.05 30.36 0.645
#>       75% 61.04 37.15 0.609
#>       90% 71.00 42.13 0.593
#> 
#> --- 3. KM Median Censoring Time (reversed event indicator) ---
#>    Group Ref_median Sim_median Ratio
#>  Overall   54.04517   31.61996 0.585
#>    Arm 0   53.94661   31.94067 0.592
#>    Arm 1   55.03080   31.39309 0.570
#> 
#> --- 4. Implied Time-Scale Check ---
#>   exp(params$mu)          = 86.40  [median outcome time, DGM scale]
#>   exp(params$censoring$mu)= 57.35  [median censoring time, DGM scale]
#>   If these values are implausibly large relative to your
#>   analysis_time, the DGM was likely built on a different time
#>   scale (e.g. days vs months).
#> 
#> [WARNING] Potential discrepancies detected:
#>   * Censoring rate: ref = 55.3%, sim = 71.5%  [|diff| = 16.2 pp > tol 10.0 pp]
#>   * KM median censoring: ref = 54.05, sim = 31.62  [ratio = 0.59, tol = 1.25]
#> 
#> Common causes:
#>   1. Time-scale mismatch: verify exp(dgm$model_params$mu) is
#>      plausible relative to analysis_time.
#>   2. cens_adjust shifting censoring substantially from fitted model.
#>   3. Short analysis_time / time_eos making admin censoring dominant.
#> =========================================================
#> 
# }