Diagnose Censoring Consistency Between DGM Source Data and Simulated Data
Source:R/simulate_from_dgm.R
check_censoring_dgm.RdCompares the censoring distribution observed in the data used to build the
DGM against the censoring generated by simulate_from_dgm.
Reports censoring rates, time quantiles, KM-based median censoring times,
and flags substantial discrepancies.
Usage
check_censoring_dgm(
sim_data,
dgm,
treat_var = "treat_sim",
rate_tol = 0.1,
median_tol = 0.25,
verbose = TRUE
)Arguments
- sim_data
A
data.framereturned bysimulate_from_dgm.- dgm
An
"aft_dgm_flex"object fromgenerate_aft_dgm_flex. The super population (dgm$df_super) provides reference censoring times and event indicators on the DGM time scale.- treat_var
Character. Name of the treatment column in
sim_dataused for arm-stratified comparisons. Default"treat_sim".- rate_tol
Numeric. Absolute tolerance (proportion scale) for flagging a censoring-rate discrepancy. Default
0.10(10 pp).- median_tol
Numeric. Relative tolerance for flagging a KM median censoring-time discrepancy. Default
0.25(25 percent).- verbose
Logical. If
TRUE, prints the full diagnostic table. DefaultTRUE.
Value
Invisibly returns a named list. Elements are: rates (data
frame of censoring rates overall and by arm); quantiles (data
frame of censoring-time quantiles among censored subjects);
km_medians (data frame of KM-based median censoring times); and
flags (character vector of triggered warnings, empty if none).
Details
The reference censoring distribution is derived from dgm$df_super,
sampled with replacement from the data passed to
generate_aft_dgm_flex(). Columns y (observed time) and
event (event indicator) in df_super reflect the original
observed censoring process on the DGM time scale.
The KM median censoring time is estimated by reversing the event indicator
(1 - event), treating events as censored and censored observations
as the event of interest. This gives a non-parametric estimate of the
censoring time distribution unconfounded by event occurrence.
Common causes of discrepancy: (1) time-scale mismatch (DGM built on days,
analysis_time in months); check exp(dgm$model_params$mu)
against your analysis_time. (2) Large cens_adjust shifting
censoring substantially from the fitted model. (3) Short
analysis_time or time_eos making administrative censoring
dominate the censoring process.
Examples
# \donttest{
dgm <- setup_gbsg_dgm(model = "null", verbose = FALSE)
sim_data <- simulate_from_dgm(dgm, n = 200)
check_censoring_dgm(sim_data, dgm = dgm)
#>
#> =========================================================
#> Censoring Diagnostic: DGM Reference vs Simulated
#> =========================================================
#> DGM censoring type : weibull
#> DGM mu_cens : 4.0492 [exp = 57.35 time units]
#> DGM tau_cens : 0.4363
#> Reference n (super) : 5000
#> Simulated n : 200
#>
#> --- 1. Censoring Rates ---
#> Group Ref_rate Sim_rate Diff
#> Overall 55.3% 71.5% +16.2 pp
#> Arm 0 54.5% 69.0% +14.5 pp
#> Arm 1 56.2% 74.0% +17.8 pp
#>
#> --- 2. Censoring-Time Quantiles (censored subjects only) ---
#> Ref: y[event == 0] | Sim: c_time[event_sim == 0]
#> Quantile Ref Sim Ratio
#> 25% 28.19 25.97 0.921
#> 50% 47.05 30.36 0.645
#> 75% 61.04 37.15 0.609
#> 90% 71.00 42.13 0.593
#>
#> --- 3. KM Median Censoring Time (reversed event indicator) ---
#> Group Ref_median Sim_median Ratio
#> Overall 54.04517 31.61996 0.585
#> Arm 0 53.94661 31.94067 0.592
#> Arm 1 55.03080 31.39309 0.570
#>
#> --- 4. Implied Time-Scale Check ---
#> exp(params$mu) = 86.40 [median outcome time, DGM scale]
#> exp(params$censoring$mu)= 57.35 [median censoring time, DGM scale]
#> If these values are implausibly large relative to your
#> analysis_time, the DGM was likely built on a different time
#> scale (e.g. days vs months).
#>
#> [WARNING] Potential discrepancies detected:
#> * Censoring rate: ref = 55.3%, sim = 71.5% [|diff| = 16.2 pp > tol 10.0 pp]
#> * KM median censoring: ref = 54.05, sim = 31.62 [ratio = 0.59, tol = 1.25]
#>
#> Common causes:
#> 1. Time-scale mismatch: verify exp(dgm$model_params$mu) is
#> plausible relative to analysis_time.
#> 2. cens_adjust shifting censoring substantially from fitted model.
#> 3. Short analysis_time / time_eos making admin censoring dominant.
#> =========================================================
#>
# }