Diagnose Censoring Consistency Between DGM Source Data and Simulated Data
Source:R/simulate_from_dgm.R
check_censoring_dgm.RdCompares the censoring distribution observed in the data used to build the
DGM against the censoring generated by simulate_from_dgm.
Reports censoring rates, time quantiles, KM-based median censoring times,
and flags substantial discrepancies.
Usage
check_censoring_dgm(
sim_data,
dgm,
treat_var = "treat_sim",
rate_tol = 0.1,
median_tol = 0.25,
verbose = TRUE
)Arguments
- sim_data
A
data.framereturned bysimulate_from_dgm.- dgm
An
"aft_dgm_flex"object fromgenerate_aft_dgm_flex. The super population (dgm$df_super) provides reference censoring times and event indicators on the DGM time scale.- treat_var
Character. Name of the treatment column in
sim_dataused for arm-stratified comparisons. Default"treat_sim".- rate_tol
Numeric. Absolute tolerance (proportion scale) for flagging a censoring-rate discrepancy. Default
0.10(10 pp).- median_tol
Numeric. Relative tolerance for flagging a KM median censoring-time discrepancy. Default
0.25(25 percent).- verbose
Logical. If
TRUE, prints the full diagnostic table. DefaultTRUE.
Value
Invisibly returns a named list. Elements are: rates (data
frame of censoring rates overall and by arm); quantiles (data
frame of censoring-time quantiles among censored subjects);
km_medians (data frame of KM-based median censoring times); and
flags (character vector of triggered warnings, empty if none).
Details
The reference censoring distribution is derived from dgm$df_super,
sampled with replacement from the data passed to
generate_aft_dgm_flex(). Columns y (observed time) and
event (event indicator) in df_super reflect the original
observed censoring process on the DGM time scale.
The KM median censoring time is estimated by reversing the event indicator
(1 - event), treating events as censored and censored observations
as the event of interest. This gives a non-parametric estimate of the
censoring time distribution unconfounded by event occurrence.
Common causes of discrepancy: (1) time-scale mismatch (DGM built on days,
analysis_time in months); check exp(dgm$model_params$mu)
against your analysis_time. (2) Large cens_adjust shifting
censoring substantially from the fitted model. (3) Short
analysis_time or time_eos making administrative censoring
dominate the censoring process.