Bootstrap Results for ForestSearch with Bias Correction
Source:R/bootstrap_analysis_dofuture.R
bootstrap_results.RdRuns bootstrap analysis for ForestSearch, fitting Cox models and computing bias-corrected estimates and valid CIs (see vignette for references)
Usage
bootstrap_results(
fs.est,
df_boot_analysis,
cox.formula.boot,
nb_boots,
show_three,
H_obs,
Hc_obs,
seed = 8316951L
)Arguments
- fs.est
List. ForestSearch results object from
forestsearch. Must contain:df.est: Data frame with analysis data includingtreat.recommendconfounders.candidate: Character vector of confounder namesargs_call_all: List of original forestsearch call arguments
- df_boot_analysis
Data frame. Bootstrap analysis data with same structure as
fs.est$df.est. Must contain columns for outcome, event, treatment, and thetreat.recommendflag.- cox.formula.boot
Formula. Cox model formula for bootstrap, typically created by
build_cox_formula. Should be of formSurv(outcome, event) ~ treatment.- nb_boots
Integer. Number of bootstrap samples to generate (e.g., 500-1000). More iterations provide better bias correction but increase computation time.
- show_three
Logical. If
TRUE, prints detailed progress for the first three bootstrap iterations for debugging purposes. Default:FALSE.- H_obs
Numeric. Observed log hazard ratio for subgroup H (harm/questionable group,
treat.recommend == 0) from original sample. Used as reference for bias correction.- Hc_obs
Numeric. Observed log hazard ratio for subgroup H^c (complement/recommend,
treat.recommend == 1) from original sample. Used as reference for bias correction.- seed
Integer. Random seed for reproducibility. Default 8316951L. Must match the seed used in
bootstrap_ystarto ensure bootstrap index alignment.
Value
Data.table with one row per bootstrap iteration and columns:
- boot_id
Integer. Bootstrap iteration number (1 to
nb_boots)- H_biasadj_1
Bias-corrected estimate for H using method 1:
H_obs - (Hstar_star - Hstar_obs)- H_biasadj_2
Bias-corrected estimate for H using method 2:
2*H_obs - (H_star + Hstar_star - Hstar_obs)- Hc_biasadj_1
Bias-corrected estimate for H^c using method 1
- Hc_biasadj_2
Bias-corrected estimate for H^c using method 2
- max_sg_est
Numeric. Maximum subgroup hazard ratio found
- L
Integer. Number of candidate factors evaluated
- max_count
Integer. Maximum number of factor combinations
- events_H_0
Integer. Number of events in control arm of original subgroup H on bootstrap sample
- events_H_1
Integer. Number of events in treatment arm of original subgroup H on bootstrap sample
- events_Hc_0
Integer. Number of events in control arm of original subgroup H^c on bootstrap sample
- events_Hc_1
Integer. Number of events in treatment arm of original subgroup H^c on bootstrap sample
- events_Hstar_0
Integer. Number of events in control arm of new subgroup H* on original data
- events_Hstar_1
Integer. Number of events in treatment arm of new subgroup H* on original data
- events_Hcstar_0
Integer. Number of events in control arm of new subgroup H^c* on original data
- events_Hcstar_1
Integer. Number of events in treatment arm of new subgroup H^c* on original data
- tmins_search
Numeric. Minutes spent on subgroup search in this iteration
- tmins_iteration
Numeric. Total minutes for this bootstrap iteration
- Pcons
Numeric. Consistency p-value for top subgroup
- hr_sg
Numeric. Hazard ratio for top subgroup
- N_sg
Integer. Sample size of top subgroup
- E_sg
Integer. Number of events in top subgroup
- K_sg
Integer. Number of factors defining top subgroup
- g_sg
Numeric. Subgroup group ID
- m_sg
Numeric. Subgroup index
- M.1
Character. First factor label
- M.2
Character. Second factor label
- M.3
Character. Third factor label
- M.4
Character. Fourth factor label
- M.5
Character. Fifth factor label
- M.6
Character. Sixth factor label
- M.7
Character. Seventh factor label
Rows where no valid subgroup was found will have NA for bias corrections.
The returned object has a "timing" attribute with summary statistics.
Note
This function is designed to be called within a foreach loop
with %dofuture% operator. It requires:
All functions in
get_bootstrap_exportsto be available in the parallel workersPackages listed in
BOOTSTRAP_REQUIRED_PACKAGESto be installedProper parallel backend setup via
setup_parallel_SGcons
Bias Correction Methods
Two bias correction approaches are implemented:
Method 1 (Simple Optimism): $$H_{adj1} = H_{obs} - (H^*_{*} - H^*_{obs})$$ where \(H^*_{*}\) is the new subgroup HR on bootstrap data and \(H^*_{obs}\) is the new subgroup HR on original data.
Method 2 (Double Bootstrap): $$H_{adj2} = 2 \times H_{obs} - (H_{*} + H^*_{*} - H^*_{obs})$$ where \(H_{*}\) is the original subgroup HR on bootstrap data.
where:
H_obs: Original subgroup HR on original dataH_star: Original subgroup HR on bootstrap dataHstar_obs: New subgroup (found in bootstrap) HR on original dataHstar_star: New subgroup (found in bootstrap) HR on bootstrap data
Computational Details
Uses
doFuturebackend for parallel execution (configured externally)Sets reproducible seeds:
8316951 + boot * 100for each iterationEach bootstrap iteration runs full ForestSearch pipeline including variable selection, subgroup search, and consistency evaluation
Sequential execution within each bootstrap prevents nested parallelization
Failed bootstrap iterations generate warnings but don't stop execution
Confounders are removed from bootstrap data to force fresh variable selection
Bootstrap Configuration
Each bootstrap iteration modifies ForestSearch arguments to:
Suppress output:
details,showten_subgroups,plot.sg,plot.grfall set toFALSEForce re-selection:
grf_resandgrf_cutsset toNULLPrevent nested parallel:
parallel_args$plan = "sequential",workers = 1
Performance Considerations
Typical runtime: 1-5 seconds per bootstrap iteration
For 1000 bootstraps with 6 workers: ~3-10 minutes total
Memory usage scales with dataset size and number of workers
Consider reducing
nb_bootsfor initial testing (e.g., 100)
Error Handling
The function gracefully handles three failure modes:
Bootstrap sample creation fails: Returns row with all
NAForestSearch fails to run: Warns and returns row with all
NAForestSearch runs but finds no subgroup: Returns row with all
NA
All three cases ensure the foreach loop can still combine results via rbind.
See also
forestsearch_bootstrap_dofuture for the wrapper function that
sets up parallelization and calls this function
build_cox_formula for creating the Cox formula
fit_cox_models for initial Cox model fitting
get_Cox_sg for Cox model fitting on subgroups
get_dfRes for processing bootstrap results into confidence intervals
bootstrap_ystar for generating the Ystar matrix
Examples
if (FALSE) { # \dontrun{
# Typically called via forestsearch_bootstrap_dofuture()
# Manual usage for debugging:
# 1. Fit initial ForestSearch model
fs_result <- forestsearch(
df.analysis = mydata,
outcome.name = "time",
event.name = "status",
treat.name = "treatment",
confounders.name = c("age", "sex", "stage")
)
# 2. Build Cox formula
cox_formula <- build_cox_formula("time", "status", "treatment")
# 3. Get observed estimates
cox_fits <- fit_cox_models(fs_result$df.est, cox_formula)
# 4. Set up parallel backend
library(doFuture)
plan(multisession, workers = 6)
# 5. Run bootstrap (note: this is already parallelized internally)
boot_results <- bootstrap_results(
fs.est = fs_result,
df_boot_analysis = fs_result$df.est,
cox.formula.boot = cox_formula,
nb_boots = 100,
show_three = TRUE,
H_obs = cox_fits$H_obs,
Hc_obs = cox_fits$Hc_obs
)
# 6. Check results
summary(boot_results)
# Proportion of bootstraps that found a subgroup
mean(!is.na(boot_results$H_biasadj_2))
} # }