ForestSearch K-Fold Cross-Validation
Source:R/forestsearch_cross-validation.R
forestsearch_Kfold.RdThis function assesses the stability and reproducibility of ForestSearch subgroup identification through cross-validation. For each fold:
Train ForestSearch on (K-1) folds
Apply the identified subgroup to the held-out fold
Compare predictions to the original full-data analysis
Arguments
- fs.est
List. ForestSearch results object from
forestsearch. Must containdf.est(data frame) andargs_call_all(list of arguments).- Kfolds
Integer. Number of folds (default:
nrow(fs.est$df.est)for LOO).- seedit
Integer. Random seed for fold assignment (default: 8316951).
- parallel_args
List. Parallelization configuration with elements:
plan: Character. One of "multisession", "multicore", "sequential"workers: Integer. Number of parallel workersshow_message: Logical. Show parallel setup messages
- sg0.name
Character. Label for subgroup 0 (default: "Not recommend").
- sg1.name
Character. Label for subgroup 1 (default: "Recommend").
- details
Logical. Print progress details (default: FALSE).
Value
List with components:
- resCV
Data frame with CV predictions for each observation
- cv_args
Arguments used for CV ForestSearch calls
- timing_minutes
Execution time in minutes
- prop_SG_found
Percentage of folds where a subgroup was found
- sg_analysis
Original subgroup definition from full-data analysis
- sg0.name, sg1.name
Subgroup labels
- Kfolds
Number of folds used
- sens_summary
Named vector of sensitivity metrics (sens_H, sens_Hc, ppv_H, ppv_Hc)
- find_summary
Named vector of subgroup-finding metrics (Any, Exact, etc.)
Details
Performs K-fold cross-validation for ForestSearch, evaluating subgroup identification and agreement between training and test sets.
Cross-Validation Types
Leave-One-Out (LOO): When
Kfolds = nrow(df), each observation is held out once. Most thorough but computationally intensive.K-Fold: When
Kfolds < nrow(df), data is split into K roughly equal folds. Good balance of bias-variance tradeoff.
Output Metrics
The returned resCV data frame contains:
treat.recommend: Prediction from CV modeltreat.recommend.original: Prediction from full-data modelcvindex: Fold assignmentsg1,sg2: Subgroup definitions found in each fold
See also
forestsearch for initial subgroup identification
forestsearch_KfoldOut for summarizing CV results
forestsearch_tenfold for repeated K-fold simulations
Examples
if (FALSE) { # \dontrun{
# Run initial ForestSearch
fs_result <- forestsearch(
df.analysis = trial_data,
outcome.name = "time",
event.name = "status",
treat.name = "treatment",
confounders.name = c("age", "biomarker")
)
# Run 10-fold cross-validation
cv_results <- forestsearch_Kfold(
fs.est = fs_result,
Kfolds = 10,
parallel_args = list(plan = "multisession", workers = 4),
details = TRUE
)
# Summarize results
cv_summary <- forestsearch_KfoldOut(cv_results, outall = TRUE)
} # }