Evaluates candidate subgroups using split-sample consistency validation. For each candidate, repeatedly splits the data and checks whether the treatment effect direction is consistent across splits.
Usage
subgroup.consistency(
df,
hr.subgroups,
hr.threshold = 1,
hr.consistency = 1,
pconsistency.threshold = 0.9,
m1.threshold = Inf,
n.splits = 100,
details = FALSE,
by.risk = 12,
plot.sg = FALSE,
maxk = 7,
Lsg,
confs_labels,
sg_focus = "hr",
stop_Kgroups = 10,
stop_threshold = NULL,
showten_subgroups = FALSE,
pconsistency.digits = 2,
seed = 8316951,
checking = FALSE,
use_twostage = FALSE,
twostage_args = list(),
parallel_args = list()
)Arguments
- df
Data frame containing the analysis dataset. Must include columns for outcome (Y), event indicator (Event), and treatment (Treat).
- hr.subgroups
Data.table of candidate subgroups from subgroup search, containing columns: HR, n, E, K, d0, d1, m0, m1, grp, and factor indicators.
- hr.threshold
Numeric. Minimum hazard ratio threshold for candidates. Default: 1.0
- hr.consistency
Numeric. Minimum HR required in each split for consistency. Default: 1.0
- pconsistency.threshold
Numeric. Minimum proportion of splits that must be consistent. Default: 0.9
- m1.threshold
Numeric. Maximum m1 threshold for filtering. Default: Inf
- n.splits
Integer. Number of splits for consistency evaluation. Default: 100
- details
Logical. Print progress details. Default: FALSE
- by.risk
Numeric. Risk interval for KM plots. Default: 12
- plot.sg
Logical. Generate subgroup plots. Default: FALSE
- maxk
Integer. Maximum number of factors in subgroup. Default: 7
- Lsg
List of subgroup parameters.
- confs_labels
Character vector mapping factor names to labels.
- sg_focus
Character. Subgroup selection criterion: "hr", "maxSG", or "minSG". Default: "hr"
- stop_Kgroups
Integer. Maximum number of candidates to evaluate. Default: 10
- stop_threshold
Numeric in
[0, 1]orNULL. When a candidate subgroup's consistency probability (Pcons) meets or exceeds this threshold, evaluation stops early — remaining candidates are skipped. Set toNULLto disable early stopping and evaluate all candidates up tostop_Kgroups. Default:NULL.Note: Values > 1.0 are not permitted. To disable early stopping, use
stop_threshold = NULL, not a value above 1.Interaction with
sg_focus:"hr","maxSG","minSG"Early stopping is valid because candidates are sorted by a single criterion. The first candidate passing the threshold is optimal under that criterion.
"hrMaxSG","hrMinSG"Should generally be
NULL, because these compound criteria require comparing HR and size across all candidates.forestsearch()automatically resets toNULLwith a warning for these.
For parallel execution, early stopping is checked after each batch completes, so some additional candidates beyond the first meeting the threshold may be evaluated. Use a smaller
batch_sizeinparallel_argsfor finer-grained early stopping.- showten_subgroups
Logical. If TRUE, prints up to 10 candidate subgroups after sorting by sg_focus, showing their rank, HR, sample size, events, and factor definitions. Useful for reviewing which candidates will be evaluated for consistency. Default: FALSE
- pconsistency.digits
Integer. Decimal places for consistency proportion. Default: 2
- seed
Integer. Random seed for reproducible consistency splits. Default: 8316951. Set to NULL for non-reproducible random splits. The seed is used both for sequential execution (via set.seed()) and parallel execution (via future.seed).
- checking
Logical. Enable additional validation checks. Default: FALSE
- use_twostage
Logical. Use two-stage adaptive algorithm. Default: FALSE
- twostage_args
List. Parameters for two-stage algorithm:
- n.splits.screen
Splits for Stage 1 screening. Default: 30
- screen.threshold
Consistency threshold for Stage 1. Default: auto
- batch.size
Splits per batch in Stage 2. Default: 20
- conf.level
Confidence level for early stopping. Default: 0.95
- min.valid.screen
Minimum valid Stage 1 splits. Default: 10
- parallel_args
List. Parallel processing configuration:
- plan
Future plan: "multisession", "multicore", or "sequential"
- workers
Number of parallel workers
- batch_size
Number of candidates to evaluate per batch. Smaller values provide finer-grained early stopping but may increase overhead. Default: When stop_threshold is set and sg_focus is "hr" or "minSG", defaults to 1 (stop immediately when first candidate passes). For other sg_focus values with stop_threshold, defaults to min(workers, n_candidates/4). When stop_threshold is NULL, defaults to workers*2 for efficiency.
- show_message
Print parallel config messages
Value
A list containing:
- out_sg
Selected subgroup results
- sg_focus
Selection criterion used
- df_flag
Data frame with treatment recommendations
- sg.harm
Subgroup definition labels
- sg.harm.id
Subgroup membership indicator
- algorithm
"twostage" or "fixed"
- n_candidates_evaluated
Number of candidates actually evaluated
- n_candidates_total
Total candidates available
- n_passed
Number meeting consistency threshold
- early_stop_triggered
Logical indicating if early stop occurred
- early_stop_candidate
Index of candidate triggering early stop
- stop_threshold
Threshold used for early stopping
- seed
Random seed used for reproducibility (NULL if not set)