Skip to contents

Evaluates candidate subgroups using split-sample consistency validation. For each candidate, repeatedly splits the data and checks whether the treatment effect direction is consistent across splits.

Usage

subgroup.consistency(
  df,
  hr.subgroups,
  hr.threshold = 1,
  hr.consistency = 1,
  pconsistency.threshold = 0.9,
  m1.threshold = Inf,
  n.splits = 100,
  details = FALSE,
  by.risk = 12,
  plot.sg = FALSE,
  maxk = 7,
  Lsg,
  confs_labels,
  sg_focus = "hr",
  stop_Kgroups = 10,
  stop_threshold = NULL,
  showten_subgroups = FALSE,
  pconsistency.digits = 2,
  seed = 8316951,
  checking = FALSE,
  use_twostage = FALSE,
  twostage_args = list(),
  parallel_args = list()
)

Arguments

df

Data frame containing the analysis dataset. Must include columns for outcome (Y), event indicator (Event), and treatment (Treat).

hr.subgroups

Data.table of candidate subgroups from subgroup search, containing columns: HR, n, E, K, d0, d1, m0, m1, grp, and factor indicators.

hr.threshold

Numeric. Minimum hazard ratio threshold for candidates. Default: 1.0

hr.consistency

Numeric. Minimum HR required in each split for consistency. Default: 1.0

pconsistency.threshold

Numeric. Minimum proportion of splits that must be consistent. Default: 0.9

m1.threshold

Numeric. Maximum m1 threshold for filtering. Default: Inf

n.splits

Integer. Number of splits for consistency evaluation. Default: 100

details

Logical. Print progress details. Default: FALSE

by.risk

Numeric. Risk interval for KM plots. Default: 12

plot.sg

Logical. Generate subgroup plots. Default: FALSE

maxk

Integer. Maximum number of factors in subgroup. Default: 7

Lsg

List of subgroup parameters.

confs_labels

Character vector mapping factor names to labels.

sg_focus

Character. Subgroup selection criterion: "hr", "maxSG", or "minSG". Default: "hr"

stop_Kgroups

Integer. Maximum number of candidates to evaluate. Default: 10

stop_threshold

Numeric in [0, 1] or NULL. When a candidate subgroup's consistency probability (Pcons) meets or exceeds this threshold, evaluation stops early — remaining candidates are skipped. Set to NULL to disable early stopping and evaluate all candidates up to stop_Kgroups. Default: NULL.

Note: Values > 1.0 are not permitted. To disable early stopping, use stop_threshold = NULL, not a value above 1.

Interaction with sg_focus:

"hr", "maxSG", "minSG"

Early stopping is valid because candidates are sorted by a single criterion. The first candidate passing the threshold is optimal under that criterion.

"hrMaxSG", "hrMinSG"

Should generally be NULL, because these compound criteria require comparing HR and size across all candidates. forestsearch() automatically resets to NULL with a warning for these.

For parallel execution, early stopping is checked after each batch completes, so some additional candidates beyond the first meeting the threshold may be evaluated. Use a smaller batch_size in parallel_args for finer-grained early stopping.

showten_subgroups

Logical. If TRUE, prints up to 10 candidate subgroups after sorting by sg_focus, showing their rank, HR, sample size, events, and factor definitions. Useful for reviewing which candidates will be evaluated for consistency. Default: FALSE

pconsistency.digits

Integer. Decimal places for consistency proportion. Default: 2

seed

Integer. Random seed for reproducible consistency splits. Default: 8316951. Set to NULL for non-reproducible random splits. The seed is used both for sequential execution (via set.seed()) and parallel execution (via future.seed).

checking

Logical. Enable additional validation checks. Default: FALSE

use_twostage

Logical. Use two-stage adaptive algorithm. Default: FALSE

twostage_args

List. Parameters for two-stage algorithm:

n.splits.screen

Splits for Stage 1 screening. Default: 30

screen.threshold

Consistency threshold for Stage 1. Default: auto

batch.size

Splits per batch in Stage 2. Default: 20

conf.level

Confidence level for early stopping. Default: 0.95

min.valid.screen

Minimum valid Stage 1 splits. Default: 10

parallel_args

List. Parallel processing configuration:

plan

Future plan: "multisession", "multicore", or "sequential"

workers

Number of parallel workers

batch_size

Number of candidates to evaluate per batch. Smaller values provide finer-grained early stopping but may increase overhead. Default: When stop_threshold is set and sg_focus is "hr" or "minSG", defaults to 1 (stop immediately when first candidate passes). For other sg_focus values with stop_threshold, defaults to min(workers, n_candidates/4). When stop_threshold is NULL, defaults to workers*2 for efficiency.

show_message

Print parallel config messages

Value

A list containing:

out_sg

Selected subgroup results

sg_focus

Selection criterion used

df_flag

Data frame with treatment recommendations

sg.harm

Subgroup definition labels

sg.harm.id

Subgroup membership indicator

algorithm

"twostage" or "fixed"

n_candidates_evaluated

Number of candidates actually evaluated

n_candidates_total

Total candidates available

n_passed

Number meeting consistency threshold

early_stop_triggered

Logical indicating if early stop occurred

early_stop_candidate

Index of candidate triggering early stop

stop_threshold

Threshold used for early stopping

seed

Random seed used for reproducibility (NULL if not set)