Skip to contents

Evaluates candidate subgroups using split-sample consistency validation. For each candidate, repeatedly splits the data and checks whether the treatment effect direction is consistent across splits.

Usage

subgroup.consistency(
  df,
  hr.subgroups,
  hr.threshold = 1,
  hr.consistency = 1,
  pconsistency.threshold = 0.9,
  m1.threshold = Inf,
  n.splits = 100,
  details = FALSE,
  by.risk = 12,
  plot.sg = FALSE,
  maxk = 7,
  Lsg,
  confs_labels,
  sg_focus = "hr",
  stop_Kgroups = 10,
  stop_threshold = NULL,
  showten_subgroups = FALSE,
  pconsistency.digits = 2,
  seed = 8316951,
  checking = FALSE,
  use_twostage = FALSE,
  twostage_args = list(),
  parallel_args = list()
)

Arguments

df

Data frame containing the analysis dataset. Must include columns for outcome (Y), event indicator (Event), and treatment (Treat).

hr.subgroups

Data.table of candidate subgroups from subgroup search, containing columns: HR, n, E, K, d0, d1, m0, m1, grp, and factor indicators.

hr.threshold

Numeric. Minimum hazard ratio threshold for candidates. Default: 1.0

hr.consistency

Numeric. Minimum HR required in each split for consistency. Default: 1.0

pconsistency.threshold

Numeric. Minimum proportion of splits that must be consistent. Default: 0.9

m1.threshold

Numeric. Maximum m1 threshold for filtering. Default: Inf

n.splits

Integer. Number of splits for consistency evaluation. Default: 100

details

Logical. Print progress details. Default: FALSE

by.risk

Numeric. Risk interval for KM plots. Default: 12

plot.sg

Logical. Generate subgroup plots. Default: FALSE

maxk

Integer. Maximum number of factors in subgroup. Default: 7

Lsg

List of subgroup parameters.

confs_labels

Character vector mapping factor names to labels.

sg_focus

Character. Subgroup selection criterion: "hr", "maxSG", or "minSG". Default: "hr"

stop_Kgroups

Integer. Maximum number of candidates to evaluate. Default: 10

stop_threshold

Numeric or NULL. If specified, evaluation stops once any subgroup achieves consistency >= stop_threshold. This enables early termination when a sufficiently consistent subgroup is found. Default: NULL (evaluate all candidates up to stop_Kgroups).

When combined with HR-based sorting (sg_focus = "hr"), this ensures the highest-HR subgroup meeting the threshold is identified efficiently.

Note: For parallel execution, early stopping is checked after each batch completes, so some additional candidates beyond the first meeting the threshold may be evaluated. Use a smaller batch_size in parallel_args for finer-grained early stopping.

showten_subgroups

Logical. If TRUE, prints up to 10 candidate subgroups after sorting by sg_focus, showing their rank, HR, sample size, events, and factor definitions. Useful for reviewing which candidates will be evaluated for consistency. Default: FALSE

pconsistency.digits

Integer. Decimal places for consistency proportion. Default: 2

seed

Integer. Random seed for reproducible consistency splits. Default: 8316951. Set to NULL for non-reproducible random splits. The seed is used both for sequential execution (via set.seed()) and parallel execution (via future.seed).

checking

Logical. Enable additional validation checks. Default: FALSE

use_twostage

Logical. Use two-stage adaptive algorithm. Default: FALSE

twostage_args

List. Parameters for two-stage algorithm:

n.splits.screen

Splits for Stage 1 screening. Default: 30

screen.threshold

Consistency threshold for Stage 1. Default: auto

batch.size

Splits per batch in Stage 2. Default: 20

conf.level

Confidence level for early stopping. Default: 0.95

min.valid.screen

Minimum valid Stage 1 splits. Default: 10

parallel_args

List. Parallel processing configuration:

plan

Future plan: "multisession", "multicore", or "sequential"

workers

Number of parallel workers

batch_size

Number of candidates to evaluate per batch. Smaller values provide finer-grained early stopping but may increase overhead. Default: When stop_threshold is set and sg_focus is "hr" or "minSG", defaults to 1 (stop immediately when first candidate passes). For other sg_focus values with stop_threshold, defaults to min(workers, n_candidates/4). When stop_threshold is NULL, defaults to workers*2 for efficiency.

show_message

Print parallel config messages

Value

A list containing:

out_sg

Selected subgroup results

sg_focus

Selection criterion used

df_flag

Data frame with treatment recommendations

sg.harm

Subgroup definition labels

sg.harm.id

Subgroup membership indicator

algorithm

"twostage" or "fixed"

n_candidates_evaluated

Number of candidates actually evaluated

n_candidates_total

Total candidates available

n_passed

Number meeting consistency threshold

early_stop_triggered

Logical indicating if early stop occurred

early_stop_candidate

Index of candidate triggering early stop

stop_threshold

Threshold used for early stopping

seed

Random seed used for reproducibility (NULL if not set)

Examples

if (FALSE) { # \dontrun{
# Standard evaluation
result <- subgroup.consistency(
  df = trial_data,
  hr.subgroups = candidates,
  sg_focus = "hr",
  n.splits = 400,
  parallel_args = list(plan = "multisession", workers = 6)
)

# Show top 10 candidates before evaluation
result <- subgroup.consistency(
  df = trial_data,
  hr.subgroups = candidates,
  sg_focus = "hr",
  showten_subgroups = TRUE,  # Display candidates
  n.splits = 400
)

# With early stopping and custom batch size
result <- subgroup.consistency(
  df = trial_data,
  hr.subgroups = candidates,
  sg_focus = "hr",
  stop_threshold = 0.95,
  showten_subgroups = TRUE,
  parallel_args = list(
    plan = "multisession",
    workers = 6,
    batch_size = 2  # Check early stopping after every 2 candidates
  )
)
} # }