nvflare.app_opt.xgboost.recipes package

Submodules

Module contents

class XGBBaggingRecipe(name: str, min_clients: int, training_mode: str = 'bagging', num_rounds: int | None = None, num_client_bagging: int | None = None, num_local_parallel_tree: int = 1, local_subsample: float = 0.8, learning_rate: float = 0.1, objective: str = 'binary:logistic', max_depth: int = 8, eval_metric: str = 'auc', tree_method: str = 'hist', use_gpus: bool = False, nthread: int = 16, lr_mode: str = 'uniform', save_name: str = 'xgboost_model.json', data_loader_id: str = 'dataloader', per_site_config: dict[str, dict] | None = None)[source]

Bases: Recipe

XGBoost Tree-Based Recipe for federated learning (supports Bagging and Cyclic modes).

This recipe implements tree-based federated XGBoost with two training modes: - Bagging: Each client trains a local sub-forest, aggregated on server (federated Random Forest) - Cyclic: Clients train sequentially in rounds, each contributing to the global model

Parameters:
  • name (str) – Name of the federated job.

  • min_clients (int) – The minimum number of clients for the job.

  • training_mode (str, optional) – Training mode (“bagging” or “cyclic”). Default is “bagging”.

  • num_rounds (int, optional) – Number of training rounds. Default is 1 for bagging, 100 for cyclic.

  • num_client_bagging (int, optional) – Number of clients for bagging. Default is min_clients.

  • num_local_parallel_tree (int, optional) – Number of parallel trees per client. Default is 1.

  • local_subsample (float, optional) – Subsample ratio for local training. Default is 0.8.

  • learning_rate (float, optional) – Learning rate for XGBoost. Default is 0.1.

  • objective (str, optional) – Learning objective. Default is “binary:logistic”.

  • max_depth (int, optional) – Maximum tree depth. Default is 8.

  • eval_metric (str, optional) – Evaluation metric. Default is “auc”.

  • tree_method (str, optional) – Tree construction method. Default is “hist”.

  • use_gpus (bool, optional) – Whether to use GPUs. Default is False.

  • nthread (int, optional) – Number of threads. Default is 16.

  • lr_mode (str, optional) – Learning rate mode (“uniform” or “scaled”). Default is “uniform”.

  • save_name (str, optional) – Model save name. Default is “xgboost_model.json”.

  • data_loader_id (str, optional) – ID of the data loader component. Default is “dataloader”.

  • per_site_config (dict, optional) – Per-site configuration mapping site names to config dicts. Each config dict must contain ‘data_loader’ key with XGBDataLoader instance. Can optionally include ‘lr_scale’ for scaled learning rate mode. Example: {“site-1”: {“data_loader”: CSVDataLoader(…), “lr_scale”: 0.5}, “site-2”: {…}}

Example

from nvflare.app_opt.xgboost.recipes import XGBBaggingRecipe
from nvflare.app_opt.xgboost.histogram_based_v2.csv_data_loader import CSVDataLoader
from nvflare.recipe import SimEnv

# Bagging mode (federated Random Forest) with uniform learning rate
recipe = XGBBaggingRecipe(
    name="random_forest",
    min_clients=3,
    training_mode="bagging",
    num_rounds=1,
    num_local_parallel_tree=5,
    local_subsample=0.5,
    per_site_config={
        "site-1": {"data_loader": CSVDataLoader(folder="/tmp/data")},
        "site-2": {"data_loader": CSVDataLoader(folder="/tmp/data")},
        "site-3": {"data_loader": CSVDataLoader(folder="/tmp/data")},
    },
)

# Or with scaled learning rate (data-size dependent)
recipe = XGBBaggingRecipe(
    name="random_forest_scaled",
    min_clients=3,
    training_mode="bagging",
    lr_mode="scaled",
    per_site_config={
        "site-1": {"data_loader": CSVDataLoader(folder="/tmp/data"), "lr_scale": 0.5},
        "site-2": {"data_loader": CSVDataLoader(folder="/tmp/data"), "lr_scale": 0.3},
        "site-3": {"data_loader": CSVDataLoader(folder="/tmp/data"), "lr_scale": 0.2},
    },
)

env = SimEnv(num_clients=3)
run = recipe.execute(env)

This is base class of a recipe. Recipes are implemented by jobs. A concrete recipe must provide the job for recipe implementation.

Parameters:

job – the job that implements the recipe.

configure()[source]

Configure the federated job for XGBoost tree-based training.

class XGBHorizontalRecipe(name: str, min_clients: int, num_rounds: int, early_stopping_rounds: int = 2, use_gpus: bool = False, secure: bool = False, client_ranks: dict | None = None, xgb_params: dict | None = None, data_loader_id: str = 'dataloader', metrics_writer_id: str = 'metrics_writer', per_site_config: dict[str, dict] | None = None)[source]

Bases: Recipe

XGBoost Horizontal Federated Learning Recipe.

This recipe implements horizontal federated XGBoost using histogram-based algorithms. In horizontal federated learning, each client has different samples with the same features. The histogram-based approach enables efficient gradient boosting by computing histograms of gradients and hessians collaboratively across clients.

Parameters:
  • name (str) – Name of the federated job.

  • min_clients (int) – The minimum number of clients for the job.

  • num_rounds (int) – Number of boosting rounds.

  • early_stopping_rounds (int, optional) – Early stopping rounds. Default is 2.

  • use_gpus (bool, optional) – Whether to use GPUs for training. Default is False.

  • secure (bool, optional) – Enable secure training with Homomorphic Encryption (HE). Default is False. Requires encryption plugins to be installed and configured. When secure=True, client_ranks must be provided.

  • client_ranks (dict, optional) – Mapping of client names to ranks for secure training. Required when secure=True. Maps each client name to a unique rank (0-indexed). Example: {“site-1”: 0, “site-2”: 1, “site-3”: 2}.

  • xgb_params (dict, optional) – XGBoost parameters passed to xgboost.train(). If None, uses default params. Default params: max_depth=8, eta=0.1, objective=’binary:logistic’, eval_metric=’auc’, tree_method=’hist’, nthread=16.

  • data_loader_id (str, optional) – ID of the data loader component. Default is ‘dataloader’.

  • metrics_writer_id (str, optional) – ID of the metrics writer component. Default is ‘metrics_writer’.

  • per_site_config (dict) – Per-site configuration mapping site names to config dicts. Each config dict must contain ‘data_loader’ key with XGBDataLoader instance. Example: {“site-1”: {“data_loader”: CSVDataLoader(…)}, “site-2”: {…}}

Example

from nvflare.app_opt.xgboost.recipes import XGBHorizontalRecipe
from nvflare.app_opt.xgboost.histogram_based_v2.csv_data_loader import CSVDataLoader
from nvflare.recipe import SimEnv

# Build per-site configuration with data loaders
per_site_config = {
    "site-1": {"data_loader": CSVDataLoader(folder="/tmp/data/horizontal_xgb_data")},
    "site-2": {"data_loader": CSVDataLoader(folder="/tmp/data/horizontal_xgb_data")},
}

# Create recipe
recipe = XGBHorizontalRecipe(
    name="xgb_higgs_horizontal",
    min_clients=2,
    num_rounds=100,
    xgb_params={
        "max_depth": 8,
        "eta": 0.1,
        "objective": "binary:logistic",
        "eval_metric": "auc",
    },
    per_site_config=per_site_config,
)

# Run simulation with explicit client list
clients = list(per_site_config.keys())
env = SimEnv(clients=clients)
run = recipe.execute(env)

Note

  • Data loaders must be configured via per_site_config parameter.

  • TensorBoard tracking is automatically configured for both server and clients.

  • Executor and metrics components are automatically added to all clients.

This is base class of a recipe. Recipes are implemented by jobs. A concrete recipe must provide the job for recipe implementation.

Parameters:

job – the job that implements the recipe.

configure()[source]

Configure the federated job for XGBoost histogram-based training.

class XGBVerticalRecipe(name: str, min_clients: int, num_rounds: int, label_owner: str, early_stopping_rounds: int = 3, use_gpus: bool = False, secure: bool = False, client_ranks: dict | None = None, xgb_params: dict | None = None, data_loader_id: str = 'dataloader', metrics_writer_id: str = 'metrics_writer', in_process: bool = True, model_file_name: str = 'test.model.json', per_site_config: dict[str, dict] | None = None)[source]

Bases: Recipe

XGBoost Vertical Federated Learning Recipe.

This recipe implements vertical federated XGBoost where different clients have different features for the same samples. In vertical FL, data is split by columns (features) rather than rows (samples).

Key concepts: - Vertical split: Each client has different features, same sample IDs - Label owner: Only one client has the target labels - PSI required: Private Set Intersection must be run first to align sample IDs - Histogram-based: Uses histogram_v2 algorithm for vertical collaboration

Parameters:
  • name (str) – Name of the federated job.

  • min_clients (int) – The minimum number of clients for the job.

  • num_rounds (int) – Number of boosting rounds.

  • label_owner (str) – Client ID that owns the labels (e.g., ‘site-1’). Must be in format ‘site-X’.

  • early_stopping_rounds (int, optional) – Early stopping rounds. Default is 3.

  • use_gpus (bool, optional) – Whether to use GPUs for training. Default is False.

  • secure (bool, optional) – Enable secure training with Homomorphic Encryption (HE). Default is False. Requires encryption plugins to be installed and configured.

  • client_ranks (dict, optional) – Mapping of client names to unique ranks (0-indexed). Example: {“site-1”: 0, “site-2”: 1, “site-3”: 2}. In vertical mode, the label owner must be assigned rank 0. If client_ranks is omitted, the recipe assigns the label owner rank 0 and assigns the remaining clients by name. For secure training, provide client_ranks when a stable secure-rank mapping is required.

  • xgb_params (dict, optional) – XGBoost parameters passed to xgboost.train(). If None, uses default params. Default params: max_depth=8, eta=0.1, objective=’binary:logistic’, eval_metric=’auc’, tree_method=’hist’, nthread=16.

  • data_loader_id (str, optional) – ID of the data loader component. Default is ‘dataloader’.

  • metrics_writer_id (str, optional) – ID of the metrics writer component. Default is ‘metrics_writer’.

  • in_process (bool, optional) – Whether to run in-process (required for vertical). Default is True.

  • model_file_name (str, optional) – Model file name. Default is ‘test.model.json’.

  • per_site_config (dict, optional) – Per-site configuration mapping site names to config dicts. Each config dict must contain ‘data_loader’ key with XGBDataLoader instance. Example: {“site-1”: {“data_loader”: VerticalDataLoader(…)}, “site-2”: {…}}

Example

from nvflare.app_opt.xgboost.recipes import XGBVerticalRecipe
from vertical_data_loader import VerticalDataLoader
from nvflare.recipe import SimEnv

# Step 1: Run PSI first (separate job) to get intersection files
# ... PSI job execution ...

# Step 2: Create vertical XGBoost recipe with per-site data loaders
recipe = XGBVerticalRecipe(
    name="xgb_vertical",
    min_clients=2,
    num_rounds=100,
    label_owner="site-1",  # Only site-1 has labels
    per_site_config={
    "site-1": {
        "data_loader": VerticalDataLoader(
            data_split_path="/tmp/data/site-1/higgs.data.csv",
            psi_path="/tmp/psi/site-1/intersection.txt",
            id_col="uid",
            label_owner="site-1",
            train_proportion=0.8,
        )
    },
    "site-2": {
        "data_loader": VerticalDataLoader(
            data_split_path="/tmp/data/site-2/higgs.data.csv",
            psi_path="/tmp/psi/site-2/intersection.txt",
            id_col="uid",
            label_owner="site-1",
            train_proportion=0.8,
        )
    },
)

# Step 3: Run with explicit client list
clients = list(per_site_config.keys())
env = SimEnv(clients=clients)
run = recipe.execute(env)

Note

  • PSI must be run first to compute sample intersection across clients

  • Only one client should be designated as label_owner

  • All clients must have overlapping sample IDs (after PSI)

  • Uses histogram_v2 algorithm with data_split_mode=1 (vertical)

  • Data loaders must be configured via per_site_config parameter

  • Executor and metrics components are automatically added to all clients

  • TensorBoard tracking is automatically configured

This is base class of a recipe. Recipes are implemented by jobs. A concrete recipe must provide the job for recipe implementation.

Parameters:

job – the job that implements the recipe.

configure()[source]

Configure the federated job for vertical XGBoost training.