nvflare.app_opt.xgboost.recipes package

Submodules

Module contents

class XGBBaggingRecipe(name: str, min_clients: int, training_mode: str = 'bagging', num_rounds: int | None = None, num_client_bagging: int | None = None, num_local_parallel_tree: int = 1, local_subsample: float = 0.8, learning_rate: float = 0.1, objective: str = 'binary:logistic', max_depth: int = 8, eval_metric: str = 'auc', tree_method: str = 'hist', use_gpus: bool = False, nthread: int = 16, lr_mode: str = 'uniform', save_name: str = 'xgboost_model.json', data_loader_id: str = 'dataloader', per_site_config: dict[str, dict] | None = None)[source]

Bases: Recipe

XGBoost Tree-Based Recipe for federated learning (supports Bagging and Cyclic modes).

Recipe parameters, including xgb_params and nested per_site_config values, must never contain actual secrets. Read secrets from site environment variables or mounted files; references are supported only where documented in nvflare.recipe.secrets.

This recipe implements tree-based federated XGBoost with two training modes: - Bagging: Each client trains a local sub-forest, aggregated on server (federated Random Forest) - Cyclic: Clients train sequentially in rounds, each contributing to the global model

Parameters:

name (str) – Name of the federated job.
min_clients (int) – The minimum number of clients for the job.
training_mode (str, optional) – Training mode (“bagging” or “cyclic”). Default is “bagging”.
num_rounds (int, optional) – Number of training rounds. Default is 1 for bagging, 100 for cyclic.
num_client_bagging (int, optional) – Number of clients for bagging. Default is min_clients.
num_local_parallel_tree (int, optional) – Number of parallel trees per client. Default is 1.
local_subsample (float, optional) – Subsample ratio for local training. Default is 0.8.
learning_rate (float, optional) – Learning rate for XGBoost. Default is 0.1.
objective (str, optional) – Learning objective. Default is “binary:logistic”.
max_depth (int, optional) – Maximum tree depth. Default is 8.
eval_metric (str, optional) – Evaluation metric. Default is “auc”.
tree_method (str, optional) – Tree construction method. Default is “hist”.
use_gpus (bool, optional) – Whether to use GPUs. Default is False.
nthread (int, optional) – Number of threads. Default is 16.
lr_mode (str, optional) – Learning rate mode (“uniform” or “scaled”). Default is “uniform”.
save_name (str, optional) – Model save name. Default is “xgboost_model.json”.
data_loader_id (str, optional) – ID of the data loader component. Default is “dataloader”.
per_site_config (dict, optional) – Deprecated constructor form of per-site configuration. New code should call set_per_site_config(recipe, config) immediately after construction.

Example

from nvflare.app_opt.xgboost.recipes import XGBBaggingRecipe
from nvflare.app_opt.xgboost.histogram_based_v2.csv_data_loader import CSVDataLoader
from nvflare.recipe import SimEnv, set_per_site_config

# Bagging mode (federated Random Forest) with uniform learning rate
recipe = XGBBaggingRecipe(
    name="random_forest",
    min_clients=3,
    training_mode="bagging",
    num_rounds=1,
    num_local_parallel_tree=5,
    local_subsample=0.5,
)
set_per_site_config(
    recipe,
    {
        "site-1": {"data_loader": CSVDataLoader(folder="/tmp/data")},
        "site-2": {"data_loader": CSVDataLoader(folder="/tmp/data")},
        "site-3": {"data_loader": CSVDataLoader(folder="/tmp/data")},
    },
)

# Or with scaled learning rate (data-size dependent)
recipe = XGBBaggingRecipe(
    name="random_forest_scaled",
    min_clients=3,
    training_mode="bagging",
    lr_mode="scaled",
)
set_per_site_config(
    recipe,
    {
        "site-1": {"data_loader": CSVDataLoader(folder="/tmp/data"), "lr_scale": 0.5},
        "site-2": {"data_loader": CSVDataLoader(folder="/tmp/data"), "lr_scale": 0.3},
        "site-3": {"data_loader": CSVDataLoader(folder="/tmp/data"), "lr_scale": 0.2},
    },
)

env = SimEnv(num_clients=3)
run = recipe.execute(env)

This is base class of a recipe. Recipes are implemented by jobs. A concrete recipe must provide the job for recipe implementation.

Security contract – no secrets in recipe parameters:

Recipe parameters (train_args, task_args, eval_args, per_site_config, config overrides, dicts passed to add_client_config/add_server_config, exec params, etc.) can be written in clear text into generated job configuration. These parameters and their nested values must never contain actual passwords, API keys, tokens, private keys, or other credentials. Instead, read secrets from site environment variables or mounted secret files inside your code, or pass a placeholder created with nvflare.recipe.secrets.secret_ref() or nvflare.recipe.secrets.secret_file_ref() at a supported runtime boundary. See nvflare.recipe.secrets for the supported parameter locations.

Before export or run, recipes scan their parameters with heuristics and emit nvflare.recipe.secrets.PotentialSecretWarning when a value looks like an actual secret. The scan is best-effort: absence of a warning does not prove a parameter is safe to share.

Parameters:: job – the job that implements the recipe.

configure()[source]: Configure the federated job for XGBoost tree-based training.

class XGBHorizontalRecipe(name: str, min_clients: int, num_rounds: int, early_stopping_rounds: int = 2, use_gpus: bool = False, secure: bool = False, client_ranks: dict | None = None, xgb_params: dict | None = None, data_loader_id: str = 'dataloader', metrics_writer_id: str = 'metrics_writer', per_site_config: dict[str, dict] | None = None)[source]

Bases: Recipe

XGBoost Horizontal Federated Learning Recipe.

Recipe parameters, including xgb_params and nested per_site_config values, must never contain actual secrets. Read secrets from site environment variables or mounted files; references are supported only where documented in nvflare.recipe.secrets.

This recipe implements horizontal federated XGBoost using histogram-based algorithms. In horizontal federated learning, each client has different samples with the same features. The histogram-based approach enables efficient gradient boosting by computing histograms of gradients and hessians collaboratively across clients.

Parameters:

name (str) – Name of the federated job.
min_clients (int) – The minimum number of clients for the job.
num_rounds (int) – Number of boosting rounds.
early_stopping_rounds (int, optional) – Early stopping rounds. Default is 2.
use_gpus (bool, optional) – Whether to use GPUs for training. Default is False.
secure (bool, optional) – Enable secure training with Homomorphic Encryption (HE). Default is False. Requires encryption plugins to be installed and configured. When secure=True, client_ranks must be provided.
client_ranks (dict, optional) – Mapping of client names to ranks for secure training. Required when secure=True. Maps each client name to a unique rank (0-indexed). Example: {“site-1”: 0, “site-2”: 1, “site-3”: 2}.
xgb_params (dict, optional) – XGBoost parameters passed to xgboost.train(). If None, uses default params. Default params: max_depth=8, eta=0.1, objective=’binary:logistic’, eval_metric=’auc’, tree_method=’hist’, nthread=16.
data_loader_id (str, optional) – ID of the data loader component. Default is ‘dataloader’.
metrics_writer_id (str, optional) – ID of the metrics writer component. Default is ‘metrics_writer’.
per_site_config (dict, optional) – Deprecated constructor form of per-site configuration. New code should call set_per_site_config(recipe, config) immediately after construction.

Example

from nvflare.app_opt.xgboost.recipes import XGBHorizontalRecipe
from nvflare.app_opt.xgboost.histogram_based_v2.csv_data_loader import CSVDataLoader
from nvflare.recipe import SimEnv, set_per_site_config

# Build per-site configuration with data loaders
per_site_config = {
    "site-1": {"data_loader": CSVDataLoader(folder="/tmp/data/horizontal_xgb_data")},
    "site-2": {"data_loader": CSVDataLoader(folder="/tmp/data/horizontal_xgb_data")},
}

# Create recipe
recipe = XGBHorizontalRecipe(
    name="xgb_higgs_horizontal",
    min_clients=2,
    num_rounds=100,
    xgb_params={
        "max_depth": 8,
        "eta": 0.1,
        "objective": "binary:logistic",
        "eval_metric": "auc",
    },
)
set_per_site_config(recipe, per_site_config)

# Run simulation with explicit client list
clients = list(per_site_config.keys())
env = SimEnv(clients=clients)
run = recipe.execute(env)

Note

Data loaders must be configured with set_per_site_config before export or execution.
TensorBoard tracking is automatically configured for the server and configured sites.
Executor and metrics components are automatically added to each configured site.

This is base class of a recipe. Recipes are implemented by jobs. A concrete recipe must provide the job for recipe implementation.

Security contract – no secrets in recipe parameters:

Recipe parameters (train_args, task_args, eval_args, per_site_config, config overrides, dicts passed to add_client_config/add_server_config, exec params, etc.) can be written in clear text into generated job configuration. These parameters and their nested values must never contain actual passwords, API keys, tokens, private keys, or other credentials. Instead, read secrets from site environment variables or mounted secret files inside your code, or pass a placeholder created with nvflare.recipe.secrets.secret_ref() or nvflare.recipe.secrets.secret_file_ref() at a supported runtime boundary. See nvflare.recipe.secrets for the supported parameter locations.

Before export or run, recipes scan their parameters with heuristics and emit nvflare.recipe.secrets.PotentialSecretWarning when a value looks like an actual secret. The scan is best-effort: absence of a warning does not prove a parameter is safe to share.

Parameters:: job – the job that implements the recipe.

configure()[source]: Configure the federated job for XGBoost histogram-based training.

class XGBVerticalRecipe(name: str, min_clients: int, num_rounds: int, label_owner: str, early_stopping_rounds: int = 3, use_gpus: bool = False, secure: bool = False, client_ranks: dict | None = None, xgb_params: dict | None = None, data_loader_id: str = 'dataloader', metrics_writer_id: str = 'metrics_writer', in_process: bool = True, model_file_name: str = 'test.model.json', per_site_config: dict[str, dict] | None = None)[source]

Bases: Recipe

XGBoost Vertical Federated Learning Recipe.

Recipe parameters, including xgb_params and nested per_site_config values, must never contain actual secrets. Read secrets from site environment variables or mounted files; references are supported only where documented in nvflare.recipe.secrets.

This recipe implements vertical federated XGBoost where different clients have different features for the same samples. In vertical FL, data is split by columns (features) rather than rows (samples).

Key concepts: - Vertical split: Each client has different features, same sample IDs - Label owner: Only one client has the target labels - PSI required: Private Set Intersection must be run first to align sample IDs - Histogram-based: Uses histogram_v2 algorithm for vertical collaboration

Parameters:

name (str) – Name of the federated job.
min_clients (int) – The minimum number of clients for the job.
num_rounds (int) – Number of boosting rounds.
label_owner (str) – Client ID that owns the labels (e.g., ‘site-1’). Must be in format ‘site-X’.
early_stopping_rounds (int, optional) – Early stopping rounds. Default is 3.
use_gpus (bool, optional) – Whether to use GPUs for training. Default is False.
secure (bool, optional) – Enable secure training with Homomorphic Encryption (HE). Default is False. Requires encryption plugins to be installed and configured.
client_ranks (dict, optional) – Mapping of client names to unique ranks (0-indexed). Example: {“site-1”: 0, “site-2”: 1, “site-3”: 2}. In vertical mode, the label owner must be assigned rank 0. If client_ranks is omitted, the recipe assigns the label owner rank 0 and assigns the remaining clients by name. For secure training, provide client_ranks when a stable secure-rank mapping is required.
xgb_params (dict, optional) – XGBoost parameters passed to xgboost.train(). If None, uses default params. Default params: max_depth=8, eta=0.1, objective=’binary:logistic’, eval_metric=’auc’, tree_method=’hist’, nthread=16.
data_loader_id (str, optional) – ID of the data loader component. Default is ‘dataloader’.
metrics_writer_id (str, optional) – ID of the metrics writer component. Default is ‘metrics_writer’.
in_process (bool, optional) – Whether to run in-process (required for vertical). Default is True.
model_file_name (str, optional) – Model file name. Default is ‘test.model.json’.
per_site_config (dict, optional) – Deprecated constructor form of per-site configuration. New code should call set_per_site_config(recipe, config) immediately after construction.

Example

from nvflare.app_opt.xgboost.recipes import XGBVerticalRecipe
from vertical_data_loader import VerticalDataLoader
from nvflare.recipe import SimEnv, set_per_site_config

# Step 1: Run PSI first (separate job) to get intersection files
# ... PSI job execution ...

# Step 2: Create vertical XGBoost recipe and configure site data loaders
per_site_config = {
    "site-1": {
        "data_loader": VerticalDataLoader(
            data_split_path="/tmp/data/site-1/higgs.data.csv",
            psi_path="/tmp/psi/site-1/intersection.txt",
            id_col="uid",
            label_owner="site-1",
            train_proportion=0.8,
        )
    },
    "site-2": {
        "data_loader": VerticalDataLoader(
            data_split_path="/tmp/data/site-2/higgs.data.csv",
            psi_path="/tmp/psi/site-2/intersection.txt",
            id_col="uid",
            label_owner="site-1",
            train_proportion=0.8,
        )
    },
}
recipe = XGBVerticalRecipe(
    name="xgb_vertical",
    min_clients=2,
    num_rounds=100,
    label_owner="site-1",  # Only site-1 has labels
)
set_per_site_config(recipe, per_site_config)

# Step 3: Run with explicit client list
clients = list(per_site_config.keys())
env = SimEnv(clients=clients)
run = recipe.execute(env)

Note

PSI must be run first to compute sample intersection across clients
Only one client should be designated as label_owner
All clients must have overlapping sample IDs (after PSI)
Uses histogram_v2 algorithm with data_split_mode=1 (vertical)
Data loaders must be configured with set_per_site_config before export or execution
Executor and metrics components are automatically added to each configured site
TensorBoard tracking is automatically configured

This is base class of a recipe. Recipes are implemented by jobs. A concrete recipe must provide the job for recipe implementation.

Security contract – no secrets in recipe parameters:

Recipe parameters (train_args, task_args, eval_args, per_site_config, config overrides, dicts passed to add_client_config/add_server_config, exec params, etc.) can be written in clear text into generated job configuration. These parameters and their nested values must never contain actual passwords, API keys, tokens, private keys, or other credentials. Instead, read secrets from site environment variables or mounted secret files inside your code, or pass a placeholder created with nvflare.recipe.secrets.secret_ref() or nvflare.recipe.secrets.secret_file_ref() at a supported runtime boundary. See nvflare.recipe.secrets for the supported parameter locations.

Before export or run, recipes scan their parameters with heuristics and emit nvflare.recipe.secrets.PotentialSecretWarning when a value looks like an actual secret. The scan is best-effort: absence of a warning does not prove a parameter is safe to share.

Parameters:: job – the job that implements the recipe.

configure()[source]: Configure the federated job for vertical XGBoost training.