nvflare.app_opt.xgboost.recipes.vertical module

class XGBVerticalRecipe(name: str, min_clients: int, num_rounds: int, label_owner: str, early_stopping_rounds: int = 3, use_gpus: bool = False, secure: bool = False, client_ranks: dict | None = None, xgb_params: dict | None = None, data_loader_id: str = 'dataloader', metrics_writer_id: str = 'metrics_writer', in_process: bool = True, model_file_name: str = 'test.model.json', per_site_config: dict[str, dict] | None = None)[source]

Bases: Recipe

XGBoost Vertical Federated Learning Recipe.

Recipe parameters, including xgb_params and nested per_site_config values, must never contain actual secrets. Read secrets from site environment variables or mounted files; references are supported only where documented in nvflare.recipe.secrets.

This recipe implements vertical federated XGBoost where different clients have different features for the same samples. In vertical FL, data is split by columns (features) rather than rows (samples).

Key concepts: - Vertical split: Each client has different features, same sample IDs - Label owner: Only one client has the target labels - PSI required: Private Set Intersection must be run first to align sample IDs - Histogram-based: Uses histogram_v2 algorithm for vertical collaboration

Parameters:

name (str) – Name of the federated job.
min_clients (int) – The minimum number of clients for the job.
num_rounds (int) – Number of boosting rounds.
label_owner (str) – Client ID that owns the labels (e.g., ‘site-1’). Must be in format ‘site-X’.
early_stopping_rounds (int, optional) – Early stopping rounds. Default is 3.
use_gpus (bool, optional) – Whether to use GPUs for training. Default is False.
secure (bool, optional) – Enable secure training with Homomorphic Encryption (HE). Default is False. Requires encryption plugins to be installed and configured.
client_ranks (dict, optional) – Mapping of client names to unique ranks (0-indexed). Example: {“site-1”: 0, “site-2”: 1, “site-3”: 2}. In vertical mode, the label owner must be assigned rank 0. If client_ranks is omitted, the recipe assigns the label owner rank 0 and assigns the remaining clients by name. For secure training, provide client_ranks when a stable secure-rank mapping is required.
xgb_params (dict, optional) – XGBoost parameters passed to xgboost.train(). If None, uses default params. Default params: max_depth=8, eta=0.1, objective=’binary:logistic’, eval_metric=’auc’, tree_method=’hist’, nthread=16.
data_loader_id (str, optional) – ID of the data loader component. Default is ‘dataloader’.
metrics_writer_id (str, optional) – ID of the metrics writer component. Default is ‘metrics_writer’.
in_process (bool, optional) – Whether to run in-process (required for vertical). Default is True.
model_file_name (str, optional) – Model file name. Default is ‘test.model.json’.
per_site_config (dict, optional) – Deprecated constructor form of per-site configuration. New code should call set_per_site_config(recipe, config) immediately after construction.

Example

from nvflare.app_opt.xgboost.recipes import XGBVerticalRecipe
from vertical_data_loader import VerticalDataLoader
from nvflare.recipe import SimEnv, set_per_site_config

# Step 1: Run PSI first (separate job) to get intersection files
# ... PSI job execution ...

# Step 2: Create vertical XGBoost recipe and configure site data loaders
per_site_config = {
    "site-1": {
        "data_loader": VerticalDataLoader(
            data_split_path="/tmp/data/site-1/higgs.data.csv",
            psi_path="/tmp/psi/site-1/intersection.txt",
            id_col="uid",
            label_owner="site-1",
            train_proportion=0.8,
        )
    },
    "site-2": {
        "data_loader": VerticalDataLoader(
            data_split_path="/tmp/data/site-2/higgs.data.csv",
            psi_path="/tmp/psi/site-2/intersection.txt",
            id_col="uid",
            label_owner="site-1",
            train_proportion=0.8,
        )
    },
}
recipe = XGBVerticalRecipe(
    name="xgb_vertical",
    min_clients=2,
    num_rounds=100,
    label_owner="site-1",  # Only site-1 has labels
)
set_per_site_config(recipe, per_site_config)

# Step 3: Run with explicit client list
clients = list(per_site_config.keys())
env = SimEnv(clients=clients)
run = recipe.execute(env)

Note

PSI must be run first to compute sample intersection across clients
Only one client should be designated as label_owner
All clients must have overlapping sample IDs (after PSI)
Uses histogram_v2 algorithm with data_split_mode=1 (vertical)
Data loaders must be configured with set_per_site_config before export or execution
Executor and metrics components are automatically added to each configured site
TensorBoard tracking is automatically configured

This is base class of a recipe. Recipes are implemented by jobs. A concrete recipe must provide the job for recipe implementation.

Security contract – no secrets in recipe parameters:

Recipe parameters (train_args, task_args, eval_args, per_site_config, config overrides, dicts passed to add_client_config/add_server_config, exec params, etc.) can be written in clear text into generated job configuration. These parameters and their nested values must never contain actual passwords, API keys, tokens, private keys, or other credentials. Instead, read secrets from site environment variables or mounted secret files inside your code, or pass a placeholder created with nvflare.recipe.secrets.secret_ref() or nvflare.recipe.secrets.secret_file_ref() at a supported runtime boundary. See nvflare.recipe.secrets for the supported parameter locations.

Before export or run, recipes scan their parameters with heuristics and emit nvflare.recipe.secrets.PotentialSecretWarning when a value looks like an actual secret. The scan is best-effort: absence of a warning does not prove a parameter is safe to share.

Parameters:: job – the job that implements the recipe.

configure()[source]: Configure the federated job for vertical XGBoost training.