nvflare.app_opt.xgboost.recipes.vertical module
- class XGBVerticalRecipe(name: str, min_clients: int, num_rounds: int, label_owner: str, early_stopping_rounds: int = 3, use_gpus: bool = False, secure: bool = False, client_ranks: dict | None = None, xgb_params: dict | None = None, data_loader_id: str = 'dataloader', metrics_writer_id: str = 'metrics_writer', in_process: bool = True, model_file_name: str = 'test.model.json', per_site_config: dict[str, dict] | None = None)[source]
Bases:
RecipeXGBoost Vertical Federated Learning Recipe.
This recipe implements vertical federated XGBoost where different clients have different features for the same samples. In vertical FL, data is split by columns (features) rather than rows (samples).
Key concepts: - Vertical split: Each client has different features, same sample IDs - Label owner: Only one client has the target labels - PSI required: Private Set Intersection must be run first to align sample IDs - Histogram-based: Uses histogram_v2 algorithm for vertical collaboration
- Parameters:
name (str) – Name of the federated job.
min_clients (int) – The minimum number of clients for the job.
num_rounds (int) – Number of boosting rounds.
label_owner (str) – Client ID that owns the labels (e.g., ‘site-1’). Must be in format ‘site-X’.
early_stopping_rounds (int, optional) – Early stopping rounds. Default is 3.
use_gpus (bool, optional) – Whether to use GPUs for training. Default is False.
secure (bool, optional) – Enable secure training with Homomorphic Encryption (HE). Default is False. Requires encryption plugins to be installed and configured.
client_ranks (dict, optional) – Mapping of client names to unique ranks (0-indexed). Example: {“site-1”: 0, “site-2”: 1, “site-3”: 2}. In vertical mode, the label owner must be assigned rank 0. If client_ranks is omitted, the recipe assigns the label owner rank 0 and assigns the remaining clients by name. For secure training, provide client_ranks when a stable secure-rank mapping is required.
xgb_params (dict, optional) – XGBoost parameters passed to xgboost.train(). If None, uses default params. Default params: max_depth=8, eta=0.1, objective=’binary:logistic’, eval_metric=’auc’, tree_method=’hist’, nthread=16.
data_loader_id (str, optional) – ID of the data loader component. Default is ‘dataloader’.
metrics_writer_id (str, optional) – ID of the metrics writer component. Default is ‘metrics_writer’.
in_process (bool, optional) – Whether to run in-process (required for vertical). Default is True.
model_file_name (str, optional) – Model file name. Default is ‘test.model.json’.
per_site_config (dict, optional) – Per-site configuration mapping site names to config dicts. Each config dict must contain ‘data_loader’ key with XGBDataLoader instance. Example: {“site-1”: {“data_loader”: VerticalDataLoader(…)}, “site-2”: {…}}
Example
from nvflare.app_opt.xgboost.recipes import XGBVerticalRecipe from vertical_data_loader import VerticalDataLoader from nvflare.recipe import SimEnv # Step 1: Run PSI first (separate job) to get intersection files # ... PSI job execution ... # Step 2: Create vertical XGBoost recipe with per-site data loaders recipe = XGBVerticalRecipe( name="xgb_vertical", min_clients=2, num_rounds=100, label_owner="site-1", # Only site-1 has labels per_site_config={ "site-1": { "data_loader": VerticalDataLoader( data_split_path="/tmp/data/site-1/higgs.data.csv", psi_path="/tmp/psi/site-1/intersection.txt", id_col="uid", label_owner="site-1", train_proportion=0.8, ) }, "site-2": { "data_loader": VerticalDataLoader( data_split_path="/tmp/data/site-2/higgs.data.csv", psi_path="/tmp/psi/site-2/intersection.txt", id_col="uid", label_owner="site-1", train_proportion=0.8, ) }, ) # Step 3: Run with explicit client list clients = list(per_site_config.keys()) env = SimEnv(clients=clients) run = recipe.execute(env)
Note
PSI must be run first to compute sample intersection across clients
Only one client should be designated as label_owner
All clients must have overlapping sample IDs (after PSI)
Uses histogram_v2 algorithm with data_split_mode=1 (vertical)
Data loaders must be configured via per_site_config parameter
Executor and metrics components are automatically added to all clients
TensorBoard tracking is automatically configured
This is base class of a recipe. Recipes are implemented by jobs. A concrete recipe must provide the job for recipe implementation.
- Parameters:
job – the job that implements the recipe.