Federated Learning for XGBoost
Overview
This guide demonstrates how to use NVIDIA FLARE (NVFlare) to train XGBoost models in a federated learning environment. It showcases multiple collaboration strategies with varying levels of security.
NVFlare provides the following advantages:
Secure training with Homomorphic Encryption (HE), protecting local histograms and gradients from the federated server and passive parties.
Lifecycle management of XGBoost processes
Reliable messaging that can overcome network glitches
Training over complex networks with relays
This guide covers several federated XGBoost configurations:
Horizontal Collaboration: Histogram-based and tree-based approaches (non-secure and secure)
Vertical Collaboration: Histogram-based approach (non-secure and secure with Homomorphic Encryption)
What is XGBoost?
XGBoost (eXtreme Gradient Boosting) is a powerful machine learning algorithm that uses decision/regression trees for classification and regression tasks. It excels particularly with tabular data and remains widely used due to its:
High performance on structured data
Explainability of predictions
Computational efficiency
These examples use DMLC XGBoost, which provides:
GPU acceleration capabilities
Distributed and federated learning support
Optimized gradient boosting implementations
Federated Learning Modes
Horizontal Federated Learning
In horizontal collaboration, each participant has:
Same features (columns) across all sites
Different data samples (rows) at each site
Equal status as label owners
Example: Multiple hospitals each have complete patient records (all features), but different patients.
Vertical Federated Learning
In vertical collaboration, each participant has:
Different features (columns) at each site
Same data samples (rows) across all sites
One “active party” (label owner) and multiple “passive parties”
Example: A bank and a retailer have data about the same customers, but different attributes (financial vs. shopping behavior).
Supported Training Modes
When running with NVFlare, all XGBoost communications are local and messages are forwarded through NVFlare’s communication infrastructure. The encryption is handled in XGBoost by encryption plugins, which are external components that can be installed at runtime.
NVFlare supports federated training in the following 4 modes:
Horizontal without HE-based security protection - Histogram-based or tree-based (tree-based is secured by removing “sum_hessian” values before transmission)
Vertical without HE-based security protection - Histogram-based
Horizontal with HE - Histogram-based (histograms secured against federated server)
Vertical with HE - Histogram-based (gradients secured against passive parties)
Security Risks and Mitigations
Risks
Federated XGBoost faces three main security risks:
Model Statistics Leakage: The default XGBoost JSON model contains “sum_hessian” statistics that enable model inversion attacks to recover data distributions. (Reference: TimberStrike)
Histogram Leakage: Gradient histograms can be exploited to reconstruct data distributions. The same model statistics of “sum_hessian” can be derived from histograms. (Reference: TimberStrike)
Gradient Leakage: Sample-wise gradients may reveal label information. (Reference: SecureBoost)
Attack Surface
The attack surface varies by collaboration mode and party role:
Server: Depending on the collaboration mode, the server may have access to
The local model:
Horizontal tree-based:
Model Statistics Leakage over each client’s data distribution
Local histograms:
Horizontal histogram-based / vertical histogram-based:
Histogram Leakage over each client / passive party’s data distribution
Sample-wise gradients:
Vertical histogram-based:
Gradient Leakage over active party’s label information
Clients: Depending on the collaboration mode, the clients may have access to
The aggregated global model:
Horizontal tree-based:
Model Statistics Leakage over global data distribution
Global histograms:
Horizontal histogram-based:
Histogram Leakage over global data distribution
Local histograms:
Vertical histogram-based:
Histogram Leakage over each passive party’s data distribution on active party
Sample-wise gradients:
Gradient Leakage over active party’s label information on passive parties
Mitigations
The following table summarizes the available mitigations for different collaboration scenarios:
Collaboration Mode |
Algorithm |
Data Exchange |
Risk Mitigated |
Security Measure |
Implementation |
|---|---|---|---|---|---|
Horizontal |
Tree-based |
Clients send locally boosted trees to server; server combines and distributes trees back to clients |
Model statistics leakage on both server and clients |
Remove “sum_hessian” values from JSON model |
Removed before clients send local trees to server |
Horizontal |
Histogram-based |
Clients send local histograms to server; server aggregates to global histogram and distributes it back to clients |
Histogram leakage on server (client-side remain) |
Encrypt histograms |
Local histograms encrypted before transmission |
Vertical |
Histogram-based |
Active party computes gradients; routed by server, passive parties receive gradients, compute histograms, and send them back to active party through server |
Histogram leakage on server (active party-side remain), Gradient leakage on both server and passive parties |
Primary: Encrypt gradients; Secondary: Mask feature ownership in split values |
Gradients encrypted before sending out to passive parties |
Notes:
Vertical histogram-based:
Primary goal: Protect sample gradients from passive parties (critical)
Secondary goal: Hide split values from non-feature owners (desirable but lower risk)
The remaining two risks will be discussed in the Advanced Topics: Future Security Scenarios section.
TimberStrike Attack Analysis
TimberStrike is a model inversion attack that exploits sum_hessian values and tree structure to estimate training data distributions. Empirical results vary significantly with dataset scale:
Dataset |
Samples |
Features |
Reconstruction Accuracy |
|---|---|---|---|
Diabetes (toy) |
768 |
8 |
65.80% |
CreditCard (realistic) |
284,807 |
30 |
8.72% |
Note
The above results were obtained before NVFlare’s sum_hessian removal—i.e., with full model statistics available to the attacker. With NVFlare’s built-in protection enabled (see below), TimberStrike’s primary information source is eliminated, which is expected to substantially degrade attack performance. “Reconstruction accuracy” is a distance-tolerance metric (not exact recovery); see the TimberStrike paper for the precise definition.
Risk Assessment
On practical datasets (CreditCard), TimberStrike achieves <10% accuracy even with sum_hessian available. To put this in perspective, we use NeMo SafeSynthesizer as a reference. SafeSynthesizer is a privacy-focused synthetic data generation tool purpose-built for compliance (GDPR, HIPAA), with built-in membership inference protection and optional differential privacy guarantees. Even with these privacy safeguards, its synthetic data still achieves 51.98% proximity to real samples, because preserving data utility requires some statistical similarity. TimberStrike’s 8.72% falls well below this reference point. Acceptable privacy levels are inherently data-dependent; users are encouraged to run similar comparisons on their own datasets.
Protection
Built-in: NVFlare removes
sum_hessianfrom model transmissions in horizontal tree-based mode, eliminating the attack’s primary information source.Additional: Increase
min_child_weightto raise the minimum sum of instance weight (hessian) required per leaf, resulting in coarser tree structure with fewer splits. The TimberStrike paper shows that tree depth (and by extension, number of splits) directly impacts reconstruction accuracy, so reducing tree granularity is expected to limit information exposure. Optimal values are task-dependent; refer to the paper for analysis of the privacy-utility trade-off. This parameter can be added toxgb_paramsin the recipe:recipe = XGBHorizontalRecipe( name="xgb_higgs_horizontal", min_clients=2, num_rounds=100, xgb_params={ "max_depth": 8, "eta": 0.1, "objective": "binary:logistic", "eval_metric": "auc", "min_child_weight": 100, # increase for coarser trees and reduced privacy exposure }, per_site_config=per_site_config, )
Closest Reconstructed Samples (CreditCard)
Each example below shows the closest match (minimum distance) from its respective method. Note that these are different source records, shown to illustrate the reconstruction quality of each method independently.
TimberStrike (8.72% accuracy):
Original: [-27.0, -25.3, -12.1, -1.53, -3.67, -1.82, -3.34, -26.6, 1.08, -0.42, 3.61, -5.42, ...]
Reconstructed: [-30.0, -29.2, -10.5, 7.60, 2.20, -0.11, 4.55, -5.84, 5.50, 4.38, 3.07, 1.26, ...]
SafeSynthesizer (51.98% accuracy):
Original: [2.06, -0.03, -1.06, 0.42, -0.13, -1.21, 0.20, -0.35, 0.51, 0.07, -0.70, 0.54, ...]
Reconstructed: [2.06, -0.05, -1.07, 0.41, -0.12, -1.20, 0.20, -0.34, 0.50, 0.06, -0.68, 0.53, ...]
TimberStrike shows substantial deviations even on its closest match (e.g., feature 4: -1.53 → 7.60), while SafeSynthesizer’s closest match differs by only 0.01–0.02 per feature yet remains privacy-compliant by design. This suggests TimberStrike’s reconstructions may not constitute a meaningful privacy risk for a given dataset like CreditCard.
GPU Acceleration
Federated XGBoost supports two levels of GPU acceleration:
1. XGBoost GPU Training
Enable GPU-accelerated training by setting tree_method='gpu_hist' when initializing the XGBoost model.
Performance: Up to 4.15x speedup vs. CPU training (GPU XGBoost Blog)
2. GPU-Accelerated Homomorphic Encryption (HE)
NVFlare provides GPU acceleration for HE operations using specialized encryption plugins.
Performance: Up to 36.5x speedup vs. CPU encryption (NVFlare Secure XGBoost Blog)
We will refer to these as “CPU/GPU XGBoost” and “CPU/GPU Encryption”.
Security Implementation Matrix
The following table shows which security measures are supported across different hardware configurations:
Collaboration Mode |
Security Goal |
CPU XGBoost + CPU Encryption |
CPU XGBoost + GPU Encryption |
GPU XGBoost + CPU Encryption |
GPU XGBoost + GPU Encryption |
|---|---|---|---|---|---|
Horizontal |
Histogram protection against server |
✅ |
N/A* |
✅ |
N/A* |
Vertical |
Primary: Gradient protection |
✅ |
✅ |
✅ |
✅ |
Vertical |
Secondary: Split value masking |
✅ |
✅ |
❌ |
❌ |
*Note: Horizontal histogram encryption is not computationally intensive (encrypting histogram vectors), so GPU encryption is not needed.
Implementation Notes:
Vertical mode primary goal (gradient protection): Fully supported across all configurations
Vertical mode secondary goal (split value masking): Only supported with CPU XGBoost
Advanced Topics: Future Security Scenarios
The following security scenarios are not currently implemented in our solution. Users should be aware that plaintext histogram communication can reveal data distribution information, which may enable data reconstruction attacks as stated above. On the other hand, similar statistics can also be derived from common practices such as federated statistics. As the attack potency depends on multiple factors including data complexity, model hyperparameters, and the data distribution information that can be utilized, the corresponding indications of a certain type of attack can vary significantly. This is still an open and active research area.
Potential Future Enhancements to Protect Against All Parties
Collaboration Mode |
Algorithm |
Remaining Security Risk |
Possible Approach |
Challenges |
|---|---|---|---|---|
Horizontal |
Histogram-based |
Histogram leakage over global data distribution on clients (in addition to server as addressed above) |
Confidential computing, advanced HE |
HE compatibility issue [*] with server performing calculations and distributing only final splits |
Vertical |
Histogram-based |
Histogram leakage over each passive party’s data distribution on active party (in addition to Histogram leakage on server, and Gradient leakage on server and passive parties as addressed above) |
Local data preprocessing and anonymization, confidential computing, advanced HE |
HE compatibility issue [*]_ with passive parties performing calculations and sending only final splits |
Prerequisites
Required Python Packages
NVFlare 2.7.2 or above,
pip install nvflare~=2.7.2
Federated Secure XGBoost, which can be installed from the binary build using this command,
pip install https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/federated-secure/xgboost-2.2.0.dev0%2B4601688195708f7c31fcceeb0e0ac735e7311e61-py3-none-manylinux_2_28_x86_64.whl
Note
The xgboost build environment may depend on specific numpy versions that require Python < 3.12.
or in case you need to get the most current build of XGBoost,
pip install https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/federated-secure/`curl -s https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/federated-secure/meta.json | grep -o 'xgboost-2\.2.*whl'|sed -e 's/+/%2B/'`
TenSEAL package is needed for horizontal secure training,
pip install tenseal
ipcl_python package is required for vertical secure training if nvflare plugin is used. This package is not needed if cuda_paillier plugin is used.
pip install ipcl-python
This package is only available for Python 3.8 on PyPI. For other versions of python, it needs to be installed from github,
pip install git+https://github.com/intel/pailliercryptolib_python.git@development
System Environments
To support secure training, several homomorphic encryption libraries are used. Those libraries require Intel CPU or Nvidia GPU.
Linux is the preferred OS. It’s tested extensively under Ubuntu 22.4.
The following docker image is recommended for GPU training:
nvcr.io/nvidia/pytorch:24.03-py3
Building Encryption Plugins
The secure training requires encryption plugins, which need to be built from the source code for your specific environment.
To build the plugins, check out the NVFlare source code from https://github.com/NVIDIA/NVFlare and follow the instructions in this document.
NVFlare Provisioning
For horizontal secure training, the NVFlare system must be provisioned with a homomorphic encryption context. The HEBuilder in project.yml is used to achieve this.
An example configuration can be found at secure_project.yml.
This is a snippet of the secure_project.yml file with the HEBuilder:
api_version: 3
name: secure_project
description: NVIDIA FLARE sample project yaml file for CIFAR-10 example
participants:
...
builders:
- path: nvflare.lighter.impl.workspace.WorkspaceBuilder
args:
template_file: master_template.yml
- path: nvflare.lighter.impl.template.TemplateBuilder
- path: nvflare.lighter.impl.static_file.StaticFileBuilder
args:
config_folder: config
- path: nvflare.lighter.impl.he.HEBuilder
args:
poly_modulus_degree: 8192
coeff_mod_bit_sizes: [60, 40, 40]
scale_bits: 40
scheme: CKKS
- path: nvflare.lighter.impl.cert.CertBuilder
- path: nvflare.lighter.impl.signature.SignatureBuilder
Data Preparation
Data must be properly formatted for federated XGBoost training based on the collaboration mode.
Horizontal Training
For horizontal training, the datasets on all clients must share the same columns (features). Each client has different data samples (rows).
Vertical Training
For vertical training, the datasets on all clients contain different columns (features), but must share overlapping rows (data samples). The label column is typically assigned to site-1 (the “active party”) by default.
For more details on vertical split preprocessing, refer to the Vertical XGBoost Example.
XGBoost Plugin Configuration
XGBoost requires an encryption plugin to handle secure training. Two plugins are available:
cuda_paillier: The default plugin. This plugin uses GPU for cryptographic operations.
nvflare: This plugin forwards data locally to NVFlare process for encryption.
Note
All clients must use the same plugin. When different plugins are used in different clients, the behavior of federated XGBoost is undetermined, which can cause the job to crash.
The cuda_paillier plugin requires NVIDIA GPUs that support compute capability 7.0 or higher. Also, CUDA 12.2 or 12.4 must be installed. Please refer to https://developer.nvidia.com/cuda-gpus for more information.
The two included plugins are only different in vertical secure training. For horizontal secure training, both plugins work exactly the same by forwarding the data to NVFlare for encryption.
Plugin Configuration by Training Mode
Vertical (Non-secure)
No plugin is needed.
Horizontal (Non-secure)
No plugin is needed.
Vertical Secure
Both plugins can be used for vertical secure training.
The default cuda_paillier plugin is preferred because it uses GPU for faster cryptographic operations.
Note
cuda_paillier plugin requires NVIDIA GPUs that support compute capability 7.0 or higher. Please refer to https://developer.nvidia.com/cuda-gpus for more information.
If you see the following errors in the log, it means either no GPU is detected or the GPU does not meet the requirements:
CUDA runtime API error no kernel image is available for execution on the device at line 241 in file /my_home/nvflare-internal/processor/src/cuda-plugin/paillier.h
2024-07-01 12:19:15,683 - SimulatorClientRunner - ERROR - run_client_thread error: EOFError:
In this case, the nvflare plugin can be used to perform encryption on CPUs, which requires the ipcl-python package.
The plugin can be configured in the local/resources.json file on clients:
{
"federated_plugin": {
"name": "nvflare",
"path": "/opt/libs/libnvflare.so"
}
}
Where name is the plugin name and path is the full path of the plugin including the library file name. The path is optional, the default value is the library distributed with NVFlare for the plugin.
The following environment variables can be used to override the values in the JSON,
export NVFLARE_XGB_PLUGIN_NAME=nvflare
export NVFLARE_XGB_PLUGIN_PATH=/opt/libs/libnvflare.so
Note
When running with the NVFlare simulator, the plugin must be configured using environment variables, as it does not support resources.json.
Horizontal Secure
The plugin setup is the same as vertical secure.
This mode requires the tenseal package for all plugins. The provisioning of NVFlare systems must include tenseal context. See NVFlare Provisioning for details.
For simulator, the tenseal context generated by provisioning needs to be copied to the startup folder,
simulator_workspace/startup/client_context.tenseal
For example,
nvflare provision -p secure_project.yml -w /tmp/poc_workspace
mkdir -p /tmp/simulator_workspace/startup
cp /tmp/poc_workspace/example_project/prod_00/site-1/startup/client_context.tenseal /tmp/simulator_workspace/startup
The server_context.tenseal file is not needed.
Job Configuration
Controller
On the server side, the following controller must be configured in workflows,
nvflare.app_opt.xgboost.histogram_based_v2.fed_controller.XGBFedController
Even though the XGBoost training is performed on clients, the parameters are configured on the server so all clients share the same configuration. XGBoost parameters are defined here, https://xgboost.readthedocs.io/en/stable/python/python_intro.html#setting-parameters
num_rounds: Number of training rounds.
data_split_mode: Same as XGBoost data_split_mode parameter, 0 for horizontal, 1 for vertical.
secure_training: If true, XGBoost will train in secure mode using the plugin.
xgb_params: The training parameters defined in this dict are passed to XGBoost as params, the boost parameter.
xgb_options: This dict contains other optional parameters passed to XGBoost. Currently, only early_stopping_rounds is supported.
client_ranks: A dict that maps client name to rank.
Executor
On the client side, the following executor must be configured in executors,
nvflare.app_opt.xgboost.histogram_based_v2.fed_executor.FedXGBHistogramExecutor
Only one parameter is required for executor,
data_loader_id: The component ID of Data Loader
Data Loader
On the client side, a data loader must be configured in the components. The CSVDataLoader can be used if the data is pre-processed. For example,
{
"id": "dataloader",
"path": "nvflare.app_opt.xgboost.histogram_based_v2.csv_data_loader.CSVDataLoader",
"args": {
"folder": "/opt/dataset/vertical_xgb_data"
}
}
If the data requires any special processing, a custom loader can be implemented. The loader must implement the XGBDataLoader interface.
Job Examples
Vertical Training
Here are the configuration files for a vertical secure training job. If encryption is not needed, just change the secure_training arg to false.
:caption: config_fed_server.json
{
"format_version": 2,
"num_rounds": 3,
"workflows": [
{
"id": "xgb_controller",
"path": "nvflare.app_opt.xgboost.histogram_based_v2.fed_controller.XGBFedController",
"args": {
"num_rounds": "{num_rounds}",
"data_split_mode": 1,
"secure_training": true,
"xgb_options": {
"early_stopping_rounds": 2
},
"xgb_params": {
"max_depth": 3,
"eta": 0.1,
"objective": "binary:logistic",
"eval_metric": "auc",
"tree_method": "hist",
"nthread": 1
},
"client_ranks": {
"site-1": 0,
"site-2": 1
}
}
}
]
}
:caption: config_fed_client.json
{
"format_version": 2,
"executors": [
{
"tasks": [
"config",
"start"
],
"executor": {
"id": "Executor",
"path": "nvflare.app_opt.xgboost.histogram_based_v2.fed_executor.FedXGBHistogramExecutor",
"args": {
"data_loader_id": "dataloader"
}
}
}
],
"components": [
{
"id": "dataloader",
"path": "nvflare.app_opt.xgboost.histogram_based_v2.csv_data_loader.CSVDataLoader",
"args": {
"folder": "/opt/dataset/vertical_xgb_data"
}
}
]
}
Horizontal Training
The configuration for horizontal training is the same as vertical except data_split_mode is 0 and the data loader must point to horizontal split data.
{
"format_version": 2,
"num_rounds": 3,
"workflows": [
{
"id": "xgb_controller",
"path": "nvflare.app_opt.xgboost.histogram_based_v2.fed_controller.XGBFedController",
"args": {
"num_rounds": "{num_rounds}",
"data_split_mode": 0,
"secure_training": true,
"xgb_options": {
"early_stopping_rounds": 2
},
"xgb_params": {
"max_depth": 3,
"eta": 0.1,
"objective": "binary:logistic",
"eval_metric": "auc",
"tree_method": "hist",
"nthread": 1
},
"client_ranks": {
"site-1": 0,
"site-2": 1
},
"in_process": true
}
}
]
}
{
"format_version": 2,
"executors": [
{
"tasks": [
"config",
"start"
],
"executor": {
"id": "Executor",
"path": "nvflare.app_opt.xgboost.histogram_based_v2.fed_executor.FedXGBHistogramExecutor",
"args": {
"data_loader_id": "dataloader",
"in_process": true
}
}
}
],
"components": [
{
"id": "dataloader",
"path": "nvflare.app_opt.xgboost.histogram_based_v2.csv_data_loader.CSVDataLoader",
"args": {
"folder": "/data/xgboost_secure/dataset/horizontal_xgb_data"
}
}
]
}
Pre-Trained Models
To continue training using a pre-trained model, the model can be placed in the job folder with the path and name
of custom/model.json.
Every site should share the same model.json. The result of previous training with the same dataset can be used as the input model.
When a pre-trained model is detected, NVFlare prints following line in the log:
INFO - Pre-trained model is used: /tmp/nvflare/poc/example_project/prod_00/site-1/startup/../996ac44f-e784-4117-b365-24548f1c490d/app_site-1/custom/model.json
Performance Tuning
Timeouts
For secure training, the HE operations are very slow. If a large dataset is used, several timeout values need to be adjusted.
The XGBoost messages are transferred between client and server using
Reliable Messages (ReliableMessage). The following parameters
in executor arguments control the timeout behavior:
per_msg_timeout: Timeout in seconds for each message.
tx_timeout: Timeout for the whole transaction in seconds. This is the total time to wait for a response, accounting for all retry attempts.
{
"format_version": 2,
"executors": [
{
"tasks": [
"config",
"start"
],
"executor": {
"id": "Executor",
"path": "nvflare.app_opt.xgboost.histogram_based_v2.fed_executor.FedXGBHistogramExecutor",
"args": {
"data_loader_id": "dataloader",
"per_msg_timeout": 300.0,
"tx_timeout": 900.0,
"in_process": true
}
}
}
],
...
}
Number of Clients
The default configuration can only handle 20 clients. This parameter needs to be adjusted if more clients are involved in the training:
{
"format_version": 2,
"num_rounds": 3,
"rm_max_request_workers": 100,
...
}