Federated Learning for XGBoost

Overview

This guide demonstrates how to use NVIDIA FLARE (NVFlare) to train XGBoost models in a federated learning environment. It showcases multiple collaboration strategies with varying levels of security.

NVFlare provides the following advantages:

Secure training with Homomorphic Encryption (HE), protecting local histograms and gradients from the federated server and passive parties.
Lifecycle management of XGBoost processes
Reliable messaging that can overcome network glitches
Training over complex networks with relays

This guide covers several federated XGBoost configurations:

Horizontal Collaboration: Histogram-based and tree-based approaches (non-secure and secure)
Vertical Collaboration: Histogram-based approach (non-secure and secure with Homomorphic Encryption)

What is XGBoost?

XGBoost (eXtreme Gradient Boosting) is a powerful machine learning algorithm that uses decision/regression trees for classification and regression tasks. It excels particularly with tabular data and remains widely used due to its:

High performance on structured data
Explainability of predictions
Computational efficiency

These examples use DMLC XGBoost, which provides:

GPU acceleration capabilities
Distributed and federated learning support
Optimized gradient boosting implementations

Federated Learning Modes

Horizontal Federated Learning

In horizontal collaboration, each participant has:

Same features (columns) across all sites
Different data samples (rows) at each site
Equal status as label owners

Example: Multiple hospitals each have complete patient records (all features), but different patients.

Vertical Federated Learning

In vertical collaboration, each participant has:

Different features (columns) at each site
Same data samples (rows) across all sites
One “active party” (label owner) and multiple “passive parties”

Example: A bank and a retailer have data about the same customers, but different attributes (financial vs. shopping behavior).

Supported Training Modes

When running with NVFlare, all XGBoost communications are local and messages are forwarded through NVFlare’s communication infrastructure. The encryption is handled in XGBoost by encryption plugins, which are external components that can be installed at runtime.

NVFlare supports federated training in the following 4 modes:

Horizontal without HE-based security protection - Histogram-based or tree-based (tree-based is secured by removing “sum_hessian” values before transmission)
Vertical without HE-based security protection - Histogram-based
Horizontal with HE - Histogram-based (histograms secured against federated server)
Vertical with HE - Histogram-based (gradients secured against passive parties)

Security Risks and Mitigations

Risks

Federated XGBoost faces three main security risks:

Model Statistics Leakage: The default XGBoost JSON model contains “sum_hessian” statistics that enable model inversion attacks to recover data distributions. (Reference: TimberStrike)
Histogram Leakage: Gradient histograms can be exploited to reconstruct data distributions. The same model statistics of “sum_hessian” can be derived from histograms. (Reference: TimberStrike)
Gradient Leakage: Sample-wise gradients may reveal label information. (Reference: SecureBoost)

Attack Surface

The attack surface varies by collaboration mode and party role:

Server: Depending on the collaboration mode, the server may have access to

The local model:
- Horizontal tree-based:
  - Model Statistics Leakage over each client’s data distribution
Local histograms:
- Horizontal histogram-based / vertical histogram-based:
  - Histogram Leakage over each client / passive party’s data distribution
Sample-wise gradients:
- Vertical histogram-based:
  - Gradient Leakage over active party’s label information

Clients: Depending on the collaboration mode, the clients may have access to

The aggregated global model:
- Horizontal tree-based:
  - Model Statistics Leakage over global data distribution
Global histograms:
- Horizontal histogram-based:
  - Histogram Leakage over global data distribution
Local histograms:
- Vertical histogram-based:
  - Histogram Leakage over each passive party’s data distribution on active party
Sample-wise gradients:
- Gradient Leakage over active party’s label information on passive parties

Mitigations

The following table summarizes the available mitigations for different collaboration scenarios:

Mitigations by Collaboration Mode
Collaboration Mode	Algorithm	Data Exchange	Risk Mitigated	Security Measure	Implementation
Horizontal	Tree-based	Clients send locally boosted trees to server; server combines and distributes trees back to clients	Model statistics leakage on both server and clients	Remove “sum_hessian” values from JSON model	Removed before clients send local trees to server
Horizontal	Histogram-based	Clients send local histograms to server; server aggregates to global histogram and distributes it back to clients	Histogram leakage on server (client-side remain)	Encrypt histograms	Local histograms encrypted before transmission
Vertical	Histogram-based	Active party computes gradients; routed by server, passive parties receive gradients, compute histograms, and send them back to active party through server	Histogram leakage on server (active party-side remain), Gradient leakage on both server and passive parties	Primary: Encrypt gradients; Secondary: Mask feature ownership in split values	Gradients encrypted before sending out to passive parties

Notes:

Vertical histogram-based:
- Primary goal: Protect sample gradients from passive parties (critical)
- Secondary goal: Hide split values from non-feature owners (desirable but lower risk)
The remaining two risks will be discussed in the Advanced Topics: Future Security Scenarios section.

TimberStrike Attack Analysis

TimberStrike is a model inversion attack that exploits sum_hessian values and tree structure to estimate training data distributions. Empirical results vary significantly with dataset scale:

Reconstruction Accuracy Results
Dataset	Samples	Features	Reconstruction Accuracy
Diabetes (toy)	768	8	65.80%
CreditCard (realistic)	284,807	30	8.72%

Note

The above results were obtained before NVFlare’s sum_hessian removal—i.e., with full model statistics available to the attacker. With NVFlare’s built-in protection enabled (see below), TimberStrike’s primary information source is eliminated, which is expected to substantially degrade attack performance. “Reconstruction accuracy” is a distance-tolerance metric (not exact recovery); see the TimberStrike paper for the precise definition.

Risk Assessment

On practical datasets (CreditCard), TimberStrike achieves <10% accuracy even with sum_hessian available. To put this in perspective, we use NeMo SafeSynthesizer as a reference. SafeSynthesizer is a privacy-focused synthetic data generation tool purpose-built for compliance (GDPR, HIPAA), with built-in membership inference protection and optional differential privacy guarantees. Even with these privacy safeguards, its synthetic data still achieves 51.98% proximity to real samples, because preserving data utility requires some statistical similarity. TimberStrike’s 8.72% falls well below this reference point. Acceptable privacy levels are inherently data-dependent; users are encouraged to run similar comparisons on their own datasets.

Protection

Built-in: NVFlare removes sum_hessian from model transmissions in horizontal tree-based mode, eliminating the attack’s primary information source.
Additional: Increase min_child_weight to raise the minimum sum of instance weight (hessian) required per leaf, resulting in coarser tree structure with fewer splits. The TimberStrike paper shows that tree depth (and by extension, number of splits) directly impacts reconstruction accuracy, so reducing tree granularity is expected to limit information exposure. Optimal values are task-dependent; refer to the paper for analysis of the privacy-utility trade-off. This parameter can be added to xgb_params in the recipe:
```
recipe = XGBHorizontalRecipe(
    name="xgb_higgs_horizontal",
    min_clients=2,
    num_rounds=100,
    xgb_params={
        "max_depth": 8,
        "eta": 0.1,
        "objective": "binary:logistic",
        "eval_metric": "auc",
        "min_child_weight": 100,  # increase for coarser trees and reduced privacy exposure
    },
    per_site_config=per_site_config,
)
```

Closest Reconstructed Samples (CreditCard)

Each example below shows the closest match (minimum distance) from its respective method. Note that these are different source records, shown to illustrate the reconstruction quality of each method independently.

TimberStrike (8.72% accuracy):

Original:      [-27.0, -25.3, -12.1, -1.53, -3.67, -1.82, -3.34, -26.6, 1.08, -0.42, 3.61, -5.42, ...]
Reconstructed: [-30.0, -29.2, -10.5, 7.60, 2.20, -0.11, 4.55, -5.84, 5.50, 4.38, 3.07, 1.26, ...]

SafeSynthesizer (51.98% accuracy):

Original:      [2.06, -0.03, -1.06, 0.42, -0.13, -1.21, 0.20, -0.35, 0.51, 0.07, -0.70, 0.54, ...]
Reconstructed: [2.06, -0.05, -1.07, 0.41, -0.12, -1.20, 0.20, -0.34, 0.50, 0.06, -0.68, 0.53, ...]

TimberStrike shows substantial deviations even on its closest match (e.g., feature 4: -1.53 → 7.60), while SafeSynthesizer’s closest match differs by only 0.01–0.02 per feature yet remains privacy-compliant by design. This suggests TimberStrike’s reconstructions may not constitute a meaningful privacy risk for a given dataset like CreditCard.

GPU Acceleration

Federated XGBoost supports two levels of GPU acceleration:

1. XGBoost GPU Training

Enable GPU-accelerated training by setting tree_method='gpu_hist' when initializing the XGBoost model.

Performance: Up to 4.15x speedup vs. CPU training (GPU XGBoost Blog)

2. GPU-Accelerated Homomorphic Encryption (HE)

NVFlare provides GPU acceleration for HE operations using specialized encryption plugins.

Performance: Up to 36.5x speedup vs. CPU encryption (NVFlare Secure XGBoost Blog)

We will refer to these as “CPU/GPU XGBoost” and “CPU/GPU Encryption”.

Security Implementation Matrix

The following table shows which security measures are supported across different hardware configurations:

Security Implementation Matrix
Collaboration Mode	Security Goal	CPU XGBoost + CPU Encryption	CPU XGBoost + GPU Encryption	GPU XGBoost + CPU Encryption	GPU XGBoost + GPU Encryption
Horizontal	Histogram protection against server	✅	N/A*	✅	N/A*
Vertical	Primary: Gradient protection	✅	✅	✅	✅
Vertical	Secondary: Split value masking	✅	✅	❌	❌

*Note: Horizontal histogram encryption is not computationally intensive (encrypting histogram vectors), so GPU encryption is not needed.

Implementation Notes:

Vertical mode primary goal (gradient protection): Fully supported across all configurations
Vertical mode secondary goal (split value masking): Only supported with CPU XGBoost

Advanced Topics: Future Security Scenarios

The following security scenarios are not currently implemented in our solution. Users should be aware that plaintext histogram communication can reveal data distribution information, which may enable data reconstruction attacks as stated above. On the other hand, similar statistics can also be derived from common practices such as federated statistics. As the attack potency depends on multiple factors including data complexity, model hyperparameters, and the data distribution information that can be utilized, the corresponding indications of a certain type of attack can vary significantly. This is still an open and active research area.

Potential Future Enhancements to Protect Against All Parties

Future Security Scenarios
Collaboration Mode	Algorithm	Remaining Security Risk	Possible Approach	Challenges
Horizontal	Histogram-based	Histogram leakage over global data distribution on clients (in addition to server as addressed above)	Confidential computing, advanced HE	HE compatibility issue [*] with server performing calculations and distributing only final splits
Vertical	Histogram-based	Histogram leakage over each passive party’s data distribution on active party (in addition to Histogram leakage on server, and Gradient leakage on server and passive parties as addressed above)	Local data preprocessing and anonymization, confidential computing, advanced HE	HE compatibility issue [*]_ with passive parties performing calculations and sending only final splits

Prerequisites

Required Python Packages

NVFlare 2.7.2 or above,

pip install nvflare~=2.7.2

Federated Secure XGBoost, which can be installed from the binary build using this command,

pip install https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/federated-secure/xgboost-2.2.0.dev0%2B4601688195708f7c31fcceeb0e0ac735e7311e61-py3-none-manylinux_2_28_x86_64.whl

Note

The xgboost build environment may depend on specific numpy versions that require Python < 3.12.

or in case you need to get the most current build of XGBoost,

pip install https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/federated-secure/`curl -s https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/federated-secure/meta.json | grep -o 'xgboost-2\.2.*whl'|sed -e 's/+/%2B/'`

TenSEAL package is needed for horizontal secure training,

pip install tenseal

ipcl_python package is required for vertical secure training if nvflare plugin is used. This package is not needed if cuda_paillier plugin is used.

pip install ipcl-python

This package is only available for Python 3.8 on PyPI. For other versions of python, it needs to be installed from github,

pip install git+https://github.com/intel/pailliercryptolib_python.git@development

System Environments

To support secure training, several homomorphic encryption libraries are used. Those libraries require Intel CPU or Nvidia GPU.

Linux is the preferred OS. It’s tested extensively under Ubuntu 22.4.

The following docker image is recommended for GPU training:

nvcr.io/nvidia/pytorch:24.03-py3

Building Encryption Plugins

The secure training requires encryption plugins, which need to be built from the source code for your specific environment.

To build the plugins, check out the NVFlare source code from https://github.com/NVIDIA/NVFlare and follow the instructions in this document.

NVFlare Provisioning

For horizontal secure training, the NVFlare system must be provisioned with a homomorphic encryption context. The HEBuilder in project.yml is used to achieve this. An example configuration can be found at secure_project.yml.

This is a snippet of the secure_project.yml file with the HEBuilder:

api_version: 3
name: secure_project
description: NVIDIA FLARE sample project yaml file for CIFAR-10 example

participants:

...

builders:
- path: nvflare.lighter.impl.workspace.WorkspaceBuilder
    args:
    template_file: master_template.yml
- path: nvflare.lighter.impl.template.TemplateBuilder
- path: nvflare.lighter.impl.static_file.StaticFileBuilder
    args:
    config_folder: config
- path: nvflare.lighter.impl.he.HEBuilder
    args:
    poly_modulus_degree: 8192
    coeff_mod_bit_sizes: [60, 40, 40]
    scale_bits: 40
    scheme: CKKS
- path: nvflare.lighter.impl.cert.CertBuilder
- path: nvflare.lighter.impl.signature.SignatureBuilder

Data Preparation

Data must be properly formatted for federated XGBoost training based on the collaboration mode.

Horizontal Training

For horizontal training, the datasets on all clients must share the same columns (features). Each client has different data samples (rows).

Vertical Training

For vertical training, the datasets on all clients contain different columns (features), but must share overlapping rows (data samples). The label column is typically assigned to site-1 (the “active party”) by default.

For more details on vertical split preprocessing, refer to the Vertical XGBoost Example.

XGBoost Plugin Configuration

XGBoost requires an encryption plugin to handle secure training. Two plugins are available:

cuda_paillier: The default plugin. This plugin uses GPU for cryptographic operations.
nvflare: This plugin forwards data locally to NVFlare process for encryption.

Note

All clients must use the same plugin. When different plugins are used in different clients, the behavior of federated XGBoost is undetermined, which can cause the job to crash.

The cuda_paillier plugin requires NVIDIA GPUs that support compute capability 7.0 or higher. Also, CUDA 12.2 or 12.4 must be installed. Please refer to https://developer.nvidia.com/cuda-gpus for more information.

The two included plugins are only different in vertical secure training. For horizontal secure training, both plugins work exactly the same by forwarding the data to NVFlare for encryption.

Plugin Configuration by Training Mode

Vertical (Non-secure)

No plugin is needed.

Horizontal (Non-secure)

No plugin is needed.

Vertical Secure

Both plugins can be used for vertical secure training.

The default cuda_paillier plugin is preferred because it uses GPU for faster cryptographic operations.

Note

cuda_paillier plugin requires NVIDIA GPUs that support compute capability 7.0 or higher. Please refer to https://developer.nvidia.com/cuda-gpus for more information.

If you see the following errors in the log, it means either no GPU is detected or the GPU does not meet the requirements:

CUDA runtime API error no kernel image is available for execution on the device at line 241 in file /my_home/nvflare-internal/processor/src/cuda-plugin/paillier.h
2024-07-01 12:19:15,683 - SimulatorClientRunner - ERROR - run_client_thread error: EOFError:

In this case, the nvflare plugin can be used to perform encryption on CPUs, which requires the ipcl-python package. The plugin can be configured in the local/resources.json file on clients:

{
    "federated_plugin": {
        "name": "nvflare",
        "path": "/opt/libs/libnvflare.so"
    }
}

Where name is the plugin name and path is the full path of the plugin including the library file name. The path is optional, the default value is the library distributed with NVFlare for the plugin.

The following environment variables can be used to override the values in the JSON,

export NVFLARE_XGB_PLUGIN_NAME=nvflare
export NVFLARE_XGB_PLUGIN_PATH=/opt/libs/libnvflare.so

Note

When running with the NVFlare simulator, the plugin must be configured using environment variables, as it does not support resources.json.

Horizontal Secure

The plugin setup is the same as vertical secure.

This mode requires the tenseal package for all plugins. The provisioning of NVFlare systems must include tenseal context. See NVFlare Provisioning for details.

For simulator, the tenseal context generated by provisioning needs to be copied to the startup folder,

simulator_workspace/startup/client_context.tenseal

For example,

nvflare provision -p secure_project.yml -w /tmp/poc_workspace
mkdir -p /tmp/simulator_workspace/startup
cp /tmp/poc_workspace/example_project/prod_00/site-1/startup/client_context.tenseal /tmp/simulator_workspace/startup

The server_context.tenseal file is not needed.

Job Configuration

Controller

On the server side, the following controller must be configured in workflows,

nvflare.app_opt.xgboost.histogram_based_v2.fed_controller.XGBFedController

Even though the XGBoost training is performed on clients, the parameters are configured on the server so all clients share the same configuration. XGBoost parameters are defined here, https://xgboost.readthedocs.io/en/stable/python/python_intro.html#setting-parameters

num_rounds: Number of training rounds.
data_split_mode: Same as XGBoost data_split_mode parameter, 0 for horizontal, 1 for vertical.
secure_training: If true, XGBoost will train in secure mode using the plugin.
xgb_params: The training parameters defined in this dict are passed to XGBoost as params, the boost parameter.
xgb_options: This dict contains other optional parameters passed to XGBoost. Currently, only early_stopping_rounds is supported.
client_ranks: A dict that maps client name to rank.

Executor

On the client side, the following executor must be configured in executors,

nvflare.app_opt.xgboost.histogram_based_v2.fed_executor.FedXGBHistogramExecutor

Only one parameter is required for executor,

data_loader_id: The component ID of Data Loader

Data Loader

On the client side, a data loader must be configured in the components. The CSVDataLoader can be used if the data is pre-processed. For example,

{
    "id": "dataloader",
    "path": "nvflare.app_opt.xgboost.histogram_based_v2.csv_data_loader.CSVDataLoader",
    "args": {
        "folder": "/opt/dataset/vertical_xgb_data"
    }
}

If the data requires any special processing, a custom loader can be implemented. The loader must implement the XGBDataLoader interface.

Job Examples

Vertical Training

Here are the configuration files for a vertical secure training job. If encryption is not needed, just change the secure_training arg to false.

:caption: config_fed_server.json

{
    "format_version": 2,
    "num_rounds": 3,
    "workflows": [
        {
            "id": "xgb_controller",
            "path": "nvflare.app_opt.xgboost.histogram_based_v2.fed_controller.XGBFedController",
            "args": {
                "num_rounds": "{num_rounds}",
                "data_split_mode": 1,
                "secure_training": true,
                "xgb_options": {
                    "early_stopping_rounds": 2
                },
                "xgb_params": {
                    "max_depth": 3,
                    "eta": 0.1,
                    "objective": "binary:logistic",
                    "eval_metric": "auc",
                    "tree_method": "hist",
                    "nthread": 1
                },
                "client_ranks": {
                    "site-1": 0,
                    "site-2": 1
                }
            }
        }
    ]
}

:caption: config_fed_client.json

{
    "format_version": 2,
    "executors": [
        {
            "tasks": [
                "config",
                "start"
            ],
            "executor": {
                "id": "Executor",
                "path": "nvflare.app_opt.xgboost.histogram_based_v2.fed_executor.FedXGBHistogramExecutor",
                "args": {
                    "data_loader_id": "dataloader"
                }
            }
        }
    ],
    "components": [
        {
            "id": "dataloader",
            "path": "nvflare.app_opt.xgboost.histogram_based_v2.csv_data_loader.CSVDataLoader",
            "args": {
                "folder": "/opt/dataset/vertical_xgb_data"
            }
        }
    ]
}

Horizontal Training

The configuration for horizontal training is the same as vertical except data_split_mode is 0 and the data loader must point to horizontal split data.

config_fed_server.json

 {
     "format_version": 2,
     "num_rounds": 3,
     "workflows": [
         {
             "id": "xgb_controller",
             "path": "nvflare.app_opt.xgboost.histogram_based_v2.fed_controller.XGBFedController",
             "args": {
                 "num_rounds": "{num_rounds}",
                 "data_split_mode": 0,
                 "secure_training": true,
                 "xgb_options": {
                     "early_stopping_rounds": 2
                 },
                 "xgb_params": {
                     "max_depth": 3,
                     "eta": 0.1,
                     "objective": "binary:logistic",
                     "eval_metric": "auc",
                     "tree_method": "hist",
                     "nthread": 1
                 },
                 "client_ranks": {
                     "site-1": 0,
                     "site-2": 1
                 },
                 "in_process": true
             }
         }
     ]
 }

config_fed_client.json

 {
     "format_version": 2,
     "executors": [
         {
             "tasks": [
                 "config",
                 "start"
             ],
             "executor": {
                 "id": "Executor",
                 "path": "nvflare.app_opt.xgboost.histogram_based_v2.fed_executor.FedXGBHistogramExecutor",
                 "args": {
                     "data_loader_id": "dataloader",
                     "in_process": true
                 }
             }
         }
     ],
     "components": [
         {
             "id": "dataloader",
             "path": "nvflare.app_opt.xgboost.histogram_based_v2.csv_data_loader.CSVDataLoader",
             "args": {
                 "folder": "/data/xgboost_secure/dataset/horizontal_xgb_data"
             }
         }
     ]
 }

Pre-Trained Models

To continue training using a pre-trained model, the model can be placed in the job folder with the path and name of custom/model.json.

Every site should share the same model.json. The result of previous training with the same dataset can be used as the input model.

When a pre-trained model is detected, NVFlare prints following line in the log:

INFO - Pre-trained model is used: /tmp/nvflare/poc/example_project/prod_00/site-1/startup/../996ac44f-e784-4117-b365-24548f1c490d/app_site-1/custom/model.json

Performance Tuning

Timeouts

For secure training, the HE operations are very slow. If a large dataset is used, several timeout values need to be adjusted.

The XGBoost messages are transferred between client and server using Reliable Messages (ReliableMessage). The following parameters in executor arguments control the timeout behavior:

per_msg_timeout: Timeout in seconds for each message.

tx_timeout: Timeout for the whole transaction in seconds. This is the total time to wait for a response, accounting for all retry attempts.

config_fed_client.json

 {
     "format_version": 2,
     "executors": [
         {
             "tasks": [
                 "config",
                 "start"
             ],
             "executor": {
                 "id": "Executor",
                 "path": "nvflare.app_opt.xgboost.histogram_based_v2.fed_executor.FedXGBHistogramExecutor",
                 "args": {
                     "data_loader_id": "dataloader",
                     "per_msg_timeout": 300.0,
                     "tx_timeout": 900.0,
                     "in_process": true
                 }
             }
         }
     ],
     ...
 }

Number of Clients

The default configuration can only handle 20 clients. This parameter needs to be adjusted if more clients are involved in the training:

config_fed_client.json

 {
     "format_version": 2,
     "num_rounds": 3,
     "rm_max_request_workers": 100,
     ...
 }

Federated Learning for XGBoost

Overview

What is XGBoost?

Federated Learning Modes

Horizontal Federated Learning

Vertical Federated Learning

Supported Training Modes

Security Risks and Mitigations

Risks

Attack Surface

Mitigations

TimberStrike Attack Analysis

Risk Assessment

Protection

Closest Reconstructed Samples (CreditCard)

GPU Acceleration

1. XGBoost GPU Training

2. GPU-Accelerated Homomorphic Encryption (HE)

Security Implementation Matrix

Advanced Topics: Future Security Scenarios

Potential Future Enhancements to Protect Against All Parties

Prerequisites

Required Python Packages

System Environments

Building Encryption Plugins

NVFlare Provisioning

Data Preparation

Horizontal Training

Vertical Training

XGBoost Plugin Configuration

Plugin Configuration by Training Mode

Vertical (Non-secure)

Horizontal (Non-secure)

Vertical Secure

Horizontal Secure

Job Configuration

Controller

Executor

Data Loader

Job Examples

Vertical Training

Horizontal Training

Pre-Trained Models

Performance Tuning

Timeouts

Number of Clients

Additional Resources