Reliable Federated XGBoost Design

Flare as XGBoost Launcher

NVFLARE serves as a launchpad to start the XGBoost system. Once started, the XGBoost system runs independently of FLARE, as illustrated in the following figure.

../../_images/loose_xgb.png

There are a few potential problems with this approach:

  • As we know, MPI requires a perfect communication network, whereas the simple gRPC over the internet could be unstable.

  • For each job, the XGBoost Server must open a port for clients to connect to. This adds burden to request IT for the additional port in the real-world situation. Even if a fixed port is allowed to open, and we reuse that port, multiple XGBoost jobs can not be run at the same time, since each XGBoost job requires a different port number.

Flare as XGBoost Communicator

FLARE provides a highly flexible, scalable and reliable communication mechanism. We enhance the reliability of federated XGBoost by using FLARE as the communicator of XGBoost, as shown here:

../../_images/tight_xgb.png

Detailed Design

The open-source Federated XGBoost (c++) uses gRPC as the communication protocol. To use FLARE as the communicator, we simply route XGBoost’s gRPC messages through FLARE. To do so, we change the server endpoint of each XGBoost client to a local gRPC server (LGS) within the FLARE client.

../../_images/fed_xgb_detail.png

As shown in this diagram, there is a local GRPC server (LGS) for each site that serves as the server endpoint for the XGBoost client on the site. Similarly, there is a local GRPC Client (LGC) on the FL Server that interacts with the XGBoost Server. The message path between the XGBoost Client and the XGBoost Server is as follows:

  1. The XGBoost client generates a gRPC message and sends it to the LGS in FLARE Client

  2. FLARE Client forwards the message to the FLARE Server. This is a reliable FLARE message.

  3. FLARE Server uses the LGC to send the message to the XGBoost Server.

  4. XGBoost Server sends the response back to the LGC in FLARE Server.

  5. FLARE Server sends the response back to the FLARE Client.

  6. FLARE Client sends the response back to the XGBoost Client via the LGS.

Please note that the XGBoost Client (c++) component could be running as a separate process or within the same process of FLARE Client.