Overview

Introduction

NVIDIA FLARE utilizes provisioning and admin clients to reduce the amount of human coordination involved to set up a federated learning project. A provisioning tool can be configured to create a startup kit for each site in an encrypted package. These packages can then be delivered to each site ready to go, streamlining the process to provision, start, and operate federated learning with a trusted setup.

Provision - Start - Operate

Provision

Lead IT generates the packages for the server / clients / admins, protected with passwords

Start

Site IT each installs their own packages, starts the services, and maps the data location

Operate

Lead scientists / administrators control the federated learning process: submit jobs to deploy applications, check statuses, abort / shutdown training

Provision: Configure and generate packages for the server, clients, and admins

One party leads the process of configuring the provisioning tool and using it to generate startup kits for each party in the federated learning training project:

Preparation for using the provisioning tool

After Installation, the provisioning tool is available via provision command.

Provisioning a federated learning project

The Provisioning in NVIDIA FLARE page has details on the contents of the provisioning tool and the underlying NVIDIA FLARE Open Provision API, which you can use to customize configurations to fit your own requirements.

Edit the Project yaml file in the directory with the provisioning tool to meet your project requirements (make sure the server, client sites, admin, orgs, enable_byoc settings, and everything else are right for your project).

Then run the provision command with (here we assume your project.yml is in current working directory):

provision -p project.yml

The generated startup kits are created by default in a directory prefixed with “prod_” within a folder of the project name in the workspace folder created where provision.py is run.

Attention

In order to change configurations, it may be necessary to alter nvflare/lighter/impl/master_template.yml before running provision with your checked out version of the code (make sure PYTHONPATH points to the location of where you checked out the NVFlare repository).

You cannot directly edit the contents of the startup kits because the contents of the generated startup kits are signed by SignatureBuilder so the system will detect if any of the files have been altered and may not run.

The console displays a list of zip files and their passwords. We suggest you copy the console output and “packages” folder to a safe location. The passwords shown below are for demonstration purposes only:

Project yaml file: /home/nvflare-venv/project.yml.
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ participant           ┃ org    ┃ destination               ┃ password         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ localhost             │ nvidia │ localhost.zip             │ Dby2BhwJdLKfStWl │
│ org1-a                │ org1   │ org1-a.zip                │ BlLXFKgWp0Qu28cS │
│ org1-b                │ org1   │ org1-b.zip                │ Lu6w0aCf1RhAqSlJ │
│ org2                  │ org2   │ org2.zip                  │ KdotOafkUl2ziRh5 │
│ org3                  │ org3   │ org3.zip                  │ t05cHjnd4WNSo62x │
│ admin@nvidia.com      │ nvidia │ admin@nvidia.com.zip      │ eyTrthBudD7noW1s │
│ researcher@nvidia.com │ nvidia │ researcher@nvidia.com.zip │ s52b8h9ToIRuALpx │
│ researcher@org1.com   │ org1   │ researcher@org1.com.zip   │ dmlt3ySsAeU0V9F7 │
│ researcher@org2.com   │ org2   │ researcher@org2.com.zip   │ GJS6eb410q0ijlCZ │
│ it@org2.com           │ org2   │ it@org2.com.zip           │ s3lYvaL2tqX0Wrjb │
└───────────────────────┴────────┴───────────────────────────┴──────────────────┘

Tip

For security reasons, it is recommended to send the password to each participant separately from the package itself.

After generating packages: Distribute and extract

Please let each participant know that the packages are password protected. In Ubuntu, the following command can be used to extract the packages:

unzip -P $PASSWORD $ZIP_FILE -d $DIRECTORY_TO_EXTRACT_TO

Using -d $DIRECTORY_TO_EXTRACT_TO is optional, and without it, a “startup” folder will be extracted to the current directory the package is in. Either way, the parent folder containing this “startup” folder ($DIRECTORY_TO_EXTRACT_TO if the -d option was used) will be the server, client, or admin client workspace root directory, and the party running the package will need write access there.

Note

It is important that this “startup” folder is not renamed because the code relies upon this for operation. Please note that a “transfer” directory and deployed applications will be created at the level of this “startup” folder. See the section on Internal folder and file structures for NVIDIA FLARE below for more details.

Start: Instructions for each participant to start running FL with their startup kits

Attention

Please always safeguard .key files! These are the critical keys for secure communication!

Overseer ($OVERSEER_NAME.zip)

One single Overseer will keep track of all the FL servers and communicate to all the participants through their Overseer Agents the active FL server or SP.

After unzipping the package for the Overseer, run the start.sh file from the “startup” folder you unzipped to start the Overseer.

If clients from other machines cannot connect to the Overseer, make sure that the hostname (name of the server under participants in project.yml) specified when generating the startup kits in the provisioning process resolves to the correct IP. If the FL server is on an internal network without a DNS hostname, in Ubuntu, an entry may need to be added to /etc/hosts with the internal IP and the hostname.

Federated learning servers ($SERVER_NAME.zip)

Server will coordinate the federated learning training and be the main hub all clients and admin clients connect to.

After unzipping the package server.zip, run the start.sh file from the “startup” folder you unzipped to start the server.

The rootCA.pem file is pointed to by “ssl_root_cert” in fed_server.json. If you plan to move/copy it to a different place, you will need to modify fed_server.json. The same applies to the other two files, server.crt and server.key.

Note

When launching the FL server inside a docker with docker run, use --net=host to map hostname into that docker instance. For secure gRPC communication, the FL server has to bind to the hostname specified in the provisioning stage. Always make sure that hostname is what FL server can bind to. Additionally, the port that the server communicates on must also not be blocked by any firewalls.

If clients from other machines cannot connect to the server, make sure that the hostname (name of the server under participants in project.yml) specified when generating the startup kits in the provisioning process resolves to the correct IP. If the FL server is on an internal network without a DNS hostname, in Ubuntu, an entry may need to be added to /etc/hosts with the internal IP and the hostname.

Federated learning client ($CLIENT_NAME.zip)

Each site participating in federated learning training is a client. Each package for a client is named after the client name specified when provisioning the project.

After unzipping the package (for details see After generating packages: Distribute and extract), run start.sh from the “startup” folder you unzipped to start the client.

Tip

You need to first install NVIDIA FLARE package before running the start.sh shell script. NVIDIA FLARE is available on PyPi and can be installed with python3 -m pip install nvflare.

Depending on the deployed application which shall start later, your environment may need some additional Python packages. If you haven’t installed them, do it after you install NVIDIA FLARE. NVIDIA FLARE does not dictate your deep learning environments. It’s completely up to you to set it up.

Note

Coordination for where to mount the data may be needed depending on where the dataset is located in the application to be deployed.

The rootCA.pem file is pointed to by “ssl_root_cert” in fed_client.json. If you plan to move/copy it to a different place, you will need to modify fed_client.json. The same applies to the other two files, client.crt and client.key.

The client name in your submission to participate this federated learning project is embedded in the CN field of client certificate, which uniquely identifies the participant. As such, please safeguard its private key, client.key.

When a client successfully connects to the FL server, the server and that client will both log a token confirming that the client successfully connected:

Server:

2020-07-07 03:48:49,712 - ClientManager - INFO - Client: New client abcd@127.0.0.1 joined. Sent token: f279157b-df8c-aa1b-8560-2c43efa257bc.  Total clients: 1

Client:

2020-07-07 03:48:49,713 - FederatedClient - INFO - Successfully registered client:abcd for exampletraining. Got token:f279157b-df8c-aa1b-8560-2c43efa257bc

If a connection cannot be made, the client will repeatedly try to connect and for each failure log:

Could not connect to server. Setting flag for stopping training. failed to connect to all addresses

If the server is up, you may need to troubleshoot with settings for firewall ports to make sure that the proper permissions are in place. This could require coordination between the lead IT and site IT personnel.

Federated learning administration client ($EMAIL.zip)

Each admin client will be able to connect and submit commands to the server. Each admin client package is named after the email specified when provisioning the project, and the same email will need to be entered for authentication when the admin client is launched.

Install the wheel package first with:

python3 -m pip install nvflare

After installation, you can run the fl_admin.sh file to start communicating to the FL server. The FL server must be running and there must be a successful connection between the admin client and the FL server in order for the admin client to start. For the prompt User Name:, enter the email that was used for that admin client in the provisioning of the project.

The rootCA.pem file is pointed to by “ca_cert” in fl_admin.sh. If you plan to move/copy it to a different place, you will need to modify the corresponding script. The same applies to the other two files, client.crt and client.key.

The email to participate this FL project is embedded in the CN field of client certificate, which uniquely identifies the participant. As such, please safeguard its private key, client.key.

Attention

You will need write access in the directory containing the “startup” folder because the “transfer” directory for uploading files as well as directories created for federated learning runs will live here. For details, see Internal folder and file structures for NVIDIA FLARE.

Operate: Running federated learning as an administrator

Running federated learning from the administration client

With all connections between the FL server, FL clients, and administration clients open and all of the parties started successfully as described in the preceding section, Federated learning administration client ($EMAIL.zip), admin commands can be used to operate a federated learning project. The FLAdminAPI provides a way to programmatically issue commands to operate the system so it can be run with a script.

For a complete list of admin commands, see Operating NVFLARE - Admin Client, Commands, FLAdminAPI.

For examples of using the commands to operate a FL system, see the examples in the Quickstart section.

Internal folder and file structures for NVIDIA FLARE

Please refer to server workspace and client workspace for the folder and file structures on the server/client side.

Administrator side folder and file structure

/some_path_on_fl_admin/fl_administrator_workspace_root/
    startup/
        client.crt
        client.key
        fl_admin.sh
        readme.txt
        rootCA.pem
        signature.pkl
    transfer/
        application_for_uploading/
            config/
            models/
            resources/
        application2_for_uploading/
            config/
            models/
            resources/