Launching NVIDIA FLARE with docker compose¶
For users who would like to get NVIDIA FLARE up and running as easy as possible, such as first-time NVIDIA FLARE users or people who need to demonstrate it upon request, they can use this docker compose feature. All they need is a working docker environment.
The provisioning tool of NVIDIA FLARE now includes a new
builder DockerBuilder
that can create compose.yaml
and other information.
After provisioing, users can enter the result folder, normally in
workspace/example_project/prod_NN, and type docker compose build
and docker compose up
to start overseer, servers and clients
in the docker compose manner.
Provisioning stage¶
First check if your project.yml file contains the following section.
- path: nvflare.lighter.impl.docker.DockerBuilder
args:
base_image: python:3.8
requirements_file: docker_compose_requirements.txt
This builder will generate the necessary information during provisioning time.
The base_image
argument is the base docker image name that will be used to create
the runtime docker image for NVIDIA FLARE in docker compose setting.
The requirements_file
can contain additional python packages that will be installed
after nvflare package is installed in the runtime docker image. If you don’t need to install
any additional python package, you can provide an empty file.
Post-provisioning stage¶
Running provision command as usual, either in the new format nvflare provision
or just provision
.
After the command, there should a folder with structure similar to the following:
$ tree -L 1
.
├── admin@nvidia.com
├── compose.yaml
├── nvflare_compose
├── nvflare_hc
├── overseer
├── server1
├── server2
├── site-1
└── site-2
8 directories, 1 file
The compose.yaml
is the key file for docker compose command and the folder nvflare_compose
is the compose context folder for generating runtime docker image during docker compose build
stage.
The content inside nvflare_compose
consists of two files only, Dockerfile
and requirements.txt
.
You can modify them if necessary. For example, if you need to install additional binary packages with apt-get install
,
you can add them in the Dockerfile.
The requirements.txt
is a copy of the requirements_file you provided in the project.yml file.
Running docker compose¶
Inside the prod_NN folder, if this is the very first time you start the docker compose for NVIDIA FLARE, please
run docker compose build
to build the runtime docker image. If nothing is changed in Dockerfile and requirements.txt,
you don’t have to run that command again.
$ docker compose build
[+] Building 0.1s (10/10) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 177B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/python:3.8 0.0s
=> [1/5] FROM docker.io/library/python:3.8 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 37B 0.0s
=> CACHED [2/5] RUN pip install -U pip 0.0s
=> CACHED [3/5] RUN pip install nvflare 0.0s
=> CACHED [4/5] COPY requirements.txt requirements.txt 0.0s
=> CACHED [5/5] RUN pip install -r requirements.txt 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:53a1463bd170b8bc213899037bbe4403f2d6f0d553cdd470805855f3968d19d4 0.0s
=> => naming to docker.io/library/nvflare-service 0.0s
After the runtime docker image is ready, you can run docker compose up
to get one overseer, two servers and two sites
running together. The ports for overseer and servers are also opened. The overseer/severs/clients folders in current
prod_NN folder are mounted to different running docker instances. An internal folder will be mounted by servers to store
shared snapshot information.
$ docker compose up
[+] Running 5/0
⠿ Container prod_02-site-1-1 Recreated 0.1s
⠿ Container prod_02-overseer-1 Recreated 0.1s
⠿ Container prod_02-server1-1 Recreated 0.1s
⠿ Container prod_02-server2-1 Recreated 0.1s
⠿ Container prod_02-site-2-1 Recreated 0.1s
Attaching to prod_02-overseer-1, prod_02-server1-1, prod_02-server2-1, prod_02-site-1-1, prod_02-site-2-1
prod_02-overseer-1 | [2022-09-23 16:00:58 +0000] [9] [INFO] Starting gunicorn 20.1.0
prod_02-overseer-1 | [2022-09-23 16:00:58 +0000] [9] [INFO] Listening at: https://0.0.0.0:8443 (9)
prod_02-overseer-1 | [2022-09-23 16:00:58 +0000] [9] [INFO] Using worker: nvflare.ha.overseer.worker.ClientAuthWorker
prod_02-overseer-1 | [2022-09-23 16:00:58 +0000] [12] [INFO] Booting worker with pid: 12
prod_02-server2-1 | 2022-09-23 16:00:59,103 - FederatedServer - INFO - starting secure server at server2:8102
prod_02-server2-1 | deployed FL server trainer.
prod_02-server2-1 | 2022-09-23 16:00:59,118 - nvflare.fuel.hci.server.hci - INFO - Starting Admin Server server2 on Port 8103
prod_02-server2-1 | 2022-09-23 16:00:59,119 - root - INFO - Server started
prod_02-server2-1 | 2022-09-23 16:00:59,121 - FederatedServer - INFO - Got the primary sp: server2 fl_port: 8102 SSID: 9ba168f0-6cf5-446b-bfd5-a1243dd195f8. Turning to hot.
prod_02-server1-1 | 2022-09-23 16:00:59,332 - FederatedServer - INFO - starting secure server at server1:8002
prod_02-server1-1 | deployed FL server trainer.
prod_02-server1-1 | 2022-09-23 16:00:59,346 - nvflare.fuel.hci.server.hci - INFO - Starting Admin Server server1 on Port 8003
prod_02-server1-1 | 2022-09-23 16:00:59,346 - root - INFO - Server started
prod_02-site-2-1 | Waiting for SP....
prod_02-site-2-1 | 2022-09-23 16:00:59,399 - FederatedClient - INFO - Got the new primary SP: server2:8102
prod_02-site-1-1 | Waiting for SP....
prod_02-site-1-1 | 2022-09-23 16:00:59,450 - FederatedClient - INFO - Got the new primary SP: server2:8102
prod_02-server2-1 | 2022-09-23 16:01:00,393 - ClientManager - INFO - Client: New client site-2@172.18.0.2 joined. Sent token: 3da72f67-3443-47ac-b059-76b0b314dd08. Total clients: 1
prod_02-site-2-1 | 2022-09-23 16:01:00,394 - FederatedClient - INFO - Successfully registered client:site-2 for project example_project. Token:3da72f67-3443-47ac-b059-76b0b314dd08 SSID:9ba168f0-6cf5-446b-bfd5-a1243dd195f8
prod_02-server2-1 | 2022-09-23 16:01:00,439 - ClientManager - INFO - Client: New client site-1@172.18.0.3 joined. Sent token: 5e0b1012-77e6-41a3-8af0-9fa86df8ef2e. Total clients: 2
prod_02-site-1-1 | 2022-09-23 16:01:00,440 - FederatedClient - INFO - Successfully registered client:site-1 for project example_project. Token:5e0b1012-77e6-41a3-8af0-9fa86df8ef2e SSID:9ba168f0-6cf5-446b-bfd5-a1243dd195f8
Login with admin console¶
You can use admin console to login to this newly created NVIDIA FLARE system after your machine can resolve the IP
addresses of overseer and servers. For example, if you are running the docker compose at machine desktop1
with ip 192.168.1.101 and
would like to run your admin console at machine desktop2
, you will need to edit the /etc/hosts file on desktop2 to include this line:
192.168.1.101 overseer server1 server2
After this update, the admin console can find overseer, server1 and server2. If in your project.yml file, you name them differently, for example myoverseer for the overseer, please change that line to
192.168.1.101 myoverseer server1 server2
Login with admin console will be as usual. Just run fl_admin.sh in the startup folder of admin console startup.
$ ./admin@nvidia.com/startup/fl_admin.sh
User Name: admin@nvidia.com
Trying to obtain server address
Obtained server address: server1:8003
Trying to login, please wait ...
Logged into server at server1:8003
Type ? to list commands; type "? cmdName" to show usage of a command.
> check_status server
Engine status: stopped
---------------------
| JOB_ID | APP NAME |
---------------------
---------------------
Registered clients: 2
----------------------------------------------------------------------------
| CLIENT | TOKEN | LAST CONNECT TIME |
----------------------------------------------------------------------------
| site-2 | 7cfe5dce-00a5-4ffb-a5ad-d31dc050c5dd | Fri Sep 23 16:15:00 2022 |
| site-1 | 5435ccb6-9240-42b1-a48b-6290cc71d8d0 | Fri Sep 23 16:15:00 2022 |
----------------------------------------------------------------------------
Done [9729 usecs] 2022-09-23 09:15:12.137237
Ending docker compose¶
You can press CTRL-C
to stop the docker compose.