Proof Of Concept (POC) Command¶
The POC command allows users to try out the features of NVFlare in a proof of concept deployment on a single machine.
Syntax and Usage¶
The POC command has been reorgaznied in version 2.4 to have the subcommands prepare
, prepare-jobs-dir
, start
, stop
, and clean
.
nvflare poc -h
usage: nvflare poc [-h] {prepare,prepare-jobs-dir,start,stop,clean} ...
options:
-h, --help show this help message and exit
poc:
{prepare,prepare-jobs-dir,start,stop,clean}
poc subcommand
prepare prepare poc environment by provisioning local project
prepare-jobs-dir prepare jobs directory
start start services in poc mode
stop stop services in poc mode
clean clean up poc workspace
nvflare poc prepare¶
The detailed options for nvflare poc prepare
:
nvflare poc prepare -h
usage: nvflare poc prepare [-h] [-n [NUMBER_OF_CLIENTS]] [-c [CLIENTS ...]] [-he] [-i [PROJECT_INPUT]] [-d [DOCKER_IMAGE]] [-debug]
options:
-h, --help show this help message and exit
-n [NUMBER_OF_CLIENTS], --number_of_clients [NUMBER_OF_CLIENTS]
number of sites or clients, default to 2
-c [CLIENTS ...], --clients [CLIENTS ...]
Space separated client names. If specified, number_of_clients argument will be ignored.
-he, --he enable homomorphic encryption.
-i [PROJECT_INPUT], --project_input [PROJECT_INPUT]
project.yaml file path, If specified, 'number_of_clients','clients' and 'docker' specific options will be ignored.
-d [DOCKER_IMAGE], --docker_image [DOCKER_IMAGE]
generate docker.sh based on the docker_image, used in '--prepare' command. and generate docker.sh 'start/stop' commands will start with docker.sh
-debug, --debug debug is on
nvflare poc prepare-jobs-dir¶
The detailed options for nvflare poc prepare-jobs-dir
:
nvflare poc prepare-jobs-dir -h
usage: nvflare poc prepare-jobs-dir [-h] [-j [JOBS_DIR]] [-debug]
optional arguments:
-h, --help show this help message and exit
-j [JOBS_DIR], --jobs_dir [JOBS_DIR]
jobs directory
-debug, --debug debug is on
Note
The “-j” option is new in version 2.4 for linking to the job directory in the code base. Previously, you could
optionally define an NVFLARE_HOME
environment variable to point to a local NVFlare directory to create a symbolic
link to point the transfer directory to the examples in the code base. For example, if the the NVFlare GitHub
repository is cloned under ~/projects, then you could set NVFLARE_HOME=~/projects/NVFlare
. If the NVFLARE_HOME
environment variable was not set, you could manually copy the examples to the transfer directory.
Now, the “-j” option takes precedence over the NVFLARE_HOME
environment variable, but the NVFLARE_HOME
environment
variable can still be used.
nvflare poc start¶
The detailed options for nvflare poc start
:
nvflare poc start -h
usage: nvflare poc start [-h] [-p [SERVICE]] [-ex [EXCLUDE]] [-gpu [GPU ...]] [-debug]
options:
-h, --help show this help message and exit
-p [SERVICE], --service [SERVICE]
participant, Default to all participants
-ex [EXCLUDE], --exclude [EXCLUDE]
exclude service directory during 'start', default to , i.e. nothing to exclude
-gpu [GPU ...], --gpu [GPU ...]
gpu device ids will be used as CUDA_VISIBLE_DEVICES. used for poc start command
-debug, --debug debug is on
nvflare poc stop¶
The detailed options for nvflare poc stop
:
usage: nvflare poc stop [-h] [-p [SERVICE]] [-ex [EXCLUDE]] [-debug]
options:
-h, --help show this help message and exit
-p [SERVICE], --service [SERVICE]
participant, Default to all participants
-ex [EXCLUDE], --exclude [EXCLUDE]
exclude service directory during 'stop', default to , i.e. nothing to exclude
-debug, --debug debug is on
nvflare poc clean¶
The detailed options for nvflare poc clean
:
usage: nvflare poc clean [-h] [-debug]
options:
-h, --help show this help message and exit
-debug, --debug debug is on
Set Up POC Workspace¶
Running the following command will generate the POC startup startup kits in the default workspace of “/tmp/nvflare/poc”:
nvflare poc prepare
Starting in version 2.4, a config.conf
file located at the hidden directory of .nvflare/config.conf
in
the home directory obtained from Path.home()
is used to store the location of the POC workspace:
startup_kit {
path = /tmp/nvflare/poc/example_project/prod_00
}
poc_workspace {
path = /tmp/nvflare/poc
}
This config.conf
file will be created automatically when nvflare poc prepare
is first run.
Replace the Default POC Workspace¶
You can change the default POC workspace to any location. You can set the environment variable NVFLARE_POC_WORKSPACE:
NVFLARE_POC_WORKSPACE="/tmp/nvflare/poc2"
In this example, the default workspace is set to the location “/tmp/nvflare/poc2”.
You can also create the config.conf
file at .nvflare/config.conf
in the home directory and set the value of poc_workspace
before running nvflare poc prepare
to set the POC workspace, but the NVFLARE_POC_WORKSPACE environment variable will take precedence if set.
The following command can be used to set the POC workspace:
nvflare config -pw <poc_workspace>
The startup kit directory can be set with the following command:
nvflare config -d <startup_dir>
or
nvflare config --startup_kit_dir <startup_dir>
Note that you will need to run nvflare poc prepare
again after setting the location.
Start Package(s)¶
Once the startup kits are generated with the prepare command, they are ready to be started. If you prepared the POC startup kits using default workspace, then you need to start with the same default workspace, otherwise, you need to specify the workspace.
Start ALL Packages¶
Running the following command:
nvflare poc start
will start ALL clients (site-1, site-2) and server as well as FLARE Console (aka Admin Client) located in the default workspace=”/tmp/nvflare/poc”.
Example Output
start_poc at /tmp/nvflare/poc, gpu_ids=[], excluded = [], services_list=[]
WORKSPACE set to /tmp/nvflare/poc/example_project/prod_00/site-2/startup/..
WORKSPACE set to /tmp/nvflare/poc/example_project/prod_00/server/startup/..
WORKSPACE set to /tmp/nvflare/poc/example_project/prod_00/site-1/startup/..
PYTHONPATH is /local/custom:
PYTHONPATH is /local/custom:
start fl because of no pid.fl
start fl because of no pid.fl
start fl because of no pid.fl
new pid 24462
new pid 24463
new pid 24461
Waiting for SP....
Waiting for SP....
2023-07-20 16:29:32,709 - Cell - INFO - server: creating listener on grpc://0:8002
2023-07-20 16:29:32,718 - Cell - INFO - site-1: created backbone external connector to grpc://localhost:8002
2023-07-20 16:29:32,718 - Cell - INFO - site-2: created backbone external connector to grpc://localhost:8002
2023-07-20 16:29:32,719 - ConnectorManager - INFO - 24462: Try start_listener Listener resources: {'secure': False, 'host': 'localhost'}
2023-07-20 16:29:32,719 - ConnectorManager - INFO - 24463: Try start_listener Listener resources: {'secure': False, 'host': 'localhost'}
2023-07-20 16:29:32,719 - Cell - INFO - server: created backbone external listener for grpc://0:8002
2023-07-20 16:29:32,719 - ConnectorManager - INFO - 24461: Try start_listener Listener resources: {'secure': False, 'host': 'localhost'}
2023-07-20 16:29:32,719 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 PASSIVE tcp://0:31953] is starting
2023-07-20 16:29:32,719 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 PASSIVE tcp://0:22614] is starting
2023-07-20 16:29:32,720 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 PASSIVE tcp://0:41710] is starting
Trying to obtain server address
Obtained server address: localhost:8003
Trying to login, please wait ...
2023-07-20 16:29:33,220 - Cell - INFO - site-1: created backbone internal listener for tcp://localhost:31953
2023-07-20 16:29:33,220 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 ACTIVE grpc://localhost:8002] is starting
2023-07-20 16:29:33,220 - Cell - INFO - site-2: created backbone internal listener for tcp://localhost:22614
2023-07-20 16:29:33,220 - Cell - INFO - server: created backbone internal listener for tcp://localhost:41710
2023-07-20 16:29:33,220 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 PASSIVE grpc://0:8002] is starting
2023-07-20 16:29:33,220 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 ACTIVE grpc://localhost:8002] is starting
2023-07-20 16:29:33,221 - FederatedClient - INFO - Wait for engine to be created.
2023-07-20 16:29:33,221 - FederatedClient - INFO - Wait for engine to be created.
2023-07-20 16:29:33,222 - ServerState - INFO - Got the primary sp: localhost fl_port: 8002 SSID: ebc6125d-0a56-4688-9b08-355fe9e4d61a. Turning to hot.
deployed FL server trainer.
2023-07-20 16:29:33,229 - nvflare.fuel.hci.server.hci - INFO - Starting Admin Server localhost on Port 8003
2023-07-20 16:29:33,229 - root - INFO - Server started
2023-07-20 16:29:33,710 - ClientManager - INFO - Client: New client site-2@192.168.86.53 joined. Sent token: cbb4983f-c895-4364-8508-f58cca53dc31. Total clients: 1
2023-07-20 16:29:33,711 - ClientManager - INFO - Client: New client site-1@192.168.86.53 joined. Sent token: e70a1568-2025-4d47-8e64-e3d1a3667a22. Total clients: 2
2023-07-20 16:29:33,711 - FederatedClient - INFO - Successfully registered client:site-2 for project example_project. Token:cbb4983f-c895-4364-8508-f58cca53dc31 SSID:ebc6125d-0a56-4688-9b08-355fe9e4d61a
2023-07-20 16:29:33,712 - FederatedClient - INFO - Successfully registered client:site-1 for project example_project. Token:e70a1568-2025-4d47-8e64-e3d1a3667a22 SSID:ebc6125d-0a56-4688-9b08-355fe9e4d61a
2023-07-20 16:29:33,712 - FederatedClient - INFO - Got engine after 0.49114251136779785 seconds
2023-07-20 16:29:33,713 - FederatedClient - INFO - Got the new primary SP: grpc://localhost:8002
2023-07-20 16:29:33,714 - FederatedClient - INFO - Got engine after 0.49308180809020996 seconds
2023-07-20 16:29:33,714 - FederatedClient - INFO - Got the new primary SP: grpc://localhost:8002
Trying to login, please wait ...
Logged into server at localhost:8003 with SSID: ebc6125d-0a56-4688-9b08-355fe9e4d61a
Type ? to list commands; type "? cmdName" to show usage of a command.
>
Note
If you run nvflare poc start
before prepare, you will get the following error:
/tmp/nvflare/poc/project.yml is missing, make sure you have first run 'nvflare poc prepare'
Note
If you run nvflare poc start
after having already started the server or any of the clients, you will get errors like:
There seems to be one instance, pid=12458, running. If you are sure it's not the case, please kill process 12458 and then remove daemon_pid.fl in /tmp/nvflare/poc/server/startup/..There seems to be one instance, pid=12468, running. If you are sure it's not the case, please kill process 12468.
Note
If you prefer to have the FLARE Console on a different terminal, you can start everything else with: nvflare poc start -ex admin
.
Start the server only¶
nvflare poc start -p server
An example of successful output for starting a server:
WORKSPACE set to /tmp/nvflare/poc/example_project/prod_00/server/startup/..
start fl because of no pid.fl
new pid 26314
2023-07-20 16:35:49,591 - Cell - INFO - server: creating listener on grpc://0:8002
2023-07-20 16:35:49,596 - Cell - INFO - server: created backbone external listener for grpc://0:8002
2023-07-20 16:35:49,597 - ConnectorManager - INFO - 26314: Try start_listener Listener resources: {'secure': False, 'host': 'localhost'}
2023-07-20 16:35:49,597 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 PASSIVE tcp://0:36446] is starting
2023-07-20 16:35:50,098 - Cell - INFO - server: created backbone internal listener for tcp://localhost:36446
2023-07-20 16:35:50,098 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 PASSIVE grpc://0:8002] is starting
2023-07-20 16:35:50,100 - ServerState - INFO - Got the primary sp: localhost fl_port: 8002 SSID: ebc6125d-0a56-4688-9b08-355fe9e4d61a. Turning to hot.
deployed FL server trainer.
2023-07-20 16:35:50,107 - nvflare.fuel.hci.server.hci - INFO - Starting Admin Server localhost on Port 8003
2023-07-20 16:35:50,107 - root - INFO - Server started
Start the FLARE Console (previously called the Admin Client)¶
nvflare poc start -p admin@nvidia.com
Start Clients with GPU Assignment¶
The user can provide the GPU device IDs in a certain order, for example:
nvflare poc start -gpu 1 0 0 2
The system will try to match the clients with the given GPU devices in order. In this example, the matches will be site-1 with GPU_id = 1, site-2 with GPU_id = 0, site-3 with GPU_id = 0 and site-4 with GPU_id = 2.
If the GPU ID does not exist on the client machine, you will get an error like:
gpu_id provided is not available in the host machine, available GPUs are [0]
If no GPU id is specified, the host GPU ID will be used if available.
If there is no GPU, then there will be no assignments. If there are GPUs, they will be assigned to clients automatically.
Tip
You can check the GPUs available with the following command (assuming you have NVIDIA GPUs with drivers installed):
nvidia-smi --list-gpus
Stop Package(s)¶
To stop packages, issue the command:
nvflare poc stop
Similarly, you can stop a specific package, for example:
nvflare poc stop -p server
Note that you may need to exit the FLARE Console yourself.
Clean Up¶
There is a command to clean up the POC workspace added in version 2.2 that will delete the POC workspaces:
nvflare poc clean