Operating NVFLARE - Admin Client, Commands, FLAdminAPI

The FL system is operated by the packages of type admin configured at provisioning. The admin packages contain key and certificate files to connect and authenticate with the server, and the administration can be done through an included command prompt with fl_admin.sh or programmatically through the FLAdminAPI.

Admin command prompt

After running fl_admin.sh, log in by following the prompt and entering the name of the participant that the admin package was provisioned for (or for poc mode, “admin” as the name and password).

Typing “help” or “?” will display a list of the commands and a brief description for each. Typing “? ” before a command like “? check_status” or “?ls” will provide additional details for the usage of a command. Provided below is a list of commands shown as examples of how they may be run with a description.

Command

Example

Description

bye

bye

Exit from the client

help

help

Get command help information

lpwd

lpwd

Print local workspace root directory of the admin client

info

info

Show folder setup info (upload and download sources and destinations)

check_status

check_status server

The FL job id, FL server status, and the registered clients with their names and tokens are displayed. If training is running, the round information is also displayed.

check_status client

The name, token, and status of each connected client are displayed.

check_status client clientname

The name, token, and status of the specified client with clientname are displayed.

submit_job

submit_job job_folder_name

Submits the job to the server.

list_jobs

list_jobs

Lists the jobs on the server. (Options: [-n name_prefix] [-d] [job_id_prefix])

abort_job

abort_job job_id

Aborts the job of the specified job_id if it is running or dispatched

clone_job

clone_job job_id

Creates a copy of the specified job with a new job_id

abort

abort job_id client

Aborts the job for the specified job_id for all clients. Individual client jobs can be aborted by specifying clientname.

abort job_id server

Aborts the server job for the specified job_id.

abort_task

abort_task job_id clientname

Aborts the running task for the specified job ID and client.

download_job

download_job job_id

Download folder from the job store containing the job and workspace

delete_job

delete_job job_id

Delete the job from the job store

cat

cat server startup/fed_server.json -ns

Show content of a file (-n: number all output lines; -s: suppress repeated empty output lines)

cat clientname startup/docker.sh -bT

Show content of a file (-b: number nonempty output lines; -T: display TAB characters as ^I)

grep

grep server "info" -i log.txt

Search for a pattern in a file (-n: print line number; -i: ignore case)

head

head clientname log.txt

Print the first 10 lines of a file

head server log.txt -n 15

Print the first 15 lines of a file (-n: print the first N lines instead of the first 10)

tail

tail clientname log.txt

Print the last 10 lines of a file

tail server log.txt -n 15

Print the last 15 lines of a file (-n: output the last N lines instead of the last 10)

ls

ls server -alt

List files in workspace root directory (-a: all; -l: use a long listing format; -t: sort by modification time)

ls clientname -SR

List files in workspace root directory (-S: sort by file size; -R: list subdirectories recursively)

pwd

pwd server

Print the name of workspace root directory

pwd clientname

Print the name of workspace root directory

sys_info

sys_info server

Get system information

sys_info client *clientname*

Get system information. Individual clients can be shutdown by specifying clientname.

remove_client

remove_client clientname

Issue command for server to release client before the 10 minute timeout to allow client to rejoin after manual restart.

restart

restart client

Restarts all of the clients. Individual clients can be restarted by specifying clientname.

restart server

Restarts the server. Clients will also be restarted. Note that the admin client will need to log in again after the server restarts.

shutdown

shutdown client

Shuts down all of the clients. Individual clients can be shutdown by specifying clientname. Please note that this may not be instant but may take time for the command to take effect.

shutdown server

Shuts down the active server. Clients must be shut down first before the server is shut down. Note this will not shut down the Overseer or other SPs.

get_active_sp

get_active_sp

Get information on the active SP (service provider or FL server).

list_sp

list_sp

Get data from last heartbeat of the active and available SP endpoint information.

promote_sp

promote_sp sp_end_point

promote a specified SP to become the active SP (promote_sp example1.com:8002:8003)

shutdown_system

shutdown_system

Shut down entire system by setting the system state to shutdown through the overseer

Note

The commands promote_sp and shutdown_system both go to the Overseer and have a different mechanism of authorization than the other commands sent to the FL server. The Overseer keeps track of a list of privileged users, configured to be admin users with the role of “super”. Only users owning certificates whose cn is in the privileged user list can call these commands.

Tip

Outputs of any command can be redirected into a file by using the greater-than symbol “>”, however there must be no whitespace before the filename. For example, you may run sys_info server >serverinfo.txt. To only save the file output without printing it, use two greater-than symbols “>>” instead: sys_info server >>serverinfo.txt.

FLAdminAPI

FLAdminAPI is a wrapper for admin commands that can be issued by an admin client to the FL server. You can use a provisioned admin client’s certs and keys to initialize an instance of FLAdminAPI to programmatically submit commands to the FL server.

Initialization

It is recommended to use the FLAdminAPIRunner to initialize the API, or use it as a guide to write your own code to use the FLAdminAPI.

Compared to before NVIDIA FLARE 2.1, the FLAdminAPI now requires an overseer_agent to be provided, and this is automatically created by the FLAdminAPIRunner with the information in fed_admin.json in the provided admin_dir’s startup directory.

Logging in is now automatically handled, and when there is a server cut-over, the overseer_agent will provide the new SP endpoint information for the active server and the FLAdminAPI will re-authenticate so the commands will be sent to the new active server.

logout() function can be called to log out. Both login() and logout() are inherited from AdminAPI.

api.logout()

After using FLAdminAPI, the overseer_agent must be cleaned up with a call to:

api.overseer_agent.end()

Usage

See the example scripts run_fl.py in CIFAR-10 for an example of how to use the FLAdminAPI with FLAdminAPIRunner.

You can use the example as inspiration to write your own code using the FLAdminAPI to operate your FL system.

Arguments and targets

The arguments required for each FLAdminAPI call are specified in FLAdminAPI.

The target when needed is where the action should take place. An argument of target as a string can be a singular target of “server” or a specific client name. Where target_type is required and targets can be an optional list, the call can be submitted to multiple targets:

  • If target_type is “server”, the command target is just the server and targets is ignored

  • If target_type is “client” and targets is empty, the command target is all clients

  • If target_type is “client” and targets is a list of strings, each a client name, the command target is all clients in the list targets

  • If target_type is “all” and the command supports it, the command target is the server and all clients

Return Structure

FLAdminAPI calls return an FLAdminAPIResponse dictionary object of key value pairs consisting of a status of type APIStatus, a dictionary with details, and in some cases a raw response from the underlying call to AdminAPI (mainly useful for debugging).

Implementation Notes

FLAdminAPI uses the underlying AdminAPI’s do_command() function to submit commands to the server, and you can also use this function directly for functions that are not wrapped in an FLAdminAPI function. Returns from AdminAPI are included in the FLAdminAPI reply under the “raw” key for some calls and error conditions.

Additional and Complex Commands

The functions wait_until_server_status(), wait_until_client_status(), and wait_until_server_stats() are included with the FLAdminAPI in NVIDIA FLARE as examples of useful functions that can be built with other calls in a loop with logic. These examples wait until the provided callback returns True, with the option to specify a timeout and interval to check the status or stats. There is a default callback to evaluate the reply in the included functions, and additional kwargs passed in will be available to the callback. Custom callbacks can be provided to add logic to handle checking for other conditions. For these example functions, a timeout should be set in case there are any error conditions that result in the system being stuck in a state where the callback never returns True.

You can use the source code of these function as inspiration to create your own functions or logic that makes use of other FLAdminAPI calls.

Questions

  1. Why do I get an error of “Command ___ not found in server or client cmds” even though I did not try any unspecified command?

    The underlying AdminAPI may have not have successfully logged in and obtained a list of available commands to register from the server. Please make sure that the server is accessible and the login is working.

  2. Why does the AdminAPI return status APIStatus.SUCCESS even though an error occurred after issuing the command?

    If you send a raw command to the underlying AdminAPI with do_command(), AdminAPI returns APIStatus.SUCCESS if the command was successfully sent to the server and a reply obtained. FLAdminAPI’s calls make sense of the underlying server reply and returns a suitable status based on the reply.

  3. After a while with the same command, why do I get a SUCCESS from FLAdminAPI but the raw reply contains an error of “not authenticated - no user”?

    The server has a timeout after which login() must be called again in order for the underlying AdminAPI to be authenticated.