Operating NVFLARE - Admin Client, Commands, FLAdminAPI¶
The FL system is operated by the packages of type admin configured at provisioning. The admin packages contain key and
certificate files to connect and authenticate with the server, and the administration can be done through an included
command prompt with fl_admin.sh
or programmatically through the FLAdminAPI.
Admin command prompt¶
After running fl_admin.sh
, log in by following the prompt and entering the name of the participant that the admin
package was provisioned for (or for poc mode, “admin” as the name and password).
Typing “help” or “?” will display a list of the commands and a brief description for each. Typing “? ” before a command like “? check_status” or “?ls” will provide additional details for the usage of a command. Provided below is a list of commands shown as examples of how they may be run with a description.
Command |
Example |
Description |
---|---|---|
bye |
|
Exit from the client |
help |
|
Get command help information |
lpwd |
|
Print local workspace root directory of the admin client |
info |
|
Show folder setup info (upload and download sources and destinations) |
check_status |
|
The FL job id, FL server status, and the registered clients with their names and tokens are displayed. If training is running, the round information is also displayed. |
|
The name, token, and status of each connected client are displayed. |
|
|
The name, token, and status of the specified client with clientname are displayed. |
|
submit_job |
|
Submits the job to the server. |
list_jobs |
|
Lists the jobs on the server. (Options: [-n name_prefix] [-d] [job_id_prefix]) |
abort_job |
|
Aborts the job of the specified job_id if it is running or dispatched |
clone_job |
|
Creates a copy of the specified job with a new job_id |
abort |
|
Aborts the job for the specified job_id for all clients. Individual client jobs can be aborted by specifying clientname. |
|
Aborts the server job for the specified job_id. |
|
abort_task |
|
Aborts the running task for the specified job ID and client. |
download_job |
|
Download folder from the job store containing the job and workspace |
delete_job |
|
Delete the job from the job store |
cat |
|
Show content of a file (-n: number all output lines; -s: suppress repeated empty output lines) |
|
Show content of a file (-b: number nonempty output lines; -T: display TAB characters as ^I) |
|
grep |
|
Search for a pattern in a file (-n: print line number; -i: ignore case) |
head |
|
Print the first 10 lines of a file |
|
Print the first 15 lines of a file (-n: print the first N lines instead of the first 10) |
|
tail |
|
Print the last 10 lines of a file |
|
Print the last 15 lines of a file (-n: output the last N lines instead of the last 10) |
|
ls |
|
List files in workspace root directory (-a: all; -l: use a long listing format; -t: sort by modification time) |
|
List files in workspace root directory (-S: sort by file size; -R: list subdirectories recursively) |
|
pwd |
|
Print the name of workspace root directory |
|
Print the name of workspace root directory |
|
sys_info |
|
Get system information |
|
Get system information. Individual clients can be shutdown by specifying clientname. |
|
remove_client |
|
Issue command for server to release client before the 10 minute timeout to allow client to rejoin after manual restart. |
restart |
|
Restarts all of the clients. Individual clients can be restarted by specifying clientname. |
|
Restarts the server. Clients will also be restarted. Note that the admin client will need to log in again after the server restarts. |
|
shutdown |
|
Shuts down all of the clients. Individual clients can be shutdown by specifying clientname. Please note that this may not be instant but may take time for the command to take effect. |
|
Shuts down the active server. Clients must be shut down first before the server is shut down. Note this will not shut down the Overseer or other SPs. |
|
get_active_sp |
|
Get information on the active SP (service provider or FL server). |
list_sp |
|
Get data from last heartbeat of the active and available SP endpoint information. |
promote_sp |
|
promote a specified SP to become the active SP (promote_sp example1.com:8002:8003) |
shutdown_system |
|
Shut down entire system by setting the system state to shutdown through the overseer |
Note
The commands promote_sp
and shutdown_system
both go to the Overseer and have a different mechanism of
authorization than the other commands sent to the FL server. The Overseer keeps track of a list of privileged users,
configured to be admin users with the role of “super”. Only users owning certificates whose cn is in the privileged
user list can call these commands.
Tip
Outputs of any command can be redirected into a file by using the greater-than symbol “>”, however there must be no
whitespace before the filename. For example, you may run sys_info server >serverinfo.txt
. To only save the
file output without printing it, use two greater-than symbols “>>” instead: sys_info server >>serverinfo.txt
.
FLAdminAPI¶
FLAdminAPI
is a wrapper for admin commands that can be issued
by an admin client to the FL server. You can use a provisioned admin client’s certs and keys to initialize an instance
of FLAdminAPI to programmatically submit commands to the FL server.
Initialization¶
It is recommended to use the FLAdminAPIRunner
to
initialize the API, or use it as a guide to write your own code to use the FLAdminAPI.
Compared to before NVIDIA FLARE 2.1.0, the FLAdminAPI now requires an overseer_agent to be provided, and this is automatically
created by the FLAdminAPIRunner
with the
information in fed_admin.json
in the provided admin_dir’s startup directory.
Logging in is now automatically handled, and when there is a server cutover, the overseer_agent will provide the new SP endpoint information for the active server and the FLAdminAPI will reauthenticate so the commands will be sent to the new active server.
logout()
function can be called to log out. Both login()
and logout()
are
inherited from AdminAPI.
api.logout()
After using FLAdminAPI, the overseer_agent must be cleaned up with a call to:
api.overseer_agent.end()
Usage¶
See the example scripts run_fl.py
in CIFAR-10 for
an example of how to use the FLAdminAPI with FLAdminAPIRunner.
You can use the example as inspiration to write your own code using the FLAdminAPI to operate your FL system.
Arguments and targets¶
The arguments required for each FLAdminAPI call are specified in FLAdminAPI
.
The target
when needed is where the action should take place. An argument of target
as a string can be a
singular target of “server” or a specific client name. Where target_type
is required and targets
can
be an optional list, the call can be submitted to multiple targets:
If
target_type
is “server”, the command target is just the server andtargets
is ignoredIf
target_type
is “client” andtargets
is empty, the command target is all clientsIf
target_type
is “client” andtargets
is a list of strings, each a client name, the command target is all clients in the listtargets
If
target_type
is “all” and the command supports it, the command target is the server and all clients
Return Structure¶
FLAdminAPI calls return an FLAdminAPIResponse dictionary object of key value pairs consisting of a status of type APIStatus, a dictionary with details, and in some cases a raw response from the underlying call to AdminAPI (mainly useful for debugging).
Implementation Notes¶
FLAdminAPI uses the underlying AdminAPI’s do_command()
function to submit commands to the server, and you
can also use this function directly for functions that are not wrapped in an FLAdminAPI function. Returns from AdminAPI
are included in the FLAdminAPI reply under the “raw” key for some calls and error conditions.
Additional and Complex Commands¶
The functions wait_until_server_status()
, wait_until_client_status()
, and wait_until_server_stats()
are
included with the FLAdminAPI in NVIDIA FLARE as examples of useful functions that can be built with other calls in a
loop with logic. These examples wait until the provided callback returns True, with the option to specify a timeout and
interval to check the status or stats. There is a default callback to evaluate the reply in the included functions, and
additional kwargs passed in will be available to the callback. Custom callbacks can be provided to add logic to handle
checking for other conditions. For these example functions, a timeout should be set in case there are any error
conditions that result in the system being stuck in a state where the callback never returns True.
You can use the source code of these function as inspiration to create your own functions or logic that makes use of other FLAdminAPI calls.
Questions¶
Why do I get an error of “Command ___ not found in server or client cmds” even though I did not try any unspecified command?
The underlying AdminAPI may have not have successfully logged in and obtained a list of available commands to register from the server. Please make sure that the server is accessible and the login is working.
Why does the AdminAPI return status APIStatus.SUCCESS even though an error occurred after issuing the command?
If you send a raw command to the underlying AdminAPI with
do_command()
, AdminAPI returns APIStatus.SUCCESS if the command was successfully sent to the server and a reply obtained. FLAdminAPI’s calls make sense of the underlying server reply and returns a suitable status based on the reply.After a while with the same command, why do I get a SUCCESS from FLAdminAPI but the raw reply contains an error of “not authenticated - no user”?
The server has a timeout after which
login()
must be called again in order for the underlying AdminAPI to be authenticated.