NVIDIA FLARE Job CLI¶
The NVIDIA FLARE Job CLI
provides options to create and submit
jobs from a command line interface. See the NVFlare Job CLI Notebook
for a tutorial on how to use the Job CLI.
Command Usage¶
usage: nvflare job [-h] {list_templates,create,submit,show_variables} ...
options:
-h, --help show this help message and exit
job:
{list_templates,create,submit,show_variables}
job subcommand
list_templates show available job templates
create create job
submit submit job
show_variables show template variable values in configuration
Command examples¶
Lists Job Templates¶
The nvflare job list_templates
command lists the available job templates. The option
-d "<job_templates_dir>"
or --job_template_dir "<job_templates_dir>"
is the location of
the job_templates.
nvflare job list_templates -d "<NVFlare location>/job_templates"
The output should be similar to the following:
The following job templates are available:
----------------------------------------------------------------------------------------------------------------------
name Description Controller Type Execution API Type
----------------------------------------------------------------------------------------------------------------------
cyclic_cc_pt client-controlled cyclic workflow with PyTorch ClientAPI tra client client_api
cyclic_pt server-controlled cyclic workflow with PyTorch ClientAPI tra server client_api
psi_csv private-set intersection for csv data server Executor
sag_cross_np scatter & gather and cross-site validation using numpy server client executor
sag_cse_pt scatter & gather workflow and cross-site evaluation with PyT server client_api
sag_gnn scatter & gather workflow for gnn learning server client_api
sag_nemo Scatter and Gather Workflow for NeMo server client_api
sag_np scatter & gather workflow using numpy server client_api
sag_np_cell_pipe scatter & gather workflow using numpy server client_api
sag_np_metrics scatter & gather workflow using numpy server client_api
sag_pt scatter & gather workflow using pytorch server client_api
sag_pt_deploy_map SAG workflow with pytorch, deploy_map, site-specific configs server client_api
sag_pt_executor scatter & gather workflow and cross-site evaluation with PyT server Executor
sag_pt_he scatter & gather workflow using pytorch and homomorphic encr server client_api
sag_pt_mlflow scatter & gather workflow using pytorch with MLflow tracking server client_api
sag_pt_model_learner scatter & gather workflow and cross-site evaluation with PyT server ModelLearner
sag_tf scatter & gather workflow using TensorFlow server client_api
sklearn_kmeans scikit-learn KMeans model server client_api
sklearn_linear scikit-learn linear model server client_api
sklearn_svm scikit-learn SVM model server client_api
stats_df FedStats: tabular data with pandas server stats executor
stats_image FedStats: image intensity histogram server stats executor
swarm_cse_pt Swarm Learning with Cross-Site Evaluation with PyTorch client client_api
swarm_cse_pt_model_l Swarm Learning with Cross-Site Evaluation with PyTorch Model client ModelLearner
vertical_xgb vertical federated xgboost server Executor
xgboost_tree xgboost horizontal tree-based collaboration model server client_api
----------------------------------------------------------------------------------------------------------------------
View all the available templates at the FLARE Job Template Registry.
Setting job_template path¶
You can also use the nvflare job list_templates
command without the -d option. When the job templates directory
is not specified, the Job CLI will try to find the location with the following logic:
See if the NVFLARE_HOME
environment variable is set. If NVFLARE_HOME
is not empty, the Job CLI will look for the job templates at
${NVFLARE_HOME}/job_templates
.
If the NVFLARE_HOME
environment variable is not set, the Job CLI will look for the job_template
path in the config in the nvflare
hidden directory (located at ~/.nvflare/config.conf
). Once the -d <job_template_dir>
option is used, the job_template
value
in ~/.nvflare/config.conf
will be updated so you don’t need to specify -d
again.
If you want to change the job_template
path, you can directly edit this config file or use the nvflare config
command with the
-jt
or --job_templates_dir
option:
nvflare config -jt ../../job_templates
Create new job¶
The nvflare job create
command will allow you to create a new job based on a template, with options to replace variables in config files.
The options for usage are as follows:
usage: nvflare job create [-h] [-j [JOB_FOLDER]] [-w [TEMPLATE]] [-sd [SCRIPT_DIR]] [-f [CONFIG_FILE [CONFIG_FILE ...]]] [-debug] [-force]
optional arguments:
-h, --help show this help message and exit
-j [JOB_FOLDER], --job_folder [JOB_FOLDER]
job_folder path, default to ./current_job directory
-w [TEMPLATE], --template [TEMPLATE]
template name or template folder. You can use list_templates to see available jobs from job templates, pick name such as 'sag_pt' as template name. Alternatively, you can use the path to the job
template folder, such as job_templates/sag_pt
-sd [SCRIPT_DIR], --script_dir [SCRIPT_DIR]
script directory contains additional related files. All files or directories under this directory will be copied over to the custom directory.
-f [CONFIG_FILE [CONFIG_FILE ...]], --config_file [CONFIG_FILE [CONFIG_FILE ...]]
Training config file with corresponding optional key=value pairs. If key presents in the preceding config file, the value in the config file will be overwritten by the new value
-debug, --debug debug is on
-force, --force force create is on, if -force, overwrite existing configuration with newly created configurations
The -j
option or --job_folder
option is the path to the job folder to be created. If the job folder is not specified, the Job CLI will create a
current_job
folder in the current directory.
The -w
option or --template
option is the name of the template that the new job will be created from.
Show variables¶
The nvflare job show_variables
command can be used to show the variables in a job. The options for usage are as follows:
nvflare job show_variables -j <path/to/my_job>
Submit job with CLI¶
The nvflare job submit
command can be used to submit jobs:
usage: nvflare job submit [-h] [-j [JOB_FOLDER]] [-f [CONFIG_FILE ...]] [-debug]
options:
-h, --help show this help message and exit
-j [JOB_FOLDER], --job_folder [JOB_FOLDER]
job_folder path, default to ./current_job directory
-f [CONFIG_FILE ...], --config_file [CONFIG_FILE ...]
Training config file with corresponding optional key=value pairs. If key presents in the preceding config file, the value in the config file will be overwritten by the new value
-debug, --debug debug is on
In order to do this, it will need to know the location of the admin console
startup kit directory. In POC mode, this is set for the user automatically. For a provisioned setup, the user will need to set the path to
the startup kit for the Job CLI. The startup kit path is stored in the ~/.nvflare/config.conf
file in the nvflare hidden directory at
the user’s home directory. You can edit this path in the file and set it directly for example:
startup_kit {
path = /tmp/nvflare/poc/example_project/prod_00
}
Alternatively, you can use the nvflare config
command with the -d
or --startup_kit_dir
option to set the startup kit path:
nvflare config --startup_kit_dir /tmp/nvflare/poc/example_project/prod_00
With the startup kit directory path set, you can submit the job (this following example is from the
NVFlare Job CLI Notebook and
replaces several variables in the config_fed_server.conf
config file):
nvflare job submit -j /tmp/nvflare/my_job -f config_fed_server.conf num_rounds=1 app_config="--dataset_path /tmp/nvflare/data/cifar10"
Troubleshooting with the -debug flag¶
Since the nvflare job submit
command does not overwrite the job folder configuration during submission, it has to use a temp job folder.
If you want to check the final configs submited to the server or simply want to see the stack trace of the exception, you can use the -debug
flag.
With the -debug
flag, the nvflare job submit
command will not delete the temp job folder once it has finished job submission,
and it will also print the exception stack trace in case of failure.
When you submit a job with the -debug
flag, you should see a statement like the following after the message that the job was
submitted (the actual random folder name will vary):
in debug mode, job configurations can be examined in temp job directory '/tmp/tmpdnusoyzj'
You can look at the contents of the temp job folder for more information about the job submission. For example, you can look at the
config_fed_server.conf
file in the temp job folder to see if the final configuration is what you intended.
Advanced Job Configurations¶
For different configurations for different client sites, you can use the -f
option to specify the variables to change for each
config file for each client site.
For example, to change number of training rounds to 2, change default app_script from “cifar10.py” to “train.py” for both app_1 and app_2, and change the app_1 batch_size to 4, app_2 batch_size to 6 for sag_pt_deploy_map as in the NVFlare Job CLI Notebook:
nvflare job create \
-j /tmp/nvflare/my_job -w sag_pt_deploy_map \
-f app_server/config_fed_server.conf num_rounds=2 \
-f app_1/config_fed_client.conf app_script=train.py app_config="--batch_size 4" \
-f app_2/config_fed_client.conf app_script=train.py app_config="--batch_size 6" \
-sd ../hello-world/step-by-step/cifar10/code/fl
Note
The app names must be defined in the job template being used: in this case app_1
, app_2
, and app_server
,
are in sag_pt_deploy_map
.