NVIDIA FLARE Job CLI
The nvflare job command family is used to submit, inspect, monitor, and
manage federated learning jobs from an admin startup kit.
Before using server-connected job commands, either run nvflare poc prepare
or activate a registered startup kit with Config Command:
nvflare config add project_admin /path/to/admin@nvidia.com
nvflare config use project_admin
Command Usage
nvflare job -h
usage: nvflare job [-h] ...
job subcommands:
submit submit job
wait wait for a job and return one final JSON envelope
monitor wait for a job and stream progress to stderr
list list jobs on the server
abort abort a running job
meta get metadata for a job
logs retrieve job logs from the server-side log store
log-config change logging configuration for a running job
stats show running job statistics
download download job result
clone clone an existing job
delete delete a job
list_templates [DEPRECATED] use 'nvflare recipe list'
create [DEPRECATED] use 'python job.py --export --export-dir <job_folder>' + 'nvflare job submit -j <job_folder>'
show_variables [DEPRECATED] use 'nvflare recipe list' or the Job Recipe API
Common Workflow
Export or prepare a job folder.
Submit the job with
nvflare job submit -j <job_folder>.In automation, wait for completion with
nvflare job wait <job_id>. For interactive progress output, usenvflare job monitor <job_id>.Inspect metadata, stats, or logs as needed.
Download, clone, abort, or delete the job when appropriate.
Startup Kit Selection
Server-connected job commands use this startup kit resolution order:
Optional
--kit-id <id>: override the active startup kit for this command only by using a registered startup-kit ID.Optional
--startup-kit <path>: override the active startup kit for this command only by using an explicit admin startup-kit directory.NVFLARE_STARTUP_KIT_DIRwhen set.startup_kits.activefrom~/.nvflare/config.conf.If no source resolves to a valid admin startup kit, the command fails before connecting.
--kit-id and --startup-kit are not required. When provided, they take
precedence over the active startup kit for the current command only and do not
change the globally active startup kit. They are useful for scripts, notebooks,
and concurrent workflows that must not mutate ~/.nvflare/config.conf.
Submit a Job
Use nvflare job submit to submit a pre-built NVFlare job folder:
nvflare job submit -j /tmp/nvflare/hello-pt
Submit options:
-j, --job_folder: job folder path. Defaults to./current_job.--study: submit into a named study when the server is configured for multi-study access. If omitted, the literal study namedefaultis submitted.--submit-token: caller-generated token for retry-safe submit and later recovery withnvflare job list --submit-token.-debug, --debug: keep the temporary copied job folder for inspection.--schema: print the command schema as JSON and exit.
Submit returns immediately with a job_id. It does not wait for terminal
job status.
To change job configuration values, edit the exported job files before
submission. Submit-time -f/--config_file overrides are not supported.
Examples:
nvflare config use project_admin
nvflare job submit -j /tmp/nvflare/hello-pt
nvflare job list --kit-id project_admin
nvflare job submit -j /tmp/nvflare/hello-pt --startup-kit /path/to/admin@nvidia.com
Registered startup kit paths must point to the admin startup kit directory
itself, not the broader prod_00 root.
Example JSON success response:
{"schema_version": "1", "status": "ok", "exit_code": 0, "data": {"job_id": "abc123"}}
If the server is configured for studies, you can target one explicitly:
nvflare job submit -j /tmp/nvflare/my_job --study cancer_research
Retry-Safe Submit Tokens
Use --submit-token when an automated caller may retry a submit after a
timeout or lost client connection:
TOKEN=$(uuidgen)
nvflare job submit -j /tmp/nvflare/my_job \
--study cancer_research \
--submit-token "$TOKEN" \
--format json
--submit-token is optional. When provided, it must be generated by the
caller and is used as an idempotency and recovery value for one intended
submit. NVFlare does not auto-generate a submit token when the flag is omitted.
The token is not an authentication token, session token, startup-kit credential,
API key, or certificate secret. Normal startup-kit authentication and
authorization still apply.
Tokens must be non-empty, at most 128 characters, and use only letters,
numbers, ., _, :, or -.
Submit-token scope is the selected server/project context, study, submitter
identity, and token value. Reusing the same token with the same job content in
the same scope returns the existing job_id. Reusing it with different job
content fails with SUBMIT_TOKEN_CONFLICT. The same token may be used in a
different study because studies are separate job namespaces.
If a job created with --submit-token is later deleted, the server keeps the
submit record as job_deleted. A later submit or list lookup with the same
token returns SUBMIT_TOKEN_JOB_DELETED instead of silently recreating the
deleted job. Use a new submit token to submit the job again.
The submitted job path should point to the job content root. When the submitted
artifact is a zip file with one wrapper directory around the job content, the
wrapper is ignored for submit-token content hashing so a normal
zip -r my_job.zip my_job/ archive matches submitting my_job/ directly.
Submitting the parent directory that contains my_job/ is different content
and may conflict when retried with the same token.
The token is stored only as server-owned submission metadata. It is not written
to the job’s meta.json; that file remains job-owned execution metadata such
as deploy_map, resource_spec, min_clients, and launcher settings.
If --submit-token is omitted, submit behavior is unchanged and each submit
creates a new job as before. The server still records the submitted job through
the normal job store and job history, but no retry-safe submit-token record is
created. The job cannot later be recovered with job list --submit-token
unless the original submit used a caller-provided token.
After a client-side timeout or session loss, recover the accepted job with
job list --submit-token:
nvflare job list --study cancer_research --submit-token "$TOKEN" --format json
If the recovered job was deleted, JSON output uses the normal error envelope:
{
"schema_version": "1",
"status": "error",
"exit_code": 4,
"error_code": "SUBMIT_TOKEN_JOB_DELETED",
"data": {
"job_id": "abc123",
"state": "job_deleted",
"deleted_time": "2026-04-30T10:00:00-07:00"
}
}
--submit-token is only for job submit and job list. To monitor,
download, abort, delete, or clone the recovered job, first resolve the
job_id with job list --submit-token and then use the normal job command.
Wait or Monitor a Job
Use nvflare job wait when a script or agent needs one final command result
after the job reaches a terminal state:
nvflare job wait <job_id>
nvflare job wait <job_id> --study cancer_research
nvflare job wait <job_id> --timeout 3600 --interval 5 --format json
job wait accepts:
job_id: job ID to wait for.--timeout: max seconds to wait; must be greater than or equal to0. Default:0(no timeout).--interval: poll interval in seconds; must be greater than0. Default:2.--study: wait for a job in a named study. Use the same study name used at submission time. If omitted, the literal study namedefaultis used.--schema: print the command schema as JSON and exit.
Unlike job monitor, job wait is the single-envelope automation command.
It does not stream progress lines. In JSON mode, stdout contains exactly one
final JSON envelope with the terminal job status and metadata; human-readable
diagnostics still go to stderr.
Exit behavior:
exit code
0: job finished successfully.exit code
1: job reached a terminal failure state, such asFAILED,FINISHED_EXCEPTION,ABORTED, orABANDONED.exit code
2: connection, authentication, or authorization failure prevented waiting.exit code
3: wait timeout.
This enables CI/CD-style chaining without parsing progress output:
JOB=$(nvflare job submit -j ./my_job --format json | jq -r .data.job_id)
nvflare job wait $JOB --format json && nvflare job download $JOB
Use nvflare job monitor when a human wants progress updates while waiting.
It streams status lines to stderr and returns the final result when the job
reaches a terminal state:
nvflare job monitor <job_id>
nvflare job monitor <job_id> --study cancer_research
nvflare job monitor <job_id> --timeout 3600 --format jsonl
Monitor options:
job_id: job ID to monitor.--timeout: max seconds to wait; must be greater than or equal to0. Default:0(no timeout).--interval: poll interval in seconds; must be greater than0. Default:2.--study: monitor a job in a named study. Use the same study name used at submission time. If omitted, the literal study namedefaultis used.--stats-target: where to fetch stats from. Choices:server,client,all. Default:server.--metric: extra metric key to surface from stats. Repeatable.--schema: print the command schema as JSON and exit.
job monitor exit behavior matches job wait:
exit code
0: job finished successfullyexit code
1: job reached a terminal failure state:FAILED,FINISHED_EXCEPTION,ABORTED, orABANDONEDexit code
2: connection, authentication, or authorization failure prevented monitoringexit code
3: monitor timeout
For automation that needs progress events, use --format jsonl. Each stdout
line is one complete JSON object. Progress events include terminal: false;
the final event always includes terminal: true. Timeout emits a final event
with status: "TIMEOUT" and exits with code 3. Successful terminal job
statuses such as FINISHED_OK are normalized to status: "COMPLETED" and
the raw server status is preserved in job_status. Connection,
authentication, and authorization failures emit a terminal error event with
status: "error" and the specific code in error_code.
Example JSONL terminal event:
{"schema_version":"1","event":"terminal","job_id":"abc123","status":"COMPLETED","job_status":"FINISHED_OK","terminal":true}
List and Inspect Jobs
List jobs currently known to the server:
nvflare job list
Common list filters:
-n, --name: filter by job name prefix.-i, --id: filter by job ID prefix.-r, --reverse: reverse sort order.-m, --max: maximum number of results to return.--study: list jobs from a named study. If omitted, the literal study namedefaultis used. Values such asallare passed through to the server unchanged.--submit-token: find the job associated with a retry-safe submit token in the selected study. This is the recovery path after submitting with--submit-token.--schema: print the command schema as JSON and exit.
Retrieve metadata for a single job:
nvflare job meta <job_id>
nvflare job meta <job_id> --study cancer_research
Use metadata to inspect job identity, lifecycle fields, and server-reported
status information after submission. Human output is grouped into a concise
summary; use --format json to retrieve the full raw metadata envelope.
All job-ID lookup and control commands accept --study. Use the same study
name used at submission time. If omitted, the command searches the literal
default study. If the job is not found, the error reports which study was
searched and suggests retrying with --study.
nvflare job meta also supports --schema.
Download, Clone, Abort, Delete
Download job results:
nvflare job download <job_id> -o ./downloads
nvflare job download <job_id> --study cancer_research -o ./downloads
nvflare job download <job_id> --study cancer_research --force
For automation, use JSON output:
nvflare job download <job_id> -o ./downloads --format json
The job must be in a terminal state before download. For a running job, wait first:
nvflare job wait <job_id> --study cancer_research
nvflare job download <job_id> --study cancer_research
The local destination defaults to ./<job_id>. If that directory already
exists, the command fails unless --force is specified. Use --force only
when replacing the existing local download is intended.
Human output remains concise and prints only the final download location. Use
--format json when agents or scripts need artifact discovery fields. The
JSON success response reports local paths on the machine running the CLI:
{
"schema_version": "1",
"status": "ok",
"exit_code": 0,
"data": {
"job_id": "abc123",
"download_path": "/abs/path/downloads/abc123",
"path": "/abs/path/downloads/abc123",
"artifact_discovery": "completed",
"artifacts": {
"global_model": "/abs/path/downloads/abc123/workspace/FL_global_model.pt",
"metrics_summary": "/abs/path/downloads/abc123/workspace/metrics_summary.json",
"client_logs": {
"site-1": "/abs/path/downloads/abc123/workspace/site-1/log.txt"
}
},
"missing_artifacts": []
}
}
download_path is the final local directory returned by the download API.
path is a backward-compatible alias for download_path when present.
artifacts contains local paths discovered under download_path. Agents
and scripts should use data.artifacts.* as the source of truth for
consumable files instead of assuming a server workspace layout or constructing
paths from download_path. missing_artifacts lists expected categories,
such as model, metrics, or client logs, that were not found locally. Missing
artifacts do not make the command fail when the download itself succeeds.
When artifact_discovery is skipped, the CLI did not have a local
directory to inspect, so artifacts and missing_artifacts are null
instead of claiming that expected artifacts were verified absent.
The server download protocol is unchanged; artifact discovery is a local CLI post-processing step after the result has been downloaded.
Clone an existing job:
nvflare job clone <job_id>
nvflare job clone <job_id> --study cancer_research
nvflare job clone clones the full server-side job for reuse. The current
CLI surface takes the source job_id, optional --study, and --schema.
It returns source_job_id and new_job_id. Use the returned new_job_id
to monitor or manage the cloned job.
Abort a running job:
nvflare job abort <job_id>
nvflare job abort <job_id> --study cancer_research
nvflare job abort <job_id> --force
Delete a job:
nvflare job delete <job_id>
nvflare job delete <job_id> --study cancer_research
nvflare job delete <job_id> --force
Notes:
abortanddeletesupport--forceto skip the confirmation prompt.abortanddeletesearch the selected study. If omitted,defaultis used.delete --format jsonreturnsjob_idandsubmit_records_marked_deleted. When this count is nonzero, future use of the same submit token returnsSUBMIT_TOKEN_JOB_DELETED.downloadsupports-o, --output-dirto choose the destination directory. Default: job-specific directory under the current working directory (./<job_id>).clone,download,abort, anddeleteall support--schema.
Observability
Retrieve job logs from the server-side log store:
nvflare job logs <job_id>
nvflare job logs <job_id> --site site-1
nvflare job logs <job_id> --site all
nvflare job logs <job_id> --site all --tail 200
nvflare job logs <job_id> --site site-1 --since 2026-04-28T10:00:00
nvflare job logs <job_id> --site all --max-bytes 200000
nvflare job logs <job_id> --study cancer_research
job logs accepts:
--study: retrieve logs for a job in a named study. If omitted,job logssearches the default study. Use the same study name used forjob submitorjob list.--site server: return the server job log. This is the default.--site <client_name>: return that client’s job log after it has been streamed to and stored by the server.--site all: return the server log and all client logs currently available in the server-side log store. If a known job site does not have stored log content, the JSON response includes it underunavailable.--sitesis accepted as an alias for--sitebut still selects one target value.--tail N: return at most the last N log lines per site.--since timestamp: return timestamped log lines at or after the timestamp when line timestamps are parseable. Continuation lines following an included timestamped line are included.--max-bytes N: return at most N UTF-8 bytes per site.job logsalso supports--schema.
If no explicit bound is provided, job logs returns at most the last 500
lines per site. JSON output includes logs_truncated, per-site availability
and line/byte counts under sites, and the applied filters.
When any of --tail, --since, or --max-bytes is provided, the
default 500-line tail is disabled and filters.default_tail_applied is
false. The explicit bounds are applied in this order: --since,
--tail, then --max-bytes.
The bound options are applied by the CLI after the server returns the stored
log content. They bound the printed or JSON output from nvflare job logs;
they do not reduce the amount of log content requested from the server. If a
large log is already limited by the server-side maximum response size before it
reaches the CLI, these client-side bounds are applied to that returned content.
In normal human output mode, job logs prints the log text directly. With
--site all, each site is separated by a short header. Use --format json
when a structured logs dictionary is needed for automation.
job logs does not provide a built-in grep option. Pipe or post-process
the returned content when text matching is needed.
Client logs are not fetched from client machines at command time. The command
asks the server for logs that were already streamed to the server during the
job. Streamed client logs are read from the server job workspace, where they are
stored as <client_name>/log.txt or <client_name>/log.json depending on
the configured log streamer; after the job workspace is archived, the same files
are read from the stored job workspace artifact.
To enable client job log streaming in a portable job, add the job-level log streamer and receiver components to the job definition:
from nvflare.app_common.logging.job_log_receiver import JobLogReceiver
from nvflare.app_common.logging.job_log_streamer import JobLogStreamer
# Tails each client's job log.txt and streams it to the server.
recipe.job.to_clients(JobLogStreamer())
# Receives streamed log chunks on the server and stores them with the job.
recipe.job.to_server(JobLogReceiver())
System-level logging configuration in resources.json.default is separate
from this job-level opt-in. Some deployments may configure a server-side
JobLogReceiver globally, but including both components in the job makes the
job self-contained across POC and production deployments.
To stream structured JSON logs instead, configure the streamer with
JobLogStreamer(log_file_name="log.json"). nvflare job logs --format json
uses log.json when available and falls back to log.txt otherwise. Human
output prints readable text; if only log.json is available, the CLI renders
the JSON log records as text for display.
The examples/hello-world/hello-log-streaming example shows this pattern.
Change logging configuration for a running job:
nvflare job log-config <job_id> DEBUG
nvflare job log-config <job_id> concise
nvflare job log-config <job_id> msg_only
nvflare job log-config <job_id> DEBUG --study cancer_research
job log-config accepts:
positional
level:DEBUG,INFO,WARNING,ERROR,CRITICALlog modes:
concise,msg_only,full,verbose,reload--site: target site name orall. Default:all; specifying--site allexplicitly is equivalent to omitting it.--study: study containing the job. If omitted,defaultis used.--schema: print the command schema as JSON and exit
Show running job statistics:
nvflare job stats <job_id>
nvflare job stats <job_id> --study cancer_research
job stats supports --study to select the study containing the job, and
--site to target a specific site or all. The default site is all,
so specifying --site all explicitly is equivalent to omitting it.
It also supports --schema.
Recipe-Based Job Creation
The recommended way to create a new job folder is through the Job Recipe API or
an example job.py script that supports --export:
python job.py --export --export-dir /tmp/nvflare/hello-pt
nvflare job submit -j /tmp/nvflare/hello-pt
To discover built-in recipes, use:
nvflare recipe list
Deprecated commands:
nvflare job create: retained for compatibility. Preferpython job.py --exportfollowed bynvflare job submit.nvflare job list_templates: usenvflare recipe list.nvflare job show_variables: use the Job Recipe API.
Current deprecation notes:
nvflare job createstill exposes template- and config-oriented arguments for legacy workflows.nvflare job list_templatesandnvflare job show_variablesremain available for backward compatibility but are not the preferred interfaces for recipe discovery or job-variable inspection.
JSON Output and Help
Add --format json anywhere after the subcommand for machine-readable output:
nvflare job meta <job_id> --format json
--format json may be placed anywhere in the command after the subcommand
name. stdout contains a single JSON envelope; human-readable progress and
diagnostics go to stderr.
Use --schema for machine-readable command discovery. --schema always
returns JSON regardless of --format, so the flag is not needed with it:
nvflare job submit --schema
nvflare job wait --schema
nvflare job monitor --schema
Schema fields such as mutating and idempotent describe the command as a
whole, not the effective behavior of one invocation. For example, job submit
reports idempotent: false because plain submission can create duplicate jobs
when retried after a timeout. It also reports retry_token.supported: true to
show that --submit-token makes retries safe for identical job content in the
same study by the same submitter. job list --submit-token is different:
there --submit-token is only a lookup filter, so retry_token.supported
remains false.
Human-readable argument errors print command help first, followed by the specific error and hint. JSON mode prints only the JSON error envelope.