NVIDIA FLARE Job CLI

The nvflare job command family is used to submit, inspect, monitor, and manage federated learning jobs from an admin startup kit.

Before using server-connected job commands, either run nvflare poc prepare or activate a registered startup kit with Config Command:

nvflare config add project_admin /path/to/admin@nvidia.com
nvflare config use project_admin

Command Usage

nvflare job -h

usage: nvflare job [-h]  ...

job subcommands:
  submit          submit job
  wait            wait for a job and return one final JSON envelope
  monitor         wait for a job and stream progress to stderr
  list            list jobs on the server
  abort           abort a running job
  meta            get metadata for a job
  logs            retrieve job logs from the server-side log store
  log-config      change logging configuration for a running job
  stats           show running job statistics
  download        download job result
  clone           clone an existing job
  delete          delete a job
  list_templates  [DEPRECATED] use 'nvflare recipe list'
  create          [DEPRECATED] use 'python job.py --export --export-dir <job_folder>' + 'nvflare job submit -j <job_folder>'
  show_variables  [DEPRECATED] use 'nvflare recipe list' or the Job Recipe API

Common Workflow

  1. Export or prepare a job folder.

  2. Submit the job with nvflare job submit -j <job_folder>.

  3. In automation, wait for completion with nvflare job wait <job_id>. For interactive progress output, use nvflare job monitor <job_id>.

  4. Inspect metadata, stats, or logs as needed.

  5. Download, clone, abort, or delete the job when appropriate.

Startup Kit Selection

Server-connected job commands use this startup kit resolution order:

  1. Optional --kit-id <id>: override the active startup kit for this command only by using a registered startup-kit ID.

  2. Optional --startup-kit <path>: override the active startup kit for this command only by using an explicit admin startup-kit directory.

  3. NVFLARE_STARTUP_KIT_DIR when set.

  4. startup_kits.active from ~/.nvflare/config.conf.

  5. If no source resolves to a valid admin startup kit, the command fails before connecting.

--kit-id and --startup-kit are not required. When provided, they take precedence over the active startup kit for the current command only and do not change the globally active startup kit. They are useful for scripts, notebooks, and concurrent workflows that must not mutate ~/.nvflare/config.conf.

Submit a Job

Use nvflare job submit to submit a pre-built NVFlare job folder:

nvflare job submit -j /tmp/nvflare/hello-pt

Submit options:

  • -j, --job_folder: job folder path. Defaults to ./current_job.

  • --study: submit into a named study when the server is configured for multi-study access. If omitted, the literal study name default is submitted.

  • --submit-token: caller-generated token for retry-safe submit and later recovery with nvflare job list --submit-token.

  • -debug, --debug: keep the temporary copied job folder for inspection.

  • --schema: print the command schema as JSON and exit.

Submit returns immediately with a job_id. It does not wait for terminal job status.

To change job configuration values, edit the exported job files before submission. Submit-time -f/--config_file overrides are not supported.

Examples:

nvflare config use project_admin
nvflare job submit -j /tmp/nvflare/hello-pt
nvflare job list --kit-id project_admin
nvflare job submit -j /tmp/nvflare/hello-pt --startup-kit /path/to/admin@nvidia.com

Registered startup kit paths must point to the admin startup kit directory itself, not the broader prod_00 root.

Example JSON success response:

{"schema_version": "1", "status": "ok", "exit_code": 0, "data": {"job_id": "abc123"}}

If the server is configured for studies, you can target one explicitly:

nvflare job submit -j /tmp/nvflare/my_job --study cancer_research

Retry-Safe Submit Tokens

Use --submit-token when an automated caller may retry a submit after a timeout or lost client connection:

TOKEN=$(uuidgen)
nvflare job submit -j /tmp/nvflare/my_job \
    --study cancer_research \
    --submit-token "$TOKEN" \
    --format json

--submit-token is optional. When provided, it must be generated by the caller and is used as an idempotency and recovery value for one intended submit. NVFlare does not auto-generate a submit token when the flag is omitted. The token is not an authentication token, session token, startup-kit credential, API key, or certificate secret. Normal startup-kit authentication and authorization still apply.

Tokens must be non-empty, at most 128 characters, and use only letters, numbers, ., _, :, or -.

Submit-token scope is the selected server/project context, study, submitter identity, and token value. Reusing the same token with the same job content in the same scope returns the existing job_id. Reusing it with different job content fails with SUBMIT_TOKEN_CONFLICT. The same token may be used in a different study because studies are separate job namespaces.

If a job created with --submit-token is later deleted, the server keeps the submit record as job_deleted. A later submit or list lookup with the same token returns SUBMIT_TOKEN_JOB_DELETED instead of silently recreating the deleted job. Use a new submit token to submit the job again.

The submitted job path should point to the job content root. When the submitted artifact is a zip file with one wrapper directory around the job content, the wrapper is ignored for submit-token content hashing so a normal zip -r my_job.zip my_job/ archive matches submitting my_job/ directly. Submitting the parent directory that contains my_job/ is different content and may conflict when retried with the same token.

The token is stored only as server-owned submission metadata. It is not written to the job’s meta.json; that file remains job-owned execution metadata such as deploy_map, resource_spec, min_clients, and launcher settings. If --submit-token is omitted, submit behavior is unchanged and each submit creates a new job as before. The server still records the submitted job through the normal job store and job history, but no retry-safe submit-token record is created. The job cannot later be recovered with job list --submit-token unless the original submit used a caller-provided token.

After a client-side timeout or session loss, recover the accepted job with job list --submit-token:

nvflare job list --study cancer_research --submit-token "$TOKEN" --format json

If the recovered job was deleted, JSON output uses the normal error envelope:

{
  "schema_version": "1",
  "status": "error",
  "exit_code": 4,
  "error_code": "SUBMIT_TOKEN_JOB_DELETED",
  "data": {
    "job_id": "abc123",
    "state": "job_deleted",
    "deleted_time": "2026-04-30T10:00:00-07:00"
  }
}

--submit-token is only for job submit and job list. To monitor, download, abort, delete, or clone the recovered job, first resolve the job_id with job list --submit-token and then use the normal job command.

Wait or Monitor a Job

Use nvflare job wait when a script or agent needs one final command result after the job reaches a terminal state:

nvflare job wait <job_id>
nvflare job wait <job_id> --study cancer_research
nvflare job wait <job_id> --timeout 3600 --interval 5 --format json

job wait accepts:

  • job_id: job ID to wait for.

  • --timeout: max seconds to wait; must be greater than or equal to 0. Default: 0 (no timeout).

  • --interval: poll interval in seconds; must be greater than 0. Default: 2.

  • --study: wait for a job in a named study. Use the same study name used at submission time. If omitted, the literal study name default is used.

  • --schema: print the command schema as JSON and exit.

Unlike job monitor, job wait is the single-envelope automation command. It does not stream progress lines. In JSON mode, stdout contains exactly one final JSON envelope with the terminal job status and metadata; human-readable diagnostics still go to stderr.

Exit behavior:

  • exit code 0: job finished successfully.

  • exit code 1: job reached a terminal failure state, such as FAILED, FINISHED_EXCEPTION, ABORTED, or ABANDONED.

  • exit code 2: connection, authentication, or authorization failure prevented waiting.

  • exit code 3: wait timeout.

This enables CI/CD-style chaining without parsing progress output:

JOB=$(nvflare job submit -j ./my_job --format json | jq -r .data.job_id)
nvflare job wait $JOB --format json && nvflare job download $JOB

Use nvflare job monitor when a human wants progress updates while waiting. It streams status lines to stderr and returns the final result when the job reaches a terminal state:

nvflare job monitor <job_id>
nvflare job monitor <job_id> --study cancer_research
nvflare job monitor <job_id> --timeout 3600 --format jsonl

Monitor options:

  • job_id: job ID to monitor.

  • --timeout: max seconds to wait; must be greater than or equal to 0. Default: 0 (no timeout).

  • --interval: poll interval in seconds; must be greater than 0. Default: 2.

  • --study: monitor a job in a named study. Use the same study name used at submission time. If omitted, the literal study name default is used.

  • --stats-target: where to fetch stats from. Choices: server, client, all. Default: server.

  • --metric: extra metric key to surface from stats. Repeatable.

  • --schema: print the command schema as JSON and exit.

job monitor exit behavior matches job wait:

  • exit code 0: job finished successfully

  • exit code 1: job reached a terminal failure state: FAILED, FINISHED_EXCEPTION, ABORTED, or ABANDONED

  • exit code 2: connection, authentication, or authorization failure prevented monitoring

  • exit code 3: monitor timeout

For automation that needs progress events, use --format jsonl. Each stdout line is one complete JSON object. Progress events include terminal: false; the final event always includes terminal: true. Timeout emits a final event with status: "TIMEOUT" and exits with code 3. Successful terminal job statuses such as FINISHED_OK are normalized to status: "COMPLETED" and the raw server status is preserved in job_status. Connection, authentication, and authorization failures emit a terminal error event with status: "error" and the specific code in error_code.

Example JSONL terminal event:

{"schema_version":"1","event":"terminal","job_id":"abc123","status":"COMPLETED","job_status":"FINISHED_OK","terminal":true}

List and Inspect Jobs

List jobs currently known to the server:

nvflare job list

Common list filters:

  • -n, --name: filter by job name prefix.

  • -i, --id: filter by job ID prefix.

  • -r, --reverse: reverse sort order.

  • -m, --max: maximum number of results to return.

  • --study: list jobs from a named study. If omitted, the literal study name default is used. Values such as all are passed through to the server unchanged.

  • --submit-token: find the job associated with a retry-safe submit token in the selected study. This is the recovery path after submitting with --submit-token.

  • --schema: print the command schema as JSON and exit.

Retrieve metadata for a single job:

nvflare job meta <job_id>
nvflare job meta <job_id> --study cancer_research

Use metadata to inspect job identity, lifecycle fields, and server-reported status information after submission. Human output is grouped into a concise summary; use --format json to retrieve the full raw metadata envelope.

All job-ID lookup and control commands accept --study. Use the same study name used at submission time. If omitted, the command searches the literal default study. If the job is not found, the error reports which study was searched and suggests retrying with --study.

nvflare job meta also supports --schema.

Download, Clone, Abort, Delete

Download job results:

nvflare job download <job_id> -o ./downloads
nvflare job download <job_id> --study cancer_research -o ./downloads
nvflare job download <job_id> --study cancer_research --force

For automation, use JSON output:

nvflare job download <job_id> -o ./downloads --format json

The job must be in a terminal state before download. For a running job, wait first:

nvflare job wait <job_id> --study cancer_research
nvflare job download <job_id> --study cancer_research

The local destination defaults to ./<job_id>. If that directory already exists, the command fails unless --force is specified. Use --force only when replacing the existing local download is intended.

Human output remains concise and prints only the final download location. Use --format json when agents or scripts need artifact discovery fields. The JSON success response reports local paths on the machine running the CLI:

{
  "schema_version": "1",
  "status": "ok",
  "exit_code": 0,
  "data": {
    "job_id": "abc123",
    "download_path": "/abs/path/downloads/abc123",
    "path": "/abs/path/downloads/abc123",
    "artifact_discovery": "completed",
    "artifacts": {
      "global_model": "/abs/path/downloads/abc123/workspace/FL_global_model.pt",
      "metrics_summary": "/abs/path/downloads/abc123/workspace/metrics_summary.json",
      "client_logs": {
        "site-1": "/abs/path/downloads/abc123/workspace/site-1/log.txt"
      }
    },
    "missing_artifacts": []
  }
}

download_path is the final local directory returned by the download API. path is a backward-compatible alias for download_path when present.

artifacts contains local paths discovered under download_path. Agents and scripts should use data.artifacts.* as the source of truth for consumable files instead of assuming a server workspace layout or constructing paths from download_path. missing_artifacts lists expected categories, such as model, metrics, or client logs, that were not found locally. Missing artifacts do not make the command fail when the download itself succeeds. When artifact_discovery is skipped, the CLI did not have a local directory to inspect, so artifacts and missing_artifacts are null instead of claiming that expected artifacts were verified absent.

The server download protocol is unchanged; artifact discovery is a local CLI post-processing step after the result has been downloaded.

Clone an existing job:

nvflare job clone <job_id>
nvflare job clone <job_id> --study cancer_research

nvflare job clone clones the full server-side job for reuse. The current CLI surface takes the source job_id, optional --study, and --schema. It returns source_job_id and new_job_id. Use the returned new_job_id to monitor or manage the cloned job.

Abort a running job:

nvflare job abort <job_id>
nvflare job abort <job_id> --study cancer_research
nvflare job abort <job_id> --force

Delete a job:

nvflare job delete <job_id>
nvflare job delete <job_id> --study cancer_research
nvflare job delete <job_id> --force

Notes:

  • abort and delete support --force to skip the confirmation prompt.

  • abort and delete search the selected study. If omitted, default is used.

  • delete --format json returns job_id and submit_records_marked_deleted. When this count is nonzero, future use of the same submit token returns SUBMIT_TOKEN_JOB_DELETED.

  • download supports -o, --output-dir to choose the destination directory. Default: job-specific directory under the current working directory (./<job_id>).

  • clone, download, abort, and delete all support --schema.

Observability

Retrieve job logs from the server-side log store:

nvflare job logs <job_id>
nvflare job logs <job_id> --site site-1
nvflare job logs <job_id> --site all
nvflare job logs <job_id> --site all --tail 200
nvflare job logs <job_id> --site site-1 --since 2026-04-28T10:00:00
nvflare job logs <job_id> --site all --max-bytes 200000
nvflare job logs <job_id> --study cancer_research

job logs accepts:

  • --study: retrieve logs for a job in a named study. If omitted, job logs searches the default study. Use the same study name used for job submit or job list.

  • --site server: return the server job log. This is the default.

  • --site <client_name>: return that client’s job log after it has been streamed to and stored by the server.

  • --site all: return the server log and all client logs currently available in the server-side log store. If a known job site does not have stored log content, the JSON response includes it under unavailable.

  • --sites is accepted as an alias for --site but still selects one target value.

  • --tail N: return at most the last N log lines per site.

  • --since timestamp: return timestamped log lines at or after the timestamp when line timestamps are parseable. Continuation lines following an included timestamped line are included.

  • --max-bytes N: return at most N UTF-8 bytes per site.

  • job logs also supports --schema.

If no explicit bound is provided, job logs returns at most the last 500 lines per site. JSON output includes logs_truncated, per-site availability and line/byte counts under sites, and the applied filters. When any of --tail, --since, or --max-bytes is provided, the default 500-line tail is disabled and filters.default_tail_applied is false. The explicit bounds are applied in this order: --since, --tail, then --max-bytes.

The bound options are applied by the CLI after the server returns the stored log content. They bound the printed or JSON output from nvflare job logs; they do not reduce the amount of log content requested from the server. If a large log is already limited by the server-side maximum response size before it reaches the CLI, these client-side bounds are applied to that returned content.

In normal human output mode, job logs prints the log text directly. With --site all, each site is separated by a short header. Use --format json when a structured logs dictionary is needed for automation.

job logs does not provide a built-in grep option. Pipe or post-process the returned content when text matching is needed.

Client logs are not fetched from client machines at command time. The command asks the server for logs that were already streamed to the server during the job. Streamed client logs are read from the server job workspace, where they are stored as <client_name>/log.txt or <client_name>/log.json depending on the configured log streamer; after the job workspace is archived, the same files are read from the stored job workspace artifact.

To enable client job log streaming in a portable job, add the job-level log streamer and receiver components to the job definition:

from nvflare.app_common.logging.job_log_receiver import JobLogReceiver
from nvflare.app_common.logging.job_log_streamer import JobLogStreamer

# Tails each client's job log.txt and streams it to the server.
recipe.job.to_clients(JobLogStreamer())

# Receives streamed log chunks on the server and stores them with the job.
recipe.job.to_server(JobLogReceiver())

System-level logging configuration in resources.json.default is separate from this job-level opt-in. Some deployments may configure a server-side JobLogReceiver globally, but including both components in the job makes the job self-contained across POC and production deployments.

To stream structured JSON logs instead, configure the streamer with JobLogStreamer(log_file_name="log.json"). nvflare job logs --format json uses log.json when available and falls back to log.txt otherwise. Human output prints readable text; if only log.json is available, the CLI renders the JSON log records as text for display.

The examples/hello-world/hello-log-streaming example shows this pattern.

Change logging configuration for a running job:

nvflare job log-config <job_id> DEBUG
nvflare job log-config <job_id> concise
nvflare job log-config <job_id> msg_only
nvflare job log-config <job_id> DEBUG --study cancer_research

job log-config accepts:

  • positional level: DEBUG, INFO, WARNING, ERROR, CRITICAL

  • log modes: concise, msg_only, full, verbose, reload

  • --site: target site name or all. Default: all; specifying --site all explicitly is equivalent to omitting it.

  • --study: study containing the job. If omitted, default is used.

  • --schema: print the command schema as JSON and exit

Show running job statistics:

nvflare job stats <job_id>
nvflare job stats <job_id> --study cancer_research

job stats supports --study to select the study containing the job, and --site to target a specific site or all. The default site is all, so specifying --site all explicitly is equivalent to omitting it. It also supports --schema.

Recipe-Based Job Creation

The recommended way to create a new job folder is through the Job Recipe API or an example job.py script that supports --export:

python job.py --export --export-dir /tmp/nvflare/hello-pt
nvflare job submit -j /tmp/nvflare/hello-pt

To discover built-in recipes, use:

nvflare recipe list

Deprecated commands:

  • nvflare job create: retained for compatibility. Prefer python job.py --export followed by nvflare job submit.

  • nvflare job list_templates: use nvflare recipe list.

  • nvflare job show_variables: use the Job Recipe API.

Current deprecation notes:

  • nvflare job create still exposes template- and config-oriented arguments for legacy workflows.

  • nvflare job list_templates and nvflare job show_variables remain available for backward compatibility but are not the preferred interfaces for recipe discovery or job-variable inspection.

JSON Output and Help

Add --format json anywhere after the subcommand for machine-readable output:

nvflare job meta <job_id> --format json

--format json may be placed anywhere in the command after the subcommand name. stdout contains a single JSON envelope; human-readable progress and diagnostics go to stderr.

Use --schema for machine-readable command discovery. --schema always returns JSON regardless of --format, so the flag is not needed with it:

nvflare job submit --schema
nvflare job wait --schema
nvflare job monitor --schema

Schema fields such as mutating and idempotent describe the command as a whole, not the effective behavior of one invocation. For example, job submit reports idempotent: false because plain submission can create duplicate jobs when retried after a timeout. It also reports retry_token.supported: true to show that --submit-token makes retries safe for identical job content in the same study by the same submitter. job list --submit-token is different: there --submit-token is only a lookup filter, so retry_token.supported remains false.

Human-readable argument errors print command help first, followed by the specific error and hint. JSON mode prints only the JSON error envelope.