.. _job_cli: ######################### NVIDIA FLARE Job CLI ######################### The ``nvflare job`` command family is used to submit, inspect, monitor, and manage federated learning jobs from an admin startup kit. Before using server-connected job commands, either run ``nvflare poc prepare`` or activate a registered startup kit with :ref:`config_command`: .. code-block:: shell nvflare config add project_admin /path/to/admin@nvidia.com nvflare config use project_admin *********************** Command Usage *********************** .. code-block:: none nvflare job -h usage: nvflare job [-h] ... job subcommands: submit submit job wait wait for a job and return one final JSON envelope monitor wait for a job and stream progress to stderr list list jobs on the server abort abort a running job meta get metadata for a job logs retrieve job logs from the server-side log store log-config change logging configuration for a running job stats show running job statistics download download job result clone clone an existing job delete delete a job list_templates [DEPRECATED] use 'nvflare recipe list' create [DEPRECATED] use 'python job.py --export --export-dir ' + 'nvflare job submit -j ' show_variables [DEPRECATED] use 'nvflare recipe list' or the Job Recipe API ***************** Common Workflow ***************** 1. Export or prepare a job folder. 2. Submit the job with ``nvflare job submit -j ``. 3. In automation, wait for completion with ``nvflare job wait ``. For interactive progress output, use ``nvflare job monitor ``. 4. Inspect metadata, stats, or logs as needed. 5. Download, clone, abort, or delete the job when appropriate. ***************************** Startup Kit Selection ***************************** Server-connected job commands use this startup kit resolution order: 1. Optional ``--kit-id ``: override the active startup kit for this command only by using a registered startup-kit ID. 2. Optional ``--startup-kit ``: override the active startup kit for this command only by using an explicit admin startup-kit directory. 3. ``NVFLARE_STARTUP_KIT_DIR`` when set. 4. ``startup_kits.active`` from ``~/.nvflare/config.conf``. 5. If no source resolves to a valid admin startup kit, the command fails before connecting. ``--kit-id`` and ``--startup-kit`` are not required. When provided, they take precedence over the active startup kit for the current command only and do not change the globally active startup kit. They are useful for scripts, notebooks, and concurrent workflows that must not mutate ``~/.nvflare/config.conf``. **************** Submit a Job **************** Use ``nvflare job submit`` to submit a pre-built NVFlare job folder: .. code-block:: shell nvflare job submit -j /tmp/nvflare/hello-pt Submit options: - ``-j, --job_folder``: job folder path. Defaults to ``./current_job``. - ``--study``: submit into a named study when the server is configured for multi-study access. If omitted, the literal study name ``default`` is submitted. - ``--submit-token``: caller-generated token for retry-safe submit and later recovery with ``nvflare job list --submit-token``. - ``-debug, --debug``: keep the temporary copied job folder for inspection. - ``--schema``: print the command schema as JSON and exit. Submit returns immediately with a ``job_id``. It does not wait for terminal job status. To change job configuration values, edit the exported job files before submission. Submit-time ``-f/--config_file`` overrides are not supported. Examples: .. code-block:: shell nvflare config use project_admin nvflare job submit -j /tmp/nvflare/hello-pt nvflare job list --kit-id project_admin nvflare job submit -j /tmp/nvflare/hello-pt --startup-kit /path/to/admin@nvidia.com Registered startup kit paths must point to the admin startup kit directory itself, not the broader ``prod_00`` root. Example JSON success response: .. code-block:: json {"schema_version": "1", "status": "ok", "exit_code": 0, "data": {"job_id": "abc123"}} If the server is configured for studies, you can target one explicitly: .. code-block:: shell nvflare job submit -j /tmp/nvflare/my_job --study cancer_research Retry-Safe Submit Tokens ======================== Use ``--submit-token`` when an automated caller may retry a submit after a timeout or lost client connection: .. code-block:: shell TOKEN=$(uuidgen) nvflare job submit -j /tmp/nvflare/my_job \ --study cancer_research \ --submit-token "$TOKEN" \ --format json ``--submit-token`` is optional. When provided, it must be generated by the caller and is used as an idempotency and recovery value for one intended submit. NVFlare does not auto-generate a submit token when the flag is omitted. The token is not an authentication token, session token, startup-kit credential, API key, or certificate secret. Normal startup-kit authentication and authorization still apply. Tokens must be non-empty, at most 128 characters, and use only letters, numbers, ``.``, ``_``, ``:``, or ``-``. Submit-token scope is the selected server/project context, study, submitter identity, and token value. Reusing the same token with the same job content in the same scope returns the existing ``job_id``. Reusing it with different job content fails with ``SUBMIT_TOKEN_CONFLICT``. The same token may be used in a different study because studies are separate job namespaces. If a job created with ``--submit-token`` is later deleted, the server keeps the submit record as ``job_deleted``. A later submit or list lookup with the same token returns ``SUBMIT_TOKEN_JOB_DELETED`` instead of silently recreating the deleted job. Use a new submit token to submit the job again. The submitted job path should point to the job content root. When the submitted artifact is a zip file with one wrapper directory around the job content, the wrapper is ignored for submit-token content hashing so a normal ``zip -r my_job.zip my_job/`` archive matches submitting ``my_job/`` directly. Submitting the parent directory that contains ``my_job/`` is different content and may conflict when retried with the same token. The token is stored only as server-owned submission metadata. It is not written to the job's ``meta.json``; that file remains job-owned execution metadata such as ``deploy_map``, ``resource_spec``, ``min_clients``, and launcher settings. If ``--submit-token`` is omitted, submit behavior is unchanged and each submit creates a new job as before. The server still records the submitted job through the normal job store and job history, but no retry-safe submit-token record is created. The job cannot later be recovered with ``job list --submit-token`` unless the original submit used a caller-provided token. After a client-side timeout or session loss, recover the accepted job with ``job list --submit-token``: .. code-block:: shell nvflare job list --study cancer_research --submit-token "$TOKEN" --format json If the recovered job was deleted, JSON output uses the normal error envelope: .. code-block:: json { "schema_version": "1", "status": "error", "exit_code": 4, "error_code": "SUBMIT_TOKEN_JOB_DELETED", "data": { "job_id": "abc123", "state": "job_deleted", "deleted_time": "2026-04-30T10:00:00-07:00" } } ``--submit-token`` is only for ``job submit`` and ``job list``. To monitor, download, abort, delete, or clone the recovered job, first resolve the ``job_id`` with ``job list --submit-token`` and then use the normal job command. *********************** Wait or Monitor a Job *********************** Use ``nvflare job wait`` when a script or agent needs one final command result after the job reaches a terminal state: .. code-block:: shell nvflare job wait nvflare job wait --study cancer_research nvflare job wait --timeout 3600 --interval 5 --format json ``job wait`` accepts: - ``job_id``: job ID to wait for. - ``--timeout``: max seconds to wait; must be greater than or equal to ``0``. Default: ``0`` (no timeout). - ``--interval``: poll interval in seconds; must be greater than ``0``. Default: ``2``. - ``--study``: wait for a job in a named study. Use the same study name used at submission time. If omitted, the literal study name ``default`` is used. - ``--schema``: print the command schema as JSON and exit. Unlike ``job monitor``, ``job wait`` is the single-envelope automation command. It does not stream progress lines. In JSON mode, stdout contains exactly one final JSON envelope with the terminal job status and metadata; human-readable diagnostics still go to stderr. Exit behavior: - exit code ``0``: job finished successfully. - exit code ``1``: job reached a terminal failure state, such as ``FAILED``, ``FINISHED_EXCEPTION``, ``ABORTED``, or ``ABANDONED``. - exit code ``2``: connection, authentication, or authorization failure prevented waiting. - exit code ``3``: wait timeout. This enables CI/CD-style chaining without parsing progress output: .. code-block:: shell JOB=$(nvflare job submit -j ./my_job --format json | jq -r .data.job_id) nvflare job wait $JOB --format json && nvflare job download $JOB Use ``nvflare job monitor`` when a human wants progress updates while waiting. It streams status lines to stderr and returns the final result when the job reaches a terminal state: .. code-block:: shell nvflare job monitor nvflare job monitor --study cancer_research nvflare job monitor --timeout 3600 --format jsonl Monitor options: - ``job_id``: job ID to monitor. - ``--timeout``: max seconds to wait; must be greater than or equal to ``0``. Default: ``0`` (no timeout). - ``--interval``: poll interval in seconds; must be greater than ``0``. Default: ``2``. - ``--study``: monitor a job in a named study. Use the same study name used at submission time. If omitted, the literal study name ``default`` is used. - ``--stats-target``: where to fetch stats from. Choices: ``server``, ``client``, ``all``. Default: ``server``. - ``--metric``: extra metric key to surface from stats. Repeatable. - ``--schema``: print the command schema as JSON and exit. ``job monitor`` exit behavior matches ``job wait``: - exit code ``0``: job finished successfully - exit code ``1``: job reached a terminal failure state: ``FAILED``, ``FINISHED_EXCEPTION``, ``ABORTED``, or ``ABANDONED`` - exit code ``2``: connection, authentication, or authorization failure prevented monitoring - exit code ``3``: monitor timeout For automation that needs progress events, use ``--format jsonl``. Each stdout line is one complete JSON object. Progress events include ``terminal: false``; the final event always includes ``terminal: true``. Timeout emits a final event with ``status: "TIMEOUT"`` and exits with code ``3``. Successful terminal job statuses such as ``FINISHED_OK`` are normalized to ``status: "COMPLETED"`` and the raw server status is preserved in ``job_status``. Connection, authentication, and authorization failures emit a terminal error event with ``status: "error"`` and the specific code in ``error_code``. Example JSONL terminal event: .. code-block:: json {"schema_version":"1","event":"terminal","job_id":"abc123","status":"COMPLETED","job_status":"FINISHED_OK","terminal":true} ********************* List and Inspect Jobs ********************* List jobs currently known to the server: .. code-block:: shell nvflare job list Common list filters: - ``-n, --name``: filter by job name prefix. - ``-i, --id``: filter by job ID prefix. - ``-r, --reverse``: reverse sort order. - ``-m, --max``: maximum number of results to return. - ``--study``: list jobs from a named study. If omitted, the literal study name ``default`` is used. Values such as ``all`` are passed through to the server unchanged. - ``--submit-token``: find the job associated with a retry-safe submit token in the selected study. This is the recovery path after submitting with ``--submit-token``. - ``--schema``: print the command schema as JSON and exit. Retrieve metadata for a single job: .. code-block:: shell nvflare job meta nvflare job meta --study cancer_research Use metadata to inspect job identity, lifecycle fields, and server-reported status information after submission. Human output is grouped into a concise summary; use ``--format json`` to retrieve the full raw metadata envelope. All job-ID lookup and control commands accept ``--study``. Use the same study name used at submission time. If omitted, the command searches the literal ``default`` study. If the job is not found, the error reports which study was searched and suggests retrying with ``--study``. ``nvflare job meta`` also supports ``--schema``. ****************************** Download, Clone, Abort, Delete ****************************** Download job results: .. code-block:: shell nvflare job download -o ./downloads nvflare job download --study cancer_research -o ./downloads nvflare job download --study cancer_research --force For automation, use JSON output: .. code-block:: shell nvflare job download -o ./downloads --format json The job must be in a terminal state before download. For a running job, wait first: .. code-block:: shell nvflare job wait --study cancer_research nvflare job download --study cancer_research The local destination defaults to ``./``. If that directory already exists, the command fails unless ``--force`` is specified. Use ``--force`` only when replacing the existing local download is intended. Human output remains concise and prints only the final download location. Use ``--format json`` when agents or scripts need artifact discovery fields. The JSON success response reports local paths on the machine running the CLI: .. code-block:: json { "schema_version": "1", "status": "ok", "exit_code": 0, "data": { "job_id": "abc123", "download_path": "/abs/path/downloads/abc123", "path": "/abs/path/downloads/abc123", "artifact_discovery": "completed", "artifacts": { "global_model": "/abs/path/downloads/abc123/workspace/FL_global_model.pt", "metrics_summary": "/abs/path/downloads/abc123/workspace/metrics_summary.json", "client_logs": { "site-1": "/abs/path/downloads/abc123/workspace/site-1/log.txt" } }, "missing_artifacts": [] } } ``download_path`` is the final local directory returned by the download API. ``path`` is a backward-compatible alias for ``download_path`` when present. ``artifacts`` contains local paths discovered under ``download_path``. Agents and scripts should use ``data.artifacts.*`` as the source of truth for consumable files instead of assuming a server workspace layout or constructing paths from ``download_path``. ``missing_artifacts`` lists expected categories, such as model, metrics, or client logs, that were not found locally. Missing artifacts do not make the command fail when the download itself succeeds. When ``artifact_discovery`` is ``skipped``, the CLI did not have a local directory to inspect, so ``artifacts`` and ``missing_artifacts`` are ``null`` instead of claiming that expected artifacts were verified absent. The server download protocol is unchanged; artifact discovery is a local CLI post-processing step after the result has been downloaded. Clone an existing job: .. code-block:: shell nvflare job clone nvflare job clone --study cancer_research ``nvflare job clone`` clones the full server-side job for reuse. The current CLI surface takes the source ``job_id``, optional ``--study``, and ``--schema``. It returns ``source_job_id`` and ``new_job_id``. Use the returned ``new_job_id`` to monitor or manage the cloned job. Abort a running job: .. code-block:: shell nvflare job abort nvflare job abort --study cancer_research nvflare job abort --force Delete a job: .. code-block:: shell nvflare job delete nvflare job delete --study cancer_research nvflare job delete --force Notes: - ``abort`` and ``delete`` support ``--force`` to skip the confirmation prompt. - ``abort`` and ``delete`` search the selected study. If omitted, ``default`` is used. - ``delete --format json`` returns ``job_id`` and ``submit_records_marked_deleted``. When this count is nonzero, future use of the same submit token returns ``SUBMIT_TOKEN_JOB_DELETED``. - ``download`` supports ``-o, --output-dir`` to choose the destination directory. Default: job-specific directory under the current working directory (``./``). - ``clone``, ``download``, ``abort``, and ``delete`` all support ``--schema``. ************** Observability ************** Retrieve job logs from the server-side log store: .. code-block:: shell nvflare job logs nvflare job logs --site site-1 nvflare job logs --site all nvflare job logs --site all --tail 200 nvflare job logs --site site-1 --since 2026-04-28T10:00:00 nvflare job logs --site all --max-bytes 200000 nvflare job logs --study cancer_research ``job logs`` accepts: - ``--study``: retrieve logs for a job in a named study. If omitted, ``job logs`` searches the default study. Use the same study name used for ``job submit`` or ``job list``. - ``--site server``: return the server job log. This is the default. - ``--site ``: return that client's job log after it has been streamed to and stored by the server. - ``--site all``: return the server log and all client logs currently available in the server-side log store. If a known job site does not have stored log content, the JSON response includes it under ``unavailable``. - ``--sites`` is accepted as an alias for ``--site`` but still selects one target value. - ``--tail N``: return at most the last N log lines per site. - ``--since timestamp``: return timestamped log lines at or after the timestamp when line timestamps are parseable. Continuation lines following an included timestamped line are included. - ``--max-bytes N``: return at most N UTF-8 bytes per site. - ``job logs`` also supports ``--schema``. If no explicit bound is provided, ``job logs`` returns at most the last 500 lines per site. JSON output includes ``logs_truncated``, per-site availability and line/byte counts under ``sites``, and the applied ``filters``. When any of ``--tail``, ``--since``, or ``--max-bytes`` is provided, the default 500-line tail is disabled and ``filters.default_tail_applied`` is ``false``. The explicit bounds are applied in this order: ``--since``, ``--tail``, then ``--max-bytes``. The bound options are applied by the CLI after the server returns the stored log content. They bound the printed or JSON output from ``nvflare job logs``; they do not reduce the amount of log content requested from the server. If a large log is already limited by the server-side maximum response size before it reaches the CLI, these client-side bounds are applied to that returned content. In normal human output mode, ``job logs`` prints the log text directly. With ``--site all``, each site is separated by a short header. Use ``--format json`` when a structured ``logs`` dictionary is needed for automation. ``job logs`` does not provide a built-in ``grep`` option. Pipe or post-process the returned content when text matching is needed. Client logs are not fetched from client machines at command time. The command asks the server for logs that were already streamed to the server during the job. Streamed client logs are read from the server job workspace, where they are stored as ``/log.txt`` or ``/log.json`` depending on the configured log streamer; after the job workspace is archived, the same files are read from the stored job ``workspace`` artifact. To enable client job log streaming in a portable job, add the job-level log streamer and receiver components to the job definition: .. code-block:: python from nvflare.app_common.logging.job_log_receiver import JobLogReceiver from nvflare.app_common.logging.job_log_streamer import JobLogStreamer # Tails each client's job log.txt and streams it to the server. recipe.job.to_clients(JobLogStreamer()) # Receives streamed log chunks on the server and stores them with the job. recipe.job.to_server(JobLogReceiver()) System-level logging configuration in ``resources.json.default`` is separate from this job-level opt-in. Some deployments may configure a server-side ``JobLogReceiver`` globally, but including both components in the job makes the job self-contained across POC and production deployments. To stream structured JSON logs instead, configure the streamer with ``JobLogStreamer(log_file_name="log.json")``. ``nvflare job logs --format json`` uses ``log.json`` when available and falls back to ``log.txt`` otherwise. Human output prints readable text; if only ``log.json`` is available, the CLI renders the JSON log records as text for display. The ``examples/hello-world/hello-log-streaming`` example shows this pattern. Change logging configuration for a running job: .. code-block:: shell nvflare job log-config DEBUG nvflare job log-config concise nvflare job log-config msg_only nvflare job log-config DEBUG --study cancer_research ``job log-config`` accepts: - positional ``level``: ``DEBUG``, ``INFO``, ``WARNING``, ``ERROR``, ``CRITICAL`` - log modes: ``concise``, ``msg_only``, ``full``, ``verbose``, ``reload`` - ``--site``: target site name or ``all``. Default: ``all``; specifying ``--site all`` explicitly is equivalent to omitting it. - ``--study``: study containing the job. If omitted, ``default`` is used. - ``--schema``: print the command schema as JSON and exit Show running job statistics: .. code-block:: shell nvflare job stats nvflare job stats --study cancer_research ``job stats`` supports ``--study`` to select the study containing the job, and ``--site`` to target a specific site or ``all``. The default site is ``all``, so specifying ``--site all`` explicitly is equivalent to omitting it. It also supports ``--schema``. *************************** Recipe-Based Job Creation *************************** The recommended way to create a new job folder is through the Job Recipe API or an example ``job.py`` script that supports ``--export``: .. code-block:: shell python job.py --export --export-dir /tmp/nvflare/hello-pt nvflare job submit -j /tmp/nvflare/hello-pt To discover built-in recipes, use: .. code-block:: shell nvflare recipe list Deprecated commands: - ``nvflare job create``: retained for compatibility. Prefer ``python job.py --export`` followed by ``nvflare job submit``. - ``nvflare job list_templates``: use ``nvflare recipe list``. - ``nvflare job show_variables``: use the Job Recipe API. Current deprecation notes: - ``nvflare job create`` still exposes template- and config-oriented arguments for legacy workflows. - ``nvflare job list_templates`` and ``nvflare job show_variables`` remain available for backward compatibility but are not the preferred interfaces for recipe discovery or job-variable inspection. ********************* JSON Output and Help ********************* Add ``--format json`` anywhere after the subcommand for machine-readable output: .. code-block:: shell nvflare job meta --format json ``--format json`` may be placed anywhere in the command after the subcommand name. stdout contains a single JSON envelope; human-readable progress and diagnostics go to stderr. Use ``--schema`` for machine-readable command discovery. ``--schema`` always returns JSON regardless of ``--format``, so the flag is not needed with it: .. code-block:: shell nvflare job submit --schema nvflare job wait --schema nvflare job monitor --schema Schema fields such as ``mutating`` and ``idempotent`` describe the command as a whole, not the effective behavior of one invocation. For example, ``job submit`` reports ``idempotent: false`` because plain submission can create duplicate jobs when retried after a timeout. It also reports ``retry_token.supported: true`` to show that ``--submit-token`` makes retries safe for identical job content in the same study by the same submitter. ``job list --submit-token`` is different: there ``--submit-token`` is only a lookup filter, so ``retry_token.supported`` remains ``false``. Human-readable argument errors print command help first, followed by the specific error and hint. JSON mode prints only the JSON error envelope.