Live Log Streaming

FLARE can stream a job’s log files from each client to the server as the job runs, so an operator can tail -f the server-side copy in real time. Streamed logs are written under the server workspace and, when a job manager is available, automatically attached to the job’s persisted artifacts.

This page describes how the feature works and how to enable it. To opt out at a particular site, see Controlling Live Log Streaming in Site Configuration Metadata.

Overview

The feature is a simple producer / consumer pair built on top of FLARE’s existing object-streaming machinery:

  • The producer runs inside the client’s job subprocess. It tails one or more log files (log.txt, error_log.txt, custom files) and pushes new bytes to the server as they are written.

  • The consumer runs in the server process. It opens a destination file per stream and writes incoming chunks directly, so the file is readable with tail -f while the job is still running. When the stream closes (normal end, abort, or idle timeout) the file is handed to the job manager for storage.

A third widget, the system streamer, lives in the client’s resources.json and saves users from declaring a streamer in every job config — it auto-injects a JobLogStreamer into each job before launch.

Components

JobLogStreamer (client, in-job)

JobLogStreamer runs inside the job subprocess (CLIENT_JOB). It belongs in the job-level client configuration (config_fed_client.json):

{
  "components": [
    {
      "id": "log_streamer",
      "path": "nvflare.app_common.logging.job_log_streamer.JobLogStreamer",
      "args": {}
    }
  ]
}

To stream more than one log file, declare one component per file:

{
  "components": [
    {
      "id": "log_streamer",
      "path": "nvflare.app_common.logging.job_log_streamer.JobLogStreamer",
      "args": {"log_file_name": "log.txt"}
    },
    {
      "id": "error_log_streamer",
      "path": "nvflare.app_common.logging.job_log_streamer.JobLogStreamer",
      "args": {"log_file_name": "error_log.txt"}
    }
  ]
}

Constructor arguments

log_file_name (str, default "log.txt")

Base name of the log file to stream. Must be a relative file name; absolute paths and .. traversal are rejected. The actual file is located by inspecting the active Python file handler and reusing its directory, so streaming works the same way under the simulator and in production without any workspace path arithmetic.

liveness_interval (float, default 10.0)

Seconds between heartbeat messages when the log file has produced no new bytes. Must be strictly less than the receiver’s idle_timeout so heartbeats keep the stream alive during quiet periods.

poll_interval (float, default 0.5)

Seconds between polls when the log file has no new data.

Lifecycle

The streamer fires on three events:

  • START_RUN — opens the stream and starts a daemon tailing thread.

  • ABOUT_TO_END_RUN — signals the streaming thread to drain and stop, but does not block, so post-event log lines still land in the file and are picked up by the drain.

  • END_RUN — joins the streaming thread (with a 60-second timeout) so the server has received the EOF before the client’s client_run returns.

JobLogReceiver (server)

JobLogReceiver opens a destination file per incoming stream and writes chunks as they arrive. It can be placed either in site-level resources so every job is covered, or in job-level configuration to scope the receiver to a single job.

Site-level (resources.json on the server, recommended):

{
  "components": [
    {
      "id": "log_receiver",
      "path": "nvflare.app_common.logging.job_log_receiver.JobLogReceiver",
      "args": {}
    }
  ]
}

Job-level (config_fed_server.json) — declare the same component there if you want the receiver to register only for that job. The widget keys off the SYSTEM_START event in system mode and START_RUN in job mode, so the underlying stream handler is registered exactly once.

In the Job API, you can attach a job-level receiver with:

job.to_server(JobLogReceiver())

Constructor arguments

dest_dir (str, default None)

Directory where incoming log files are written. Defaults to the system temporary directory.

idle_timeout (float, default 30.0)

Seconds without any message (data or heartbeat) before the receiver declares the sender dead and closes the stream. Set to 0 to disable.

File layout

Each chunk is appended to:

{dest_dir}/{job_id}/{client_name}/{log_file_name}

so an operator can find logs while the job is still running. When the stream ends successfully, the file is handed to the job manager (if registered) for permanent storage; otherwise — for example under the simulator — it is moved into the job’s workspace run directory alongside the other artifacts.

If the stream ends with a non-OK return code (e.g. idle timeout), the partial file is retained at the path above and a warning is logged.

SiteLogStreamer (client, site widget)

SiteLogStreamer is a convenience widget that lives in the client’s resources.json and removes the need to declare a streamer in every job:

{
  "components": [
    {
      "id": "site_log_streamer",
      "path": "nvflare.app_common.logging.site_log_streamer.SiteLogStreamer",
      "args": {}
    }
  ]
}

On BEFORE_JOB_LAUNCH (after the deployed job config has been written to disk but before the subprocess starts), it reads the deployed config_fed_client.json, and if no JobLogStreamer is already declared it appends one with the configured arguments. The job subprocess then loads the modified config and runs JobLogStreamer as if the user had declared it explicitly.

When configured for error_log.txt, SiteLogStreamer also uploads a post-mortem snapshot from the client parent process on JOB_COMPLETED. This guarantees error-log delivery for failures that happen so early in the job that JobLogStreamer never reaches START_RUN.

The constructor takes the same log_file_name, liveness_interval and poll_interval arguments as JobLogStreamer; any non-default values are forwarded to the injected component.

Site control

Live log streaming is enabled by default. A site can opt out by setting "allow_log_streaming": false in its resources.json; see Controlling Live Log Streaming for the full description of how each component behaves when streaming is disabled, including the server-side check that logs an error if a chunk arrives from a site that has disabled it.

Wire protocol

Streaming uses FLARE’s LogStreamer over the log_streaming channel with topic live_log. The stream context carries the trusted client name and job ID derived from the peer FL context, so filenames on disk reflect the actual sender — they cannot be spoofed by the streaming client through the stream payload. Each chunk is a sequence of log bytes; heartbeats are sent every liveness_interval seconds when no bytes have been written.

Behavior under abort

The streaming thread runs in a fresh FL context whose abort signal is never triggered, so an aborted job’s still-buffered log bytes can drain to the server before the run actually shuts down. Graceful shutdown is signaled exclusively via the streamer’s stop event, set in ABOUT_TO_END_RUN and joined in END_RUN. If the join exceeds 60 seconds, a warning is logged and the run continues to shut down — the server will see the stream close with an idle-timeout return code rather than EOF, and will retain whatever partial log has been written so far.