Live Log Streaming
FLARE can stream a job’s log files from each client to the server as the job
runs, so an operator can tail -f the server-side copy in real time.
Streamed logs are written under the server workspace and, when a job manager
is available, automatically attached to the job’s persisted artifacts.
This page describes how the feature works and how to enable it. To opt out at a particular site, see Controlling Live Log Streaming in Site Configuration Metadata.
Overview
The feature is a simple producer / consumer pair built on top of FLARE’s existing object-streaming machinery:
The producer runs inside the client’s job subprocess. It tails one or more log files (
log.txt,error_log.txt, custom files) and pushes new bytes to the server as they are written.The consumer runs in the server process. It opens a destination file per stream and writes incoming chunks directly, so the file is readable with
tail -fwhile the job is still running. When the stream closes (normal end, abort, or idle timeout) the file is handed to the job manager for storage.
A third widget, the system streamer, lives in the client’s
resources.json and saves users from declaring a streamer in every job
config — it auto-injects a JobLogStreamer into each job before launch.
Components
JobLogStreamer (client, in-job)
JobLogStreamer runs
inside the job subprocess (CLIENT_JOB). It belongs in the job-level
client configuration (config_fed_client.json):
{
"components": [
{
"id": "log_streamer",
"path": "nvflare.app_common.logging.job_log_streamer.JobLogStreamer",
"args": {}
}
]
}
To stream more than one log file, declare one component per file:
{
"components": [
{
"id": "log_streamer",
"path": "nvflare.app_common.logging.job_log_streamer.JobLogStreamer",
"args": {"log_file_name": "log.txt"}
},
{
"id": "error_log_streamer",
"path": "nvflare.app_common.logging.job_log_streamer.JobLogStreamer",
"args": {"log_file_name": "error_log.txt"}
}
]
}
Constructor arguments
log_file_name(str, default"log.txt")Base name of the log file to stream. Must be a relative file name; absolute paths and
..traversal are rejected. The actual file is located by inspecting the active Python file handler and reusing its directory, so streaming works the same way under the simulator and in production without any workspace path arithmetic.liveness_interval(float, default10.0)Seconds between heartbeat messages when the log file has produced no new bytes. Must be strictly less than the receiver’s
idle_timeoutso heartbeats keep the stream alive during quiet periods.poll_interval(float, default0.5)Seconds between polls when the log file has no new data.
Lifecycle
The streamer fires on three events:
START_RUN— opens the stream and starts a daemon tailing thread.ABOUT_TO_END_RUN— signals the streaming thread to drain and stop, but does not block, so post-event log lines still land in the file and are picked up by the drain.END_RUN— joins the streaming thread (with a 60-second timeout) so the server has received the EOF before the client’sclient_runreturns.
JobLogReceiver (server)
JobLogReceiver opens a
destination file per incoming stream and writes chunks as they arrive. It can
be placed either in site-level resources so every job is covered, or
in job-level configuration to scope the receiver to a single job.
Site-level (resources.json on the server, recommended):
{
"components": [
{
"id": "log_receiver",
"path": "nvflare.app_common.logging.job_log_receiver.JobLogReceiver",
"args": {}
}
]
}
Job-level (config_fed_server.json) — declare the same component there if
you want the receiver to register only for that job. The widget keys off the
SYSTEM_START event in system mode and START_RUN in job mode, so the
underlying stream handler is registered exactly once.
In the Job API, you can attach a job-level receiver with:
job.to_server(JobLogReceiver())
Constructor arguments
dest_dir(str, defaultNone)Directory where incoming log files are written. Defaults to the system temporary directory.
idle_timeout(float, default30.0)Seconds without any message (data or heartbeat) before the receiver declares the sender dead and closes the stream. Set to
0to disable.
File layout
Each chunk is appended to:
{dest_dir}/{job_id}/{client_name}/{log_file_name}
so an operator can find logs while the job is still running. When the stream ends successfully, the file is handed to the job manager (if registered) for permanent storage; otherwise — for example under the simulator — it is moved into the job’s workspace run directory alongside the other artifacts.
If the stream ends with a non-OK return code (e.g. idle timeout), the partial file is retained at the path above and a warning is logged.
SiteLogStreamer (client, site widget)
SiteLogStreamer is
a convenience widget that lives in the client’s resources.json and
removes the need to declare a streamer in every job:
{
"components": [
{
"id": "site_log_streamer",
"path": "nvflare.app_common.logging.site_log_streamer.SiteLogStreamer",
"args": {}
}
]
}
On BEFORE_JOB_LAUNCH (after the deployed job config has been written to
disk but before the subprocess starts), it reads the deployed
config_fed_client.json, and if no JobLogStreamer is already declared
it appends one with the configured arguments. The job subprocess then loads
the modified config and runs JobLogStreamer as if the user had declared
it explicitly.
When configured for error_log.txt, SiteLogStreamer also uploads a
post-mortem snapshot from the client parent process on JOB_COMPLETED.
This guarantees error-log delivery for failures that happen so early in the
job that JobLogStreamer never reaches START_RUN.
The constructor takes the same log_file_name, liveness_interval and
poll_interval arguments as JobLogStreamer; any non-default values
are forwarded to the injected component.
Site control
Live log streaming is enabled by default. A site can opt out by setting
"allow_log_streaming": false in its resources.json; see
Controlling Live Log Streaming for the full description of how each component
behaves when streaming is disabled, including the server-side check that
logs an error if a chunk arrives from a site that has disabled it.
Wire protocol
Streaming uses FLARE’s LogStreamer over the
log_streaming channel with topic live_log. The stream context carries
the trusted client name and job ID derived from the peer FL context, so
filenames on disk reflect the actual sender — they cannot be spoofed by the
streaming client through the stream payload. Each chunk is a sequence of
log bytes; heartbeats are sent every liveness_interval seconds when no
bytes have been written.
Behavior under abort
The streaming thread runs in a fresh FL context whose abort signal is never
triggered, so an aborted job’s still-buffered log bytes can drain to the
server before the run actually shuts down. Graceful shutdown is signaled
exclusively via the streamer’s stop event, set in ABOUT_TO_END_RUN and
joined in END_RUN. If the join exceeds 60 seconds, a warning is logged
and the run continues to shut down — the server will see the stream close
with an idle-timeout return code rather than EOF, and will retain whatever
partial log has been written so far.