Recipe Metrics Artifacts
Built-in training aggregation recipes write standard metrics artifacts when the server workflow reports round-level aggregation metrics. These files make recipe results easier to consume from benchmark, reporting, and agent tooling without scraping server logs.
The artifacts are written under the server run directory:
metrics/
metrics_summary.json
round_metrics.jsonl
Recipes or workflows that do not report training aggregation metrics do not need
to create these files. This includes PSI, stats-only jobs, and standalone
cross-site validation. Cross-site validation continues to use its existing
cross_site_val/cross_val_results.json output.
Recipe Behavior
Users do not need to select this writer for supported built-in training aggregation recipes. The recipe setup installs it as part of the server configuration, and it writes files only when the workflow reports aggregation metrics.
This release does not expose a recipe argument to disable metrics artifacts. For custom jobs that should not write these artifacts, omit the metrics artifact writer from the server configuration.
Recorder Semantics
The metrics artifact writer is a recorder. It persists metrics and metadata that workflows, aggregators, and model selectors already produce:
official aggregated metrics from the round aggregation result
per-site metrics received from clients for each round
official best metric metadata published by model-selection logic
aggregation provenance, weights, and skipped values when available
It does not recompute metrics, select a best round, infer max/min policy, parse logs, or compute nonlinear metrics such as AUROC from pooled predictions.
Metric names are dynamic. Names such as auroc, accuracy, loss,
dice, rmse, or f1 are client or workflow metric keys, not hard-coded
schema fields.
Round numbers are recorded as provided by workflow metadata such as
AppConstants.CURRENT_ROUND or FLModel.current_round. They are 0-based by
default and are not renumbered by the writer.
metrics_summary.json
metrics_summary.json contains the final aggregated metrics from the last
completed metrics round and, when available, official best metric metadata from
the model selector.
Example:
{
"schema_version": "1",
"status": "metrics_reported",
"job_name": "ames_fedavg",
"metric_source": "client_reported_flmodel_metrics",
"key_metric": {
"name": "auroc",
"mode": "max",
"mode_source": "IntimeModelSelector.negate_key_metric"
},
"final_round": 2,
"final_aggregated_metrics": [
{
"name": "auroc",
"value": 0.7421
},
{
"name": "train_loss",
"value": 0.492
}
],
"best_round": 0,
"best_metrics": [
{
"name": "auroc",
"value": 0.7500010132169
}
],
"best_metric_source": "IntimeModelSelector",
"best_metric_detail_source": "initial_metrics",
"aggregation": {
"method": "weighted_average",
"weight_key": "NUM_STEPS_CURRENT_ROUND",
"metric_policy": "finite_numeric_metrics_only_per_key_denominator"
},
"round_metrics_file": "round_metrics.jsonl",
"notes": [
"Aggregated metrics are weighted averages of client-reported metric values.",
"Nonlinear metrics are not recomputed from pooled predictions."
]
}
Best metric fields are optional. They are present only when a selector or workflow publishes explicit best-selection metadata. The writer does not infer a best round from metric values.
round_metrics.jsonl
round_metrics.jsonl contains one JSON object per completed metrics round.
Each line records official aggregated metrics, per-site client metrics, optional
aggregation metadata, and skipped metric values.
Example line:
{
"round": 0,
"aggregated_metrics": [
{
"name": "auroc",
"value": 0.7500010132169
},
{
"name": "train_loss",
"value": 0.4855
}
],
"sites": [
{
"name": "site-1",
"metrics": [
{
"name": "train_loss",
"value": 0.4707
},
{
"name": "auroc",
"value": 0.7380791446479046
}
],
"weight": 2911,
"weight_key": "NUM_STEPS_CURRENT_ROUND"
},
{
"name": "site-2",
"metrics": [
{
"name": "train_loss",
"value": 0.5003
},
{
"name": "auroc",
"value": 0.7619228817858955
}
],
"weight": 2911,
"weight_key": "NUM_STEPS_CURRENT_ROUND"
}
],
"aggregation": {
"method": "weighted_average",
"weight_key": "NUM_STEPS_CURRENT_ROUND",
"metric_policy": "finite_numeric_metrics_only_per_key_denominator"
},
"skipped_metrics": [
{
"site": "site-1",
"name": "debug_blob",
"reason": "unsupported_type"
}
]
}
Dynamic metric names are stored as name values in arrays rather than as JSON
object keys. This avoids treating client-provided names as object structure in
downstream tools.
Safe Metric Values
Clients are untrusted metric producers. The writer serializes only normalized JSON-safe scalar values and writes to fixed filenames under the server run directory.
Official aggregated metrics accept finite numeric values and bool values.
Per-site metrics accept finite numeric values, bool values, and bounded string
values. Unsupported objects, tensors, arrays, nested containers, oversized
values, NaN, and Infinity are skipped and reported in
skipped_metrics with a bounded reason record.
Downloaded Artifacts
Metrics files are part of the normal downloaded job result when they exist. For automation, use the job download JSON output to find the downloaded local paths instead of constructing paths from the workspace layout:
nvflare job download <job_id> -o ./downloads --format json
Example response excerpt:
{
"schema_version": "1",
"status": "ok",
"data": {
"download_path": "/abs/path/downloads/abc123",
"artifact_discovery": "completed",
"artifacts": {
"metrics_summary": "/abs/path/downloads/abc123/workspace/metrics/metrics_summary.json",
"round_metrics": "/abs/path/downloads/abc123/workspace/metrics/round_metrics.jsonl"
},
"missing_artifacts": []
}
}
metrics_summary and round_metrics are reported only when those files
exist in the downloaded result. round_metrics is optional because older jobs
and jobs without aggregation metrics do not create a per-round metrics file.