Recipe Metrics Artifacts

Built-in training aggregation recipes write standard metrics artifacts when the server workflow reports round-level aggregation metrics. These files make recipe results easier to consume from benchmark, reporting, and agent tooling without scraping server logs.

The artifacts are written under the server run directory:

metrics/
  metrics_summary.json
  round_metrics.jsonl

Recipes or workflows that do not report training aggregation metrics do not need to create these files. This includes PSI, stats-only jobs, and standalone cross-site validation. Cross-site validation continues to use its existing cross_site_val/cross_val_results.json output.

Recipe Behavior

Users do not need to select this writer for supported built-in training aggregation recipes. The recipe setup installs it as part of the server configuration, and it writes files only when the workflow reports aggregation metrics.

This release does not expose a recipe argument to disable metrics artifacts. For custom jobs that should not write these artifacts, omit the metrics artifact writer from the server configuration.

Recorder Semantics

The metrics artifact writer is a recorder. It persists metrics and metadata that workflows, aggregators, and model selectors already produce:

official aggregated metrics from the round aggregation result
per-site metrics received from clients for each round
official best metric metadata published by model-selection logic
aggregation provenance, weights, and skipped values when available

It does not recompute metrics, select a best round, infer max/min policy, parse logs, or compute nonlinear metrics such as AUROC from pooled predictions.

Metric names are dynamic. Names such as auroc, accuracy, loss, dice, rmse, or f1 are client or workflow metric keys, not hard-coded schema fields.

Round numbers are recorded as provided by workflow metadata such as AppConstants.CURRENT_ROUND or FLModel.current_round. They are 0-based by default and are not renumbered by the writer.

`metrics_summary.json`

metrics_summary.json contains the final aggregated metrics from the last completed metrics round and, when available, official best metric metadata from the model selector.

Example:

{
  "schema_version": "1",
  "status": "metrics_reported",
  "job_name": "ames_fedavg",
  "metric_source": "client_reported_flmodel_metrics",
  "key_metric": {
    "name": "auroc",
    "mode": "max",
    "mode_source": "IntimeModelSelector.negate_key_metric"
  },
  "final_round": 2,
  "final_aggregated_metrics": [
    {
      "name": "auroc",
      "value": 0.7421
    },
    {
      "name": "train_loss",
      "value": 0.492
    }
  ],
  "best_round": 0,
  "best_metrics": [
    {
      "name": "auroc",
      "value": 0.7500010132169
    }
  ],
  "best_metric_source": "IntimeModelSelector",
  "best_metric_detail_source": "initial_metrics",
  "aggregation": {
    "method": "weighted_average",
    "weight_key": "NUM_STEPS_CURRENT_ROUND",
    "metric_policy": "finite_numeric_metrics_only_per_key_denominator"
  },
  "round_metrics_file": "round_metrics.jsonl",
  "notes": [
    "Aggregated metrics are weighted averages of client-reported metric values.",
    "Nonlinear metrics are not recomputed from pooled predictions."
  ]
}

Best metric fields are optional. They are present only when a selector or workflow publishes explicit best-selection metadata. The writer does not infer a best round from metric values.

`round_metrics.jsonl`

round_metrics.jsonl contains one JSON object per completed metrics round. Each line records official aggregated metrics, per-site client metrics, optional aggregation metadata, and skipped metric values.

Example line:

{
  "round": 0,
  "aggregated_metrics": [
    {
      "name": "auroc",
      "value": 0.7500010132169
    },
    {
      "name": "train_loss",
      "value": 0.4855
    }
  ],
  "sites": [
    {
      "name": "site-1",
      "metrics": [
        {
          "name": "train_loss",
          "value": 0.4707
        },
        {
          "name": "auroc",
          "value": 0.7380791446479046
        }
      ],
      "weight": 2911,
      "weight_key": "NUM_STEPS_CURRENT_ROUND"
    },
    {
      "name": "site-2",
      "metrics": [
        {
          "name": "train_loss",
          "value": 0.5003
        },
        {
          "name": "auroc",
          "value": 0.7619228817858955
        }
      ],
      "weight": 2911,
      "weight_key": "NUM_STEPS_CURRENT_ROUND"
    }
  ],
  "aggregation": {
    "method": "weighted_average",
    "weight_key": "NUM_STEPS_CURRENT_ROUND",
    "metric_policy": "finite_numeric_metrics_only_per_key_denominator"
  },
  "skipped_metrics": [
    {
      "site": "site-1",
      "name": "debug_blob",
      "reason": "unsupported_type"
    }
  ]
}

Dynamic metric names are stored as name values in arrays rather than as JSON object keys. This avoids treating client-provided names as object structure in downstream tools.

Safe Metric Values

Clients are untrusted metric producers. The writer serializes only normalized JSON-safe scalar values and writes to fixed filenames under the server run directory.

Official aggregated metrics accept finite numeric values and bool values. Per-site metrics accept finite numeric values, bool values, and bounded string values. Unsupported objects, tensors, arrays, nested containers, oversized values, NaN, and Infinity are skipped and reported in skipped_metrics with a bounded reason record.

Downloaded Artifacts

Metrics files are part of the normal downloaded job result when they exist. For automation, use the job download JSON output to find the downloaded local paths instead of constructing paths from the workspace layout:

nvflare job download <job_id> -o ./downloads --format json

Example response excerpt:

{
  "schema_version": "1",
  "status": "ok",
  "data": {
    "download_path": "/abs/path/downloads/abc123",
    "artifact_discovery": "completed",
    "artifacts": {
      "metrics_summary": "/abs/path/downloads/abc123/workspace/metrics/metrics_summary.json",
      "round_metrics": "/abs/path/downloads/abc123/workspace/metrics/round_metrics.jsonl"
    },
    "missing_artifacts": []
  }
}

metrics_summary and round_metrics are reported only when those files exist in the downloaded result. round_metrics is optional because older jobs and jobs without aggregation metrics do not create a per-round metrics file.