.. _recipe_metrics_artifacts: Recipe Metrics Artifacts ======================== Built-in training aggregation recipes write standard metrics artifacts when the server workflow reports round-level aggregation metrics. These files make recipe results easier to consume from benchmark, reporting, and agent tooling without scraping server logs. The artifacts are written under the server run directory: .. code-block:: text metrics/ metrics_summary.json round_metrics.jsonl Recipes or workflows that do not report training aggregation metrics do not need to create these files. This includes PSI, stats-only jobs, and standalone cross-site validation. Cross-site validation continues to use its existing ``cross_site_val/cross_val_results.json`` output. Recipe Behavior --------------- Users do not need to select this writer for supported built-in training aggregation recipes. The recipe setup installs it as part of the server configuration, and it writes files only when the workflow reports aggregation metrics. This release does not expose a recipe argument to disable metrics artifacts. For custom jobs that should not write these artifacts, omit the metrics artifact writer from the server configuration. Recorder Semantics ------------------ The metrics artifact writer is a recorder. It persists metrics and metadata that workflows, aggregators, and model selectors already produce: * official aggregated metrics from the round aggregation result * per-site metrics received from clients for each round * official best metric metadata published by model-selection logic * aggregation provenance, weights, and skipped values when available It does not recompute metrics, select a best round, infer max/min policy, parse logs, or compute nonlinear metrics such as AUROC from pooled predictions. Metric names are dynamic. Names such as ``auroc``, ``accuracy``, ``loss``, ``dice``, ``rmse``, or ``f1`` are client or workflow metric keys, not hard-coded schema fields. Round numbers are recorded as provided by workflow metadata such as ``AppConstants.CURRENT_ROUND`` or ``FLModel.current_round``. They are 0-based by default and are not renumbered by the writer. ``metrics_summary.json`` ------------------------ ``metrics_summary.json`` contains the final aggregated metrics from the last completed metrics round and, when available, official best metric metadata from the model selector. Example: .. code-block:: json { "schema_version": "1", "status": "metrics_reported", "job_name": "ames_fedavg", "metric_source": "client_reported_flmodel_metrics", "key_metric": { "name": "auroc", "mode": "max", "mode_source": "IntimeModelSelector.negate_key_metric" }, "final_round": 2, "final_aggregated_metrics": [ { "name": "auroc", "value": 0.7421 }, { "name": "train_loss", "value": 0.492 } ], "best_round": 0, "best_metrics": [ { "name": "auroc", "value": 0.7500010132169 } ], "best_metric_source": "IntimeModelSelector", "best_metric_detail_source": "initial_metrics", "aggregation": { "method": "weighted_average", "weight_key": "NUM_STEPS_CURRENT_ROUND", "metric_policy": "finite_numeric_metrics_only_per_key_denominator" }, "round_metrics_file": "round_metrics.jsonl", "notes": [ "Aggregated metrics are weighted averages of client-reported metric values.", "Nonlinear metrics are not recomputed from pooled predictions." ] } Best metric fields are optional. They are present only when a selector or workflow publishes explicit best-selection metadata. The writer does not infer a best round from metric values. ``round_metrics.jsonl`` ----------------------- ``round_metrics.jsonl`` contains one JSON object per completed metrics round. Each line records official aggregated metrics, per-site client metrics, optional aggregation metadata, and skipped metric values. Example line: .. code-block:: json { "round": 0, "aggregated_metrics": [ { "name": "auroc", "value": 0.7500010132169 }, { "name": "train_loss", "value": 0.4855 } ], "sites": [ { "name": "site-1", "metrics": [ { "name": "train_loss", "value": 0.4707 }, { "name": "auroc", "value": 0.7380791446479046 } ], "weight": 2911, "weight_key": "NUM_STEPS_CURRENT_ROUND" }, { "name": "site-2", "metrics": [ { "name": "train_loss", "value": 0.5003 }, { "name": "auroc", "value": 0.7619228817858955 } ], "weight": 2911, "weight_key": "NUM_STEPS_CURRENT_ROUND" } ], "aggregation": { "method": "weighted_average", "weight_key": "NUM_STEPS_CURRENT_ROUND", "metric_policy": "finite_numeric_metrics_only_per_key_denominator" }, "skipped_metrics": [ { "site": "site-1", "name": "debug_blob", "reason": "unsupported_type" } ] } Dynamic metric names are stored as ``name`` values in arrays rather than as JSON object keys. This avoids treating client-provided names as object structure in downstream tools. Safe Metric Values ------------------ Clients are untrusted metric producers. The writer serializes only normalized JSON-safe scalar values and writes to fixed filenames under the server run directory. Official aggregated metrics accept finite numeric values and bool values. Per-site metrics accept finite numeric values, bool values, and bounded string values. Unsupported objects, tensors, arrays, nested containers, oversized values, ``NaN``, and ``Infinity`` are skipped and reported in ``skipped_metrics`` with a bounded reason record. Downloaded Artifacts -------------------- Metrics files are part of the normal downloaded job result when they exist. For automation, use the job download JSON output to find the downloaded local paths instead of constructing paths from the workspace layout: .. code-block:: shell nvflare job download -o ./downloads --format json Example response excerpt: .. code-block:: json { "schema_version": "1", "status": "ok", "data": { "download_path": "/abs/path/downloads/abc123", "artifact_discovery": "completed", "artifacts": { "metrics_summary": "/abs/path/downloads/abc123/workspace/metrics/metrics_summary.json", "round_metrics": "/abs/path/downloads/abc123/workspace/metrics/round_metrics.jsonl" }, "missing_artifacts": [] } } ``metrics_summary`` and ``round_metrics`` are reported only when those files exist in the downloaded result. ``round_metrics`` is optional because older jobs and jobs without aggregation metrics do not create a per-round metrics file.