nvflare.recipe.utils module

add_cross_site_evaluation(recipe: Recipe, submit_model_timeout: int = 600, validation_timeout: int = 6000, participating_clients: List[str] | None = None)[source]

Add cross-site evaluation to an existing recipe.

This utility automatically configures cross-site evaluation by: - Auto-detecting the framework from the recipe - Adding the appropriate model locator - Adding the CrossSiteModelEval controller - Adding ValidationJsonGenerator for results - Auto-adding the appropriate validator to clients (for NumPy recipes)

For standalone CSE without training, use NumpyCrossSiteEvalRecipe instead.

Note: This utility is designed for adding CSE to training recipes. If you call it on a CSE-only recipe (e.g., NumpyCrossSiteEvalRecipe), it will detect this and skip adding duplicate validators automatically.

WARNING: Do not call this function multiple times on the same recipe instance. This function is idempotent and will raise a RuntimeError if called more than once on the same recipe to prevent duplicate component registration.

IMPORTANT for PyTorch: Your client training script must handle validation tasks by checking flare.is_evaluate() and returning metrics without training. Example pattern:

```python # In your client script: while flare.is_running():

input_model = flare.receive() model.load_state_dict(input_model.params)

# Evaluate model (always required) metrics = evaluate(model, test_loader)

# Handle CSE validation task if flare.is_evaluate():

output_model = flare.FLModel(metrics=metrics) flare.send(output_model) continue # Skip training for validation-only tasks

# Normal training code here…

```

Example (NumPy - fully automatic):

```python from nvflare.app_common.np.recipes import NumpyFedAvgRecipe from nvflare.recipe.utils import add_cross_site_evaluation

recipe = NumpyFedAvgRecipe(

name=”my-job”, model=[1.0, 2.0, 3.0], min_clients=2, num_rounds=3, train_script=”client.py”

)

# That’s it! Framework auto-detected, validator auto-added add_cross_site_evaluation(recipe) ```

Example (PyTorch - requires client script support):

```python from nvflare.app_opt.pt.recipes import FedAvgRecipe from nvflare.recipe.utils import add_cross_site_evaluation

recipe = FedAvgRecipe(

name=”my-job”, min_clients=2, num_rounds=3, model=MyModel(), train_script=”client.py”

)

# Note: client.py must handle flare.is_evaluate() for validation add_cross_site_evaluation(recipe) ```

Example (TensorFlow - Client API pattern, recommended):

```python from nvflare.app_opt.tf.recipes import FedAvgRecipe from nvflare.recipe.utils import add_cross_site_evaluation

recipe = FedAvgRecipe(

name=”my-job”, min_clients=2, num_rounds=3, model=MyTFModel(), train_script=”client.py”

)

# Note: client.py must handle flare.is_evaluate() for validation add_cross_site_evaluation(recipe) ```

Example (TensorFlow - Component-based alternative):

```python from nvflare.app_opt.tf.recipes import FedAvgRecipe from nvflare.app_opt.tf.tf_validator import TFValidator from nvflare.recipe.utils import add_cross_site_evaluation

recipe = FedAvgRecipe(

name=”my-job”, min_clients=2, num_rounds=3, model=MyTFModel(), train_script=”client.py”

)

add_cross_site_evaluation(recipe)

# Optional: manually add TFValidator for component-based validation validator = TFValidator(model=my_model, data_loader=test_loader) recipe.job.to_clients(validator, tasks=[“validate”]) ```

Parameters:
  • recipe – Recipe instance to augment with cross-site evaluation.

  • submit_model_timeout – Timeout (seconds) for submitting models to clients. Defaults to 600.

  • validation_timeout – Timeout (seconds) for validation tasks on clients. Defaults to 6000.

  • participating_clients – Optional list of client names to include in cross-site evaluation. If not provided, all clients connected at controller start are used.

Raises:
  • ValueError – If the recipe doesn’t have a framework attribute or uses an unsupported framework.

  • RuntimeError – If cross-site evaluation has already been added to this recipe.

Note

  • Currently supports PyTorch, NumPy, and TensorFlow frameworks.

  • NumPy recipes using `NumpyFedAvgRecipe`: Validators (NPValidator) are automatically added to clients to handle validation tasks. The function intelligently detects if validators are already configured by checking for executors handling TASK_VALIDATION, avoiding duplicates for CSE-only recipes (like NumpyCrossSiteEvalRecipe).

  • Unified `FedAvgRecipe` with `framework=FrameworkType.NUMPY`: Uses the same Client API validation pattern as PyTorch and TensorFlow. Your client script should handle flare.is_evaluate() and return metrics for validation tasks.

  • PyTorch recipes: No separate validator component is needed. The client training script handles validation tasks through the Client API’s flare.is_evaluate() check. See the hello-pt example for implementation pattern.

  • TensorFlow recipes: Similar to PyTorch, uses the Client API pattern. The client script should handle validation tasks via flare.is_evaluate() check.

add_experiment_tracking(recipe: Recipe, tracking_type: str, tracking_config: dict | None = None, client_side: bool = False, server_side: bool = True)[source]

Add experiment tracking to a recipe.

Adds tracking receivers to the server and/or clients to collect and log metrics during training.

Parameters:
  • recipe – Recipe instance to augment with experiment tracking.

  • tracking_type – Type of tracking to enable (“mlflow”, “tensorboard”, or “wandb”).

  • tracking_config – Optional configuration dict for the tracking receiver.

  • client_side – If True, add tracking to all clients (each client tracks locally).

  • server_side – If True, add tracking to server (aggregates metrics from all clients). Default: True.

Examples

# Server-side tracking (default - federated metrics) add_experiment_tracking(recipe, “mlflow”, {“tracking_uri”: “…”})

# Client-side tracking only (each client tracks independently) add_experiment_tracking(recipe, “mlflow”, {…}, client_side=True, server_side=False)

# Both server and client tracking add_experiment_tracking(recipe, “mlflow”, {…}, client_side=True, server_side=True)

collect_non_local_scripts(job: FedJob) List[str][source]

Collect scripts that don’t exist locally.

This utility function is used by ExecEnv subclasses to validate script resources before deployment. Scripts are considered “non-local” if they are absolute paths that don’t exist on the local machine.

Parameters:

job – The FedJob to check for non-local scripts.

Returns:

List of absolute script paths that don’t exist on the local machine.

ensure_config_type_dict(config: Dict[str, Any] | None) Dict[str, Any] | None[source]

Ensure a component config dict has config_type ‘dict’ and is normalized for the config layer.

Used by FedOpt-style recipes for optimizer_args and lr_scheduler_args: those dicts have ‘path’ or ‘class_path’ plus ‘args’, and would otherwise be treated as component configs and instantiated during config scan (e.g. torch.optim.SGD without params). This function: - Accepts either ‘path’ or ‘class_path’ (for consistency with recipe model_config); if only

‘class_path’ is set, copies it to ‘path’ so the component builder and runtime code work unchanged.

  • Sets config_type to ‘dict’ when missing so the component builder does not instantiate at load time; the optimizer/scheduler is instantiated at runtime when params/optimizer are available.

Parameters:

config – A component-style config dict (e.g. {‘class_path’: ‘torch.optim.SGD’, ‘args’: {‘lr’: 1.0}} or {‘path’: ‘…’, ‘args’: {…}}) or None.

Returns:

A copy of config with config_type ‘dict’ if missing and path set from class_path if needed; None if config is None.

extract_persistor_id(result: Any) str[source]
prepare_initial_ckpt(initial_ckpt: str | None, job) str | None[source]

Prepare initial_ckpt for job deployment.

  • Relative path: treated as a local file. The file is bundled into the server app’s custom directory and the basename is returned for runtime resolution.

  • Absolute path: treated as a server-side (remote) path and returned as-is. The file is expected to exist on the server at runtime.

Parameters:
  • initial_ckpt – Checkpoint file path (absolute or relative).

  • job – BaseFedJob instance to add the file to.

Returns:

  • None if initial_ckpt is None

  • Basename for relative paths (file is bundled into app/custom/)

  • Absolute path as-is for server-side checkpoints

Return type:

The checkpoint path to pass to the persistor

recipe_model_to_job_model(recipe_model: Dict[str, Any]) Dict[str, Any][source]

Validate and convert recipe model dict (class_path) to job/config format (path).

Calls validate_dict_model_config() internally so callers do not need to validate separately. Recipes accept {“class_path”: “module.Class”, “args”: {…}} only. The Job API and config parsing expect {“path”: “module.Class”, “args”: {…}}.

Parameters:

recipe_model – Dict with ‘class_path’ and optional ‘args’.

Returns:

Dict with ‘path’ and ‘args’ for use by PTModel, persistors, etc.

resolve_initial_ckpt(initial_ckpt: str | None, prepared_initial_ckpt: str | None, job) str | None[source]
setup_custom_persistor(*, job, model_persistor=None) str[source]
validate_ckpt(ckpt: str | None) None[source]

Validate a checkpoint path if provided.

For absolute paths: no local existence check (file may be a server-side path). For relative paths: verifies the file exists locally (it will be bundled into the job).

Parameters:

ckpt – Checkpoint file path to validate (e.g. initial_ckpt or eval_ckpt).

Raises:

ValueError – If relative path does not exist locally.

validate_dict_model_config(model: Any) None[source]

Validate recipe dict model config structure.

Recipes accept model config with class_path (fully qualified class name). The job/config layer uses path; recipes use class_path only.

Parameters:

model – Model input to validate.

Raises:

ValueError – If dict config is missing ‘class_path’ or value is not a string.