nvflare.recipe.utils module
- add_cross_site_evaluation(recipe: Recipe, submit_model_timeout: int = 600, validation_timeout: int = 6000, participating_clients: List[str] | None = None)[source]
Add cross-site evaluation to an existing recipe.
This utility automatically configures cross-site evaluation by: - Auto-detecting the framework from the recipe - Adding the appropriate model locator - Adding the CrossSiteModelEval controller - Adding ValidationJsonGenerator for results - Auto-adding the appropriate validator to clients (for NumPy recipes)
For standalone CSE without training, use NumpyCrossSiteEvalRecipe instead.
Note: This utility is designed for adding CSE to training recipes. If you call it on a CSE-only recipe (e.g., NumpyCrossSiteEvalRecipe), it will detect this and skip adding duplicate validators automatically.
WARNING: Do not call this function multiple times on the same recipe instance. This function is idempotent and will raise a RuntimeError if called more than once on the same recipe to prevent duplicate component registration.
IMPORTANT for PyTorch: Your client training script must handle validation tasks by checking flare.is_evaluate() and returning metrics without training. Example pattern:
```python # In your client script: while flare.is_running():
input_model = flare.receive() model.load_state_dict(input_model.params)
# Evaluate model (always required) metrics = evaluate(model, test_loader)
# Handle CSE validation task if flare.is_evaluate():
output_model = flare.FLModel(metrics=metrics) flare.send(output_model) continue # Skip training for validation-only tasks
# Normal training code here…
- Example (NumPy - fully automatic):
```python from nvflare.app_common.np.recipes import NumpyFedAvgRecipe from nvflare.recipe.utils import add_cross_site_evaluation
- recipe = NumpyFedAvgRecipe(
name=”my-job”, model=[1.0, 2.0, 3.0], min_clients=2, num_rounds=3, train_script=”client.py”
)
# That’s it! Framework auto-detected, validator auto-added add_cross_site_evaluation(recipe) ```
- Example (PyTorch - requires client script support):
```python from nvflare.app_opt.pt.recipes import FedAvgRecipe from nvflare.recipe.utils import add_cross_site_evaluation
- recipe = FedAvgRecipe(
name=”my-job”, min_clients=2, num_rounds=3, model=MyModel(), train_script=”client.py”
)
# Note: client.py must handle flare.is_evaluate() for validation add_cross_site_evaluation(recipe) ```
- Example (TensorFlow - Client API pattern, recommended):
```python from nvflare.app_opt.tf.recipes import FedAvgRecipe from nvflare.recipe.utils import add_cross_site_evaluation
- recipe = FedAvgRecipe(
name=”my-job”, min_clients=2, num_rounds=3, model=MyTFModel(), train_script=”client.py”
)
# Note: client.py must handle flare.is_evaluate() for validation add_cross_site_evaluation(recipe) ```
- Example (TensorFlow - Component-based alternative):
```python from nvflare.app_opt.tf.recipes import FedAvgRecipe from nvflare.app_opt.tf.tf_validator import TFValidator from nvflare.recipe.utils import add_cross_site_evaluation
- recipe = FedAvgRecipe(
name=”my-job”, min_clients=2, num_rounds=3, model=MyTFModel(), train_script=”client.py”
)
add_cross_site_evaluation(recipe)
# Optional: manually add TFValidator for component-based validation validator = TFValidator(model=my_model, data_loader=test_loader) recipe.job.to_clients(validator, tasks=[“validate”]) ```
- Parameters:
recipe – Recipe instance to augment with cross-site evaluation.
submit_model_timeout – Timeout (seconds) for submitting models to clients. Defaults to 600.
validation_timeout – Timeout (seconds) for validation tasks on clients. Defaults to 6000.
participating_clients – Optional list of client names to include in cross-site evaluation. If not provided, all clients connected at controller start are used.
- Raises:
ValueError – If the recipe doesn’t have a framework attribute or uses an unsupported framework.
RuntimeError – If cross-site evaluation has already been added to this recipe.
Note
Currently supports PyTorch, NumPy, and TensorFlow frameworks.
NumPy recipes using `NumpyFedAvgRecipe`: Validators (NPValidator) are automatically added to clients to handle validation tasks. The function intelligently detects if validators are already configured by checking for executors handling TASK_VALIDATION, avoiding duplicates for CSE-only recipes (like NumpyCrossSiteEvalRecipe).
Unified `FedAvgRecipe` with `framework=FrameworkType.NUMPY`: Uses the same Client API validation pattern as PyTorch and TensorFlow. Your client script should handle flare.is_evaluate() and return metrics for validation tasks.
PyTorch recipes: No separate validator component is needed. The client training script handles validation tasks through the Client API’s flare.is_evaluate() check. See the hello-pt example for implementation pattern.
TensorFlow recipes: Similar to PyTorch, uses the Client API pattern. The client script should handle validation tasks via flare.is_evaluate() check.
- add_experiment_tracking(recipe: Recipe, tracking_type: str, tracking_config: dict | None = None, client_side: bool = False, server_side: bool = True)[source]
Add experiment tracking to a recipe.
Adds tracking receivers to the server and/or clients to collect and log metrics during training.
- Parameters:
recipe – Recipe instance to augment with experiment tracking.
tracking_type – Type of tracking to enable (“mlflow”, “tensorboard”, or “wandb”).
tracking_config – Optional configuration dict for the tracking receiver.
client_side – If True, add tracking to all clients (each client tracks locally).
server_side – If True, add tracking to server (aggregates metrics from all clients). Default: True.
Examples
# Server-side tracking (default - federated metrics) add_experiment_tracking(recipe, “mlflow”, {“tracking_uri”: “…”})
# Client-side tracking only (each client tracks independently) add_experiment_tracking(recipe, “mlflow”, {…}, client_side=True, server_side=False)
# Both server and client tracking add_experiment_tracking(recipe, “mlflow”, {…}, client_side=True, server_side=True)
- collect_non_local_scripts(job: FedJob) List[str][source]
Collect scripts that don’t exist locally.
This utility function is used by ExecEnv subclasses to validate script resources before deployment. Scripts are considered “non-local” if they are absolute paths that don’t exist on the local machine.
- Parameters:
job – The FedJob to check for non-local scripts.
- Returns:
List of absolute script paths that don’t exist on the local machine.
- ensure_config_type_dict(config: Dict[str, Any] | None) Dict[str, Any] | None[source]
Ensure a component config dict has config_type ‘dict’ and is normalized for the config layer.
Used by FedOpt-style recipes for optimizer_args and lr_scheduler_args: those dicts have ‘path’ or ‘class_path’ plus ‘args’, and would otherwise be treated as component configs and instantiated during config scan (e.g. torch.optim.SGD without params). This function: - Accepts either ‘path’ or ‘class_path’ (for consistency with recipe model_config); if only
‘class_path’ is set, copies it to ‘path’ so the component builder and runtime code work unchanged.
Sets config_type to ‘dict’ when missing so the component builder does not instantiate at load time; the optimizer/scheduler is instantiated at runtime when params/optimizer are available.
- Parameters:
config – A component-style config dict (e.g. {‘class_path’: ‘torch.optim.SGD’, ‘args’: {‘lr’: 1.0}} or {‘path’: ‘…’, ‘args’: {…}}) or None.
- Returns:
A copy of config with config_type ‘dict’ if missing and path set from class_path if needed; None if config is None.
- prepare_initial_ckpt(initial_ckpt: str | None, job) str | None[source]
Prepare initial_ckpt for job deployment.
Relative path: treated as a local file. The file is bundled into the server app’s custom directory and the basename is returned for runtime resolution.
Absolute path: treated as a server-side (remote) path and returned as-is. The file is expected to exist on the server at runtime.
- Parameters:
initial_ckpt – Checkpoint file path (absolute or relative).
job – BaseFedJob instance to add the file to.
- Returns:
None if initial_ckpt is None
Basename for relative paths (file is bundled into app/custom/)
Absolute path as-is for server-side checkpoints
- Return type:
The checkpoint path to pass to the persistor
- recipe_model_to_job_model(recipe_model: Dict[str, Any]) Dict[str, Any][source]
Validate and convert recipe model dict (class_path) to job/config format (path).
Calls
validate_dict_model_config()internally so callers do not need to validate separately. Recipes accept {“class_path”: “module.Class”, “args”: {…}} only. The Job API and config parsing expect {“path”: “module.Class”, “args”: {…}}.- Parameters:
recipe_model – Dict with ‘class_path’ and optional ‘args’.
- Returns:
Dict with ‘path’ and ‘args’ for use by PTModel, persistors, etc.
- resolve_initial_ckpt(initial_ckpt: str | None, prepared_initial_ckpt: str | None, job) str | None[source]
- validate_ckpt(ckpt: str | None) None[source]
Validate a checkpoint path if provided.
For absolute paths: no local existence check (file may be a server-side path). For relative paths: verifies the file exists locally (it will be bundled into the job).
- Parameters:
ckpt – Checkpoint file path to validate (e.g. initial_ckpt or eval_ckpt).
- Raises:
ValueError – If relative path does not exist locally.
- validate_dict_model_config(model: Any) None[source]
Validate recipe dict model config structure.
Recipes accept model config with
class_path(fully qualified class name). The job/config layer usespath; recipes useclass_pathonly.- Parameters:
model – Model input to validate.
- Raises:
ValueError – If dict config is missing ‘class_path’ or value is not a string.