nvflare.app_opt.pt.recipes.scaffold module

class ScaffoldRecipe(*, name: str = 'scaffold', model: Any | dict[str, Any] | None = None, initial_ckpt: str | None = None, min_clients: int, num_rounds: int = 2, train_script: str, train_args: str = '', launch_external_process: bool = False, command: str = 'python3 -u', server_expected_format: ExchangeFormat = ExchangeFormat.NUMPY, params_transfer_type: TransferType = TransferType.FULL, server_memory_gc_rounds: int = 0, client_memory_gc_rounds: int = 0, cuda_empty_cache: bool = False)[source]

Bases: Recipe

A recipe for implementing Scaffold in NVFlare.

Implements the training algorithm proposed in Karimireddy et al. “SCAFFOLD: Stochastic Controlled Averaging for Federated Learning” (https://arxiv.org/abs/1910.06378).

Client script requirement: Unlike FedAvgRecipe, the client script must use PTScaffoldHelper (nvflare.app_opt.pt.scaffold): call init(model), model_update() during training, terms_update() after training, and include meta[AlgorithmConstants.SCAFFOLD_CTRL_DIFF] = scaffold_helper.get_delta_controls() in the FLModel sent back to the server. A standard flare.receive/send loop without PTScaffoldHelper will cause server-side aggregation to fail.

This recipe sets up a complete federated learning workflow with Scaffold controller.

Parameters:
  • name – Name of the federated learning job. Defaults to “scaffold”.

  • model – Initial model to start federated training with. Can be: - nn.Module instance - Dict config: {“class_path”: “module.ClassName”, “args”: {“param”: value}} - None: no initial model

  • initial_ckpt – Absolute path to a pre-trained checkpoint file. The file may not exist locally as it could be on the server. Used to load initial weights. Note: PyTorch requires model when using initial_ckpt (for architecture).

  • min_clients – Minimum number of clients required to start a training round. Defaults to 2.

  • num_rounds – Number of federated training rounds to execute. Defaults to 2.

  • train_script – Path to the training script that will be executed on each client. Defaults to “client.py”.

  • train_args – Command line arguments to pass to the training script. Defaults to “”.

  • server_memory_gc_rounds – Run memory cleanup (gc.collect + malloc_trim) every N rounds on server. Set to 0 to disable. Defaults to 0.

Example

```python recipe = ScaffoldRecipe(

name=”my_scaffold_job”, model=pretrained_model, min_clients=2, num_rounds=10, train_script=”client.py”, train_args=”–epochs 5 –batch_size 32”

)

This is base class of a recipe. Recipes are implemented by jobs. A concrete recipe must provide the job for recipe implementation.

param job:

the job that implements the recipe.