nvflare.app_opt.sklearn.recipes package
Submodules
Module contents
- class KMeansFedAvgRecipe(*, name: str = 'kmeans_fedavg', min_clients: int, num_rounds: int = 5, n_clusters: int = 3, model_path: str | None = None, train_script: str, train_args: str = '', launch_external_process: bool = False, command: str = 'python3 -u', per_site_config: dict[str, dict] | None = None, key_metric: str = 'metrics')[source]
Bases:
FedAvgRecipeA recipe for Federated K-Means Clustering with Scikit-learn.
This recipe implements federated K-Means clustering using a mini-batch aggregation strategy. The aggregation follows the scheme defined in MiniBatchKMeans where each client’s results are treated as a mini-batch for updating global centers.
The recipe configures: - A federated job with initial n_clusters parameter - FedAvg controller for coordinating training rounds - Custom KMeansAssembler for mini-batch center aggregation - CollectAndAssembleModelAggregator for combining client updates - Script runners for client-side training execution
Training Process: - Round 0: Each client generates initial centers using k-means++. The server
collects all initial centers and performs one round of k-means to generate the initial global centers.
Subsequent rounds: Each client trains a local MiniBatchKMeans model starting from global centers. The server aggregates center and count information to update global centers using the mini-batch update rule.
- Parameters:
name – Name of the federated learning job. Defaults to “kmeans_fedavg”.
min_clients – Minimum number of clients required to start a training round.
num_rounds – Number of federated training rounds to execute. Defaults to 5.
n_clusters – Number of clusters for K-Means. Defaults to 3.
model_path – Absolute path to a saved model file (.joblib). If provided, the file must exist at runtime. Used to load previously saved cluster centers.
train_script – Path to the training script that will be executed on each client.
train_args – Command line arguments to pass to the training script.
launch_external_process – Whether to launch the script in external process. Defaults to False.
command – If launch_external_process=True, command to run script (prepended to script). Defaults to “python3 -u”.
per_site_config – Per-site configuration for the federated learning job. Dictionary mapping site names to configuration dicts. If not provided, the same configuration will be used for all clients.
key_metric – Metric used to determine if the model is globally best. If validation metrics are a dict, key_metric selects the metric used for global model selection. Defaults to “metrics” (which corresponds to the homogeneity score sent by the K-Means client).
Example
Basic usage with same config for all clients:
```python recipe = KMeansFedAvgRecipe(
name=”kmeans_iris”, min_clients=3, num_rounds=5, n_clusters=3, train_script=”src/kmeans_fl.py”, train_args=”–data_path /tmp/data/iris.csv”,
)
from nvflare.recipe import SimEnv env = SimEnv(num_clients=3) run = recipe.execute(env) print(“Result:”, run.get_result()) ```
Per-site configuration:
```python from nvflare.app_opt.sklearn import KMeansFedAvgRecipe
- recipe = KMeansFedAvgRecipe(
name=”kmeans_iris”, min_clients=3, num_rounds=5, n_clusters=3, train_script=”src/kmeans_fl.py”, per_site_config={
“site-1”: {“train_args”: “–data_path /tmp/data/site1.csv –train_start 0 –train_end 50”}, “site-2”: {“train_args”: “–data_path /tmp/data/site2.csv –train_start 50 –train_end 100”}, “site-3”: {“train_args”: “–data_path /tmp/data/site3.csv –train_start 100 –train_end 150”},
},
)
Note
This recipe uses a custom KMeansAssembler that implements the mini-batch K-Means aggregation logic. The assembler maintains historical center and count information across rounds for proper weighted averaging.
This is base class of a recipe. Recipes are implemented by jobs. A concrete recipe must provide the job for recipe implementation.
- param job:
the job that implements the recipe.
- class SVMFedAvgRecipe(*, name: str = 'svm_fedavg', min_clients: int, kernel: Literal['linear', 'poly', 'rbf', 'sigmoid'] = 'rbf', model_path: str | None = None, train_script: str, train_args: str = '', launch_external_process: bool = False, command: str = 'python3 -u', per_site_config: dict[str, dict] | None = None, key_metric: str = 'AUC')[source]
Bases:
FedAvgRecipeA recipe for Federated SVM with Scikit-learn.
This recipe implements federated SVM training using support vector aggregation. Unlike iterative algorithms, SVM training only requires one round: - Round 0: Each client trains a local SVM and sends their support vectors - Server aggregates all support vectors and trains a global SVM - Round 1: Clients validate using the global support vectors
The recipe configures: - A federated job with kernel parameter - FedAvg controller (2 rounds) - CollectAndAssembleModelAggregator with SVMAssembler for support vector aggregation - Script runners for client-side training execution
Training Process: - Round 0 (Training): Each client trains a local SVM on their data and extracts
support vectors. The server collects all support vectors, trains a global SVM, and extracts the global support vectors.
Round 1 (Validation): Each client validates using the global support vectors.
- Parameters:
name – Name of the federated learning job. Defaults to “svm_fedavg”.
min_clients – Minimum number of clients required to start a training round.
kernel – Kernel type for SVM. Options: ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’. Defaults to ‘rbf’.
model_path – Absolute path to a saved model file (.joblib). If provided, the file must exist at runtime. Used to load previously saved support vectors.
train_script – Path to the training script that will be executed on each client.
train_args – Command line arguments to pass to the training script.
launch_external_process – Whether to launch the script in external process. Defaults to False.
command – If launch_external_process=True, command to run script (prepended to script). Defaults to “python3 -u”.
per_site_config – Per-site configuration for the federated learning job. Dictionary mapping site names to configuration dicts. If not provided, the same configuration will be used for all clients.
key_metric – Metric used to determine if the model is globally best. If validation metrics are a dict, key_metric selects the metric used for global model selection. Defaults to “AUC” (which corresponds to the ROC AUC score sent by the SVM client in round 1).
Example
Basic usage with same config for all clients:
```python recipe = SVMFedAvgRecipe(
name=”svm_cancer”, min_clients=3, kernel=”rbf”, train_script=”client.py”, train_args=”–data_path /tmp/data/cancer.csv”,
)
from nvflare.recipe import SimEnv env = SimEnv(num_clients=3) run = recipe.execute(env) print(“Result:”, run.get_result()) ```
Per-site configuration:
```python from nvflare.app_opt.sklearn import SVMFedAvgRecipe
- recipe = SVMFedAvgRecipe(
name=”svm_cancer”, min_clients=3, kernel=”rbf”, train_script=”client.py”, per_site_config={
“site-1”: {“train_args”: “–data_path /tmp/data/site1.csv –train_start 0 –train_end 100”}, “site-2”: {“train_args”: “–data_path /tmp/data/site2.csv –train_start 100 –train_end 200”}, “site-3”: {“train_args”: “–data_path /tmp/data/site3.csv –train_start 200 –train_end 300”},
},
)
Note
This recipe uses CollectAndAssembleModelAggregator with SVMAssembler for support vector aggregation. The training only requires one round since SVM is not an iterative algorithm in the federated setting. A second round is included for validation purposes.
This is base class of a recipe. Recipes are implemented by jobs. A concrete recipe must provide the job for recipe implementation.
- param job:
the job that implements the recipe.
- class SklearnFedAvgRecipe(*, name: str = 'sklearn_fedavg', min_clients: int, num_rounds: int = 2, model_params: dict | None = None, model_path: str | None = None, train_script: str, train_args: str = '', aggregator: Aggregator | None = None, aggregator_data_kind: DataKind = DataKind.WEIGHTS, launch_external_process: bool = False, command: str = 'python3 -u', per_site_config: dict[str, dict] | None = None, key_metric: str = 'accuracy', launch_once: bool = True, shutdown_timeout: float = 0.0)[source]
Bases:
FedAvgRecipeA recipe for implementing Federated Averaging (FedAvg) with Scikit-learn.
This recipe sets up a complete federated learning workflow with memory-efficient InTime aggregation, specifically designed for scikit-learn models.
The recipe configures: - A federated job with initial parameters - FedAvg controller with InTime aggregation for memory efficiency - Optional early stopping and model selection - Script runners for client-side training execution
- Parameters:
name – Name of the federated learning job. Defaults to “sklearn_fedavg”.
min_clients – Minimum number of clients required to start a training round.
num_rounds – Number of federated training rounds to execute. Defaults to 2.
model_params – Model hyperparameters as a dictionary. For SGDClassifier, can include: n_classes, learning_rate, eta0, loss, penalty, fit_intercept, etc.
model_path – Optional absolute path to a saved model file (.joblib, .pkl). If provided, the model is loaded from this path at runtime (file must exist). Takes precedence over model_params when loading.
train_script – Path to the training script that will be executed on each client.
train_args – Command line arguments to pass to the training script.
aggregator – Custom aggregator for combining client updates. If None, uses InTimeAccumulateWeightedAggregator with aggregator_data_kind.
aggregator_data_kind – Data kind to use for the aggregator. Defaults to DataKind.WEIGHTS.
launch_external_process – Whether to launch the script in external process. Defaults to False.
command – If launch_external_process=True, command to run script (prepended to script). Defaults to “python3 -u”.
per_site_config – Per-site configuration for the federated learning job. Dictionary mapping site names to configuration dicts. If not provided, the same configuration will be used for all clients.
key_metric – Metric used to determine if the model is globally best. If validation metrics are a dict, key_metric selects the metric used for global model selection. Defaults to “accuracy”.
launch_once – Whether the external process will be launched only once at the beginning or on each task. Only used if launch_external_process is True. Defaults to True.
shutdown_timeout – If provided, will wait for this number of seconds before shutdown. Only used if launch_external_process is True. Defaults to 0.0.
Example
Basic usage with same config for all clients:
```python recipe = SklearnFedAvgRecipe(
name=”sklearn_linear”, min_clients=5, num_rounds=50, model_params={
“n_classes”: 2, “learning_rate”: “constant”, “eta0”: 1e-4, “loss”: “log_loss”, “penalty”: “l2”, “fit_intercept”: 1,
}, train_script=”client.py”, train_args=”–data_path /tmp/data/HIGGS.csv”,
)
from nvflare.recipe import SimEnv env = SimEnv(num_clients=5) run = recipe.execute(env) print(“Result:”, run.get_result()) ```
Per-site configuration:
```python from nvflare.app_opt.sklearn import SklearnFedAvgRecipe
- recipe = SklearnFedAvgRecipe(
name=”sklearn_linear”, min_clients=3, num_rounds=50, model_params={“n_classes”: 2, “learning_rate”: “constant”, “eta0”: 1e-4}, train_script=”client.py”, per_site_config={
“site-1”: {“train_args”: “–data_path /tmp/data/site1.csv”}, “site-2”: {“train_args”: “–data_path /tmp/data/site2.csv”}, “site-3”: {“train_args”: “–data_path /tmp/data/site3.csv”},
},
)
Note
By default, this recipe implements the standard FedAvg algorithm where model updates are aggregated using weighted averaging based on the number of training samples provided by each client.
If you want to use a custom aggregator, you can pass it in the aggregator parameter. The custom aggregator must be a subclass of the Aggregator class.
This is base class of a recipe. Recipes are implemented by jobs. A concrete recipe must provide the job for recipe implementation.
- param job:
the job that implements the recipe.