nvflare.app_common.workflows.cyclic_ctl module¶
- class CyclicController(num_rounds: int = 5, task_assignment_timeout: int = 10, persistor_id='persistor', shareable_generator_id='shareable_generator', task_name='train', task_check_period: float = 0.5, persist_every_n_rounds: int = 1, snapshot_every_n_rounds: int = 1, order: str | List[str] = 'FIXED', allow_early_termination=False)[source]¶
Bases:
Controller
A sample implementation to demonstrate how to use relay method for Cyclic Federated Learning.
- Parameters:
num_rounds (int, optional) – number of rounds this controller should perform. Defaults to 5.
task_assignment_timeout (int, optional) – timeout (in sec) to determine if one client fails to request the task which it is assigned to . Defaults to 10.
persistor_id (str, optional) – id of the persistor so this controller can save a global model. Defaults to “persistor”.
shareable_generator_id (str, optional) – id of shareable generator. Defaults to “shareable_generator”.
task_name (str, optional) – the task name that clients know how to handle. Defaults to “train”.
task_check_period (float, optional) – interval for checking status of tasks. Defaults to 0.5.
persist_every_n_rounds (int, optional) – persist the global model every n rounds. Defaults to 1. If n is 0 then no persist.
snapshot_every_n_rounds (int, optional) – persist the server state every n rounds. Defaults to 1. If n is 0 then no persist.
order (Union[str, List[str]], optional) –
The order of relay.
If a string is provided:
”FIXED”: Same order for every round.
”RANDOM”: Random order for every round.
”RANDOM_WITHOUT_SAME_IN_A_ROW”: Shuffled order, no repetition in consecutive rounds.
If a list of strings is provided, it represents a custom order for relay.
allow_early_termination – whether to allow early workflow termination from clients
- Raises:
TypeError – when any of input arguments does not have correct type
- control_flow(abort_signal: Signal, fl_ctx: FLContext)[source]¶
This is the control logic for the RUN.
NOTE: this is running in a separate thread, and its life is the duration of the RUN.
- Parameters:
fl_ctx – the FL context
abort_signal – the abort signal. If triggered, this method stops waiting and returns to the caller.
- get_persist_state(fl_ctx: FLContext) dict [source]¶
Generate data from state to be persisted.
- Parameters:
fl_ctx – FLContext
- Returns:
A dict serializable persist data
- process_result_of_unknown_task(client: Client, task_name: str, client_task_id: str, result: Shareable, fl_ctx: FLContext)[source]¶
Process result when no task is found for it.
This is called when a result submission is received from a client, but no standing task can be found for it (from the task queue)
This could happen when: - the client’s submission is too late - the task is already completed - the Controller lost the task, e.g. the Server is restarted
- Parameters:
client – the client that the result comes from
task_name – the name of the task
client_task_id – ID of the task
result – the result from the client
fl_ctx – the FL context that comes with the client’s submission
- restore(state_data: dict, fl_ctx: FLContext)[source]¶
Restore the state from persisted data.
- Parameters:
state_data – serialized persist data
fl_ctx – FLContext
- start_controller(fl_ctx: FLContext)[source]¶
Starts the controller.
This method is called at the beginning of the RUN.
- Parameters:
fl_ctx – the FL context. You can use this context to access services provided by the
example (framework. For)
your (you can get Command Register from it and register)
modules. (admin command)
- stop_controller(fl_ctx: FLContext)[source]¶
Stops the controller.
This method is called right before the RUN is ended.
- Parameters:
fl_ctx – the FL context. You can use this context to access services provided by the
example (framework. For)
your (you can get Command Register from it and unregister)
modules. (admin command)