nvflare.app_common.workflows.statistics_controller module¶

class StatisticsController(statistic_configs: Dict[str, dict], writer_id: str, wait_time_after_min_received: int = 1, result_wait_timeout: int = 10, precision=4, min_clients: int | None = None, enable_pre_run_task: bool = True)[source]¶

Bases: Controller

Parameters:

statistic_configs –
defines the input statistic to be computed and each statistic’s configuration. The key is one of statistic names sum, count, mean, stddev, histogram the value is the arguments needed. all other statistics except histogram require no argument.

”statistic_configs”: {

“count”: {}, “mean”: {}, “sum”: {}, “stddev”: {}, “histogram”: { “*”: {“bins”: 20 },

”Age”: {“bins”: 10, “range”:[0,120]}

}

},

Histogram requires the following :param 1) numbers of bins or buckets of the histogram: :param 2) the histogram range values [min: :param max]: :param These arguments are different for each feature. Here are few examples:

”histogram”: { “*”: {“bins”: 20 },
”Age”: {“bins”: 10, “range”:[0,120] }

The configuration specify that
feature ‘Age’ will have 10 bins for histogram and the range is within [0, 120) for all other features, the default (“*”) configuration is used, with bins = 20. but the range of histogram is not specified, thus requires the Statistics controller to dynamically estimate histogram range for each feature. Then this estimated global range (est global min, est. global max) will be used as histogram range.

to dynamically estimated such histogram range, we need client to provide the local min and max values in order to calculate the global bin and max value. But to protect data privacy and avoid the data leak, the noise level is added to the local min/max value before send to the controller. Therefore the controller only get the ‘estimated’ values, the global min/max are estimated, or more accurately, noised global min/max values.

Here is another example:

”histogram”: { “density”: {“bins”: 10, “range”:[0,120] }

in this example, there is no default histogram configuration for other features.

This will work correctly if there is only one feature called “density” but will fail if there are other features in the dataset

In the following configuration

”statistic_configs”: {
“count”: {}, “mean”: {}, “stddev”: {}

} only count, mean and stddev statistics are specified, then the statistics_controller will only set tasks to calculate these three statistics
writer_id – ID for StatisticsWriter. The StatisticWriter will save the result to output specified by the StatisticsWriter
wait_time_after_min_received – numbers of seconds to wait after minimum numer of clients specified has received.
result_wait_timeout – numbers of seconds to wait until we received all results. Notice this is after the min_clients have arrived, and we wait for result process callback, this becomes important if the data size to be processed is large
precision – number of precision digits
min_clients – if specified, min number of clients we have to wait before process.

control_flow(abort_signal: Signal, fl_ctx: FLContext)[source]¶

This is the control logic for the RUN.

NOTE: this is running in a separate thread, and its life is the duration of the RUN.

Parameters:

fl_ctx – the FL context
abort_signal – the abort signal. If triggered, this method stops waiting and returns to the caller.

handle_client_errors(rc: str, client_task: ClientTask, fl_ctx: FLContext)[source]¶

post_fn(task_name: str, fl_ctx: FLContext)[source]¶

pre_run_task_flow(abort_signal: Signal, fl_ctx: FLContext)[source]¶

process_result_of_unknown_task(client: Client, task_name: str, client_task_id: str, result: Shareable, fl_ctx: FLContext)[source]¶

Process result when no task is found for it.

This is called when a result submission is received from a client, but no standing task can be found for it (from the task queue)

This could happen when: - the client’s submission is too late - the task is already completed - the Controller lost the task, e.g. the Server is restarted

Parameters:

client – the client that the result comes from
task_name – the name of the task
client_task_id – ID of the task
result – the result from the client
fl_ctx – the FL context that comes with the client’s submission

results_cb(client_task: ClientTask, fl_ctx: FLContext)[source]¶

results_pre_run_cb(client_task: ClientTask, fl_ctx: FLContext)[source]¶

start_controller(fl_ctx: FLContext)[source]¶

Starts the controller.

This method is called at the beginning of the RUN.

Parameters:

fl_ctx – the FL context. You can use this context to access services provided by the
example (framework. For) –
your (you can get Command Register from it and register) –
modules. (admin command) –

statistics_task_flow(abort_signal: Signal, fl_ctx: FLContext, statistic_task: str)[source]¶

stop_controller(fl_ctx: FLContext)[source]¶

Stops the controller.

This method is called right before the RUN is ended.

Parameters:

fl_ctx – the FL context. You can use this context to access services provided by the
example (framework. For) –
your (you can get Command Register from it and unregister) –
modules. (admin command) –