nvflare.app_common.workflows.statistics_controller module

class StatisticsController(statistic_configs: Dict[str, dict], writer_id: str, wait_time_after_min_received: int = 1, result_wait_timeout: int = 10, precision=4, min_clients: int | None = None, enable_pre_run_task: bool = True)[source]

Bases: Controller

Controller for Statistics.

Parameters:

statistic_configs – defines the input statistic to be computed and each statistic’s configuration, see below for details.
writer_id – ID for StatisticsWriter. The StatisticWriter will save the result to output specified by the StatisticsWriter
wait_time_after_min_received – numbers of seconds to wait after minimum numer of clients specified has received.
result_wait_timeout – numbers of seconds to wait until we received all results. Notice this is after the min_clients have arrived, and we wait for result process callback, this becomes important if the data size to be processed is large
precision – number of precision digits
min_clients – if specified, min number of clients we have to wait before process.

For statistic_configs, the key is one of statistics’ names sum, count, mean, stddev, histogram, and the value is the arguments needed. All other statistics except histogram require no argument.

"statistic_configs": {
    "count": {},
    "mean": {},
    "sum": {},
    "stddev": {},
    "histogram": {
        "*": {"bins": 20},
        "Age": {"bins": 10, "range": [0, 120]}
    },
    quantile: {
        "*": [25, 50, 75, 90],
        "Age": [50, 75, 95]
    }
},

Histogram requires the following arguments:

numbers of bins or buckets of the histogram
the histogram range values [min, max]

These arguments are different for each feature. Here are few examples:

"histogram": {
                "*": {"bins": 20 },
                "Age": {"bins": 10, "range":[0,120]}
             }

The configuration specifies that the feature ‘Age’ will have 10 bins for and the range is within [0, 120). For all other features, the default (“*”) configuration is used, with bins = 20. The range of histogram is not specified, thus requires the Statistics controller to dynamically estimate histogram range for each feature. Then this estimated global range (est global min, est. global max) will be used as the histogram range.

To dynamically estimate such a histogram range, we need the client to provide the local min and max values in order to calculate the global bin and max value. In order to protect data privacy and avoid data leakage, a noise level is added to the local min/max value before sending to the controller. Therefore the controller only gets the ‘estimated’ values, and the global min/max are estimated, or more accurately, they are noised global min/max values.

Here is another example:

"histogram": {
                "density": {"bins": 10, "range":[0,120]}
             }

In this example, there is no default histogram configuration for other features.

This will work correctly if there is only one feature called “density” but will fail if there are other features in the dataset.

In the following configuration:

"statistic_configs": {
    "count": {},
    "mean": {},
    "stddev": {}
}

Only count, mean and stddev statistics are specified, so the statistics_controller will only set tasks to calculate these three statistics.

control_flow(abort_signal: Signal, fl_ctx: FLContext)[source]

This is the control logic for the RUN.

NOTE: this is running in a separate thread, and its life is the duration of the RUN.

Parameters:

fl_ctx – the FL context
abort_signal – the abort signal. If triggered, this method stops waiting and returns to the caller.

handle_client_errors(rc: str, client_task: ClientTask, fl_ctx: FLContext)[source]

post_fn(task_name: str, fl_ctx: FLContext)[source]

pre_run_task_flow(abort_signal: Signal, fl_ctx: FLContext)[source]

process_result_of_unknown_task(client: Client, task_name: str, client_task_id: str, result: Shareable, fl_ctx: FLContext)[source]

Process result when no task is found for it.

This is called when a result submission is received from a client, but no standing task can be found for it (from the task queue)

This could happen when: - the client’s submission is too late - the task is already completed - the Controller lost the task, e.g. the Server is restarted

Parameters:

client – the client that the result comes from
task_name – the name of the task
client_task_id – ID of the task
result – the result from the client
fl_ctx – the FL context that comes with the client’s submission

results_cb(client_task: ClientTask, fl_ctx: FLContext)[source]

results_pre_run_cb(client_task: ClientTask, fl_ctx: FLContext)[source]

start_controller(fl_ctx: FLContext)[source]

Starts the controller.

This method is called at the beginning of the RUN.

Parameters:

fl_ctx – the FL context. You can use this context to access services provided by the
example (framework. For)
your (you can get Command Register from it and register)
modules. (admin command)

statistics_task_flow(abort_signal: Signal, fl_ctx: FLContext, statistic_task: str)[source]

stop_controller(fl_ctx: FLContext)[source]

Stops the controller.

This method is called right before the RUN is ended.

Parameters:

fl_ctx – the FL context. You can use this context to access services provided by the
example (framework. For)
your (you can get Command Register from it and unregister)
modules. (admin command)