nvflare.app_common.workflows.statistics_controller module

class StatisticsController(statistic_configs: Dict[str, dict], writer_id: str, wait_time_after_min_received: int = 1, result_wait_timeout: int = 10, precision=4, min_clients: int | None = None, enable_pre_run_task: bool = True)[source]

Bases: Controller

Parameters:
  • statistic_configs

    defines the input statistic to be computed and each statistic’s configuration. The key is one of statistic names sum, count, mean, stddev, histogram the value is the arguments needed. all other statistics except histogram require no argument.

    ”statistic_configs”: {

    “count”: {}, “mean”: {}, “sum”: {}, “stddev”: {}, “histogram”: { “*”: {“bins”: 20 },

    ”Age”: {“bins”: 10, “range”:[0,120]}

    }

    },

    Histogram requires the following :param 1) numbers of bins or buckets of the histogram: :param 2) the histogram range values [min: :param max]: :param These arguments are different for each feature. Here are few examples:

    ”histogram”: { “*”: {“bins”: 20 },

    ”Age”: {“bins”: 10, “range”:[0,120] }

    The configuration specify that

    feature ‘Age’ will have 10 bins for histogram and the range is within [0, 120) for all other features, the default (“*”) configuration is used, with bins = 20. but the range of histogram is not specified, thus requires the Statistics controller to dynamically estimate histogram range for each feature. Then this estimated global range (est global min, est. global max) will be used as histogram range.

    to dynamically estimated such histogram range, we need client to provide the local min and max values in order to calculate the global bin and max value. But to protect data privacy and avoid the data leak, the noise level is added to the local min/max value before send to the controller. Therefore the controller only get the ‘estimated’ values, the global min/max are estimated, or more accurately, noised global min/max values.

    Here is another example:

    ”histogram”: { “density”: {“bins”: 10, “range”:[0,120] }

    in this example, there is no default histogram configuration for other features.

    This will work correctly if there is only one feature called “density” but will fail if there are other features in the dataset

    In the following configuration
    ”statistic_configs”: {

    “count”: {}, “mean”: {}, “stddev”: {}

    } only count, mean and stddev statistics are specified, then the statistics_controller will only set tasks to calculate these three statistics

  • writer_id – ID for StatisticsWriter. The StatisticWriter will save the result to output specified by the StatisticsWriter

  • wait_time_after_min_received – numbers of seconds to wait after minimum numer of clients specified has received.

  • result_wait_timeout – numbers of seconds to wait until we received all results. Notice this is after the min_clients have arrived, and we wait for result process callback, this becomes important if the data size to be processed is large

  • precision – number of precision digits

  • min_clients – if specified, min number of clients we have to wait before process.

control_flow(abort_signal: Signal, fl_ctx: FLContext)[source]

This is the control logic for the RUN.

NOTE: this is running in a separate thread, and its life is the duration of the RUN.

Parameters:
  • fl_ctx – the FL context

  • abort_signal – the abort signal. If triggered, this method stops waiting and returns to the caller.

handle_client_errors(rc: str, client_task: ClientTask, fl_ctx: FLContext)[source]
post_fn(task_name: str, fl_ctx: FLContext)[source]
pre_run_task_flow(abort_signal: Signal, fl_ctx: FLContext)[source]
process_result_of_unknown_task(client: Client, task_name: str, client_task_id: str, result: Shareable, fl_ctx: FLContext)[source]

Process result when no task is found for it.

This is called when a result submission is received from a client, but no standing task can be found for it (from the task queue)

This could happen when: - the client’s submission is too late - the task is already completed - the Controller lost the task, e.g. the Server is restarted

Parameters:
  • client – the client that the result comes from

  • task_name – the name of the task

  • client_task_id – ID of the task

  • result – the result from the client

  • fl_ctx – the FL context that comes with the client’s submission

results_cb(client_task: ClientTask, fl_ctx: FLContext)[source]
results_pre_run_cb(client_task: ClientTask, fl_ctx: FLContext)[source]
start_controller(fl_ctx: FLContext)[source]

Starts the controller.

This method is called at the beginning of the RUN.

Parameters:
  • fl_ctx – the FL context. You can use this context to access services provided by the

  • example (framework. For) –

  • your (you can get Command Register from it and register) –

  • modules. (admin command) –

statistics_task_flow(abort_signal: Signal, fl_ctx: FLContext, statistic_task: str)[source]
stop_controller(fl_ctx: FLContext)[source]

Stops the controller.

This method is called right before the RUN is ended.

Parameters:
  • fl_ctx – the FL context. You can use this context to access services provided by the

  • example (framework. For) –

  • your (you can get Command Register from it and unregister) –

  • modules. (admin command) –