nvflare.app_common.abstract.statistics_spec module

class Bin(low_value, high_value, sample_count)[source]

Bases: tuple

Create new instance of Bin(low_value, high_value, sample_count)

high_value: float

Alias for field number 1

low_value: float

Alias for field number 0

sample_count: float

Alias for field number 2

class BinRange(min_value, max_value)[source]

Bases: tuple

Create new instance of BinRange(min_value, max_value)

max_value: float

Alias for field number 1

min_value: float

Alias for field number 0

class DataType(value)[source]

Bases: IntEnum

An enumeration.

BYTES = 3
DATETIME = 5
FLOAT = 1
INT = 0
STRING = 2
STRUCT = 4
class Feature(feature_name, data_type)[source]

Bases: tuple

Create new instance of Feature(feature_name, data_type)

data_type: DataType

Alias for field number 1

feature_name: str

Alias for field number 0

class Histogram(hist_type, bins, hist_name)[source]

Bases: tuple

Create new instance of Histogram(hist_type, bins, hist_name)

bins: List[Bin]

Alias for field number 1

hist_name: str | None

Alias for field number 2

hist_type: HistogramType

Alias for field number 0

class HistogramType(value)[source]

Bases: IntEnum

An enumeration.

QUANTILES = 1
STANDARD = 0
class StatisticConfig(name, config)[source]

Bases: tuple

Create new instance of StatisticConfig(name, config)

config: dict

Alias for field number 1

name: str

Alias for field number 0

class Statistics[source]

Bases: InitFinalComponent, ABC

Init FLComponent.

The FLComponent is the base class of all FL Components. (executors, controllers, responders, filters, aggregators, and widgets are all FLComponents)

FLComponents have the capability to handle and fire events and contain various methods for logging.

abstract count(dataset_name: str, feature_name: str) int[source]
Returns record count for given dataset and feature.

to perform data privacy min_count check, count is always required

Parameters:
  • dataset_name

  • feature_name

Returns: number of total records

Raises:

NotImplementedError

failure_count(dataset_name: str, feature_name: str) int[source]

Return failed count for given dataset and feature.

To perform data privacy min_count check, failure_count is always required.

Parameters:
  • dataset_name

  • feature_name

Returns: number of failure records, default to 0

abstract features() Dict[str, List[Feature]][source]

Return Features for each dataset.

For example, if we have training and test datasets, the method will return { “train”: features1, “test”: features2} where features1,2 are the list of Features which contains feature name and DataType

Returns: Dict[<dataset_name>, List[Feature]]

Raises:

NotImplementedError

finalize(fl_ctx: FLContext)[source]

Called to finalize the Statistic calculator (close/release resources gracefully).

After this call, the Learner will be destroyed.

histogram(dataset_name: str, feature_name: str, num_of_bins: int, global_min_value: float, global_max_value: float) Histogram[source]
Parameters:
  • dataset_name – dataset name

  • feature_name – feature name

  • num_of_bins – number of bins or buckets

  • global_min_value – global min value for the histogram range

  • global_max_value – global max value for the histogram range

Returns: histogram

Raises:

NotImplementedError will be raised when histogram statistic is configured but not implemented. If the histogram – is not configured to be calculated, no need to implement this method and NotImplementedError will not be raised.

initialize(fl_ctx: FLContext)[source]

This is called when client is start Run. At this point the server hasn’t communicated to the Statistics calculator yet.

Parameters:

fl_ctx – fl_ctx: FLContext of the running environment

max_value(dataset_name: str, feature_name: str) float[source]

Returns max value.

This method is only needed when “histogram” statistic is configured and the histogram range is not specified. And the histogram range needs to dynamically estimated based on the client’s local min/max values. this method returns local max value. The actual max value will not directly return to the FL server. the data privacy policy will add additional noise to the estimated value.

Parameters:
  • dataset_name – dataset name

  • feature_name – feature name

Returns: local max value

Raises:
  • NotImplementedError will be raised when histogram statistic is configured and histogram range for the

  • given feature is not specified, and this method is not implemented. If the histogram

  • is not configured to be calculated; or the given feature histogram range is already specified.

  • no need to implement this method and NotImplementedError will not be raised.

mean(dataset_name: str, feature_name: str) float[source]
Parameters:
  • dataset_name – dataset name

  • feature_name – feature name

Returns: mean (average) value

Raises:
  • NotImplementedError will be raised when mean statistic is configured but not implemented. If the mean is not

  • configured to be calculated, no need to implement this method and NotImplementedError will not be raised.

min_value(dataset_name: str, feature_name: str) float[source]

Returns min value.

This method is only needed when “histogram” statistic is configured and the histogram range is not specified. And the histogram range needs to dynamically estimated based on the client’s local min/max values. this method returns local min value. The actual min value will not directly return to the FL server. the data privacy policy will add additional noise to the estimated value.

Parameters:
  • dataset_name – dataset name

  • feature_name – feature name

Returns: local min value

Raises:
  • NotImplementedError will be raised when histogram statistic is configured and histogram range for the

  • given feature is not specified, and this method is not implemented. If the histogram

  • is not configured to be calculated; or the given feature histogram range is already specified.

  • no need to implement this method and NotImplementedError will not be raised.

pre_run(statistics: List[str], num_of_bins: Dict[str, int | None] | None, bin_ranges: Dict[str, List[float] | None] | None)[source]

This method is the initial hand-shake, where controller pass all the requested statistics configuration to client.

This method invocation is optional and Configured via controller argument. If it is configured, this method will be called before all other statistic calculation methods

Parameters:
  • statistics – list of statistics to be calculated, count, sum, etc.

  • num_of_bins – if histogram statistic is required, num_of_bins will be specified for each feature. “*” implies default feature. None value implies the feature’s number of bins is not specified.

  • bin_ranges – if histogram statistic is required, bin_ranges for the feature may be provided. if bin_ranges is None. no bin_range is provided for any feature. if bins_range is not None, but bins_ranges[‘feature_A’] is None, means that for specific feature ‘feature_A’, the bin_range is not provided by user.

Returns: Dict

stddev(dataset_name: str, feature_name: str) float[source]

Get local stddev value for given dataset and feature.

Parameters:
  • dataset_name – dataset name

  • feature_name – feature name

Returns: local standard deviation

Raises:
  • NotImplementedError will be raised when stddev statistic is configured but not implemented. If the stddev is not

  • configured to be calculated, no need to implement this method and NotImplementedError will not be raised.

sum(dataset_name: str, feature_name: str) float[source]

Calculate local sums for given dataset and feature.

Parameters:
  • dataset_name

  • feature_name

Returns: sum of all records

Raises:
  • NotImplementedError will be raised when sum statistic is configured but not implemented. If the sum is not

  • configured to be calculated, no need to implement this method and NotImplementedError will not be raised.

variance_with_mean(dataset_name: str, feature_name: str, global_mean: float, global_count: float) float[source]

Calculate the variance with the given mean and count values.

This is not local variance based on the local mean values. The calculation should be:

m = global mean
N = global Count
variance = (sum ( x - m)^2))/ (N-1)

This is used to calculate global standard deviation. Therefore, this method must be implemented if stddev statistic is requested

Parameters:
  • dataset_name – dataset name

  • feature_name – feature name

  • global_mean – global mean value

  • global_count – total count records across all sites

Returns: variance result

Raises:
  • NotImplementedError will be raised when stddev statistic is configured but not implemented. If the stddev is not

  • configured to be calculated, no need to implement this method and NotImplementedError will not be raised.