nvflare.app_opt.statistics.df.df_core_statistics module¶
- class DFStatisticsCore(max_bin=None)[source]¶
Bases:
Statistics,ABCInit FLComponent.
The FLComponent is the base class of all FL Components. (executors, controllers, responders, filters, aggregators, and widgets are all FLComponents)
FLComponents have the capability to handle and fire events and contain various methods for logging.
- count(dataset_name: str, feature_name: str) int[source]¶
- Returns record count for given dataset and feature.
to perform data privacy min_count check, count is always required
- Parameters:
dataset_name
feature_name
Returns: number of total records
- Raises:
NotImplementedError –
- features() Dict[str, List[Feature]][source]¶
Return Features for each dataset.
For example, if we have training and test datasets, the method will return { “train”: features1, “test”: features2} where features1,2 are the list of Features which contains feature name and DataType
Returns: Dict[<dataset_name>, List[Feature]]
- Raises:
NotImplementedError –
- histogram(dataset_name: str, feature_name: str, num_of_bins: int, global_min_value: float, global_max_value: float) Histogram[source]¶
- Parameters:
dataset_name – dataset name
feature_name – feature name
num_of_bins – number of bins or buckets
global_min_value – global min value for the histogram range
global_max_value – global max value for the histogram range
Returns: histogram
- Raises:
NotImplementedError will be raised when histogram statistic is configured but not implemented. If the histogram – is not configured to be calculated, no need to implement this method and NotImplementedError will not be raised.
- max_value(dataset_name: str, feature_name: str) float[source]¶
this is needed for histogram calculation, not used for reporting
- mean(dataset_name: str, feature_name: str) float[source]¶
- Parameters:
dataset_name – dataset name
feature_name – feature name
Returns: mean (average) value
- Raises:
NotImplementedError will be raised when mean statistic is configured but not implemented. If the mean is not –
configured to be calculated, no need to implement this method and NotImplementedError will not be raised. –
- min_value(dataset_name: str, feature_name: str) float[source]¶
this is needed for histogram calculation, not used for reporting
- quantiles(dataset_name: str, feature_name: str, percents: List) Dict[source]¶
Return failed count for given dataset and feature.
To perform data privacy min_count check, failure_count is always required.
- Parameters:
dataset_name
feature_name
percentiles – List[Int] ex [25,50, 75] corresponding to p25, p50, p75
Returns: dict
- stddev(dataset_name: str, feature_name: str) float[source]¶
Get local stddev value for given dataset and feature.
- Parameters:
dataset_name – dataset name
feature_name – feature name
Returns: local standard deviation
- Raises:
NotImplementedError will be raised when stddev statistic is configured but not implemented. If the stddev is not –
configured to be calculated, no need to implement this method and NotImplementedError will not be raised. –
- sum(dataset_name: str, feature_name: str) float[source]¶
Calculate local sums for given dataset and feature.
- Parameters:
dataset_name
feature_name
Returns: sum of all records
- Raises:
NotImplementedError will be raised when sum statistic is configured but not implemented. If the sum is not –
configured to be calculated, no need to implement this method and NotImplementedError will not be raised. –
- variance_with_mean(dataset_name: str, feature_name: str, global_mean: float, global_count: float) float[source]¶
Calculate the variance with the given mean and count values.
This is not local variance based on the local mean values. The calculation should be:
m = global mean N = global Count variance = (sum ( x - m)^2))/ (N-1)
This is used to calculate global standard deviation. Therefore, this method must be implemented if stddev statistic is requested
- Parameters:
dataset_name – dataset name
feature_name – feature name
global_mean – global mean value
global_count – total count records across all sites
Returns: variance result
- Raises:
NotImplementedError will be raised when stddev statistic is configured but not implemented. If the stddev is not –
configured to be calculated, no need to implement this method and NotImplementedError will not be raised. –