fluke.evaluation

This module contains the definition of the evaluation classes used to perform the evaluation of the model client-side and server-side.

Evaluator

This class is the base class for all evaluators in fluke.

ClassificationEval

Evaluate a PyTorch model for classification.

interface fluke.evaluation.Evaluator

class fluke.evaluation.Evaluator(eval_every: int = 1)[source]

Bases: ABC

This class is the base class for all evaluators in fluke. An evaluator object should be used to perform the evaluation of a (federated) model.

Parameters:

eval_every (int) – The evaluation frequency expressed as the number of rounds between two evaluations. Defaults to 1, i.e., evaluate the model at each round.

eval_every

The evaluation frequency.

Type:

int

abstractmethod evaluate(round: int, model: Module, eval_data_loader: FastDataLoader, loss_fn: Module | None, additional_metrics: dict[str, Metric] | None = None, **kwargs) dict[str, Any][source]

Evaluate the model.

Parameters:
  • round (int) – The current

  • model (Module) – The model to evaluate.

  • eval_data_loader (FastDataLoader) – The data loader to use for evaluation.

  • loss_fn (torch.nn.Module, optional) – The loss function to use for evaluation.

  • additional_metrics (dict[str, Metric], optional) – Additional metrics to use for evaluation. If provided, they are added to the default metrics.

  • **kwargs – Additional keyword arguments.

Returns:

A dictionary containing the computed metrics.

Return type:

dict[str, Any]

class fluke.evaluation.ClassificationEval

class fluke.evaluation.ClassificationEval(eval_every: int, n_classes: int, **metrics: Metric)[source]

Bases: Evaluator

Evaluate a PyTorch model for classification. The metrics computed are accuracy, precision, recall, f1 and the loss according to the provided loss function loss_fn when calling the method evaluation. Metrics are computed both in a micro and macro fashion.

Parameters:
  • eval_every (int) – The evaluation frequency.

  • n_classes (int) – The number of classes.

  • **metrics (dict[str, Metric]) – The metrics to use for evaluation. If not provided, the default metrics are used: accuracy, macro_precision, macro_recall, macro_f1, micro_precision, micro_recall and micro_f1.

eval_every

The evaluation frequency.

Type:

int

n_classes

The number of classes.

Type:

int

add_metric(name: str, metric: Metric) None[source]

Add a metric to the evaluator.

Parameters:
  • name (str) – The name of the metric.

  • metric (Metric) – The metric to add.

evaluate(round: int, model: Module, eval_data_loader: FastDataLoader | Collection[FastDataLoader], loss_fn: Module | None = None, additional_metrics: dict[str, Metric] | None = None, device: device = device(type='cpu')) dict[source]

Evaluate the model. The metrics computed are accuracy, precision, recall, f1 and the loss according to the provided loss function loss_fn. Metrics are computed both in a micro and macro fashion.

Warning

The loss function loss_fn should be defined on the same device as the model. Moreover, it is assumed that the only arguments of the loss function are the predicted values and the true values.

Parameters:
  • round (int) – The current round.

  • model (torch.nn.Module) – The model to evaluate. If None, the method returns an empty dictionary.

  • eval_data_loader (Union[FastDataLoader, Collection[FastDataLoader]]) – The data loader(s) to use for evaluation. If None, the method returns an empty dictionary.

  • loss_fn (torch.nn.Module, optional) – The loss function to use for evaluation.

  • additional_metrics (dict[str, Metric], optional) – Additional metrics to use for evaluation. If provided, they are added to the default metrics.

  • device (torch.device, optional) – The device to use for evaluation. Defaults to “cpu”.

Returns:

A dictionary containing the computed metrics.

Return type:

dict

interface fluke.evaluation.PerformanceTracker

class fluke.evaluation.PerformanceTracker[source]

Bases: object

add(perf_type: Literal['global', 'locals', 'pre-fit', 'post-fit', 'comm', 'mem'], metrics: dict[str, float] | float, round: int = 0, client_id: int | None = None) None[source]

Add performance metrics for a specific type and client.

Parameters:
  • perf_type (Literal["global", "locals", "pre-fit", "post-fit", "comm", "mem"]) – The type of performance metrics to add.

  • metrics (dict[str, float] | float) – The performance metrics to add. If perf_type is “comm”, this should be a single float value representing the communication cost.

  • round (int, optional) – The current round. Defaults to 0.

  • client_id (int, optional) – The client ID for local performance metrics. Defaults to None for global metrics.

get(perf_type: Literal['global', 'locals', 'pre-fit', 'post-fit', 'comm', 'mem'], round: int) dict | float[source]

Get performance metrics for a specific type and round.

Parameters:
  • perf_type (Literal["global", "locals", "pre-fit", "post-fit", "comm", "mem"]) – The type of performance metrics to retrieve.

  • round (int) – The round for which to retrieve the metrics.

Raises:

ValueError – If the perf_type is unknown.

Returns:

The performance metrics for the specified type and round.

If perf_type is “comm” or “mem”, returns a float; otherwise, returns a dict.

Return type:

Union[dict, float]

summary(perf_type: Literal['global', 'locals', 'pre-fit', 'post-fit', 'comm', 'mem'], round: int, include_round: bool = True, force_round: bool = True) dict | float[source]

Get the summary of the performance metrics for a specific type.

Summary metrics are computed as the mean of the metrics for the specified type and round. If perf_type is “comm”, the total communication cost is returned. If perf_type is “mem”, the memory usage is returned. If perf_type is “global”, the metrics are returned as they are.

Parameters:
  • perf_type (Literal["global", "locals", "pre-fit", "post-fit", "comm", "mem"]) – The type of performance metrics to retrieve.

  • round (int) – The round for which to compute the summary of the metrics.

  • include_round (bool, optional) – Whether to include the round number in the returned metrics. Defaults to True.

  • force_round (bool, optional) – If True, the method will return the metrics for the specified round if it exists, otherwise it will return the metrics for the latest round. Defaults to False.

Raises:

ValueError – If the perf_type is unknown or if there are no metrics for the specified type and round.

Returns:

The summary performance metrics for the specified type.

Return type:

Union[dict, float]