fluke.evaluation
¶
This module contains the definition of the evaluation classes used to perform the evaluation of the model client-side and server-side.
This class is the base class for all evaluators in |
|
Evaluate a PyTorch model for classification. |
interface fluke.evaluation.Evaluator
- class fluke.evaluation.Evaluator(eval_every: int = 1)[source]¶
Bases:
ABC
This class is the base class for all evaluators in
fluke
. An evaluator object should be used to perform the evaluation of a (federated) model.- Parameters:
eval_every (int) – The evaluation frequency expressed as the number of rounds between two evaluations. Defaults to 1, i.e., evaluate the model at each round.
- abstractmethod evaluate(round: int, model: Module, eval_data_loader: FastDataLoader, loss_fn: Module | None, additional_metrics: dict[str, Metric] | None = None, **kwargs) dict[str, Any] [source]¶
Evaluate the model.
- Parameters:
round (int) – The current
model (Module) – The model to evaluate.
eval_data_loader (FastDataLoader) – The data loader to use for evaluation.
loss_fn (torch.nn.Module, optional) – The loss function to use for evaluation.
additional_metrics (dict[str, Metric], optional) – Additional metrics to use for evaluation. If provided, they are added to the default metrics.
**kwargs – Additional keyword arguments.
- Returns:
A dictionary containing the computed metrics.
- Return type:
class fluke.evaluation.ClassificationEval
- class fluke.evaluation.ClassificationEval(eval_every: int, n_classes: int, **metrics: Metric)[source]¶
Bases:
Evaluator
Evaluate a PyTorch model for classification. The metrics computed are
accuracy
,precision
,recall
,f1
and the loss according to the provided loss functionloss_fn
when calling the methodevaluation
. Metrics are computed both in a micro and macro fashion.- Parameters:
- add_metric(name: str, metric: Metric) None [source]¶
Add a metric to the evaluator.
- Parameters:
name (str) – The name of the metric.
metric (Metric) – The metric to add.
- evaluate(round: int, model: Module, eval_data_loader: FastDataLoader | Collection[FastDataLoader], loss_fn: Module | None = None, additional_metrics: dict[str, Metric] | None = None, device: device = device(type='cpu')) dict [source]¶
Evaluate the model. The metrics computed are
accuracy
,precision
,recall
,f1
and the loss according to the provided loss functionloss_fn
. Metrics are computed both in a micro and macro fashion.Warning
The loss function
loss_fn
should be defined on the same device as the model. Moreover, it is assumed that the only arguments of the loss function are the predicted values and the true values.- Parameters:
round (int) – The current round.
model (torch.nn.Module) – The model to evaluate. If
None
, the method returns an empty dictionary.eval_data_loader (Union[FastDataLoader, Collection[FastDataLoader]]) – The data loader(s) to use for evaluation. If
None
, the method returns an empty dictionary.loss_fn (torch.nn.Module, optional) – The loss function to use for evaluation.
additional_metrics (dict[str, Metric], optional) – Additional metrics to use for evaluation. If provided, they are added to the default metrics.
device (torch.device, optional) – The device to use for evaluation. Defaults to “cpu”.
- Returns:
A dictionary containing the computed metrics.
- Return type:
interface fluke.evaluation.PerformanceTracker
- class fluke.evaluation.PerformanceTracker[source]¶
Bases:
object
- add(perf_type: Literal['global', 'locals', 'pre-fit', 'post-fit', 'comm', 'mem'], metrics: dict[str, float] | float, round: int = 0, client_id: int | None = None) None [source]¶
Add performance metrics for a specific type and client.
- Parameters:
perf_type (Literal["global", "locals", "pre-fit", "post-fit", "comm", "mem"]) – The type of performance metrics to add.
metrics (dict[str, float] | float) – The performance metrics to add. If perf_type is “comm”, this should be a single float value representing the communication cost.
round (int, optional) – The current round. Defaults to 0.
client_id (int, optional) – The client ID for local performance metrics. Defaults to None for global metrics.
- get(perf_type: Literal['global', 'locals', 'pre-fit', 'post-fit', 'comm', 'mem'], round: int) dict | float [source]¶
Get performance metrics for a specific type and round.
- Parameters:
perf_type (Literal["global", "locals", "pre-fit", "post-fit", "comm", "mem"]) – The type of performance metrics to retrieve.
round (int) – The round for which to retrieve the metrics.
- Raises:
ValueError – If the perf_type is unknown.
- Returns:
- The performance metrics for the specified type and round.
If perf_type is “comm” or “mem”, returns a float; otherwise, returns a dict.
- Return type:
- summary(perf_type: Literal['global', 'locals', 'pre-fit', 'post-fit', 'comm', 'mem'], round: int, include_round: bool = True, force_round: bool = True) dict | float [source]¶
Get the summary of the performance metrics for a specific type.
Summary metrics are computed as the mean of the metrics for the specified type and round. If perf_type is “comm”, the total communication cost is returned. If perf_type is “mem”, the memory usage is returned. If perf_type is “global”, the metrics are returned as they are.
- Parameters:
perf_type (Literal["global", "locals", "pre-fit", "post-fit", "comm", "mem"]) – The type of performance metrics to retrieve.
round (int) – The round for which to compute the summary of the metrics.
include_round (bool, optional) – Whether to include the round number in the returned metrics. Defaults to True.
force_round (bool, optional) – If True, the method will return the metrics for the specified round if it exists, otherwise it will return the metrics for the latest round. Defaults to False.
- Raises:
ValueError – If the perf_type is unknown or if there are no metrics for the specified type and round.
- Returns:
The summary performance metrics for the specified type.
- Return type: