fluke.evaluation¶
This module contains the definition of the evaluation classes used to perform the evaluation of the model client-side and server-side.
This class is the base class for all evaluators in |
|
Evaluate a PyTorch model for classification. |
interface fluke.evaluation.Evaluator
- class fluke.evaluation.Evaluator(eval_every: int = 1)[source]¶
Bases:
ABCThis class is the base class for all evaluators in
fluke. An evaluator object should be used to perform the evaluation of a (federated) model.- Parameters:
eval_every (int) – The evaluation frequency expressed as the number of rounds between two evaluations. Defaults to 1, i.e., evaluate the model at each round.
- abstractmethod evaluate(round: int, model: Module, eval_data_loader: FastDataLoader, loss_fn: Module | None, additional_metrics: dict[str, Metric] | None = None, **kwargs) dict[str, Any][source]¶
Evaluate the model.
- Parameters:
round (int) – The current
model (Module) – The model to evaluate.
eval_data_loader (FastDataLoader) – The data loader to use for evaluation.
loss_fn (torch.nn.Module, optional) – The loss function to use for evaluation.
additional_metrics (dict[str, Metric], optional) – Additional metrics to use for evaluation. If provided, they are added to the default metrics.
**kwargs – Additional keyword arguments.
- Returns:
A dictionary containing the computed metrics.
- Return type:
class fluke.evaluation.ClassificationEval
- class fluke.evaluation.ClassificationEval(eval_every: int, n_classes: int, **metrics: Metric)[source]¶
Bases:
EvaluatorEvaluate a PyTorch model for classification. The metrics computed are
accuracy,precision,recall,f1and the loss according to the provided loss functionloss_fnwhen calling the methodevaluation. Metrics are computed both in a micro and macro fashion.- Parameters:
- add_metric(name: str, metric: Metric) None[source]¶
Add a metric to the evaluator.
- Parameters:
name (str) – The name of the metric.
metric (Metric) – The metric to add.
- evaluate(round: int, model: Module, eval_data_loader: FastDataLoader | Collection[FastDataLoader], loss_fn: Module | None = None, additional_metrics: dict[str, Metric] | None = None, device: device = device(type='cpu')) dict[source]¶
Evaluate the model. The metrics computed are
accuracy,precision,recall,f1and the loss according to the provided loss functionloss_fn. Metrics are computed both in a micro and macro fashion.Warning
The loss function
loss_fnshould be defined on the same device as the model. Moreover, it is assumed that the only arguments of the loss function are the predicted values and the true values.- Parameters:
round (int) – The current round.
model (torch.nn.Module) – The model to evaluate. If
None, the method returns an empty dictionary.eval_data_loader (Union[FastDataLoader, Collection[FastDataLoader]]) – The data loader(s) to use for evaluation. If
None, the method returns an empty dictionary.loss_fn (torch.nn.Module, optional) – The loss function to use for evaluation.
additional_metrics (dict[str, Metric], optional) – Additional metrics to use for evaluation. If provided, they are added to the default metrics.
device (torch.device, optional) – The device to use for evaluation. Defaults to “cpu”.
- Returns:
A dictionary containing the computed metrics.
- Return type:
interface fluke.evaluation.PerformanceTracker
- class fluke.evaluation.PerformanceTracker[source]¶
Bases:
object- add(perf_type: Literal['global', 'locals', 'pre-fit', 'post-fit', 'comm', 'mem'], metrics: dict[str, float] | float, round: int = 0, client_id: int | None = None) None[source]¶
Add performance metrics for a specific type and client.
- Parameters:
perf_type (Literal["global", "locals", "pre-fit", "post-fit", "comm", "mem"]) – The type of performance metrics to add.
metrics (dict[str, float] | float) – The performance metrics to add. If perf_type is “comm”, this should be a single float value representing the communication cost.
round (int, optional) – The current round. Defaults to 0.
client_id (int, optional) – The client ID for local performance metrics. Defaults to None for global metrics.
- get(perf_type: Literal['global', 'locals', 'pre-fit', 'post-fit', 'comm', 'mem'], round: int) dict | float[source]¶
Get performance metrics for a specific type and round.
- Parameters:
perf_type (Literal["global", "locals", "pre-fit", "post-fit", "comm", "mem"]) – The type of performance metrics to retrieve.
round (int) – The round for which to retrieve the metrics.
- Raises:
ValueError – If the perf_type is unknown.
- Returns:
- The performance metrics for the specified type and round.
If perf_type is “comm” or “mem”, returns a float; otherwise, returns a dict.
- Return type:
- summary(perf_type: Literal['global', 'locals', 'pre-fit', 'post-fit', 'comm', 'mem'], round: int, include_round: bool = True, force_round: bool = True) dict | float[source]¶
Get the summary of the performance metrics for a specific type.
Summary metrics are computed as the mean of the metrics for the specified type and round. If perf_type is “comm”, the total communication cost is returned. If perf_type is “mem”, the memory usage is returned. If perf_type is “global”, the metrics are returned as they are.
- Parameters:
perf_type (Literal["global", "locals", "pre-fit", "post-fit", "comm", "mem"]) – The type of performance metrics to retrieve.
round (int) – The round for which to compute the summary of the metrics.
include_round (bool, optional) – Whether to include the round number in the returned metrics. Defaults to True.
force_round (bool, optional) – If True, the method will return the metrics for the specified round if it exists, otherwise it will return the metrics for the latest round. Defaults to False.
- Raises:
ValueError – If the perf_type is unknown or if there are no metrics for the specified type and round.
- Returns:
The summary performance metrics for the specified type.
- Return type: