gossipy.data module#

Submodules#

Module contents#

This module contains functions and classes to manage datasets loading and dispatching.

class gossipy.data.DataHandler#

Bases: abc.ABC

Abstract class for data handlers.

A DataHandler class provides attributes and methods to manage a dataset. A subclass of DataHandler must implement the following methods:

abstract at(idx, eval_set=False)#

Get a sample (or samples) from the training/test set.

Parameters
  • idx (int or list[int]) – The index or indices of the sample(s) to get.

  • eval_set (bool, default=False) – Whether to get the sample(s) from the training or the evaluation set.

Returns

The sample(s) at the given index(ices) in the training/evaluation set.

Return type

Any

abstract eval_size()#

Get the number of examples of the evaluation set.

Returns

The size of the evaluation set of the dataset.

Return type

int

abstract get_eval_set()#

Get the evaluation set of the dataset.

Returns

The evaluation set of the dataset.

Return type

tuple[Any, Any]

abstract get_train_set()#

Get the training set of the dataset.

Returns

The training set of the dataset.

Return type

tuple[Any, Any]

abstract size(dim=0)#

Get the size of the training set along a given dimension.

Parameters

dim (int, default=0) – The dimension along which to get the size of the dataset.

Returns

The size of the dataset along the given dimension.

Return type

int

gossipy.data.get_CIFAR10(path='./data', as_tensor=True)#

Returns the CIFAR10 dataset.

The method downloads the dataset if it is not already present in path.

Parameters
  • path (str, default="./data") – Path to save the dataset, by default “./data”.

  • as_tensor (bool, default=True) – If True, the dataset is returned as a tuple of pytorch tensors. Otherwise, the dataset is returned as a tuple of numpy arrays. By default, True.

Returns

Tuple of training and test sets of the form \((X_train, y_train), (X_test, y_test)\).

Return type

tuple[tuple[np.ndarray, list], tuple[np.ndarray, list]] or tuple[tuple[Tensor, Tensor], tuple[Tensor, Tensor]]

gossipy.data.get_FashionMNIST(path='./data', as_tensor=True)#

Returns the FashionMNIST dataset.

The method downloads the dataset if it is not already present in path.

Parameters
  • path (str, default="./data") – Path to save the dataset, by default “./data”.

  • as_tensor (bool, default=True) – If True, the dataset is returned as a tuple of pytorch tensors. Otherwise, the dataset is returned as a tuple of numpy arrays. By default, True.

Returns

Tuple of training and test sets of the form \((X_ ext{train}, y_ ext{train}), (X_ ext{test}, y_ ext{test})\).

Return type

Tuple[Tuple[np.ndarray, list], Tuple[np.ndarray, list]] or Tuple[Tuple[Tensor, Tensor], Tuple[Tensor, Tensor]]

gossipy.data.load_classification_dataset(name_or_path, normalize=True, as_tensor=True)#

Loads a classification dataset.

A dataset can be loaded from svmlight file or can be one of the following: iris, breast, digits, wine, reuters, spambase, sonar, ionosphere, abalone, banknote.

Parameters
  • name_or_path (str) – The name of the dataset or the path to the dataset.

  • normalize (bool, default=True) – Whether to normalize (standard scaling) the data or not.

  • as_tensor (bool, default=True) – Whether to return the data as a tensor or as a numpy array.

Returns

A tuple containing the data and the labels with the specified type.

Return type

tuple[torch.Tensor, torch.Tensor] or tuple[np.ndarray, np.ndarray]

gossipy.data.load_recsys_dataset(name, path='.')#

Load a recsys dataset.

Currently, only the following datasets are supported: ml-100k, ml-1m, ml-10m and ml-20m.

Parameters
  • name (str) – The name of the dataset.

  • path (str, default=".") – The path in which to download the dataset.

Returns

A tuple contining the ratings, the number of users and the number of items. Ratings are represented as a dictionary mapping user ids to a list of tuples (item id, rating).

Return type

tuple[dict[int, list[tuple[int, float]]], int, int]