gossipy.data.handler module#

Module contents#

class gossipy.data.handler.ClassificationDataHandler(X, y, X_te=None, y_te=None, test_size=0.2, seed=42)#

Bases: gossipy.data.DataHandler

Handler for classification data.

The handlers provides methods to access the data and to split it into training and evaluation sets.

Parameters
  • X (np.ndarray or torch.Tensor) – The data set examples matrix. If X_te is not None, then the data set is assumed to be already splitted into training and evaluation set (test_size will be ignored).

  • y (np.ndarray or torch.Tensor) – The data set labels.

  • X_te (np.ndarray or torch.Tensor, default=None) – The evaluation set examples matrix.

  • y_te (np.ndarray or torch.Tensor, default=None) – The evaluation set labels.

  • test_size (float, default=0.2) – The size of the evaluation set as a fraction of the data set.

  • seed (int, default=42) – The seed used to split the data set into training and evaluation set.

at(idx, eval_set=False)#

Get the data set example and label at the given index or list of indices.

Parameters
  • idx (int or list of int) – The index or list of indices of the data set examples to get.

  • eval_set (bool, default=False) – If True, the data set example and label are retrieved from the evaluation set. Otherwise, they are retrieved from the training set.

Return type

Tuple[ndarray, int]

Returns

  • X (np.ndarray) – The data set example.

  • y (int) – The data set label.

eval_size()#

Get the number of examples of the evaluation set.

Returns

The size of the evaluation set of the dataset.

Return type

int

get_eval_set()#

Get the evaluation set of the dataset.

Returns

The evaluation set of the dataset.

Return type

tuple[Any, Any]

get_train_set()#

Get the training set of the dataset.

Returns

The training set of the dataset.

Return type

tuple[Any, Any]

size(dim=0)#

Get the size of the training set along a given dimension.

Parameters

dim (int, default=0) – The dimension along which to get the size of the dataset.

Returns

The size of the dataset along the given dimension.

Return type

int

class gossipy.data.handler.ClusteringDataHandler(X, y)#

Bases: gossipy.data.handler.ClassificationDataHandler

Handler for clustering (unsupervised) data.

The handlers provides methods to access the data. The evaluation set is the training set.

Parameters
  • X (Union[np.ndarray, torch.Tensor]) – The data set examples matrix.

  • y (Union[np.ndarray, torch.Tensor]) – The data set labels.

eval_size()#

Get the number of examples of the evaluation set.

Returns

The size of the evaluation set of the dataset.

Return type

int

get_eval_set()#

Get the evaluation set of the dataset.

Returns

The evaluation set of the dataset.

Return type

tuple[Any, Any]

class gossipy.data.handler.RecSysDataHandler(ratings, n_users, n_items, test_size=0.2, seed=42)#

Bases: gossipy.data.DataHandler

Handler for recommendation system data.

The handlers provides methods to access the rating data.

Parameters
  • ratings (Dict[int, List[Tuple[int, float]]]) – The user-item ratings.

  • n_users (int) – The number of users.

  • n_items (int) – The number of items.

  • test_size (float, default=0.2) – The size of the evaluation set as a fraction of the data set. The division is performed user-wise, i.e., for each user a subset of its ratings is selected as the evaluation set.

  • seed (int, default=42) – The seed used to split the data set into training and evaluation set.

at(idx, eval_set=False)#

Get a sample (or samples) from the training/test set.

Parameters
  • idx (int or list[int]) – The index or indices of the sample(s) to get.

  • eval_set (bool, default=False) – Whether to get the sample(s) from the training or the evaluation set.

Returns

The sample(s) at the given index(ices) in the training/evaluation set.

Return type

Any

eval_size()#

Get the number of examples of the evaluation set.

Returns

The size of the evaluation set of the dataset.

Return type

int

get_eval_set()#

Get the evaluation set of the dataset.

Returns

The evaluation set of the dataset.

Return type

tuple[Any, Any]

get_train_set()#

Get the training set of the dataset.

Returns

The training set of the dataset.

Return type

tuple[Any, Any]

size()#

Get the size of the training set along a given dimension.

Parameters

dim (int, default=0) – The dimension along which to get the size of the dataset.

Returns

The size of the dataset along the given dimension.

Return type

int

class gossipy.data.handler.RegressionDataHandler(X, y, X_te=None, y_te=None, test_size=0.2, seed=42)#

Bases: gossipy.data.handler.ClassificationDataHandler

Handler for classification data.

The handlers provides methods to access the data and to split it into training and evaluation sets.

Parameters
  • X (np.ndarray or torch.Tensor) – The data set examples matrix. If X_te is not None, then the data set is assumed to be already splitted into training and evaluation set (test_size will be ignored).

  • y (np.ndarray or torch.Tensor) – The data set labels.

  • X_te (np.ndarray or torch.Tensor, default=None) – The evaluation set examples matrix.

  • y_te (np.ndarray or torch.Tensor, default=None) – The evaluation set labels.

  • test_size (float, default=0.2) – The size of the evaluation set as a fraction of the data set.

  • seed (int, default=42) – The seed used to split the data set into training and evaluation set.

at(idx, eval_set=False)#

Get the data set example and label at the given index or list of indices.

Parameters
  • idx (int or list of int) – The index or list of indices of the data set examples to get.

  • eval_set (bool, default=False) – If True, the data set example and label are retrieved from the evaluation set. Otherwise, they are retrieved from the training set.

Return type

Tuple[ndarray, float]

Returns

  • X (np.ndarray) – The data set example.

  • y (int) – The data set label.