gossipy.data.handler module#
Module contents#
- class gossipy.data.handler.ClassificationDataHandler(X, y, X_te=None, y_te=None, test_size=0.2, seed=42)#
Bases:
gossipy.data.DataHandler
Handler for classification data.
The handlers provides methods to access the data and to split it into training and evaluation sets.
- Parameters
X (np.ndarray or torch.Tensor) – The data set examples matrix. If
X_te
is not None, then the data set is assumed to be already splitted into training and evaluation set (test_size
will be ignored).y (np.ndarray or torch.Tensor) – The data set labels.
X_te (np.ndarray or torch.Tensor, default=None) – The evaluation set examples matrix.
y_te (np.ndarray or torch.Tensor, default=None) – The evaluation set labels.
test_size (float, default=0.2) – The size of the evaluation set as a fraction of the data set.
seed (int, default=42) – The seed used to split the data set into training and evaluation set.
- at(idx, eval_set=False)#
Get the data set example and label at the given index or list of indices.
- Parameters
- Return type
- Returns
X (np.ndarray) – The data set example.
y (int) – The data set label.
- eval_size()#
Get the number of examples of the evaluation set.
- Returns
The size of the evaluation set of the dataset.
- Return type
- get_eval_set()#
Get the evaluation set of the dataset.
- Returns
The evaluation set of the dataset.
- Return type
tuple[Any, Any]
- class gossipy.data.handler.ClusteringDataHandler(X, y)#
Bases:
gossipy.data.handler.ClassificationDataHandler
Handler for clustering (unsupervised) data.
The handlers provides methods to access the data. The evaluation set is the training set.
- Parameters
X (Union[np.ndarray, torch.Tensor]) – The data set examples matrix.
y (Union[np.ndarray, torch.Tensor]) – The data set labels.
- eval_size()#
Get the number of examples of the evaluation set.
- Returns
The size of the evaluation set of the dataset.
- Return type
- class gossipy.data.handler.RecSysDataHandler(ratings, n_users, n_items, test_size=0.2, seed=42)#
Bases:
gossipy.data.DataHandler
Handler for recommendation system data.
The handlers provides methods to access the rating data.
- Parameters
ratings (Dict[int, List[Tuple[int, float]]]) – The user-item ratings.
n_users (int) – The number of users.
n_items (int) – The number of items.
test_size (float, default=0.2) – The size of the evaluation set as a fraction of the data set. The division is performed user-wise, i.e., for each user a subset of its ratings is selected as the evaluation set.
seed (int, default=42) – The seed used to split the data set into training and evaluation set.
- at(idx, eval_set=False)#
Get a sample (or samples) from the training/test set.
- eval_size()#
Get the number of examples of the evaluation set.
- Returns
The size of the evaluation set of the dataset.
- Return type
- get_eval_set()#
Get the evaluation set of the dataset.
- Returns
The evaluation set of the dataset.
- Return type
tuple[Any, Any]
- class gossipy.data.handler.RegressionDataHandler(X, y, X_te=None, y_te=None, test_size=0.2, seed=42)#
Bases:
gossipy.data.handler.ClassificationDataHandler
Handler for classification data.
The handlers provides methods to access the data and to split it into training and evaluation sets.
- Parameters
X (np.ndarray or torch.Tensor) – The data set examples matrix. If
X_te
is not None, then the data set is assumed to be already splitted into training and evaluation set (test_size
will be ignored).y (np.ndarray or torch.Tensor) – The data set labels.
X_te (np.ndarray or torch.Tensor, default=None) – The evaluation set examples matrix.
y_te (np.ndarray or torch.Tensor, default=None) – The evaluation set labels.
test_size (float, default=0.2) – The size of the evaluation set as a fraction of the data set.
seed (int, default=42) – The seed used to split the data set into training and evaluation set.
- at(idx, eval_set=False)#
Get the data set example and label at the given index or list of indices.
- Parameters
- Return type
- Returns
X (np.ndarray) – The data set example.
y (int) – The data set label.