rectorch.samplers¶
Class list¶
|
Sampler base class. |
|
This is a standard sampler that returns batches without any particular constraint. |
Data sampler with conditioned filtering used by the |
|
Sub-sampled version of the |
|
Data sampler that returns unconditioned batches used by the |
|
Sampler used for training the generator and discriminator of the CFGAN model. |
|
|
Sampler used for training SVAE. |
The samplers
module that contains definitions of sampler classes useful when training neural
network-based models.
The samplers
module is inspired by the torch.utils.data.DataLoader
class which,
however, is not really efficient because it outputs a single example at a time. The idea behind the
samplers defined in this module is to treat the data set at batches highly improving the efficiency.
Each new sampler must extend the base class Sampler
implementing all the abstract special
methods, in particular samplers.Sampler.__len__()
and samplers.Sampler.__iter__()
.
-
class
rectorch.samplers.
Sampler
(*args, **kargs)[source]¶ Bases:
object
Sampler base class.
A sampler is meant to be used as a generator of batches useful in training neural networks.
Note
Each new sampler must extend this base class implementing all the abstract special methods, in particular
rectorch.samplers.Sampler.__len__()
andrectorch.samplers.Sampler.__iter__()
.
-
class
rectorch.samplers.
DataSampler
(sparse_data_tr, sparse_data_te=None, batch_size=1, shuffle=True)[source]¶ Bases:
rectorch.samplers.Sampler
This is a standard sampler that returns batches without any particular constraint.
Bathes are randomly returned with the defined dimension (i.e.,
batch_size
). Ifshuffle
is set toFalse
then the sampler returns batches with the same order as in the original dataset. Whensparse_data_te
is defined then each returned batch is atuple
with the training part of the batch and its test/validation counterpart. Otherwise, ifsparse_data_te
isNone
then the second element of the yielded tuple will beNone
.- Parameters
- sparse_data_tr
scipy.sparse.csr_matrix
The training sparse user-item rating matrix.
- sparse_data_te
scipy.sparse.csr_matrix
[optional]The test sparse user-item rating matrix. The shape of this matrix must be the same as
sparse_data_tr
. By defaultNone
.- batch_size
int
[optional]The size of the batches, by default 1.
- shuffle
bool
[optional]Whether the data set must by randomly shuffled before creating the batches, by default
True
.- sparse_data_tr
- Attributes
- sparse_data_tr
scipy.sparse.csr_matrix
See
sparse_data_tr
parameter.- sparse_data_te
scipy.sparse.csr_matrix
See
sparse_data_te
parameter.- batch_size
int
See
batch_size
parameter.- shuffle
bool
See
shuffle
parameter.- sparse_data_tr
-
class
rectorch.samplers.
ConditionedDataSampler
(iid2cids, n_cond, sparse_data_tr, sparse_data_te=None, batch_size=1, shuffle=True)[source]¶ Bases:
rectorch.samplers.Sampler
Data sampler with conditioned filtering used by the
rectorch.models.CMultiVAE
model.This data sampler is useful when training the
rectorch.models.CMultiVAE
model described in [CVAE]. During the training, each user must be conditioned over all the possible conditions (actually the ones that the user knows) so the training set must be modified accordingly.- Parameters
- iid2cids
dict
(keyint
- valuelist
ofint
)Dictionary that maps each item to the list of all valid conditions for that item. Items are referred to with the inner id, and conditions with an integer in the range 0,
n_cond
-1.- n_cond
int
Number of possible conditions.
- sparse_data_tr
scipy.sparse.csr_matrix
The training sparse user-item rating matrix.
- sparse_data_te
scipy.sparse.csr_matrix
[optional]The test sparse user-item rating matrix. The shape of this matrix must be the same as
sparse_data_tr
. By defaultNone
.- batch_size
int
[optional]The size of the batches, by default 1.
- shuffle
bool
[optional]Whether the data set must bu randomly shuffled before creating the batches, by default
True
.- iid2cids
References
- CVAE
Tommaso Carraro, Mirko Polato and Fabio Aiolli. Conditioned Variational Autoencoder for top-N item recommendation, 2020. arXiv pre-print: https://arxiv.org/abs/2004.11141
- Attributes
- iid2cids
dict
(keyint
- valuelist
ofint
)See
iid2cids
parameter.- n_cond
int
See
n_cond
parameter.- sparse_data_tr
scipy.sparse.csr_matrix
See
sparse_data_tr
parameter.- sparse_data_te
scipy.sparse.csr_matrix
See
sparse_data_te
parameter.- batch_size
int
See
batch_size
parameter.- shuffle
bool
See
shuffle
parameter.- iid2cids
-
class
rectorch.samplers.
BalancedConditionedDataSampler
(iid2cids, n_cond, sparse_data_tr, sparse_data_te=None, batch_size=1, subsample=0.2)[source]¶ Bases:
rectorch.samplers.ConditionedDataSampler
Sub-sampled version of the
ConditionedDataSampler
.This data sampler is useful when training the
rectorch.models.CMultiVAE
model described in [CVAE]. During the training, each user must be conditioned over all the possible conditions (actually the ones that the user knows) so the training set must be modified accordingly. This sampler avoids to create all possible user-condition pairs via sub-samplig. The extent of this sub-sampling is defined by the parametersubsample
. The prefix ‘Balanced’ is due to the way the subsampling is performed. Given a user u, for each condition c only asubsample
fraction of training sample is created for u conditioned by c.- Parameters
- iid2cids
dict
(keyint
- valuelist
ofint
)Dictionary that maps each item to the list of all valid conditions for that item. Items are referred to with the inner id, and conditions with an integer in the range 0,
n_cond
-1.- n_cond
int
Number of possible conditions.
- sparse_data_tr
scipy.sparse.csr_matrix
The training sparse user-item rating matrix.
- sparse_data_te
scipy.sparse.csr_matrix
[optional]The test sparse user-item rating matrix. The shape of this matrix must be the same as
sparse_data_tr
. By defaultNone
.- batch_size
int
[optional]The size of the batches, by default 1.
- shuffle
bool
[optional]Whether the data set must bu randomly shuffled before creating the batches, by default
True
.- subsample
float
[optional]The size of the dimension. It must be a float between (0, 1], by default 0.2.
- iid2cids
References
- CVAE
Tommaso Carraro, Mirko Polato and Fabio Aiolli. Conditioned Variational Autoencoder for top-N item recommendation, 2020. arXiv pre-print: https://arxiv.org/abs/2004.11141
- Attributes
- iid2cids
dict
(keyint
- valuelist
ofint
)See
iid2cids
parameter.- n_cond
int
See
n_cond
parameter.- sparse_data_tr
scipy.sparse.csr_matrix
See
sparse_data_tr
parameter.- sparse_data_te
scipy.sparse.csr_matrix
See
sparse_data_te
parameter.- batch_size
int
See
batch_size
parameter.- shuffle
bool
See
shuffle
parameter.- subsample
float
See
subsample
parameter.- iid2cids
-
class
rectorch.samplers.
EmptyConditionedDataSampler
(cond_size, sparse_data_tr, sparse_data_te=None, batch_size=1, shuffle=True)[source]¶ Bases:
rectorch.samplers.Sampler
Data sampler that returns unconditioned batches used by the
rectorch.models.CMultiVAE
model.This data sampler is useful when training the
rectorch.models.CMultiVAE
model described in [CVAE]. This sampler is very similar toDataSampler
with the expection that the yielded batches have appended a zero matrix of the sizebatch_size
\(\times\)n_cond
.- Parameters
- n_cond
int
Number of possible conditions.
- sparse_data_tr
scipy.sparse.csr_matrix
The training sparse user-item rating matrix.
- sparse_data_te
scipy.sparse.csr_matrix
[optional]The test sparse user-item rating matrix. The shape of this matrix must be the same as
sparse_data_tr
. By defaultNone
.- batch_size
int
[optional]The size of the batches, by default 1.
- shuffle
bool
[optional]Whether the data set must bu randomly shuffled before creating the batches, by default
True
.- n_cond
References
- CVAE
Tommaso Carraro, Mirko Polato and Fabio Aiolli. Conditioned Variational Autoencoder for top-N item recommendation, 2020. arXiv pre-print: https://arxiv.org/abs/2004.11141
- Attributes
- n_cond
int
See
n_cond
parameter.- sparse_data_tr
scipy.sparse.csr_matrix
See
sparse_data_tr
parameter.- sparse_data_te
scipy.sparse.csr_matrix
See
sparse_data_te
parameter.- batch_size
int
See
batch_size
parameter.- shuffle
bool
See
shuffle
parameter.- n_cond
-
class
rectorch.samplers.
CFGAN_TrainingSampler
(sparse_data_tr, batch_size=64)[source]¶ Bases:
rectorch.samplers.Sampler
Sampler used for training the generator and discriminator of the CFGAN model.
The peculiarity of this sampler (see for [CFGAN] more details) is that batches are continuously picked at random from all the training set.
- Parameters
- sparse_data_tr
scipy.sparse.csr_matrix
The training sparse user-item rating matrix.
- batch_size
int
[optional]The size of the batches, by default 64
- sparse_data_tr
References
- CFGAN
Dong-Kyu Chae, Jin-Soo Kang, Sang-Wook Kim, and Jung-Tae Lee. 2018. CFGAN: A Generic Collaborative Filtering Framework based on Generative Adversarial Networks. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM ’18). Association for Computing Machinery, New York, NY, USA, 137–146. DOI: https://doi.org/10.1145/3269206.3271743
- Attributes
- sparse_data_tr
scipy.sparse.csr_matrix
See
sparse_data_tr
parameter.- batch_size
int
See
batch_size
parameter.- idxlist
list
ofint
Shuffled list of indexes. After an iteration over the sampler, or after a call to the
next()
function, theidxlist
contains, in the firstbatch_size
positions, the indexes of the examples that are contained in the current batch.- sparse_data_tr
-
class
rectorch.samplers.
SVAE_Sampler
(num_items, dict_data_tr, dict_data_te=None, pred_type='next_k', k=1, shuffle=True, is_training=True)[source]¶ Bases:
rectorch.samplers.Sampler
Sampler used for training SVAE.
This sampler yields pairs (
x
,``y``) wherex
is the tensor of indexes of the positive items, andy
the target tensor with the (multi-hot) ground truth items. This sampler is characterized by batches of size one (a single user at a time). Given a user (batch) u the returned ground truth tensor is a 3D tensor of dimension \(1 \times |\mathcal{I}_u|-1 \times m\), where \(|\mathcal{I}_u|\) is the set of rated items by u, and m the number of items. This tensor represents the ground truth for u over time, and each slice of the tensor is a different timestamp across all the possible time unit for this specific user.- Parameters
- num_items
int
Number of items.
- dict_data_tr
dict
(key -int
, value -list
ofint
)Dictionary containing the training set. Keys are the users, while the values are the lists of items rated by the users in a specific (often cronological) odrer.
- dict_data_te
dict
(key -int
, value -list
ofint
) orNone
[optional]Dictionary containing the test part of the data set. Keys are the users, while the values are the lists of items rated by the users in a specific (often cronological) odrer, by default
None
. IfNone
it is not considered in the batch creation, otherwise is used in the construction of the ground truth. Not thatdict_data_te
must be valued only in the case of validation/test, i.e., whenis_training
isFalse
.- pred_type
str
in the set {'next_k'
,'next'
,'postfix'
} [optional]The variant of loss used by the model, by default
'next_k'
. If'next'
then only the next item must be predicted, if'next_k'
the next k items are considered in the ground truth, otherwise (='postfix'
) all the remaining items are taken as ground truth.- k
int
[optional]The number of item to predict in the
'next_k'
variant, by default 1. This parameter is not considered whenpred_type
is not'next_k'
.- shuffle
bool
[optional]Whether the data set must by randomly shuffled before creating the batches, by default
True
.- is_training
bool
[optional]Whether the sampler is used during training, by default
True
.- num_items
- Attributes
- See *Parameters* section.