rectorch.samplers¶

Class list¶

`rectorch.samplers.Sampler`(args, *kargs)	Sampler base class.
`rectorch.samplers.DataSampler`(sparse_data_tr)	This is a standard sampler that returns batches without any particular constraint.
`rectorch.samplers.ConditionedDataSampler`(…)	Data sampler with conditioned filtering used by the `rectorch.models.CMultiVAE` model.
`rectorch.samplers.BalancedConditionedDataSampler`(…)	Sub-sampled version of the `ConditionedDataSampler`.
`rectorch.samplers.EmptyConditionedDataSampler`(…)	Data sampler that returns unconditioned batches used by the `rectorch.models.CMultiVAE` model.
`rectorch.samplers.CFGAN_TrainingSampler`(…)	Sampler used for training the generator and discriminator of the CFGAN model.
`rectorch.samplers.SVAE_Sampler`(num_items, …)	Sampler used for training SVAE.

The samplers module that contains definitions of sampler classes useful when training neural network-based models.

The samplers module is inspired by the torch.utils.data.DataLoader class which, however, is not really efficient because it outputs a single example at a time. The idea behind the samplers defined in this module is to treat the data set at batches highly improving the efficiency. Each new sampler must extend the base class Sampler implementing all the abstract special methods, in particular samplers.Sampler.__len__() and samplers.Sampler.__iter__().

class rectorch.samplers.Sampler(*args, **kargs)[source]¶

Bases: object

Sampler base class.

A sampler is meant to be used as a generator of batches useful in training neural networks.

Note

Each new sampler must extend this base class implementing all the abstract special methods, in particular rectorch.samplers.Sampler.__len__() and rectorch.samplers.Sampler.__iter__().

class rectorch.samplers.DataSampler(sparse_data_tr, sparse_data_te=None, batch_size=1, shuffle=True)[source]¶

Bases: rectorch.samplers.Sampler

This is a standard sampler that returns batches without any particular constraint.

Bathes are randomly returned with the defined dimension (i.e., batch_size). If shuffle is set to False then the sampler returns batches with the same order as in the original dataset. When sparse_data_te is defined then each returned batch is a tuple with the training part of the batch and its test/validation counterpart. Otherwise, if sparse_data_te is None then the second element of the yielded tuple will be None.

Parameters

sparse_data_trscipy.sparse.csr_matrix

The training sparse user-item rating matrix.

sparse_data_tescipy.sparse.csr_matrix [optional]

The test sparse user-item rating matrix. The shape of this matrix must be the same as sparse_data_tr. By default None.

batch_sizeint [optional]

The size of the batches, by default 1.

shufflebool [optional]

Whether the data set must by randomly shuffled before creating the batches, by default True.

Attributes

sparse_data_trscipy.sparse.csr_matrix

See sparse_data_tr parameter.

sparse_data_tescipy.sparse.csr_matrix

See sparse_data_te parameter.

batch_sizeint

See batch_size parameter.

shufflebool

See shuffle parameter.

class rectorch.samplers.ConditionedDataSampler(iid2cids, n_cond, sparse_data_tr, sparse_data_te=None, batch_size=1, shuffle=True)[source]¶

Bases: rectorch.samplers.Sampler

Data sampler with conditioned filtering used by the rectorch.models.CMultiVAE model.

This data sampler is useful when training the rectorch.models.CMultiVAE model described in [CVAE]. During the training, each user must be conditioned over all the possible conditions (actually the ones that the user knows) so the training set must be modified accordingly.

Parameters

iid2cidsdict (key int - value list of int)

Dictionary that maps each item to the list of all valid conditions for that item. Items are referred to with the inner id, and conditions with an integer in the range 0, n_cond -1.

n_condint

Number of possible conditions.

sparse_data_trscipy.sparse.csr_matrix

The training sparse user-item rating matrix.

sparse_data_tescipy.sparse.csr_matrix [optional]

The test sparse user-item rating matrix. The shape of this matrix must be the same as sparse_data_tr. By default None.

batch_sizeint [optional]

The size of the batches, by default 1.

shufflebool [optional]

Whether the data set must bu randomly shuffled before creating the batches, by default True.

References

CVAE: Tommaso Carraro, Mirko Polato and Fabio Aiolli. Conditioned Variational Autoencoder for top-N item recommendation, 2020. arXiv pre-print: https://arxiv.org/abs/2004.11141

Attributes

iid2cidsdict (key int - value list of int)

See iid2cids parameter.

n_condint

See n_cond parameter.

sparse_data_trscipy.sparse.csr_matrix

See sparse_data_tr parameter.

sparse_data_tescipy.sparse.csr_matrix

See sparse_data_te parameter.

batch_sizeint

See batch_size parameter.

shufflebool

See shuffle parameter.

class rectorch.samplers.BalancedConditionedDataSampler(iid2cids, n_cond, sparse_data_tr, sparse_data_te=None, batch_size=1, subsample=0.2)[source]¶

Bases: rectorch.samplers.ConditionedDataSampler

Sub-sampled version of the ConditionedDataSampler.

This data sampler is useful when training the rectorch.models.CMultiVAE model described in [CVAE]. During the training, each user must be conditioned over all the possible conditions (actually the ones that the user knows) so the training set must be modified accordingly. This sampler avoids to create all possible user-condition pairs via sub-samplig. The extent of this sub-sampling is defined by the parameter subsample. The prefix ‘Balanced’ is due to the way the subsampling is performed. Given a user u, for each condition c only a subsample fraction of training sample is created for u conditioned by c.

Parameters

iid2cidsdict (key int - value list of int)

Dictionary that maps each item to the list of all valid conditions for that item. Items are referred to with the inner id, and conditions with an integer in the range 0, n_cond -1.

n_condint

Number of possible conditions.

sparse_data_trscipy.sparse.csr_matrix

The training sparse user-item rating matrix.

sparse_data_tescipy.sparse.csr_matrix [optional]

The test sparse user-item rating matrix. The shape of this matrix must be the same as sparse_data_tr. By default None.

batch_sizeint [optional]

The size of the batches, by default 1.

shufflebool [optional]

Whether the data set must bu randomly shuffled before creating the batches, by default True.

subsamplefloat [optional]

The size of the dimension. It must be a float between (0, 1], by default 0.2.

References

CVAE: Tommaso Carraro, Mirko Polato and Fabio Aiolli. Conditioned Variational Autoencoder for top-N item recommendation, 2020. arXiv pre-print: https://arxiv.org/abs/2004.11141

Attributes

iid2cidsdict (key int - value list of int)

See iid2cids parameter.

n_condint

See n_cond parameter.

sparse_data_trscipy.sparse.csr_matrix

See sparse_data_tr parameter.

sparse_data_tescipy.sparse.csr_matrix

See sparse_data_te parameter.

batch_sizeint

See batch_size parameter.

shufflebool

See shuffle parameter.

subsamplefloat

See subsample parameter.

class rectorch.samplers.EmptyConditionedDataSampler(cond_size, sparse_data_tr, sparse_data_te=None, batch_size=1, shuffle=True)[source]¶

Bases: rectorch.samplers.Sampler

Data sampler that returns unconditioned batches used by the rectorch.models.CMultiVAE model.

This data sampler is useful when training the rectorch.models.CMultiVAE model described in [CVAE]. This sampler is very similar to DataSampler with the expection that the yielded batches have appended a zero matrix of the size batch_size \(\times\) n_cond.

Parameters

n_condint

Number of possible conditions.

sparse_data_trscipy.sparse.csr_matrix

The training sparse user-item rating matrix.

sparse_data_tescipy.sparse.csr_matrix [optional]

The test sparse user-item rating matrix. The shape of this matrix must be the same as sparse_data_tr. By default None.

batch_sizeint [optional]

The size of the batches, by default 1.

shufflebool [optional]

Whether the data set must bu randomly shuffled before creating the batches, by default True.

References

CVAE: Tommaso Carraro, Mirko Polato and Fabio Aiolli. Conditioned Variational Autoencoder for top-N item recommendation, 2020. arXiv pre-print: https://arxiv.org/abs/2004.11141

Attributes

n_condint

See n_cond parameter.

sparse_data_trscipy.sparse.csr_matrix

See sparse_data_tr parameter.

sparse_data_tescipy.sparse.csr_matrix

See sparse_data_te parameter.

batch_sizeint

See batch_size parameter.

shufflebool

See shuffle parameter.

class rectorch.samplers.CFGAN_TrainingSampler(sparse_data_tr, batch_size=64)[source]¶

Bases: rectorch.samplers.Sampler

Sampler used for training the generator and discriminator of the CFGAN model.

The peculiarity of this sampler (see for [CFGAN] more details) is that batches are continuously picked at random from all the training set.

Parameters

sparse_data_trscipy.sparse.csr_matrix

The training sparse user-item rating matrix.

batch_sizeint [optional]

The size of the batches, by default 64

References

CFGAN: Dong-Kyu Chae, Jin-Soo Kang, Sang-Wook Kim, and Jung-Tae Lee. 2018. CFGAN: A Generic Collaborative Filtering Framework based on Generative Adversarial Networks. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM ’18). Association for Computing Machinery, New York, NY, USA, 137–146. DOI: https://doi.org/10.1145/3269206.3271743

Attributes

sparse_data_trscipy.sparse.csr_matrix

See sparse_data_tr parameter.

batch_sizeint

See batch_size parameter.

idxlistlist of int

Shuffled list of indexes. After an iteration over the sampler, or after a call to the next() function, the idxlist contains, in the first batch_size positions, the indexes of the examples that are contained in the current batch.

class rectorch.samplers.SVAE_Sampler(num_items, dict_data_tr, dict_data_te=None, pred_type='next_k', k=1, shuffle=True, is_training=True)[source]¶

Bases: rectorch.samplers.Sampler

Sampler used for training SVAE.

This sampler yields pairs (x,``y``) where x is the tensor of indexes of the positive items, and y the target tensor with the (multi-hot) ground truth items. This sampler is characterized by batches of size one (a single user at a time). Given a user (batch) u the returned ground truth tensor is a 3D tensor of dimension \(1 \times |\mathcal{I}_u|-1 \times m\), where \(|\mathcal{I}_u|\) is the set of rated items by u, and m the number of items. This tensor represents the ground truth for u over time, and each slice of the tensor is a different timestamp across all the possible time unit for this specific user.

Parameters

num_itemsint

Number of items.

dict_data_trdict (key - int, value - list of int)

Dictionary containing the training set. Keys are the users, while the values are the lists of items rated by the users in a specific (often cronological) odrer.

dict_data_tedict (key - int, value - list of int) or None [optional]

Dictionary containing the test part of the data set. Keys are the users, while the values are the lists of items rated by the users in a specific (often cronological) odrer, by default None. If None it is not considered in the batch creation, otherwise is used in the construction of the ground truth. Not that dict_data_te must be valued only in the case of validation/test, i.e., when is_training is False.

pred_typestr in the set {'next_k', 'next', 'postfix'} [optional]

The variant of loss used by the model, by default 'next_k'. If 'next' then only the next item must be predicted, if 'next_k' the next k items are considered in the ground truth, otherwise (= 'postfix') all the remaining items are taken as ground truth.

kint [optional]

The number of item to predict in the 'next_k' variant, by default 1. This parameter is not considered when pred_type is not 'next_k'.

shufflebool [optional]

Whether the data set must by randomly shuffled before creating the batches, by default True.

is_trainingbool [optional]

Whether the sampler is used during training, by default True.

Attributes

See *Parameters* section.