rectorch.samplers

Class list

rectorch.samplers.Sampler(*args, **kargs)

Sampler base class.

rectorch.samplers.DataSampler(sparse_data_tr)

This is a standard sampler that returns batches without any particular constraint.

rectorch.samplers.ConditionedDataSampler(…)

Data sampler with conditioned filtering used by the rectorch.models.CMultiVAE model.

rectorch.samplers.BalancedConditionedDataSampler(…)

Sub-sampled version of the ConditionedDataSampler.

rectorch.samplers.EmptyConditionedDataSampler(…)

Data sampler that returns unconditioned batches used by the rectorch.models.CMultiVAE model.

rectorch.samplers.CFGAN_TrainingSampler(…)

Sampler used for training the generator and discriminator of the CFGAN model.

rectorch.samplers.SVAE_Sampler(num_items, …)

Sampler used for training SVAE.

The samplers module that contains definitions of sampler classes useful when training neural network-based models.

The samplers module is inspired by the torch.utils.data.DataLoader class which, however, is not really efficient because it outputs a single example at a time. The idea behind the samplers defined in this module is to treat the data set at batches highly improving the efficiency. Each new sampler must extend the base class Sampler implementing all the abstract special methods, in particular samplers.Sampler.__len__() and samplers.Sampler.__iter__().

class rectorch.samplers.Sampler(*args, **kargs)[source]

Bases: object

Sampler base class.

A sampler is meant to be used as a generator of batches useful in training neural networks.

Note

Each new sampler must extend this base class implementing all the abstract special methods, in particular rectorch.samplers.Sampler.__len__() and rectorch.samplers.Sampler.__iter__().

class rectorch.samplers.DataSampler(sparse_data_tr, sparse_data_te=None, batch_size=1, shuffle=True)[source]

Bases: rectorch.samplers.Sampler

This is a standard sampler that returns batches without any particular constraint.

Bathes are randomly returned with the defined dimension (i.e., batch_size). If shuffle is set to False then the sampler returns batches with the same order as in the original dataset. When sparse_data_te is defined then each returned batch is a tuple with the training part of the batch and its test/validation counterpart. Otherwise, if sparse_data_te is None then the second element of the yielded tuple will be None.

Parameters
  • sparse_data_trscipy.sparse.csr_matrix
    • The training sparse user-item rating matrix.

  • sparse_data_tescipy.sparse.csr_matrix [optional]
    • The test sparse user-item rating matrix. The shape of this matrix must be the same as sparse_data_tr. By default None.

  • batch_sizeint [optional]
    • The size of the batches, by default 1.

  • shufflebool [optional]
    • Whether the data set must by randomly shuffled before creating the batches, by default True.

    Attributes
  • sparse_data_trscipy.sparse.csr_matrix
    • See sparse_data_tr parameter.

  • sparse_data_tescipy.sparse.csr_matrix
    • See sparse_data_te parameter.

  • batch_sizeint
    • See batch_size parameter.

  • shufflebool
    • See shuffle parameter.

    class rectorch.samplers.ConditionedDataSampler(iid2cids, n_cond, sparse_data_tr, sparse_data_te=None, batch_size=1, shuffle=True)[source]

    Bases: rectorch.samplers.Sampler

    Data sampler with conditioned filtering used by the rectorch.models.CMultiVAE model.

    This data sampler is useful when training the rectorch.models.CMultiVAE model described in [CVAE]. During the training, each user must be conditioned over all the possible conditions (actually the ones that the user knows) so the training set must be modified accordingly.

    Parameters
  • iid2cidsdict (key int - value list of int)
    • Dictionary that maps each item to the list of all valid conditions for that item. Items are referred to with the inner id, and conditions with an integer in the range 0, n_cond -1.

  • n_condint
    • Number of possible conditions.

  • sparse_data_trscipy.sparse.csr_matrix
    • The training sparse user-item rating matrix.

  • sparse_data_tescipy.sparse.csr_matrix [optional]
    • The test sparse user-item rating matrix. The shape of this matrix must be the same as sparse_data_tr. By default None.

  • batch_sizeint [optional]
    • The size of the batches, by default 1.

  • shufflebool [optional]
    • Whether the data set must bu randomly shuffled before creating the batches, by default True.

    References

    CVAE

    Tommaso Carraro, Mirko Polato and Fabio Aiolli. Conditioned Variational Autoencoder for top-N item recommendation, 2020. arXiv pre-print: https://arxiv.org/abs/2004.11141

    Attributes
  • iid2cidsdict (key int - value list of int)
    • See iid2cids parameter.

  • n_condint
    • See n_cond parameter.

  • sparse_data_trscipy.sparse.csr_matrix
    • See sparse_data_tr parameter.

  • sparse_data_tescipy.sparse.csr_matrix
    • See sparse_data_te parameter.

  • batch_sizeint
    • See batch_size parameter.

  • shufflebool
    • See shuffle parameter.

    class rectorch.samplers.BalancedConditionedDataSampler(iid2cids, n_cond, sparse_data_tr, sparse_data_te=None, batch_size=1, subsample=0.2)[source]

    Bases: rectorch.samplers.ConditionedDataSampler

    Sub-sampled version of the ConditionedDataSampler.

    This data sampler is useful when training the rectorch.models.CMultiVAE model described in [CVAE]. During the training, each user must be conditioned over all the possible conditions (actually the ones that the user knows) so the training set must be modified accordingly. This sampler avoids to create all possible user-condition pairs via sub-samplig. The extent of this sub-sampling is defined by the parameter subsample. The prefix ‘Balanced’ is due to the way the subsampling is performed. Given a user u, for each condition c only a subsample fraction of training sample is created for u conditioned by c.

    Parameters
  • iid2cidsdict (key int - value list of int)
    • Dictionary that maps each item to the list of all valid conditions for that item. Items are referred to with the inner id, and conditions with an integer in the range 0, n_cond -1.

  • n_condint
    • Number of possible conditions.

  • sparse_data_trscipy.sparse.csr_matrix
    • The training sparse user-item rating matrix.

  • sparse_data_tescipy.sparse.csr_matrix [optional]
    • The test sparse user-item rating matrix. The shape of this matrix must be the same as sparse_data_tr. By default None.

  • batch_sizeint [optional]
    • The size of the batches, by default 1.

  • shufflebool [optional]
    • Whether the data set must bu randomly shuffled before creating the batches, by default True.

  • subsamplefloat [optional]
    • The size of the dimension. It must be a float between (0, 1], by default 0.2.

    References

    CVAE

    Tommaso Carraro, Mirko Polato and Fabio Aiolli. Conditioned Variational Autoencoder for top-N item recommendation, 2020. arXiv pre-print: https://arxiv.org/abs/2004.11141

    Attributes
  • iid2cidsdict (key int - value list of int)
    • See iid2cids parameter.

  • n_condint
    • See n_cond parameter.

  • sparse_data_trscipy.sparse.csr_matrix
    • See sparse_data_tr parameter.

  • sparse_data_tescipy.sparse.csr_matrix
    • See sparse_data_te parameter.

  • batch_sizeint
    • See batch_size parameter.

  • shufflebool
    • See shuffle parameter.

  • subsamplefloat
    • See subsample parameter.

    class rectorch.samplers.EmptyConditionedDataSampler(cond_size, sparse_data_tr, sparse_data_te=None, batch_size=1, shuffle=True)[source]

    Bases: rectorch.samplers.Sampler

    Data sampler that returns unconditioned batches used by the rectorch.models.CMultiVAE model.

    This data sampler is useful when training the rectorch.models.CMultiVAE model described in [CVAE]. This sampler is very similar to DataSampler with the expection that the yielded batches have appended a zero matrix of the size batch_size \(\times\) n_cond.

    Parameters
  • n_condint
    • Number of possible conditions.

  • sparse_data_trscipy.sparse.csr_matrix
    • The training sparse user-item rating matrix.

  • sparse_data_tescipy.sparse.csr_matrix [optional]
    • The test sparse user-item rating matrix. The shape of this matrix must be the same as sparse_data_tr. By default None.

  • batch_sizeint [optional]
    • The size of the batches, by default 1.

  • shufflebool [optional]
    • Whether the data set must bu randomly shuffled before creating the batches, by default True.

    References

    CVAE

    Tommaso Carraro, Mirko Polato and Fabio Aiolli. Conditioned Variational Autoencoder for top-N item recommendation, 2020. arXiv pre-print: https://arxiv.org/abs/2004.11141

    Attributes
  • n_condint
    • See n_cond parameter.

  • sparse_data_trscipy.sparse.csr_matrix
    • See sparse_data_tr parameter.

  • sparse_data_tescipy.sparse.csr_matrix
    • See sparse_data_te parameter.

  • batch_sizeint
    • See batch_size parameter.

  • shufflebool
    • See shuffle parameter.

    class rectorch.samplers.CFGAN_TrainingSampler(sparse_data_tr, batch_size=64)[source]

    Bases: rectorch.samplers.Sampler

    Sampler used for training the generator and discriminator of the CFGAN model.

    The peculiarity of this sampler (see for [CFGAN] more details) is that batches are continuously picked at random from all the training set.

    Parameters
  • sparse_data_trscipy.sparse.csr_matrix
    • The training sparse user-item rating matrix.

  • batch_sizeint [optional]
    • The size of the batches, by default 64

    References

    CFGAN

    Dong-Kyu Chae, Jin-Soo Kang, Sang-Wook Kim, and Jung-Tae Lee. 2018. CFGAN: A Generic Collaborative Filtering Framework based on Generative Adversarial Networks. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM ’18). Association for Computing Machinery, New York, NY, USA, 137–146. DOI: https://doi.org/10.1145/3269206.3271743

    Attributes
  • sparse_data_trscipy.sparse.csr_matrix
    • See sparse_data_tr parameter.

  • batch_sizeint
    • See batch_size parameter.

  • idxlistlist of int
    • Shuffled list of indexes. After an iteration over the sampler, or after a call to the next() function, the idxlist contains, in the first batch_size positions, the indexes of the examples that are contained in the current batch.

    class rectorch.samplers.SVAE_Sampler(num_items, dict_data_tr, dict_data_te=None, pred_type='next_k', k=1, shuffle=True, is_training=True)[source]

    Bases: rectorch.samplers.Sampler

    Sampler used for training SVAE.

    This sampler yields pairs (x,``y``) where x is the tensor of indexes of the positive items, and y the target tensor with the (multi-hot) ground truth items. This sampler is characterized by batches of size one (a single user at a time). Given a user (batch) u the returned ground truth tensor is a 3D tensor of dimension \(1 \times |\mathcal{I}_u|-1 \times m\), where \(|\mathcal{I}_u|\) is the set of rated items by u, and m the number of items. This tensor represents the ground truth for u over time, and each slice of the tensor is a different timestamp across all the possible time unit for this specific user.

    Parameters
  • num_itemsint
    • Number of items.

  • dict_data_trdict (key - int, value - list of int)
    • Dictionary containing the training set. Keys are the users, while the values are the lists of items rated by the users in a specific (often cronological) odrer.

  • dict_data_tedict (key - int, value - list of int) or None [optional]
    • Dictionary containing the test part of the data set. Keys are the users, while the values are the lists of items rated by the users in a specific (often cronological) odrer, by default None. If None it is not considered in the batch creation, otherwise is used in the construction of the ground truth. Not that dict_data_te must be valued only in the case of validation/test, i.e., when is_training is False.

  • pred_typestr in the set {'next_k', 'next', 'postfix'} [optional]
    • The variant of loss used by the model, by default 'next_k'. If 'next' then only the next item must be predicted, if 'next_k' the next k items are considered in the ground truth, otherwise (= 'postfix') all the remaining items are taken as ground truth.

  • kint [optional]
    • The number of item to predict in the 'next_k' variant, by default 1. This parameter is not considered when pred_type is not 'next_k'.

  • shufflebool [optional]
    • Whether the data set must by randomly shuffled before creating the batches, by default True.

  • is_trainingbool [optional]
    • Whether the sampler is used during training, by default True.

    Attributes
  • See *Parameters* section.