Source code for rectorch.models

r"""This module includes the training algorithm for a bunch of state-of-the-art recommender systems.

Each new model must be a sub-class of the abstract class :class:`RecSysModel`. Moreover,
if the model is a standard neural network (NN) then it is advisable to inherit from
:class:`TorchNNTrainer` that offers a good base structure to develop a new NN training algorithm.
In these first releases of **rectorch** all models will be located in this module, but in the future
we plan to improve the structure of the library creating sub-modules.

Currently the implemented training algorithms are:

* :class:`MultiDAE`: Denoising Autoencoder for Collaborative filtering with Multinomial prior (in the paper *Mult-DAE*) [VAE]_;
* :class:`MultiVAE`: Variational Autoencoder for Collaborative filtering with Multinomial prior (in the paper *Mult-VAE*) [VAE]_;
* :class:`CMultiVAE`: Conditioned Variational Autoencoder (in the paper *C-VAE*) [CVAE]_;
* :class:`CFGAN`: Collaborative Filtering with Generative Adversarial Networks [CFGAN]_;
* :class:`EASE`: Embarrassingly shallow autoencoder for sparse data [EASE]_.
* :class:`ADMM_Slim`: ADMM SLIM: Sparse Recommendations for Many Users [ADMMS]_.
* :class:`SVAE`: Sequential Variational Autoencoders for Collaborative Filtering [SVAE]_.

It is also implemented a generic Variational autoencoder trainer (:class:`VAE`) based on the classic
loss function *cross-entropy* based reconstruction loss, plus the KL loss.

See Also
--------
Modules:
:mod:`nets <rectorch.nets>`
:mod:`samplers <rectorch.samplers>`

References
----------
.. [VAE] Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. 2018.
   Variational Autoencoders for Collaborative Filtering. In Proceedings of the 2018
   World Wide Web Conference (WWW ’18). International World Wide Web Conferences Steering
   Committee, Republic and Canton of Geneva, CHE, 689–698.
   DOI: https://doi.org/10.1145/3178876.3186150
.. [CVAE] Tommaso Carraro, Mirko Polato and Fabio Aiolli. Conditioned Variational
   Autoencoder for top-N item recommendation, 2020. arXiv pre-print:
   https://arxiv.org/abs/2004.11141
.. [CFGAN] Dong-Kyu Chae, Jin-Soo Kang, Sang-Wook Kim, and Jung-Tae Lee. 2018.
   CFGAN: A Generic Collaborative Filtering Framework based on Generative Adversarial Networks.
   In Proceedings of the 27th ACM International Conference on Information and Knowledge
   Management (CIKM ’18). Association for Computing Machinery, New York, NY, USA, 137–146.
   DOI: https://doi.org/10.1145/3269206.3271743
.. [EASE] Harald Steck. 2019. Embarrassingly Shallow Autoencoders for Sparse Data.
   In The World Wide Web Conference (WWW ’19). Association for Computing Machinery,
   New York, NY, USA, 3251–3257. DOI: https://doi.org/10.1145/3308558.3313710
.. [ADMMS] Harald Steck, Maria Dimakopoulou, Nickolai Riabov, and Tony Jebara. 2020.
   ADMM SLIM: Sparse Recommendations for Many Users. In Proceedings of the 13th International
   Conference on Web Search and Data Mining (WSDM ’20). Association for Computing Machinery,
   New York, NY, USA, 555–563. DOI: https://doi.org/10.1145/3336191.3371774
.. [SVAE] Noveen Sachdeva, Giuseppe Manco, Ettore Ritacco, and Vikram Pudi. 2019.
   Sequential Variational Autoencoders for Collaborative Filtering. In Proceedings of the Twelfth
   ACM International Conference on Web Search and Data Mining (WSDM ’19). Association for Computing
   Machinery, New York, NY, USA, 600–608. DOI: https://doi.org/10.1145/3289600.3291007
"""
import logging
import os
import time
import numpy as np
import torch
import torch.nn.functional as F
import torch.optim as optim
from .evaluation import ValidFunc, evaluate

__all__ = ['RecSysModel', 'TorchNNTrainer', 'AETrainer', 'VAE', 'MultiVAE', 'MultiDAE',\
    'CMultiVAE', 'EASE', 'CFGAN', 'SVAE']

logger = logging.getLogger(__name__)


[docs]class RecSysModel(): r"""Abstract base class that any Recommendation model must inherit from. """
[docs] def train(self, train_data, **kwargs): r"""Training procedure. This method is meant to execute all the training phase. Once the method ends, the model should be ready to be queried for predictions. Parameters ---------- train_data : :class:`rectorch.samplers.Sampler` or :class:`scipy.sparse.csr_matrix` or\ :class:`torch.Tensor` This object represents the training data. If the training procedure is based on mini-batches, then ``train_data`` should be a :class:`rectorch.samplers.Sampler`. **kargs : :obj:`dict` [optional] These are the potential keyword parameters useful to the model for performing the training. Raises ------ :class:`NotImplementedError` Raised when not implemeneted in the sub-class. """ raise NotImplementedError()
[docs] def predict(self, x, *args, **kwargs): r"""Perform the prediction using a trained model. The prediction is preformed over a generic input ``x`` and the method should be callable after the training procedure (:meth:`RecSysModel.train`). Parameters ---------- x : :class:`rectorch.samplers.Sampler` or :class:`scipy.sparse.csr_matrix` or\ :class:`torch.Tensor` The input for which the prediction has to be computed. *args : :obj:`list` [optional] These are the potential additional parameters useful to the model for performing the prediction. **kwargs : :obj:`dict` [optional] These are the potential keyword parameters useful to the model for performing the prediction. Raises ------ :class:`NotImplementedError` Raised when not implemeneted in the sub-class. """ raise NotImplementedError()
[docs] def save_model(self, filepath, *args, **kwargs): r"""Save the model to file. Parameters ---------- filepath : :obj:`str` String representing the path to the file to save the model. *args : :obj:`list` [optional] These are the potential additional parameters useful to the model for performing the prediction. **kwargs : :obj:`dict` [optional] These are the potential keyword parameters useful to the model for performing the prediction. Raises ------ :class:`NotImplementedError` Raised when not implemeneted in the sub-class. """ raise NotImplementedError()
[docs] def load_model(self, filepath, *args, **kwargs): r"""Load the model from file. Parameters ---------- filepath : :obj:`str` String representing the path to the file where the model is saved. *args : :obj:`list` [optional] These are the potential additional parameters useful to the model for performing the prediction. **kwargs : :obj:`dict` [optional] These are the potential keyword parameters useful to the model for performing the prediction. Raises ------ :class:`NotImplementedError` Raised when not implemeneted in the sub-class. """ raise NotImplementedError()
[docs]class TorchNNTrainer(RecSysModel): r"""Abstract class representing a neural network-based model. This base class assumes that the model can be trained using a standard backpropagation procedure. It is not meant to manage complex training patterns, such as alternate training between more than one network as done with Generative Adversarial Networks. Thus, it assumes that there is a neural network (i.e., :class:`torch.nn.Module`)for which the parameters must be learned. Parameters ---------- net : :class:`torch.nn.Module` The neural network architecture. learning_rate : :obj:`float` [optional] The learning rate for the optimizer, by default 1e-3. Attributes ---------- network : :class:`torch.nn.Module` The neural network architecture. learning_rate : :obj:`float` The learning rate for the optimizer. optimizer : :class:`torch.optim.Optimizer` Optimizer used for performing the training. device : :class:`torch.device` Device where the pytorch tensors are saved. """ def __init__(self, net, learning_rate=1e-3): self.network = net self.learning_rate = learning_rate self.optimizer = None #to be initialized in the sub-classes if next(self.network.parameters()).is_cuda: self.device = torch.device("cuda") else: self.device = torch.device("cpu")
[docs] def loss_function(self, prediction, ground_truth, *args, **kwargs): r"""The loss function that the model wants to minimize. Parameters ---------- prediction : :class:`torch.Tensor` The prediction tensor. ground_truth : :class:`torch.Tensor` The ground truth tensor that the model should have reconstructed correctly. *args : :obj:`list` [optional] These are the potential additional parameters useful to the model for computing the loss. **kwargs : :obj:`dict` [optional] These are the potential keyword parameters useful to the model for computing the loss. Raises ------ :class:`NotImplementedError` Raised when not implemeneted in the sub-class. """ raise NotImplementedError()
[docs] def train(self, train_data, valid_data=None, valid_metric=None, valid_func=ValidFunc(evaluate), num_epochs=100, verbose=1, **kwargs): r"""Training of a neural network-based model. Parameters ---------- train_data : :class:`rectorch.samplers.Sampler` The sampler object that load the training set in mini-batches. valid_data : :class:`rectorch.samplers.Sampler` [optional] The sampler object that load the validation set in mini-batches, by default ``None``. If the model does not have any validation procedure set this parameter to ``None``. valid_metric : :obj:`str` [optional] The metric used during the validation to select the best model, by default ``None``. If ``valid_data`` is not ``None`` then ``valid_metric`` must be not ``None``. To see the valid strings for the metric please see the module :mod:`metrics`. valid_func : :class:`evaluation.ValidFunc` [optional] The validation function, by default a standard validation procedure, i.e., :func:`evaluation.evaluate`. num_epochs : :obj:`int` [optional] Number of training epochs, by default 100. verbose : :obj:`int` [optional] The level of verbosity of the logging, by default 1. The level can have any integer value greater than 0. However, after reaching a maximum (that depends on the size of the training set) verbosity higher values will not have any effect. Raises ------ :class:`NotImplementedError` Raised when not implemeneted in the sub-class. """ raise NotImplementedError()
[docs] def train_epoch(self, epoch, train_data, *args, **kwargs): r"""Training of a single epoch. Parameters ---------- epoch : :obj:`int` Epoch's number. train_data : :class:`rectorch.samplers.Sampler` The sampler object that load the training set in mini-batches. *args : :obj:`list` [optional] These are the potential additional parameters useful to the model for performing the training. **kwargs : :obj:`dict` [optional] These are the potential keyword parameters useful to the model for performing the training. Raises ------ :class:`NotImplementedError` Raised when not implemeneted in the sub-class. """ raise NotImplementedError()
[docs] def train_batch(self, epoch, tr_batch, te_batch, *args, **kwargs): r"""Training of a single batch. Parameters ---------- epoch : :obj:`int` Epoch's number. tr_batch : :class:`torch.Tensor` Traning part of the current batch. te_batch : :class:`torch.Tensor` or ``None`` Test part of the current batch, if any, otherwise ``None``. *args : :obj:`list` [optional] These are the potential additional parameters useful to the model for performing the training on the batch. **kwargs : :obj:`dict` [optional] These are the potential keyword parameters useful to the model for performing the training on the batch. Raises ------ :class:`NotImplementedError` Raised when not implemeneted in the sub-class. """ raise NotImplementedError()
[docs] def predict(self, x, *args, **kwargs): raise NotImplementedError()
def __str__(self): s = self.__class__.__name__ + "(\n" for k, v in self.__dict__.items(): sv = "\n".join([" "+line for line in str(str(v)).split("\n")])[2:] s += " %s = %s,\n" % (k, sv) s = s[:-2] + "\n)" return s def __repr__(self): return str(self)
[docs]class AETrainer(TorchNNTrainer): r"""Base class for Autoencoder-based models. Parameters ---------- ae_net : :class:`torch.nn.Module` The autoencoder neural network. learning_rate : :obj:`float` [optional] The learning rate for the optimizer, by default 1e-3. Attributes ---------- optimizer : :class:`torch.optim.Adam` The optimizer is an Adam optimizer with default parameters and learning rate equals to ``learning_rate``. other attributes : see the base class :class:`TorchNNTrainer`. """ def __init__(self, ae_net, learning_rate=1e-3): super(AETrainer, self).__init__(ae_net, learning_rate) self.optimizer = optim.Adam(self.network.parameters(), lr=learning_rate)
[docs] def loss_function(self, prediction, ground_truth): r"""Vanilla Autoencoder loss function. This is a standard Mean Squared Error (squared L2 norm) loss, that is :math:`\mathcal{L} = \frac{1}{L} \sum_{i=1}^L (x_i - y_i)^2` where L is the batch size x and y are the ground truth and the prediction, respectively. Parameters ---------- prediction : :class:`torch.Tensor` The reconstructed input, i.e., the output of the variational autoencoder. It is meant to be the reconstruction over a batch. ground_truth : :class:`torch.Tensor` The input, and hence the target tensor. It is meant to be a batch size input. mu : :class:`torch.Tensor` The mean part of latent space for the given ``x``. Together with ``logvar`` represents the representation of the input ``x`` before the reparameteriation trick. logvar : :class:`torch.Tensor` The (logarithm of the) variance part of latent space for the given ``x``. Together with ``mu`` represents the representation of the input ``x`` before the reparameteriation trick. Returns ------- :class:`torch.Tensor` Tensor (:math:`1 \times 1`) representing the average loss incurred over the input batch. """ return torch.nn.MSELoss()(ground_truth, prediction)
[docs] def train(self, train_data, valid_data=None, valid_metric=None, valid_func=ValidFunc(evaluate), num_epochs=100, verbose=1): try: for epoch in range(1, num_epochs + 1): self.train_epoch(epoch, train_data, verbose) if valid_data is not None: assert valid_metric is not None, \ "In case of validation 'valid_metric' must be provided" valid_res = valid_func(self, valid_data, valid_metric) mu_val = np.mean(valid_res) std_err_val = np.std(valid_res) / np.sqrt(len(valid_res)) logger.info('| epoch %d | %s %.3f (%.4f) |', epoch, valid_metric, mu_val, std_err_val) except KeyboardInterrupt: logger.warning('Handled KeyboardInterrupt: exiting from training early')
[docs] def train_epoch(self, epoch, train_loader, verbose=1): self.network.train() train_loss = 0 partial_loss = 0 epoch_start_time = time.time() start_time = time.time() log_delay = max(10, len(train_loader) // 10**verbose) for batch_idx, (data, gt) in enumerate(train_loader): partial_loss += self.train_batch(data, gt) if (batch_idx+1) % log_delay == 0: elapsed = time.time() - start_time logger.info('| epoch %d | %d/%d batches | ms/batch %.2f | loss %.2f |', epoch, (batch_idx+1), len(train_loader), elapsed * 1000 / log_delay, partial_loss / log_delay) train_loss += partial_loss partial_loss = 0.0 start_time = time.time() total_loss = (train_loss + partial_loss) / len(train_loader) time_diff = time.time() - epoch_start_time logger.info("| epoch %d | loss %.4f | total time: %.2fs |", epoch, total_loss, time_diff)
[docs] def train_batch(self, tr_batch, te_batch=None): r"""Training of a single batch. Parameters ---------- epoch : :obj:`int` Epoch's number. tr_batch : :class:`torch.Tensor` Traning part of the current batch. te_batch : :class:`torch.Tensor` or ``None`` [optional] Test part of the current batch, if any, otherwise ``None``, by default ``None``. Returns ------- :obj:`float` The loss incurred in the batch. """ data_tensor = tr_batch.view(tr_batch.shape[0], -1).to(self.device) self.optimizer.zero_grad() recon_batch = self.network(data_tensor) loss = self.loss_function(recon_batch, data_tensor) loss.backward() self.optimizer.step() return loss.item()
[docs] def predict(self, x, remove_train=True): r"""Perform the prediction using a trained Autoencoder. Parameters ---------- x : :class:`torch.Tensor` The input for which the prediction has to be computed. remove_train : :obj:`bool` [optional] Whether to remove the training set from the prediction, by default True. Removing the training items means set their scores to :math:`-\infty`. Returns ------- recon_x, : :obj:`tuple` with a single element recon_x : :class:`torch.Tensor` The reconstructed input, i.e., the output of the autoencoder. It is meant to be the reconstruction over the input batch ``x``. """ self.network.eval() with torch.no_grad(): x_tensor = x.to(self.device) recon_x = self.network(x_tensor) if remove_train: recon_x[tuple(x_tensor.nonzero().t())] = -np.inf return (recon_x, )
[docs] def save_model(self, filepath, cur_epoch): r"""Save the model to file. Parameters ---------- filepath : :obj:`str` String representing the path to the file to save the model. cur_epoch : :obj:`int` The last training epoch. """ state = {'epoch': cur_epoch, 'state_dict': self.network.state_dict(), 'optimizer': self.optimizer.state_dict() } self._save_checkpoint(filepath, state)
def _save_checkpoint(self, filepath, state): logger.info("Saving model checkpoint to %s...", filepath) torch.save(state, filepath) logger.info("Model checkpoint saved!")
[docs] def load_model(self, filepath): r"""Load the model from file. Parameters ---------- filepath : :obj:`str` String representing the path to the file where the model is saved. Returns ------- :obj:`dict` A dictionary that summarizes the state of the model when it has been saved. Note: not all the information about the model are stored in the saved 'checkpoint'. """ assert os.path.isfile(filepath), "The checkpoint file %s does not exist." %filepath logger.info("Loading model checkpoint from %s...", filepath) checkpoint = torch.load(filepath) self.network.load_state_dict(checkpoint['state_dict']) self.optimizer.load_state_dict(checkpoint['optimizer']) logger.info("Model checkpoint loaded!") return checkpoint
[docs]class VAE(AETrainer): r"""Class representing a standard Variational Autoencoder. The learning follows standard backpropagation minimizing the loss described in [KINGMA]_. See :meth:`VAE.loss_function` for more details. Notes ----- Parameters and Attributes are the same as in the base class :class:`AETrainer`. References ---------- .. [KINGMA] Kingma, Diederik P and Welling, Max Auto-Encoding Variational Bayes, 2013. arXiv pre-print: https://arxiv.org/abs/1312.6114. """
[docs] def loss_function(self, recon_x, x, mu, logvar): r"""Standard VAE loss function. This method implements the loss function described in [KINGMA]_ assuming a Gaussian latent, that is: :math:`\mathcal{L} = \mathcal{L}_{rec} + \mathcal{L}_{KL}` where :math:`\mathcal{L}_{rec} = -\frac{1}{L}\ \sum_{l} E_{\sim q_{\theta}(z | x_{i})}[\log p(x_{i} | z^{(i, l)})]` and :math:`\mathcal{L}_{KL} = -\frac{1}{2} \sum_{j=1}^{J}\ [1+\log (\sigma_{i}^{2})-\sigma_{i}^{2}-\mu_{i}^{2}]` with J is the dimension of the latent vector z, and L is the number of samples stochastically drawn according to reparameterization trick. Parameters ---------- recon_x : :class:`torch.Tensor` The reconstructed input, i.e., the output of the variational autoencoder. It is meant to be the reconstruction over a batch. x : :class:`torch.Tensor` The input, and hence the target tensor. It is meant to be a batch size input. mu : :class:`torch.Tensor` The mean part of latent space for the given ``x``. Together with ``logvar`` represents the representation of the input ``x`` before the reparameteriation trick. logvar : :class:`torch.Tensor` The (logarithm of the) variance part of latent space for the given ``x``. Together with ``mu`` represents the representation of the input ``x`` before the reparameteriation trick. Returns ------- :class:`torch.Tensor` Tensor (:math:`1 \times 1`) representing the average loss incurred over the input batch. References ---------- .. [KINGMA] Kingma, Diederik P and Welling, Max Auto-Encoding Variational Bayes, 2013. arXiv pre-print: https://arxiv.org/abs/1312.6114. """ BCE = F.binary_cross_entropy(recon_x, x) KLD = -0.5 * torch.mean(torch.sum(1 + logvar - mu.pow(2) - logvar.exp(), dim=1)) return BCE + KLD
[docs] def train_batch(self, tr_batch, te_batch=None): data_tensor = tr_batch.view(tr_batch.shape[0], -1).to(self.device) self.optimizer.zero_grad() recon_batch, mu, var = self.network(data_tensor) loss = self.loss_function(recon_batch, data_tensor, mu, var) loss.backward() self.optimizer.step() return loss.item()
[docs] def predict(self, x, remove_train=True): r"""Perform the prediction using a trained Variational Autoencoder. Parameters ---------- x : :class:`torch.Tensor` The input batch tensor for which the prediction must be computed. remove_train : :obj:`bool` [optional] Whether to remove the training set from the prediction, by default True. Removing the training items means set their scores to :math:`-\infty`. Returns ------- recon_x, mu, logvar : :obj:`tuple` recon_x : :class:`torch.Tensor` The reconstructed input, i.e., the output of the variational autoencoder. It is meant to be the reconstruction over the input batch ``x``. mu : :class:`torch.Tensor` The mean part of latent space for the given ``x``. Together with ``logvar`` represents the representation of the input ``x`` before the reparameteriation trick. logvar : :class:`torch.Tensor` The (logarithm of the) variance part of latent space for the given ``x``. Together with ``mu`` represents the representation of the input ``x`` before the reparameteriation trick. """ self.network.eval() with torch.no_grad(): x_tensor = x.to(self.device) recon_x, mu, logvar = self.network(x_tensor) if remove_train: recon_x[tuple(x_tensor.nonzero().t())] = -np.inf return recon_x, mu, logvar
[docs]class MultiDAE(AETrainer): r"""Denoising Autoencoder with multinomial likelihood for collaborative filtering. This model has been proposed in [VAE]_ as a baseline method to compare with Mult-VAE. The model represent a standard denoising autoencoder in which the data is assumed to be multinomial distributed. Parameters ---------- mdae_net : :class:`torch.nn.Module` The autoencoder neural network. lam : :obj:`float` [optional] The regularization hyper-parameter :math:`\lambda` as defined in [VAE]_, by default 0.2. learning_rate : :obj:`float` [optional] The learning rate for the optimizer, by default 1e-3. References ---------- .. [VAE] Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. 2018. Variational Autoencoders for Collaborative Filtering. In Proceedings of the 2018 World Wide Web Conference (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 689–698. DOI: https://doi.org/10.1145/3178876.3186150 """ def __init__(self, mdae_net, lam=0.2, learning_rate=1e-3): super(MultiDAE, self).__init__(mdae_net, learning_rate) self.optimizer = optim.Adam(self.network.parameters(), lr=self.learning_rate, weight_decay=0.001) self.lam = lam
[docs] def loss_function(self, recon_x, x): r"""Multinomial likelihood denoising autoencoder loss. Since the model assume a multinomial distribution over the input, then te reconstruction loss must be modified with respect to a vanilla VAE. In particular, the MultiDAE loss function is a combination of a reconstruction loss and a regularization loss, i.e., :math:`\mathcal{L}(\mathbf{x}_{u} ; \theta, \phi) =\ \mathcal{L}_{rec}(\mathbf{x}_{u} ; \theta, \phi) + \lambda\ \mathcal{L}_{reg}(\mathbf{x}_{u} ; \theta, \phi)` where :math:`\mathcal{L}_{rec}(\mathbf{x}_{u} ; \theta, \phi) =\ \mathbb{E}_{q_{\phi}(\mathbf{z}_{u} | \mathbf{x}_{u})}[\log p_{\theta}\ (\mathbf{x}_{u} | \mathbf{z}_{u})]` and :math:`\mathcal{L}_{reg}(\mathbf{x}_{u} ; \theta, \phi) = \| \theta \|_2 + \| \phi \|_2`, with :math:`\mathbf{x}_u` the input vector and :math:`\mathbf{z}_u` the latent vector representing the user *u*. Parameters ---------- recon_x : :class:`torch.Tensor` The reconstructed input, i.e., the output of the variational autoencoder. It is meant to be the reconstruction over a batch. x : :class:`torch.Tensor` The input, and hence the target tensor. It is meant to be a batch size input. Returns ------- :class:`torch.Tensor` Tensor (:math:`1 \times 1`) representing the average loss incurred over the input batch. """ BCE = -torch.mean(torch.sum(F.log_softmax(recon_x, 1) * x, -1)) l2_reg = 0 for W in self.network.parameters(): l2_reg += W.norm(2) return BCE + self.lam * l2_reg
[docs]class MultiVAE(VAE): r"""Variational Autoencoder for collaborative Filtering. MultiVAE (dubbed Mult-VAE in [VAE]_) is a vanilla VAE in which the data distribution is assumed to be multinomial and the objective function is an under-regularized version of the standard VAE loss function. Specifically, the Kullbach-Liebler divergence term is weighted by an hyper-parameter (:math:`\beta`) that shows to improve de recommendations' quality when < 1. So, the regularization term is weighted less giving to the model more freedom in representing the input in the latent space. More details about this loss are given in :meth:`MultiVAE.loss_function`. Parameters ---------- mvae_net : :class:`torch.nn.Module` The variational autoencoder neural network. beta : :obj:`float` [optional] The :math:`\beta` hyper-parameter of Multi-VAE. When ``anneal_steps > 0`` then this value is the value to anneal starting from 0, otherwise the ``beta`` will be fixed to the given value for the duration of the training. By default 1. anneal_steps : :obj:`int` [optional] Number of annealing step for reaching the target value ``beta``, by default 0. 0 means that no annealing will be performed and the regularization parameter will be fixed to ``beta``. learning_rate : :obj:`float` [optional] The learning rate for the optimizer, by default 1e-3. Attributes ---------- anneal_steps : :obj:`int` See ``anneal_steps`` parameter. self.annealing : :obj:`bool` Whether the annealing is active or not. It is implicitely set to ``True`` if ``anneal_steps > 0``, otherwise is set to ``False``. gradient_updates : :obj:`int` Number of gradient updates since the beginning of the training. Once ``gradient_updates >= anneal_steps``, then the annealing is complete and the used :math:`\beta` in the loss function is ``beta``. beta : :obj:`float` See ``beta`` parameter. optimizer : :class:`torch.optim.Adam` The optimizer is an Adam optimizer with default parameters and learning rate equals to ``learning_rate``. other attributes : see the base class :class:`VAE`. References ---------- .. [VAE] Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. 2018. Variational Autoencoders for Collaborative Filtering. In Proceedings of the 2018 World Wide Web Conference (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 689–698. DOI: https://doi.org/10.1145/3178876.3186150 """ def __init__(self, mvae_net, beta=1., anneal_steps=0, learning_rate=1e-3): super(MultiVAE, self).__init__(mvae_net, learning_rate=learning_rate) self.optimizer = optim.Adam(self.network.parameters(), lr=learning_rate, weight_decay=0.0) self.anneal_steps = anneal_steps self.annealing = anneal_steps > 0 self.gradient_updates = 0. self.beta = beta
[docs] def loss_function(self, recon_x, x, mu, logvar, beta=1.0): r"""VAE for collaborative filtering loss function. MultiVAE assumes a multinomial distribution over the input and this is reflected in the loss function. The loss is a :math:`\beta` ELBO (Evidence Lower BOund) in which the regularization part is weighted by a hyper-parameter :math:`\beta`. Moreover, as in MultiDAE, the reconstruction loss is based on the multinomial likelihood. Specifically, the loss function of MultiVAE is defined as: :math:`\mathcal{L}_{\beta}(\mathbf{x}_{u} ; \theta, \phi)=\ \mathbb{E}_{q_{\phi}(\mathbf{z}_{u} | \mathbf{x}_{u})}[\log p_{\theta}\ (\mathbf{x}_{u} | \mathbf{z}_{u})]-\beta \cdot \operatorname{KL}(q_{\phi}\ (\mathbf{z}_{u} | \mathbf{x}_{u}) \| p(\mathbf{z}_{u}))` Parameters ---------- recon_x : :class:`torch.Tensor` The reconstructed input, i.e., the output of the variational autoencoder. It is meant to be the reconstruction over a batch. x : :class:`torch.Tensor` The input, and hence the target tensor. It is meant to be a batch size input. mu : :class:`torch.Tensor` The mean part of latent space for the given ``x``. Together with ``logvar`` represents the representation of the input ``x`` before the reparameteriation trick. logvar : :class:`torch.Tensor` The (logarithm of the) variance part of latent space for the given ``x``. Together with ``mu`` represents the representation of the input ``x`` before the reparameteriation trick. beta : :obj:`float` [optional] The current :math:`\beta` regularization hyper-parameter, by default 1.0. Returns ------- :class:`torch.Tensor` Tensor (:math:`1 \times 1`) representing the average loss incurred over the input batch. """ BCE = -torch.mean(torch.sum(F.log_softmax(recon_x, 1) * x, -1)) KLD = -0.5 * torch.mean(torch.sum(1 + logvar - mu.pow(2) - logvar.exp(), dim=1)) return BCE + beta * KLD
[docs] def train_batch(self, tr_batch, te_batch=None): data_tensor = tr_batch.view(tr_batch.shape[0], -1).to(self.device) if te_batch is None: gt_tensor = data_tensor else: gt_tensor = te_batch.view(te_batch.shape[0], -1).to(self.device) if self.annealing: anneal_beta = min(self.beta, 1. * self.gradient_updates / self.anneal_steps) else: anneal_beta = self.beta self.optimizer.zero_grad() recon_batch, mu, var = self.network(data_tensor) loss = self.loss_function(recon_batch, gt_tensor, mu, var, anneal_beta) loss.backward() self.optimizer.step() self.gradient_updates += 1. return loss.item()
[docs] def train(self, train_data, valid_data=None, valid_metric=None, valid_func=ValidFunc(evaluate), num_epochs=200, best_path="chkpt_best.pth", verbose=1): r"""Training procedure for Multi-VAE. The training of MultiVAE follows pretty much the same as a standard VAE with the only difference in the (possible) annealing of the hyper-parameter :math:`\beta`. This model also offer the possibility of keeping track of the best performing model in validation by setting the validation data (``valid_data``) and metric (``valid_metric``). The model will be saved in the file indicated in the parameter ``best_path``. Parameters ---------- train_data : :class:`rectorch.samplers.Sampler` The sampler object that load the training set in mini-batches. valid_data : :class:`rectorch.samplers.Sampler` [optional] The sampler object that load the validation set in mini-batches, by default ``None``. If the model does not have any validation procedure set this parameter to ``None``. valid_metric : :obj:`str` [optional] The metric used during the validation to select the best model, by default ``None``. If ``valid_data`` is not ``None`` then ``valid_metric`` must be not ``None``. To see the valid strings for the metric please see the module :mod:`metrics`. valid_func : :class:`evaluation.ValidFunc` [optional] The validation function, by default a standard validation procedure, i.e., :func:`evaluation.evaluate`. num_epochs : :obj:`int` [optional] Number of training epochs, by default 100. best_path : :obj:`str` [optional] String representing the path where to save the best performing model on the validation set. By default ``"chkpt_best.pth"``. verbose : :obj:`int` [optional] The level of verbosity of the logging, by default 1. The level can have any integer value greater than 0. However, after reaching a maximum (that depends on the size of the training set) verbosity higher values will not have any effect. """ try: best_perf = -1. #Assume the higher the better >= 0 for epoch in range(1, num_epochs + 1): self.train_epoch(epoch, train_data, verbose) if valid_data: assert valid_metric is not None, \ "In case of validation 'valid_metric' must be provided" valid_res = valid_func(self, valid_data, valid_metric) mu_val = np.mean(valid_res) std_err_val = np.std(valid_res) / np.sqrt(len(valid_res)) logger.info('| epoch %d | %s %.3f (%.4f) |', epoch, valid_metric, mu_val, std_err_val) if best_perf < mu_val: self.save_model(best_path, epoch) best_perf = mu_val except KeyboardInterrupt: logger.warning('Handled KeyboardInterrupt: exiting from training early')
[docs] def save_model(self, filepath, cur_epoch): state = {'epoch': cur_epoch, 'state_dict': self.network.state_dict(), 'optimizer': self.optimizer.state_dict(), 'gradient_updates': self.gradient_updates } self._save_checkpoint(filepath, state)
[docs] def load_model(self, filepath): checkpoint = super().load_model(filepath) self.gradient_updates = checkpoint['gradient_updates'] return checkpoint
[docs]class CMultiVAE(MultiVAE): r"""Conditioned Variatonal Autoencoder for collaborative filtering. Conditioned Variational Autoencoder (C-VAE) for constrained top-N item recommendation can recommend items that have to satisfy a given condition. The architecture is similar to a standard VAE in which the condition vector is fed into the encoder. The loss function can be seen in two ways: - same as in :class:`MultiVAE` but with a different target reconstruction. Infact, the\ network has to reconstruct only those items satisfying a specific condition; - a modified loss which performs the filtering by itself. More details about the loss function are given in the paper [CVAE]_. The training process is almost identical to the one of :class:`MultiVAE` but the sampler must be a :class:`samplers.ConditionedDataSampler`. Notes ----- For parameters and attributes please refer to :class:`MultiVAE`. References ---------- .. [CVAE] Tommaso Carraro, Mirko Polato and Fabio Aiolli. Conditioned Variational Autoencoder for top-N item recommendation, 2020. arXiv pre-print: https://arxiv.org/abs/2004.11141 """ def __init__(self, cmvae_net, beta=1., anneal_steps=0, learning_rate=1e-3): super(CMultiVAE, self).__init__(cmvae_net, beta=beta, anneal_steps=anneal_steps, learning_rate=learning_rate)
[docs] def predict(self, x, remove_train=True): self.network.eval() cond_dim = self.network.cond_dim with torch.no_grad(): x_tensor = x.to(self.device) recon_x, mu, logvar = self.network(x_tensor) if remove_train: recon_x[tuple(x_tensor[:, :-cond_dim].nonzero().t())] = -np.inf return recon_x, mu, logvar
[docs]class EASE(RecSysModel): r"""Embarrassingly Shallow AutoEncoders for Sparse Data (EASE) model. This model has been proposed in [EASE]_ and it can be summarized as follows. Given the rating matrix :math:`\mathbf{X} \in \mathbb{R}^{n \times m}` with *n* users and *m* items, EASE aims at solving the following optimization problem: :math:`\min_{\mathbf{B}} \|\mathbf{X}-\mathbf{X} \mathbf{B}\|_{F}^{2}+\ \lambda \cdot\|\mathbf{B}\|_{F}^{2}` subject to :math:`\operatorname{diag}(\mathbf{B})=0`. where :math:`\mathbf{B} \in \mathbb{R}^{m \times m}` is like a kernel matrix between items. Then, a prediction for a user-item pair *(u,j)* will be computed by :math:`S_{u j}=\mathbf{X}_{u,:} \cdot \mathbf{B}_{:, j}` It can be shown that estimating :math:`\mathbf{B}` can be done in closed form by computing :math:`\hat{\mathbf{B}}=(\mathbf{X}^{\top} \mathbf{X}+\lambda \mathbf{I})^{-1} \cdot\ (\mathbf{X}^{\top} \mathbf{X}-\mathbf{I}^\top \gamma)` where :math:`\gamma \in \mathbb{R}^m` is the vector of Lagragian multipliers, and :math:`\mathbf{I}` is the identity matrix. Parameters ---------- lam : :obj:`float` [optional] The regularization hyper-parameter, by default 100. Attributes ---------- lam : :obj:`float` See ``lam`` parameter. model : :class:`numpy.ndarray` Represent the model, i.e.m the matrix score **S**. If the model has not been trained yet ``model`` is set to ``None``. References ---------- .. [EASE] Harald Steck. 2019. Embarrassingly Shallow Autoencoders for Sparse Data. In The World Wide Web Conference (WWW ’19). Association for Computing Machinery, New York, NY, USA, 3251–3257. DOI: https://doi.org/10.1145/3308558.3313710 """ def __init__(self, lam=100.): self.lam = lam self.model = None
[docs] def train(self, train_data): """Training of the EASE model. Parameters ---------- train_data : :class:`scipy.sparse.csr_matrix` The training data. """ logger.info("EASE - start tarining (lam=%.4f)", self.lam) X = train_data.toarray() G = np.dot(X.T, X) logger.info("EASE - linear kernel computed") diag_idx = np.diag_indices(G.shape[0]) G[diag_idx] += self.lam P = np.linalg.inv(G) del G B = P / (-np.diag(P)) B[diag_idx] = 0 del P self.model = np.dot(X, B) logger.info("EASE - training complete")
[docs] def predict(self, ids_te_users, test_tr, remove_train=True): r"""Prediction using the EASE model. For the EASE model the prediction list for a user *u* is done by computing :math:`S_{u}=\mathbf{X}_{u,:} \cdot \mathbf{B}`. However, in the **rectorch** implementation the prediction is simply a look up is the score matrix *S*. Parameters ---------- ids_te_users : array_like List of the test user indexes. test_tr : :class:`scipy.sparse.csr_matrix` Training portion of the test users. remove_train : :obj:`bool` [optional] Whether to remove the training set from the prediction, by default True. Removing the training items means set their scores to :math:`-\infty`. Returns ------- pred, : :obj:`tuple` with a single element pred : :class:`numpy.ndarray` The items' score (on the columns) for each user (on the rows). """ pred = self.model[ids_te_users, :] if remove_train: pred[test_tr.nonzero()] = -np.inf return (pred, )
[docs] def save_model(self, filepath): state = {'lambda': self.lam, 'model': self.model } logger.info("Saving EASE model to %s...", filepath) np.save(filepath, state) logger.info("Model saved!")
[docs] def load_model(self, filepath): assert os.path.isfile(filepath), "The model file %s does not exist." %filepath logger.info("Loading EASE model from %s...", filepath) state = np.load(filepath, allow_pickle=True)[()] self.lam = state["lambda"] self.model = state["model"] logger.info("Model loaded!") return state
def __str__(self): s = "EASE(lambda=%.4f" % self.lam if self.model is not None: s += ", model size=(%d, %d))" %self.model.shape else: s += ") - not trained yet!" return s def __repr__(self): return str(self)
[docs]class CFGAN(RecSysModel): r"""A Generic Collaborative Filtering Framework based on Generative Adversarial Networks. This recommender systems is based on GAN where the (conditioned) generator aims at generating the ratings of users while the discriminiator tries to discriminate between genuine users profile and generated ones. The two networks try to optimize the followng loss functions: - Discriminator (D): :math:`J^{D}=-\sum_{u}\left(\log D(\mathbf{r}_{u} | \ \mathbf{c}_{u})+\log (1-D(\hat{\mathbf{r}}_{u}\ \odot(\mathbf{e}_{u}+\mathbf{k}_{u}) | \mathbf{c}_{u}))\right)` - Generator (G): :math:`J^{G}=\sum_{u}\left(\log (1-D(\hat{\mathbf{r}}_{u}\ \odot(\mathbf{e}_{u}+\mathbf{k}_{u}) | \mathbf{c}_{u}))+\ \alpha \cdot \sum_{j}(r_{uj}-\hat{r}_{uj})^{2}\right)` where :math:`\mathbf{c}_u` is the condition vector, :math:`\mathbf{k}_u` is an n-dimensional indicator vector such that :math:`k_{uj} = 1` iff *j* is a negative sampled item, :math:`\hat{\mathbf{r}}_u` is a fake user profile, and :math:`\mathbf{e}_u` is the masking vector to remove the non-rated items. For more details please refer to the original paper [CFGAN]_. Parameters ---------- generator : :class:`torch.nn.Module` The generator neural network. The expected architecture is :math:`[m, l_1, \dots, l_h, m]` where *m* is the number of items :math:`l_i, i \in [1,h]` is the number of nodes of the hidden layer *i*. discriminator : :class:`torch.nn.Module` The discriminator neural network. The expected architecture is :math:`[2m, l_1, \dots, l_h, 1]` where *m* is the number of items :math:`l_i, i \in [1,h]` is the number of nodes of the hidden layer *i*. alpha : :obj:`float` [optional] The ZR-coefficient, by default .1. s_pm : :obj:`float` [optional] The sampling parameter for the partial-masking approach, by default .7. s_zr : :obj:`float` [optional] The sampling parameter for the zero-reconstruction regularization, by default .5. learning_rate : :obj:`float` [optional] The optimization learning rate, by default 0.001. Attributes ---------- generator : :class:`torch.nn.Module` See ``generator`` parameter. discriminator : :class:`torch.nn.Module` See ``discriminator`` parameter. alpha : :obj:`float` See ``alpha`` paramater. s_pm : :obj:`float` See ``s_pm`` paramater. s_zr : :obj:`float` See ``s_zr`` paramater. learning_rate : :obj:`float` See ``learning_rate`` paramater. opt_g : :class:`torch.optim.Adam` Optimizer used for performing the training of the generator. opt_d : :class:`torch.optim.Adam` Optimizer used for performing the training of the discriminator. References ---------- .. [CFGAN] Dong-Kyu Chae, Jin-Soo Kang, Sang-Wook Kim, and Jung-Tae Lee. 2018. CFGAN: A Generic Collaborative Filtering Framework based on Generative Adversarial Networks. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM ’18). Association for Computing Machinery, New York, NY, USA, 137–146. DOI: https://doi.org/10.1145/3269206.3271743 """ def __init__(self, generator, discriminator, alpha=.1, s_pm=.7, s_zr=.5, learning_rate=0.001): self.generator = generator self.discriminator = discriminator #TODO: check this # pylint: disable=fixme if next(self.generator.parameters()).is_cuda: self.device = torch.device("cuda") else: self.device = torch.device("cpu") self.s_pm = s_pm self.s_zr = s_zr self.loss = torch.nn.BCELoss() self.regularization_loss = torch.nn.MSELoss(reduction="sum") self.learning_rate = learning_rate self.alpha = alpha self.n_items = self.generator.input_dim self.opt_g = optim.Adam(self.generator.parameters(), lr=self.learning_rate) self.opt_d = optim.Adam(self.discriminator.parameters(), lr=self.learning_rate)
[docs] def train(self, train_data, valid_data=None, valid_metric=None, valid_func=ValidFunc(evaluate), num_epochs=1000, g_steps=5, d_steps=5, verbose=1): r"""Training procedure of CFGAN. The training works in an alternate way between generator and discriminator. The number of training batches in each epochs are ``g_steps`` and ``d_steps``, respectively. Parameters ---------- train_data : :class:`samplers.CFGAN_TrainingSampler` The sampler object that load the training set in mini-batches. valid_data : :class:`samplers.DataSampler` [optional] The sampler object that load the validation set in mini-batches, by default ``None``. If the model does not have any validation procedure set this parameter to ``None``. valid_metric : :obj:`str` [optional] The metric used during the validation to select the best model, by default ``None``. If ``valid_data`` is not ``None`` then ``valid_metric`` must be not ``None``. To see the valid strings for the metric please see the module :mod:`metrics`. valid_func : :class:`evaluation.ValidFunc` [optional] The validation function, by default a standard validation procedure, i.e., :func:`evaluation.evaluate`. num_epochs : :obj:`int` [optional] Number of training epochs, by default 1000. g_steps : :obj:`int` [optional] Number of steps for a generator epoch, by default 5. d_steps : :obj:`int` [optional] Number of steps for a discriminator epoch, by default 5. verbose : :obj:`int` [optional] The level of verbosity of the logging, by default 1. The level can have any integer value greater than 0. However, after reaching a maximum (that depends on the size of the training set) verbosity higher values will not have any effect. """ self.discriminator.train() self.generator.train() start_time = time.time() log_delay = max(10, num_epochs // 10**verbose) loss_d, loss_g = 0, 0 try: for epoch in range(1, num_epochs+1): for _ in range(1, g_steps+1): loss_g += self.train_gen_batch(next(train_data).to(self.device)) for _ in range(1, d_steps+1): loss_d += self.train_disc_batch(next(train_data).to(self.device)) if epoch % log_delay == 0: loss_g /= (g_steps * log_delay) loss_d /= (d_steps * log_delay) elapsed = time.time() - start_time logger.info('| epoch {} | ms/batch {:.2f} | loss G {:.6f} | loss D {:.6f} |' .format(epoch, elapsed * 1000 / log_delay, loss_g, loss_d)) start_time = time.time() loss_g, loss_d = 0, 0 if valid_data is not None: assert valid_metric is not None, \ "In case of validation 'valid_metric' must be provided" valid_res = valid_func(self, valid_data, valid_metric) mu_val = np.mean(valid_res) std_err_val = np.std(valid_res) / np.sqrt(len(valid_res)) logger.info('| epoch %d | %s %.3f (%.4f) |', epoch, valid_metric, mu_val, std_err_val) except KeyboardInterrupt: logger.warning('Handled KeyboardInterrupt: exiting from training early')
[docs] def train_gen_batch(self, batch): r"""Training on a single batch for the generator. Parameters ---------- batch : :class:`torch.Tensor` The current batch. Returns ------- :obj:`float` The loss incurred in the current batch by the generator. """ batch = batch.to(self.device) real_label = torch.ones(batch.shape[0], 1).to(self.device) mask = batch.clone() size = int(self.s_pm * self.n_items) for u in range(batch.shape[0]): rand_it = np.random.choice(self.n_items, size, replace=False) mask[u, rand_it] = 1 mask = mask.to(self.device) if self.alpha > 0: mask_zr = batch.clone() size = int(self.s_zr * self.n_items) for u in range(batch.shape[0]): rand_it = np.random.choice(self.n_items, size, replace=False) mask_zr[u, rand_it] = 1 mask_zr = mask_zr.to(self.device) fake_data = self.generator(batch) g_reg_loss = 0 if self.alpha > 0: g_reg_loss = self.regularization_loss(fake_data, mask_zr) fake_data = torch.mul(fake_data, mask) disc_on_fake = self.discriminator(fake_data, batch) g_loss = self.loss(disc_on_fake, real_label) g_loss = g_loss + self.alpha * g_reg_loss self.opt_g.zero_grad() g_loss.backward() self.opt_g.step() return g_loss.item()
[docs] def train_disc_batch(self, batch): r"""Training on a single batch for the discriminator. Parameters ---------- batch : :class:`torch.Tensor` The current batch. Returns ------- :obj:`float` The loss incurred in the batch by the discriminator. """ batch = batch.to(self.device) real_label = torch.ones(batch.shape[0], 1).to(self.device) fake_label = torch.zeros(batch.shape[0], 1).to(self.device) mask = batch.clone() size = int(self.s_pm * self.n_items) for u in range(batch.shape[0]): rand_it = np.random.choice(self.n_items, size, replace=False) mask[u, rand_it] = 1 mask = mask.to(self.device) disc_on_real = self.discriminator(batch, batch) d_loss_real = self.loss(disc_on_real, real_label) fake_data = self.generator(batch) fake_data = torch.mul(fake_data, mask) disc_on_fake = self.discriminator(fake_data, batch) d_loss_fake = self.loss(disc_on_fake, fake_label) d_loss = d_loss_real + d_loss_fake self.opt_d.zero_grad() d_loss.backward() self.opt_d.step() return d_loss.item()
[docs] def predict(self, x, remove_train=True): self.generator.eval() with torch.no_grad(): x_tensor = x.to(self.device) pred = self.generator(x_tensor) if remove_train: pred[tuple(x_tensor.nonzero().t())] = -np.inf return (pred, )
def __str__(self): s = self.__class__.__name__ + "(\n" for k, v in self.__dict__.items(): sv = "\n".join([" "+line for line in str(str(v)).split("\n")])[2:] s += " %s = %s,\n" % (k, sv) s = s[:-2] + "\n)" return s def __repr__(self): return str(self)
[docs] def save_model(self, filepath, cur_epoch): state = {'epoch': cur_epoch, 'state_dict_g': self.generator.state_dict(), 'state_dict_d': self.discriminator.state_dict(), 'optimizer_g': self.opt_g.state_dict(), 'optimizer_d': self.opt_g.state_dict() } logger.info("Saving CFGAN model to %s...", filepath) torch.save(state, filepath) logger.info("Model saved!")
[docs] def load_model(self, filepath): assert os.path.isfile(filepath), "The checkpoint file %s does not exist." %filepath logger.info("Loading model checkpoint from %s...", filepath) checkpoint = torch.load(filepath) self.generator.load_state_dict(checkpoint['state_dict_g']) self.discriminator.load_state_dict(checkpoint['state_dict_d']) self.opt_g.load_state_dict(checkpoint['optimizer_g']) self.opt_d.load_state_dict(checkpoint['optimizer_d']) logger.info("Model checkpoint loaded!") return checkpoint
[docs]class ADMM_Slim(RecSysModel): r"""ADMM SLIM: Sparse Recommendations for Many Users. ADMM SLIM [ADMMS]_ is a model similar to SLIM [SLIM]_ in which the objective function is solved using Alternating Directions Method of Multipliers (ADMM). In particular, given the rating matrix :math:`\mathbf{X} \in \mathbb{R}^{n \times m}` with *n* users and *m* items, ADMM SLIM aims at solving the following optimization problem: :math:`\min_{B,C,\Gamma} \frac{1}{2}\|X-X B\|_{F}^{2}+\frac{\lambda_{2}}{2} \cdot\|B\|_{F}^{2}+\ \lambda_{1} \cdot\|C\|_{1} +\ \langle\Gamma, B-C\rangle_{F}+\frac{\rho}{2} \cdot\|B-C\|_{F}^{2}` with :math:`\textrm{diag}(B)=0`, :math:`\Gamma \in \mathbb{R}^{m \times m}`, and the entry of *C* are all greater or equal than 0. The prediction for a user-item pair *(u,j)* is then computed by :math:`S_{u j}=\mathbf{X}_{u,:} \cdot \mathbf{B}_{:, j}`. Parameters ---------- lambda1 : :obj:`float` [optional] Elastic net regularization hyper-parameters :math:`\lambda_1`, by default 5. lambda2 : :obj:`float` [optional] Elastic net regularization hyper-parameters :math:`\lambda_2`, by default 1e3. rho : :obj:`float` [optional] The penalty hyper-parameter :math:`\rho>0` that applies to :math:`\|B-C\|^2_F`, by default 1e5. nn_constr : :obj:`bool` [optional] Whether to keep the non-negativity constraint, by default ``True``. l1_penalty : :obj:`bool` [optional] Whether to keep the L1 penalty, by default ``True``. When ``l1_penalty = False`` then is like to set :math:`\lambda_1 = 0`. item_bias : :obj:`bool` [optional] Whether to model the item biases, by default ``False``. When ``item_bias = True`` then the scoring function for the user-item pair *(u,i)* becomes: :math:`S_{ui}=(\mathbf{X}_{u,:} - \mathbf{b})\cdot \mathbf{B}_{:, i} + \mathbf{b}_i`. Attributes ---------- See the parameters' section. References ---------- .. [ADMMS] Harald Steck, Maria Dimakopoulou, Nickolai Riabov, and Tony Jebara. 2020. ADMM SLIM: Sparse Recommendations for Many Users. In Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM ’20). Association for Computing Machinery, New York, NY, USA, 555–563. DOI: https://doi.org/10.1145/3336191.3371774 .. [SLIM] X. Ning and G. Karypis. 2011. SLIM: Sparse Linear Methods for Top-N Recommender Systems. In Proceedings of the IEEE 11th International Conference on Data Mining, Vancouver,BC, 2011, pp. 497-506. DOI: https://doi.org/10.1109/ICDM.2011.134. """ def __init__(self, lambda1=5., lambda2=1e3, rho=1e5, nn_constr=True, l1_penalty=True, item_bias=False): self.lambda1 = lambda1 self.lambda2 = lambda2 self.rho = rho self.nn_constr = nn_constr self.l1_penalty = l1_penalty self.item_bias = item_bias self.model = None
[docs] def train(self, train_data, num_iter=50, verbose=1): r"""Training of ADMM SLIM. The training procedure of ADMM SLIM highly depends on the setting of the hyper-parameters. By setting them in specific ways it is possible to define different variants of the algorithm. That are: 1. (Vanilla) ADMM SLIM - :math:`\lambda_1, \lambda_2, \rho>0`, :attr:`item_bias` = ``False``, and both :attr:`nn_constr` and :attr:`l1_penalty` set to ``True``; 2. ADMM SLIM w/o non-negativity constraint over C - :attr:`nn_constr` = ``False`` and :attr:`l1_penalty` set to ``True``; 3. ADMM SLIM w/o the L1 penalty - :attr:`l1_penalty` = ``False`` and :attr:`nn_constr` set to ``True``; 4. ADMM SLIM w/o L1 penalty and non-negativity constraint: :attr:`nn_constr` = :attr:`l1_penalty` = ``False``. All these variants can also be combined with the inclusion of the item biases by setting :attr:`item_bias` to ``True``. Parameters ---------- train_data : :class:`scipy.sparse.csr_matrix` The training data. num_iter : :obj:`int` [optional] Maximum number of training iterations, by default 50. This argument has no effect if both :attr:`nn_constr` and :attr:`l1_penalty` are set to ``False``. verbose : :obj:`int` [optional] The level of verbosity of the logging, by default 1. The level can have any integer value greater than 0. However, after reaching a maximum (that depends on the size of the training set) verbosity higher values will not have any effect. """ def _soft_threshold(a, k): return np.maximum(0., a - k) - np.maximum(0., -a - k) X = train_data.toarray() if self.item_bias: b = X.sum(axis=0) X = X - np.outer(np.ones(X.shape[0]), b) XtX = X.T.dot(X) logger.info("ADMM_Slim - linear kernel computed") diag_indices = np.diag_indices(XtX.shape[0]) XtX[diag_indices] += self.lambda2 + self.rho P = np.linalg.inv(XtX) logger.info("ADMM_Slim - inverse of XtX computed") if not self.nn_constr and not self.l1_penalty: C = np.eye(P.shape[0]) - P * np.diag(1. / np.diag(P)) else: XtX[diag_indices] -= self.lambda2 + self.rho B_aux = P.dot(XtX) Gamma = np.zeros(XtX.shape, dtype=float) C = np.zeros(XtX.shape, dtype=float) log_delay = max(5, num_iter // (10*verbose)) for j in range(num_iter): B_tilde = B_aux + P.dot(self.rho * C - Gamma) gamma = np.diag(B_tilde) / np.diag(P) B = B_tilde - P * np.diag(gamma) C = _soft_threshold(B + Gamma / self.rho, self.lambda1 / self.rho) if self.nn_constr and self.l1_penalty: C = np.maximum(C, 0.) elif self.nn_constr and not self.l1_penalty: C = np.maximum(B, 0.) Gamma += self.rho * (B - C) if not (j+1) % log_delay: logger.info("| iteration %d/%d |", j+1, num_iter) self.model = np.dot(X, C) if self.item_bias: self.model += b
[docs] def predict(self, ids_te_users, test_tr, remove_train=True): pred = self.model[ids_te_users, :] if remove_train: pred[test_tr.nonzero()] = -np.inf return (pred, )
[docs] def save_model(self, filepath): state = {'lambda1': self.lambda1, 'lambda2': self.lambda2, 'rho' : self.rho, 'model': self.model, 'nn_constr' : self.nn_constr, 'l1_penalty' : self.l1_penalty, 'item_bias' : self.item_bias } logger.info("Saving ADMM_Slim model to %s...", filepath) np.save(filepath, state) logger.info("Model saved!")
[docs] def load_model(self, filepath): assert os.path.isfile(filepath), "The model file %s does not exist." %filepath logger.info("Loading ADMM_Slim model from %s...", filepath) state = np.load(filepath, allow_pickle=True)[()] self.lambda1 = state["lambda1"] self.lambda2 = state["lambda2"] self.rho = state["rho"] self.nn_constr = state["nn_constr"] self.l1_penalty = state["l1_penalty"] self.item_bias = state["item_bias"] self.model = state["model"] logger.info("Model loaded!") return state
def __str__(self): s = "ADMM_Slim(lambda1=%.4f, lamdba2=%.4f" %(self.lambda1, self.lambda2) s += ", rho=%.4f" %self.rho s += ", non_negativity=%s" %self.nn_constr s += ", L1_penalty=%s" %self.l1_penalty s += ", item_bias=%s" %self.item_bias if self.model is not None: s += ", model size=(%d, %d))" %self.model.shape else: s += ") - not trained yet!" return s def __repr__(self): return str(self)
#TODO documentation
[docs]class SVAE(MultiVAE): r"""Sequential Variational Autoencoders for Collaborative Filtering. **UNDOCUMENTED** [SVAE]_ Parameters ---------- mvae_net : :class:`torch.nn.Module` The variational autoencoder neural network. beta : :obj:`float` [optional] The :math:`\beta` hyper-parameter of Multi-VAE. When ``anneal_steps > 0`` then this value is the value to anneal starting from 0, otherwise the ``beta`` will be fixed to the given value for the duration of the training. By default 1. anneal_steps : :obj:`int` [optional] Number of annealing step for reaching the target value ``beta``, by default 0. 0 means that no annealing will be performed and the regularization parameter will be fixed to ``beta``. learning_rate : :obj:`float` [optional] The learning rate for the optimizer, by default 1e-3. References ---------- .. [SVAE] Noveen Sachdeva, Giuseppe Manco, Ettore Ritacco, and Vikram Pudi. 2019. Sequential Variational Autoencoders for Collaborative Filtering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM '19). Association for Computing Machinery, New York, NY, USA, 600–608. DOI: https://doi.org/10.1145/3289600.3291007 """ def __init__(self, svae_net, beta=1., anneal_steps=0, learning_rate=1e-3): super(SVAE, self).__init__(svae_net, beta=beta, anneal_steps=anneal_steps, learning_rate=learning_rate) self.optimizer = optim.Adam(self.network.parameters(), lr=learning_rate, weight_decay=5e-3)
[docs] def loss_function(self, recon_x, x, mu, logvar, beta=1.0): likelihood_n = -torch.sum(torch.sum(F.log_softmax(recon_x, -1) * x.view(recon_x.shape), -1)) likelihood_d = float(torch.sum(x[0, :recon_x.shape[2]])) KLD = -0.5 * torch.mean(torch.sum(1 + logvar - mu.pow(2) - logvar.exp(), dim=-1)) return likelihood_n / likelihood_d + beta * KLD
[docs] def predict(self, x, remove_train=True): self.network.eval() with torch.no_grad(): x_tensor = x.to(self.device) recon_x, mu, logvar = self.network(x_tensor) if remove_train: recon_x[0, -1, x_tensor] = -np.inf return recon_x[:, -1, :], mu, logvar