rectorch.metrics

Module containing the definition of the evaluation metrics.

The metrics are implemented as static methods of the class Metrics. Up to now the following metrics are implemented:

See also

Modules: evaluation

class rectorch.metrics.Metrics[source]

The class Metrics contains metric functions.

All methods are static and no object of type Metrics is needed to compute the metrics.

static compute(pred_scores, ground_truth, metrics_list)[source]

Compute the given list of evaluation metrics.

The method computes all the metric listed in metric_list for all the users.

Parameters
  • pred_scoresnumpy.array
    • The array with the predicted scores. Users are on the rows and items on the columns.

  • ground_truthnumpy.array
    • Binary array with the ground truth. 1 means the item is relevant for the user and 0 not relevant. Users are on the rows and items on the columns.

  • metrics_listlist of str
    • The list of metrics to compute. Metrics are indicated by strings formed in the following way:

      matric_name @ k

      where matric_name must correspond to one of the method names without the suffix ‘_at_k’, and k is the corresponding parameter of the method and it must be an integer value. For example: ndcg@10 is a valid metric name and it corresponds to the method ndcg_at_k() with k=10.

    Returns
  • dict of numpy.array
    • Dictionary with the results for each metric in metric_list. Keys are string representing the metric, while the value is an array with the value of the metric computed on the users.

    Examples

    >>> import numpy as np
    >>> from rectorch.metrics import Metrics
    >>> scores = np.array([[4., 3., 2., 1., 0.]])
    >>> groud_truth = np.array([[1., 1., 0., 0., 1.]])
    >>> met_list = ["recall@2", "recall@3", "ndcg@2"]
    >>> Metrics.compute(scores, ground_truth, met_list)
    {'recall@2': array([1.]),
     'recall@3': array([0.66666667]),
     'ndcg@2': array([1.])}
    
    static hit_at_k(pred_scores, ground_truth, k=100)[source]

    Compute the hit at k.

    The Hit@k is either 1, if a relevan item is in the top k scored items, or 0 otherwise.

    Parameters
  • pred_scoresnumpy.array
    • The array with the predicted scores. Users are on the rows and items on the columns.

  • ground_truthnumpy.array
    • Binary array with the ground truth. 1 means the item is relevant for the user and 0 not relevant. Users are on the rows and items on the columns.

  • kint [optional]
    • The number of top items to considers, by default 100

    Returns
  • numpy.array
    • An array containing the hit@k value for each user.

    Examples

    >>> import numpy as np
    >>> from rectorch.metrics import Metrics
    >>> scores = np.array([[4., 3., 2., 1.]])
    >>> ground_truth = np.array([[0, 0, 1., 1.]])
    >>> Metrics.hit_at_k(scores, ground_truth, 3)
    np.array([1.])
    >>> Metrics.hit_at_k(scores, ground_truth, 2)
    np.array([0.])
    
    static mrr_at_k(pred_scores, ground_truth, k=100)[source]

    Compute the Mean Reciprocal Rank (MRR).

    The MRR@k is the mean overall user of the reciprocal rank, that is the rank of the highest ranked relevant item, if any in the top k, 0 otherwise.

    Parameters
  • pred_scoresnumpy.array
    • The array with the predicted scores. Users are on the rows and items on the columns.

  • ground_truthnumpy.array
    • Binary array with the ground truth. 1 means the item is relevant for the user and 0 not relevant. Users are on the rows and items on the columns.

  • kint [optional]
    • The number of top items to considers, by default 100

    Returns
  • numpy.array
    • An array containing the mrr@k value for each user.

    Examples

    >>> import numpy as np
    >>> from rectorch.metrics import Metrics
    >>> scores = np.array([[4., 2., 3., 1.], [1., 2., 3., 4.]])
    >>> ground_truth = np.array([[0, 0, 1., 1.], [0, 0, 1., 1.]])
    >>> Metrics.mrr_at_k(scores, ground_truth, 3)
    array([.5, 1.])
    >>> Metrics.mrr_at_k(scores, ground_truth, 1)
    array([0., 1.])
    
    static ndcg_at_k(pred_scores, ground_truth, k=100)[source]

    Compute the Normalized Discount Cumulative Gain (nDCG).

    nDCG is a measure of ranking quality. nDCG measures the usefulness, or gain, of an item based on its position in the scoring list. The gain is accumulated from the top of the result list to the bottom, with the gain of each result discounted at lower ranks.

    The nDCG is computed over the top-k items (out of \(m\)), where k is specified as a parameter, for all users independently. The nDCG@k (\(k \in [1,2,\dots,m]\)) is computed with the following formula:

    \(nDCG@k = \frac{\textrm{DCG}@k}{\textrm{IDCG}@k}\)

    where

    \(\textrm{DCG}@k = \sum\limits_{i=1}^k \frac{2^{rel_i}-1}{\log_2 (i+1)},\)

    \(\textrm{IDCG}@k = \sum\limits_{i=1}^{\min(k,R)} \frac{1}{\log_2 (i+1)}\)

    with \(rel_i \in \{0,1\}\) indicates whether the item at i-th position in the ranking is relevant or not, and R is the number of relevant items.

    Parameters
  • pred_scoresnumpy.array
    • The array with the predicted scores. Users are on the rows and items on the columns.

  • ground_truthnumpy.array
    • Binary array with the ground truth items. 1 means the item is relevant for the user and 0 not relevant. Users are on the rows and items on the columns.

  • kint [optional]
    • The number of top items to considers, by default 100

    Returns
  • numpy.array
    • An array containing the ndcg@k value for each user.

    Examples

    >>> import numpy as np
    >>> from rectorch.metrics import Metrics
    >>> scores = np.array([[4., 3., 2., 1.]])
    >>> ground_truth = np.array([[0, 0, 1., 1.]])
    >>> Metrics.ndcg_at_k(scores, ground_truth, 3)
    array([0.306573596])
    
    static recall_at_k(pred_scores, ground_truth, k=100)[source]

    Compute the Recall.

    The recall@k is the fraction of the relevant items that are successfully scored in the top-k The recall is computed over the top-k items (out of \(m\)), where k is specified as a parameter, for all users independently.

    Recall@k is computed as

    \(\textrm{recall}@k = \frac{\textrm{TP}}{\textrm{TP}+\textrm{FN}}\)

    where TP and FN are the true positive and the false negative retrieved items, respectively.

    Parameters
  • pred_scoresnumpy.array
    • The array with the predicted scores. Users are on the rows and items on the columns.

  • ground_truthnumpy.array
    • Binary array with the ground truth. 1 means the item is relevant for the user and 0 not relevant. Users are on the rows and items on the columns.

  • kint [optional]
    • The number of top items to considers, by default 100

    Returns
  • numpy.array
    • An array containing the recall@k value for each user.

    Examples

    >>> import numpy as np
    >>> from rectorch.metrics import Metrics
    >>> scores = np.array([[4., 3., 2., 1.]])
    >>> ground_truth = np.array([[0, 0, 1., 1.]])
    >>> Metrics.ndcg_at_k(scores, ground_truth, 3)
    array([0.306573596])