Shortcuts

Functional metrics

Audio Metrics

pesq [func]

torchmetrics.functional.pesq(preds, target, fs, mode, keep_same_device=False)[source]

PESQ (Perceptual Evaluation of Speech Quality)

This is a wrapper for the pesq package [1]. Note that input will be moved to cpu to perform the metric calculation.

Note

using this metrics requires you to have pesq install. Either install as pip install torchmetrics[audio] or pip install pesq

Parameters
  • preds (Tensor) – shape [...,time]

  • target (Tensor) – shape [...,time]

  • fs (int) – sampling frequency, should be 16000 or 8000 (Hz)

  • mode (str) – ‘wb’ (wide-band) or ‘nb’ (narrow-band)

  • keep_same_device (bool) – whether to move the pesq value to the device of preds

Return type

Tensor

Returns

pesq value of shape […]

Raises

Example

>>> from torchmetrics.functional.audio import pesq
>>> import torch
>>> g = torch.manual_seed(1)
>>> preds = torch.randn(8000)
>>> target = torch.randn(8000)
>>> pesq(preds, target, 8000, 'nb')
tensor(2.2076)
>>> pesq(preds, target, 16000, 'wb')
tensor(1.7359)

References

[1] https://github.com/ludlows/python-pesq

pit [func]

torchmetrics.functional.pit(preds, target, metric_func, eval_func='max', **kwargs)[source]

Permutation invariant training (PIT). The PIT implements the famous Permutation Invariant Training method.

[1] in speech separation field in order to calculate audio metrics in a permutation invariant way.

Parameters
  • preds (Tensor) – shape [batch, spk, …]

  • target (Tensor) – shape [batch, spk, …]

  • metric_func (Callable) – a metric function accept a batch of target and estimate, i.e. metric_func(preds[:, i, …], target[:, j, …]), and returns a batch of metric tensors [batch]

  • eval_func (str) – the function to find the best permutation, can be ‘min’ or ‘max’, i.e. the smaller the better or the larger the better.

  • kwargs (Dict[str, Any]) – additional args for metric_func

Return type

Tuple[Tensor, Tensor]

Returns

best_metric of shape [batch], best_perm of shape [batch]

Example

>>> from torchmetrics.functional.audio import si_sdr
>>> # [batch, spk, time]
>>> preds = torch.tensor([[[-0.0579,  0.3560, -0.9604], [-0.1719,  0.3205,  0.2951]]])
>>> target = torch.tensor([[[ 1.0958, -0.1648,  0.5228], [-0.4100,  1.1942, -0.5103]]])
>>> best_metric, best_perm = pit(preds, target, si_sdr, 'max')
>>> best_metric
tensor([-5.1091])
>>> best_perm
tensor([[0, 1]])
>>> pit_permutate(preds, best_perm)
tensor([[[-0.0579,  0.3560, -0.9604],
         [-0.1719,  0.3205,  0.2951]]])
Reference:

[1] Permutation Invariant Training of Deep Models

si_sdr [func]

torchmetrics.functional.si_sdr(preds, target, zero_mean=False)[source]

Calculates Scale-invariant signal-to-distortion ratio (SI-SDR) metric. The SI-SDR value is in general considered an overall measure of how good a source sound.

Parameters
  • preds (Tensor) – shape [...,time]

  • target (Tensor) – shape [...,time]

  • zero_mean (bool) – If to zero mean target and preds or not

Return type

Tensor

Returns

si-sdr value of shape […]

Example

>>> from torchmetrics.functional.audio import si_sdr
>>> target = torch.tensor([3.0, -0.5, 2.0, 7.0])
>>> preds = torch.tensor([2.5, 0.0, 2.0, 8.0])
>>> si_sdr_val = si_sdr(preds, target)
>>> si_sdr_val
tensor(18.4030)

References

[1] Le Roux, Jonathan, et al. “SDR half-baked or well done.” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019.

si_snr [func]

torchmetrics.functional.si_snr(preds, target)[source]

Scale-invariant signal-to-noise ratio (SI-SNR).

Parameters
Return type

Tensor

Returns

si-snr value of shape […]

Example

>>> import torch
>>> from torchmetrics.functional.audio import si_snr
>>> target = torch.tensor([3.0, -0.5, 2.0, 7.0])
>>> preds = torch.tensor([2.5, 0.0, 2.0, 8.0])
>>> si_snr_val = si_snr(preds, target)
>>> si_snr_val
tensor(15.0918)

References

[1] Y. Luo and N. Mesgarani, “TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation,” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 696-700, doi: 10.1109/ICASSP.2018.8462116.

snr [func]

torchmetrics.functional.snr(preds, target, zero_mean=False)[source]

Signal-to-noise ratio (SNR):

\text{SNR} = \frac{P_{signal}}{P_{noise}}

where P denotes the power of each signal. The SNR metric compares the level of the desired signal to the level of background noise. Therefore, a high value of SNR means that the audio is clear.

Parameters
  • preds (Tensor) – shape [...,time]

  • target (Tensor) – shape [...,time]

  • zero_mean (bool) – if to zero mean target and preds or not

Return type

Tensor

Returns

snr value of shape […]

Example

>>> from torchmetrics.functional.audio import snr
>>> target = torch.tensor([3.0, -0.5, 2.0, 7.0])
>>> preds = torch.tensor([2.5, 0.0, 2.0, 8.0])
>>> snr_val = snr(preds, target)
>>> snr_val
tensor(16.1805)

References

[1] Le Roux, Jonathan, et al. “SDR half-baked or well done.” IEEE International Conference on Acoustics, Speech

and Signal Processing (ICASSP) 2019.

stoi [func]

torchmetrics.functional.stoi(preds, target, fs, extended=False, keep_same_device=False)[source]

STOI (Short Term Objective Intelligibility, see [2,3]), a wrapper for the pystoi package [1]. Note that input will be moved to cpu to perform the metric calculation.

Intelligibility measure which is highly correlated with the intelligibility of degraded speech signals, e.g., due to additive noise, single/multi-channel noise reduction, binary masking and vocoded speech as in CI simulations. The STOI-measure is intrusive, i.e., a function of the clean and degraded speech signals. STOI may be a good alternative to the speech intelligibility index (SII) or the speech transmission index (STI), when you are interested in the effect of nonlinear processing to noisy speech, e.g., noise reduction, binary masking algorithms, on speech intelligibility. Description taken from [Cees Taal’s website](http://www.ceestaal.nl/code/).

Note

using this metrics requires you to have pystoi install. Either install as pip install torchmetrics[audio] or pip install pystoi

Parameters
  • preds (Tensor) – shape [...,time]

  • target (Tensor) – shape [...,time]

  • fs (int) – sampling frequency (Hz)

  • extended (bool) – whether to use the extended STOI described in [4]

  • keep_same_device (bool) – whether to move the stoi value to the device of preds

Return type

Tensor

Returns

stoi value of shape […]

Raises

ValueError – If pystoi package is not installed

Example

>>> from torchmetrics.functional.audio import stoi
>>> import torch
>>> g = torch.manual_seed(1)
>>> preds = torch.randn(8000)
>>> target = torch.randn(8000)
>>> stoi(preds, target, 8000).float()
tensor(-0.0100)

References

[1] https://github.com/mpariente/pystoi

[2] C.H.Taal, R.C.Hendriks, R.Heusdens, J.Jensen ‘A Short-Time Objective Intelligibility Measure for Time-Frequency Weighted Noisy Speech’, ICASSP 2010, Texas, Dallas.

[3] C.H.Taal, R.C.Hendriks, R.Heusdens, J.Jensen ‘An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech’, IEEE Transactions on Audio, Speech, and Language Processing, 2011.

[4] J. Jensen and C. H. Taal, ‘An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers’, IEEE Transactions on Audio, Speech and Language Processing, 2016.

Classification Metrics

accuracy [func]

torchmetrics.functional.accuracy(preds, target, average='micro', mdmc_average='global', threshold=0.5, top_k=None, subset_accuracy=False, num_classes=None, multiclass=None, ignore_index=None)[source]

Computes Accuracy

\text{Accuracy} = \frac{1}{N}\sum_i^N 1(y_i = \hat{y}_i)

Where y is a tensor of target values, and \hat{y} is a tensor of predictions.

For multi-class and multi-dimensional multi-class data with probability or logits predictions, the parameter top_k generalizes this metric to a Top-K accuracy metric: for each sample the top-K highest probability or logits items are considered to find the correct label.

For multi-label and multi-dimensional multi-class inputs, this metric computes the “global” accuracy by default, which counts all labels or sub-samples separately. This can be changed to subset accuracy (which requires all labels or sub-samples in the sample to be correctly predicted) by setting subset_accuracy=True.

Accepts all input types listed in Input types.

Parameters
  • preds (Tensor) – Predictions from model (probabilities, logits or labels)

  • target (Tensor) – Ground truth labels

  • average (str) –

    Defines the reduction that is applied. Should be one of the following:

    • 'micro' [default]: Calculate the metric globally, across all samples and classes.

    • 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).

    • 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn).

    • 'none' or None: Calculate the metric for each class separately, and return the metric for every class.

    • 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).

    Note

    What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_average.

    Note

    If 'none' and a given class doesn’t occur in the preds or target, the value for the class will be nan.

  • mdmc_average (Optional[str]) –

    Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter). Should be one of the following:

    • None [default]: Should be left unchanged if your data is not multi-dimensional multi-class.

    • 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then averaged over samples. The computation for each sample is done by treating the flattened extra axes ... (see Input types) as the N dimension within the sample, and computing the metric for the sample based on that.

    • 'global': In this case the N and ... dimensions of the inputs (see Input types) are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the average parameter applies as usual.

  • num_classes (Optional[int]) – Number of classes. Necessary for 'macro', 'weighted' and None average methods.

  • threshold (float) – Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

  • top_k (Optional[int]) –

    Number of highest probability or logit score predictions considered to find the correct label, relevant only for (multi-dimensional) multi-class inputs. The default value (None) will be interpreted as 1 for these inputs.

    Should be left at default (None) for all other types of inputs.

  • multiclass (Optional[bool]) – Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter’s documentation section for a more detailed explanation and examples.

  • ignore_index (Optional[int]) – Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and average=None or 'none', the score for the ignored class will be returned as nan.

  • subset_accuracy (bool) –

    Whether to compute subset accuracy for multi-label and multi-dimensional multi-class inputs (has no effect for other input types).

    • For multi-label inputs, if the parameter is set to True, then all labels for each sample must be correctly predicted for the sample to count as correct. If it is set to False, then all labels are counted separately - this is equivalent to flattening inputs beforehand (i.e. preds = preds.flatten() and same for target).

    • For multi-dimensional multi-class inputs, if the parameter is set to True, then all sub-sample (on the extra axis) must be correct for the sample to be counted as correct. If it is set to False, then all sub-samples are counter separately - this is equivalent, in the case of label predictions, to flattening the inputs beforehand (i.e. preds = preds.flatten() and same for target). Note that the top_k parameter still applies in both cases, if set.

Raises
  • ValueError – If top_k parameter is set for multi-label inputs.

  • ValueError – If average is none of "micro", "macro", "weighted", "samples", "none", None.

  • ValueError – If mdmc_average is not one of None, "samplewise", "global".

  • ValueError – If average is set but num_classes is not provided.

  • ValueError – If num_classes is set and ignore_index is not in the range [0, num_classes).

  • ValueError – If top_k is not an integer larger than 0.

Example

>>> import torch
>>> from torchmetrics.functional import accuracy
>>> target = torch.tensor([0, 1, 2, 3])
>>> preds = torch.tensor([0, 2, 1, 3])
>>> accuracy(preds, target)
tensor(0.5000)
>>> target = torch.tensor([0, 1, 2])
>>> preds = torch.tensor([[0.1, 0.9, 0], [0.3, 0.1, 0.6], [0.2, 0.5, 0.3]])
>>> accuracy(preds, target, top_k=2)
tensor(0.6667)
Return type

Tensor

auc [func]

torchmetrics.functional.auc(x, y, reorder=False)[source]

Computes Area Under the Curve (AUC) using the trapezoidal rule.

Parameters
  • x (Tensor) – x-coordinates, must be either increasing or decreasing

  • y (Tensor) – y-coordinates

  • reorder (bool) – if True, will reorder the arrays to make it either increasing or decreasing

Return type

Tensor

Returns

Tensor containing AUC score (float)

Raises
  • ValueError – If both x and y tensors are not 1d.

  • ValueError – If both x and y don’t have the same numnber of elements.

  • ValueError – If x tesnsor is neither increasing or decreasing.

Example

>>> from torchmetrics.functional import auc
>>> x = torch.tensor([0, 1, 2, 3])
>>> y = torch.tensor([0, 1, 2, 2])
>>> auc(x, y)
tensor(4.)
>>> auc(x, y, reorder=True)
tensor(4.)

auroc [func]

torchmetrics.functional.auroc(preds, target, num_classes=None, pos_label=None, average='macro', max_fpr=None, sample_weights=None)[source]

Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC)

For non-binary input, if the preds and target tensor have the same size the input will be interpretated as multilabel and if preds have one dimension more than the target tensor the input will be interpretated as multiclass.

Note

If either the positive class or negative class is completly missing in the target tensor, the auroc score is meaningless in this case and a score of 0 will be returned together with an warning.

Parameters
  • preds (Tensor) – predictions from model (logits or probabilities)

  • target (Tensor) – Ground truth labels

  • num_classes (Optional[int]) – integer with number of classes for multi-label and multiclass problems. Should be set to None for binary problems

  • pos_label (Optional[int]) – integer determining the positive class. Default is None which for binary problem is translate to 1. For multiclass problems this argument should not be set as we iteratively change it in the range [0,num_classes-1]

  • average (Optional[str]) –

    • 'micro' computes metric globally. Only works for multilabel problems

    • 'macro' computes metric for each class and uniformly averages them

    • 'weighted' computes metric for each class and does a weighted-average, where each class is weighted by their support (accounts for class imbalance)

    • None computes and returns the metric per class

  • max_fpr (Optional[float]) – If not None, calculates standardized partial AUC over the range [0, max_fpr]. Should be a float between 0 and 1.

  • sample_weights (Optional[Sequence]) – sample weights for each data point

Raises
  • ValueError – If max_fpr is not a float in the range (0, 1].

  • RuntimeError – If PyTorch version is below 1.6 since max_fpr requires torch.bucketize which is not available below 1.6.

  • ValueError – If max_fpr is not set to None and the mode is not binary since partial AUC computation is not available in multilabel/multiclass.

  • ValueError – If average is none of None, "macro" or "weighted".

Example (binary case):
>>> from torchmetrics.functional import auroc
>>> preds = torch.tensor([0.13, 0.26, 0.08, 0.19, 0.34])
>>> target = torch.tensor([0, 0, 1, 1, 1])
>>> auroc(preds, target, pos_label=1)
tensor(0.5000)
Example (multiclass case):
>>> preds = torch.tensor([[0.90, 0.05, 0.05],
...                       [0.05, 0.90, 0.05],
...                       [0.05, 0.05, 0.90],
...                       [0.85, 0.05, 0.10],
...                       [0.10, 0.10, 0.80]])
>>> target = torch.tensor([0, 1, 1, 2, 2])
>>> auroc(preds, target, num_classes=3)
tensor(0.7778)
Return type

Tensor

average_precision [func]

torchmetrics.functional.average_precision(preds, target, num_classes=None, pos_label=None, average='macro', sample_weights=None)[source]

Computes the average precision score.

Parameters
  • preds (Tensor) – predictions from model (logits or probabilities)

  • target (Tensor) – ground truth values

  • num_classes (Optional[int]) – integer with number of classes. Not nessesary to provide for binary problems.

  • pos_label (Optional[int]) – integer determining the positive class. Default is None which for binary problem is translate to 1. For multiclass problems this argument should not be set as we iteratively change it in the range [0,num_classes-1]

  • average (Optional[str]) –

    defines the reduction that is applied in the case of multiclass and multilabel input. Should be one of the following:

    • 'macro' [default]: Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).

    • 'micro': Calculate the metric globally, across all samples and classes. Cannot be used with multiclass input.

    • 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support.

    • 'none' or None: Calculate the metric for each class separately, and return the metric for every class.

  • sample_weights (Optional[Sequence]) – sample weights for each data point

Return type

Union[List[Tensor], Tensor]

Returns

tensor with average precision. If multiclass will return list of such tensors, one for each class

Example (binary case):
>>> from torchmetrics.functional import average_precision
>>> pred = torch.tensor([0, 1, 2, 3])
>>> target = torch.tensor([0, 1, 1, 1])
>>> average_precision(pred, target, pos_label=1)
tensor(1.)
Example (multiclass case):
>>> pred = torch.tensor([[0.75, 0.05, 0.05, 0.05, 0.05],
...                      [0.05, 0.75, 0.05, 0.05, 0.05],
...                      [0.05, 0.05, 0.75, 0.05, 0.05],
...                      [0.05, 0.05, 0.05, 0.75, 0.05]])
>>> target = torch.tensor([0, 1, 3, 2])
>>> average_precision(pred, target, num_classes=5, average=None)
[tensor(1.), tensor(1.), tensor(0.2500), tensor(0.2500), tensor(nan)]

calibration_error [func]

torchmetrics.functional.calibration_error(preds, target, n_bins=15, norm='l1')[source]

Computes the Top-label Calibration Error

Three different norms are implemented, each corresponding to variations on the calibration error metric.

L1 norm (Expected Calibration Error)

\text{ECE} = \frac{1}{N}\sum_i^N \|(p_i - c_i)\|

Infinity norm (Maximum Calibration Error)

\text{RMSCE} =  \max_{i} (p_i - c_i)

L2 norm (Root Mean Square Calibration Error)

\text{MCE} = \frac{1}{N}\sum_i^N (p_i - c_i)^2

Where p_i is the top-1 prediction accuracy in bin i and c_i is the average confidence of predictions in bin i.

Parameters
  • preds (Tensor) – Model output probabilities.

  • target (Tensor) – Ground-truth target class labels.

  • n_bins (int, optional) – Number of bins to use when computing t. Defaults to 15.

  • norm (str, optional) – Norm used to compare empirical and expected probability bins. Defaults to “l1”, or Expected Calibration Error.

Return type

Tensor

cohen_kappa [func]

torchmetrics.functional.cohen_kappa(preds, target, num_classes, weights=None, threshold=0.5)[source]
Calculates Cohen’s kappa score that measures inter-annotator agreement.

It is defined as

\kappa = (p_o - p_e) / (1 - p_e)

where p_o is the empirical probability of agreement and p_e isg the expected agreement when both annotators assign labels randomly. Note that p_e is estimated using a per-annotator empirical prior over the class labels.

Args:
preds: (float or long tensor), Either a (N, ...) tensor with labels or

(N, C, ...) where C is the number of classes, tensor with labels/probabilities

target: target (long tensor), tensor with shape (N, ...) with ground true labels

num_classes: Number of classes in the dataset.

weights: Weighting type to calculate the score. Choose from
  • None or 'none': no weighting

  • 'linear': linear weighting

  • 'quadratic': quadratic weighting

threshold:

Threshold value for binary or multi-label probabilities. default: 0.5

Example:
>>> from torchmetrics.functional import cohen_kappa
>>> target = torch.tensor([1, 1, 0, 0])
>>> preds = torch.tensor([0, 1, 0, 0])
>>> cohen_kappa(preds, target, num_classes=2)
tensor(0.5000)
Return type

Tensor

confusion_matrix [func]

torchmetrics.functional.confusion_matrix(preds, target, num_classes, normalize=None, threshold=0.5, multilabel=False)[source]

Computes the confusion matrix. Works with binary, multiclass, and multilabel data. Accepts probabilities or logits from a model output or integer class values in prediction. Works with multi-dimensional preds and target, but it should be noted that additional dimensions will be flattened.

If preds and target are the same shape and preds is a float tensor, we use the self.threshold argument to convert into integer labels. This is the case for binary and multi-label probabilities or logits.

If preds has an extra dimension as in the case of multi-class scores we perform an argmax on dim=1.

If working with multilabel data, setting the is_multilabel argument to True will make sure that a confusion matrix gets calculated per label.

Parameters
  • preds (Tensor) – (float or long tensor), Either a (N, ...) tensor with labels or (N, C, ...) where C is the number of classes, tensor with labels/logits/probabilities

  • target (Tensor) – target (long tensor), tensor with shape (N, ...) with ground true labels

  • num_classes (int) – Number of classes in the dataset.

  • normalize (Optional[str]) –

    Normalization mode for confusion matrix. Choose from

    • None or 'none': no normalization (default)

    • 'true': normalization over the targets (most commonly used)

    • 'pred': normalization over the predictions

    • 'all': normalization over the whole matrix

  • threshold (float) – Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

  • multilabel (bool) – determines if data is multilabel or not.

Example (binary data):
>>> from torchmetrics import ConfusionMatrix
>>> target = torch.tensor([1, 1, 0, 0])
>>> preds = torch.tensor([0, 1, 0, 0])
>>> confmat = ConfusionMatrix(num_classes=2)
>>> confmat(preds, target)
tensor([[2., 0.],
        [1., 1.]])
Example (multiclass data):
>>> target = torch.tensor([2, 1, 0, 0])
>>> preds = torch.tensor([2, 1, 0, 1])
>>> confmat = ConfusionMatrix(num_classes=3)
>>> confmat(preds, target)
tensor([[1., 1., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])
Example (multilabel data):
>>> target = torch.tensor([[0, 1, 0], [1, 0, 1]])
>>> preds = torch.tensor([[0, 0, 1], [1, 0, 1]])
>>> confmat = ConfusionMatrix(num_classes=3, multilabel=True)
>>> confmat(preds, target)  
tensor([[[1., 0.], [0., 1.]],
        [[1., 0.], [1., 0.]],
        [[0., 1.], [0., 1.]]])
Return type

Tensor

dice_score [func]

torchmetrics.functional.dice_score(preds, target, bg=False, nan_score=0.0, no_fg_score=0.0, reduction='elementwise_mean')[source]

Compute dice score from prediction scores.

Parameters
  • preds (Tensor) – estimated probabilities

  • target (Tensor) – ground-truth labels

  • bg (bool) – whether to also compute dice for the background

  • nan_score (float) – score to return, if a NaN occurs during computation

  • no_fg_score (float) – score to return, if no foreground pixel was found in target

  • reduction (str) –

    a method to reduce metric score over labels.

    • 'elementwise_mean': takes the mean (default)

    • 'sum': takes the sum

    • 'none': no reduction will be applied

Return type

Tensor

Returns

Tensor containing dice score

Example

>>> from torchmetrics.functional import dice_score
>>> pred = torch.tensor([[0.85, 0.05, 0.05, 0.05],
...                      [0.05, 0.85, 0.05, 0.05],
...                      [0.05, 0.05, 0.85, 0.05],
...                      [0.05, 0.05, 0.05, 0.85]])
>>> target = torch.tensor([0, 1, 3, 2])
>>> dice_score(pred, target)
tensor(0.3333)

f1 [func]

torchmetrics.functional.f1(preds, target, beta=1.0, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)[source]

Computes F1 metric. F1 metrics correspond to a equally weighted average of the precision and recall scores.

Works with binary, multiclass, and multilabel data. Accepts probabilities or logits from a model output or integer class values in prediction. Works with multi-dimensional preds and target.

If preds and target are the same shape and preds is a float tensor, we use the self.threshold argument to convert into integer labels. This is the case for binary and multi-label probabilities or logits.

If preds has an extra dimension as in the case of multi-class scores we perform an argmax on dim=1.

The reduction method (how the precision scores are aggregated) is controlled by the average parameter, and additionally by the mdmc_average parameter in the multi-dimensional multi-class case. Accepts all inputs listed in Input types.

Parameters
  • preds (Tensor) – Predictions from model (probabilities, logits or labels)

  • target (Tensor) – Ground truth values

  • average (str) –

    Defines the reduction that is applied. Should be one of the following:

    • 'micro' [default]: Calculate the metric globally, across all samples and classes.

    • 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).

    • 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn).

    • 'none' or None: Calculate the metric for each class separately, and return the metric for every class.

    • 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).

    Note

    What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_average.

    Note

    If 'none' and a given class doesn’t occur in the preds or target, the value for the class will be nan.

  • mdmc_average (Optional[str]) –

    Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter). Should be one of the following:

    • None [default]: Should be left unchanged if your data is not multi-dimensional multi-class.

    • 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then averaged over samples. The computation for each sample is done by treating the flattened extra axes ... (see Input types) as the N dimension within the sample, and computing the metric for the sample based on that.

    • 'global': In this case the N and ... dimensions of the inputs (see Input types) are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the average parameter applies as usual.

  • ignore_index (Optional[int]) – Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and average=None or 'none', the score for the ignored class will be returned as nan.

  • num_classes (Optional[int]) – Number of classes. Necessary for 'macro', 'weighted' and None average methods.

  • threshold (float) – Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

  • top_k (Optional[int]) –

    Number of highest probability or logit score predictions considered to find the correct label, relevant only for (multi-dimensional) multi-class inputs. The default value (None) will be interpreted as 1 for these inputs.

    Should be left at default (None) for all other types of inputs.

  • multiclass (Optional[bool]) – Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter’s documentation section for a more detailed explanation and examples.

Return type

Tensor

Returns

The shape of the returned tensor depends on the average parameter

  • If average in ['micro', 'macro', 'weighted', 'samples'], a one-element tensor will be returned

  • If average in ['none', None], the shape will be (C,), where C stands for the number of classes

Example

>>> from torchmetrics.functional import f1
>>> target = torch.tensor([0, 1, 2, 0, 1, 2])
>>> preds = torch.tensor([0, 2, 1, 0, 0, 1])
>>> f1(preds, target, num_classes=3)
tensor(0.3333)

fbeta [func]

torchmetrics.functional.fbeta(preds, target, beta=1.0, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)[source]

Computes f_beta metric.

F_{\beta} = (1 + \beta^2) * \frac{\text{precision} * \text{recall}}
{(\beta^2 * \text{precision}) + \text{recall}}

Works with binary, multiclass, and multilabel data. Accepts probabilities or logits from a model output or integer class values in prediction. Works with multi-dimensional preds and target.

If preds and target are the same shape and preds is a float tensor, we use the self.threshold argument to convert into integer labels. This is the case for binary and multi-label logits or probabilities.

If preds has an extra dimension as in the case of multi-class scores we perform an argmax on dim=1.

The reduction method (how the precision scores are aggregated) is controlled by the average parameter, and additionally by the mdmc_average parameter in the multi-dimensional multi-class case. Accepts all inputs listed in Input types.

Parameters
  • preds (Tensor) – Predictions from model (probabilities, logits or labels)

  • target (Tensor) – Ground truth values

  • average (str) –

    Defines the reduction that is applied. Should be one of the following:

    • 'micro' [default]: Calculate the metric globally, across all samples and classes.

    • 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).

    • 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn).

    • 'none' or None: Calculate the metric for each class separately, and return the metric for every class.

    • 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).

    Note

    What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_average.

    Note

    If 'none' and a given class doesn’t occur in the preds or target, the value for the class will be nan.

  • mdmc_average (Optional[str]) –

    Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter). Should be one of the following:

    • None [default]: Should be left unchanged if your data is not multi-dimensional multi-class.

    • 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then averaged over samples. The computation for each sample is done by treating the flattened extra axes ... (see Input types) as the N dimension within the sample, and computing the metric for the sample based on that.

    • 'global': In this case the N and ... dimensions of the inputs (see Input types) are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the average parameter applies as usual.

  • ignore_index (Optional[int]) – Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and average=None or 'none', the score for the ignored class will be returned as nan.

  • num_classes (Optional[int]) – Number of classes. Necessary for 'macro', 'weighted' and None average methods.

  • threshold (float) – Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

  • top_k (Optional[int]) –

    Number of highest probability or logit score predictions considered to find the correct label, relevant only for (multi-dimensional) multi-class inputs. The default value (None) will be interpreted as 1 for these inputs.

    Should be left at default (None) for all other types of inputs.

  • multiclass (Optional[bool]) – Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter’s documentation section for a more detailed explanation and examples.

Return type

Tensor

Returns

The shape of the returned tensor depends on the average parameter

  • If average in ['micro', 'macro', 'weighted', 'samples'], a one-element tensor will be returned

  • If average in ['none', None], the shape will be (C,), where C stands for the number of classes

Example

>>> from torchmetrics.functional import fbeta
>>> target = torch.tensor([0, 1, 2, 0, 1, 2])
>>> preds = torch.tensor([0, 2, 1, 0, 0, 1])
>>> fbeta(preds, target, num_classes=3, beta=0.5)
tensor(0.3333)

hamming_distance [func]

torchmetrics.functional.hamming_distance(preds, target, threshold=0.5)[source]

Computes the average Hamming distance (also known as Hamming loss) between targets and predictions:

\text{Hamming distance} = \frac{1}{N \cdot L} \sum_i^N \sum_l^L 1(y_{il} \neq \hat{y}_{il})

Where y is a tensor of target values, \hat{y} is a tensor of predictions, and \bullet_{il} refers to the l-th label of the i-th sample of that tensor.

This is the same as 1-accuracy for binary data, while for all other types of inputs it treats each possible label separately - meaning that, for example, multi-class data is treated as if it were multi-label.

Accepts all input types listed in Input types.

Parameters
  • preds (Tensor) – Predictions from model (probabilities, logits or labels)

  • target (Tensor) – Ground truth

  • threshold (float) – Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

Example

>>> from torchmetrics.functional import hamming_distance
>>> target = torch.tensor([[0, 1], [1, 1]])
>>> preds = torch.tensor([[0, 1], [0, 1]])
>>> hamming_distance(preds, target)
tensor(0.2500)
Return type

Tensor

hinge [func]

torchmetrics.functional.hinge(preds, target, squared=False, multiclass_mode=None)[source]

Computes the mean Hinge loss typically used for Support Vector Machines (SVMs).

In the binary case it is defined as:

\text{Hinge loss} = \max(0, 1 - y \times \hat{y})

Where y \in {-1, 1} is the target, and \hat{y} \in \mathbb{R} is the prediction.

In the multi-class case, when multiclass_mode=None (default), multiclass_mode=MulticlassMode.CRAMMER_SINGER or multiclass_mode="crammer-singer", this metric will compute the multi-class hinge loss defined by Crammer and Singer as:

\text{Hinge loss} = \max\left(0, 1 - \hat{y}_y + \max_{i \ne y} (\hat{y}_i)\right)

Where y \in {0, ..., \mathrm{C}} is the target class (where \mathrm{C} is the number of classes), and \hat{y} \in \mathbb{R}^\mathrm{C} is the predicted output per class.

In the multi-class case when multiclass_mode=MulticlassMode.ONE_VS_ALL or multiclass_mode='one-vs-all', this metric will use a one-vs-all approach to compute the hinge loss, giving a vector of C outputs where each entry pits that class against all remaining classes.

This metric can optionally output the mean of the squared hinge loss by setting squared=True

Only accepts inputs with preds shape of (N) (binary) or (N, C) (multi-class) and target shape of (N).

Parameters
  • preds (Tensor) – Predictions from model (as float outputs from decision function).

  • target (Tensor) – Ground truth labels.

  • squared (bool) – If True, this will compute the squared hinge loss. Otherwise, computes the regular hinge loss (default).

  • multiclass_mode (Union[str, MulticlassMode, None]) – Which approach to use for multi-class inputs (has no effect in the binary case). None (default), MulticlassMode.CRAMMER_SINGER or "crammer-singer", uses the Crammer Singer multi-class hinge loss. MulticlassMode.ONE_VS_ALL or "one-vs-all" computes the hinge loss in a one-vs-all fashion.

Raises
  • ValueError – If preds shape is not of size (N) or (N, C).

  • ValueError – If target shape is not of size (N).

  • ValueError – If multiclass_mode is not: None, MulticlassMode.CRAMMER_SINGER, "crammer-singer", MulticlassMode.ONE_VS_ALL or "one-vs-all".

Example (binary case):
>>> import torch
>>> from torchmetrics.functional import hinge
>>> target = torch.tensor([0, 1, 1])
>>> preds = torch.tensor([-2.2, 2.4, 0.1])
>>> hinge(preds, target)
tensor(0.3000)
Example (default / multiclass case):
>>> target = torch.tensor([0, 1, 2])
>>> preds = torch.tensor([[-1.0, 0.9, 0.2], [0.5, -1.1, 0.8], [2.2, -0.5, 0.3]])
>>> hinge(preds, target)
tensor(2.9000)
Example (multiclass example, one vs all mode):
>>> target = torch.tensor([0, 1, 2])
>>> preds = torch.tensor([[-1.0, 0.9, 0.2], [0.5, -1.1, 0.8], [2.2, -0.5, 0.3]])
>>> hinge(preds, target, multiclass_mode="one-vs-all")
tensor([2.2333, 1.5000, 1.2333])
Return type

Tensor

iou [func]

torchmetrics.functional.iou(preds, target, ignore_index=None, absent_score=0.0, threshold=0.5, num_classes=None, reduction='elementwise_mean')[source]

Computes Jaccard index

J(A,B) = \frac{|A\cap B|}{|A\cup B|}

Where: A and B are both tensors of the same size, containing integer class values. They may be subject to conversion from input data (see description below).

Note that it is different from box IoU.

If preds and target are the same shape and preds is a float tensor, we use the self.threshold argument to convert into integer labels. This is the case for binary and multi-label probabilities.

If pred has an extra dimension as in the case of multi-class scores we perform an argmax on dim=1.

Parameters
  • preds (Tensor) – tensor containing predictions from model (probabilities, or labels) with shape [N, d1, d2, ...]

  • target (Tensor) – tensor containing ground truth labels with shape [N, d1, d2, ...]

  • ignore_index (Optional[int]) – optional int specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. Has no effect if given an int that is not in the range [0, num_classes-1], where num_classes is either given or derived from pred and target. By default, no index is ignored, and all classes are used.

  • absent_score (float) – score to use for an individual class, if no instances of the class index were present in pred AND no instances of the class index were present in target. For example, if we have 3 classes, [0, 0] for pred, and [0, 2] for target, then class 1 would be assigned the absent_score.

  • threshold (float) – Threshold value for binary or multi-label probabilities. default: 0.5

  • num_classes (Optional[int]) – Optionally specify the number of classes

  • reduction (str) –

    a method to reduce metric score over labels.

    • 'elementwise_mean': takes the mean (default)

    • 'sum': takes the sum

    • 'none': no reduction will be applied

Returns

Tensor containing single value if reduction is ‘elementwise_mean’, or number of classes if reduction is ‘none’

Return type

IoU score

Example

>>> from torchmetrics.functional import iou
>>> target = torch.randint(0, 2, (10, 25, 25))
>>> pred = torch.tensor(target)
>>> pred[2:5, 7:13, 9:15] = 1 - pred[2:5, 7:13, 9:15]
>>> iou(pred, target)
tensor(0.9660)

kl_divergence [func]

torchmetrics.functional.kl_divergence(p, q, log_prob=False, reduction='mean')[source]

Computes KL divergence

D_{KL}(P||Q) = \sum_{x\in\mathcal{X}} P(x) \log\frac{P(x)}{Q{x}}

Where P and Q are probability distributions where P usually represents a distribution over data and Q is often a prior or approximation of P. It should be noted that the KL divergence is a non-symetrical metric i.e. D_{KL}(P||Q) \neq D_{KL}(Q||P).

Parameters
  • p (Tensor) – data distribution with shape [N, d]

  • q (Tensor) – prior or approximate distribution with shape [N, d]

  • log_prob (bool) – bool indicating if input is log-probabilities or probabilities. If given as probabilities, will normalize to make sure the distributes sum to 1

  • reduction (Optional[str]) –

    Determines how to reduce over the N/batch dimension:

    • 'mean' [default]: Averages score across samples

    • 'sum': Sum score across samples

    • 'none' or None: Returns score per sample

Example

>>> import torch
>>> p = torch.tensor([[0.36, 0.48, 0.16]])
>>> q = torch.tensor([[1/3, 1/3, 1/3]])
>>> kl_divergence(p, q)
tensor(0.0853)
Return type

Tensor

matthews_corrcoef [func]

torchmetrics.functional.matthews_corrcoef(preds, target, num_classes, threshold=0.5)[source]

Calculates Matthews correlation coefficient that measures the general correlation or quality of a classification. In the binary case it is defined as:

MCC = \frac{TP*TN - FP*FN}{\sqrt{(TP+FP)*(TP+FN)*(TN+FP)*(TN+FN)}}

where TP, TN, FP and FN are respectively the true postitives, true negatives, false positives and false negatives. Also works in the case of multi-label or multi-class input.

Parameters
  • preds (Tensor) – (float or long tensor), Either a (N, ...) tensor with labels or (N, C, ...) where C is the number of classes, tensor with labels/probabilities

  • target (Tensor) – target (long tensor), tensor with shape (N, ...) with ground true labels

  • num_classes (int) – Number of classes in the dataset.

  • threshold (float) – Threshold value for binary or multi-label probabilities. default: 0.5

Example

>>> from torchmetrics.functional import matthews_corrcoef
>>> target = torch.tensor([1, 1, 0, 0])
>>> preds = torch.tensor([0, 1, 0, 0])
>>> matthews_corrcoef(preds, target, num_classes=2)
tensor(0.5774)
Return type

Tensor

roc [func]

torchmetrics.functional.roc(preds, target, num_classes=None, pos_label=None, sample_weights=None)[source]

Computes the Receiver Operating Characteristic (ROC). Works with both binary, multiclass and multilabel input.

Note

If either the positive class or negative class is completly missing in the target tensor, the roc values are not well defined in this case and a tensor of zeros will be returned (either fpr or tpr depending on what class is missing) together with an warning.

Parameters
  • preds (Tensor) – predictions from model (logits or probabilities)

  • target (Tensor) – ground truth values

  • num_classes (Optional[int]) – integer with number of classes for multi-label and multiclass problems. Should be set to None for binary problems

  • pos_label (Optional[int]) – integer determining the positive class. Default is None which for binary problem is translate to 1. For multiclass problems this argument should not be set as we iteratively change it in the range [0,num_classes-1]

  • sample_weights (Optional[Sequence]) – sample weights for each data point

Return type

Union[Tuple[Tensor, Tensor, Tensor], Tuple[List[Tensor], List[Tensor], List[Tensor]]]

Returns

3-element tuple containing

fpr:

tensor with false positive rates. If multiclass or multilabel, this is a list of such tensors, one for each class/label.

tpr:

tensor with true positive rates. If multiclass or multilabel, this is a list of such tensors, one for each class/label.

thresholds:

tensor with thresholds used for computing false- and true postive rates If multiclass or multilabel, this is a list of such tensors, one for each class/label.

Example (binary case):
>>> from torchmetrics.functional import roc
>>> pred = torch.tensor([0, 1, 2, 3])
>>> target = torch.tensor([0, 1, 1, 1])
>>> fpr, tpr, thresholds = roc(pred, target, pos_label=1)
>>> fpr
tensor([0., 0., 0., 0., 1.])
>>> tpr
tensor([0.0000, 0.3333, 0.6667, 1.0000, 1.0000])
>>> thresholds
tensor([4, 3, 2, 1, 0])
Example (multiclass case):
>>> from torchmetrics.functional import roc
>>> pred = torch.tensor([[0.75, 0.05, 0.05, 0.05],
...                      [0.05, 0.75, 0.05, 0.05],
...                      [0.05, 0.05, 0.75, 0.05],
...                      [0.05, 0.05, 0.05, 0.75]])
>>> target = torch.tensor([0, 1, 3, 2])
>>> fpr, tpr, thresholds = roc(pred, target, num_classes=4)
>>> fpr
[tensor([0., 0., 1.]), tensor([0., 0., 1.]), tensor([0.0000, 0.3333, 1.0000]), tensor([0.0000, 0.3333, 1.0000])]
>>> tpr
[tensor([0., 1., 1.]), tensor([0., 1., 1.]), tensor([0., 0., 1.]), tensor([0., 0., 1.])]
>>> thresholds 
[tensor([1.7500, 0.7500, 0.0500]),
 tensor([1.7500, 0.7500, 0.0500]),
 tensor([1.7500, 0.7500, 0.0500]),
 tensor([1.7500, 0.7500, 0.0500])]
Example (multilabel case):
>>> from torchmetrics.functional import roc
>>> pred = torch.tensor([[0.8191, 0.3680, 0.1138],
...                      [0.3584, 0.7576, 0.1183],
...                      [0.2286, 0.3468, 0.1338],
...                      [0.8603, 0.0745, 0.1837]])
>>> target = torch.tensor([[1, 1, 0], [0, 1, 0], [0, 0, 0], [0, 1, 1]])
>>> fpr, tpr, thresholds = roc(pred, target, num_classes=3, pos_label=1)
>>> fpr 
[tensor([0.0000, 0.3333, 0.3333, 0.6667, 1.0000]),
 tensor([0., 0., 0., 1., 1.]),
 tensor([0.0000, 0.0000, 0.3333, 0.6667, 1.0000])]
>>> tpr
[tensor([0., 0., 1., 1., 1.]), tensor([0.0000, 0.3333, 0.6667, 0.6667, 1.0000]), tensor([0., 1., 1., 1., 1.])]
>>> thresholds 
[tensor([1.8603, 0.8603, 0.8191, 0.3584, 0.2286]),
 tensor([1.7576, 0.7576, 0.3680, 0.3468, 0.0745]),
 tensor([1.1837, 0.1837, 0.1338, 0.1183, 0.1138])]

precision [func]

torchmetrics.functional.precision(preds, target, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)[source]

Computes Precision

\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}

Where \text{TP} and \text{FP} represent the number of true positives and false positives respecitively. With the use of top_k parameter, this metric can generalize to Precision@K.

The reduction method (how the precision scores are aggregated) is controlled by the average parameter, and additionally by the mdmc_average parameter in the multi-dimensional multi-class case. Accepts all inputs listed in Input types.

Parameters
  • preds (Tensor) – Predictions from model (probabilities, logits or labels)

  • target (Tensor) – Ground truth values

  • average (str) –

    Defines the reduction that is applied. Should be one of the following:

    • 'micro' [default]: Calculate the metric globally, across all samples and classes.

    • 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).

    • 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn).

    • 'none' or None: Calculate the metric for each class separately, and return the metric for every class.

    • 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).

    Note

    What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_average.

    Note

    If 'none' and a given class doesn’t occur in the preds or target, the value for the class will be nan.

  • mdmc_average (Optional[str]) –

    Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter). Should be one of the following:

    • None [default]: Should be left unchanged if your data is not multi-dimensional multi-class.

    • 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then averaged over samples. The computation for each sample is done by treating the flattened extra axes ... (see Input types) as the N dimension within the sample, and computing the metric for the sample based on that.

    • 'global': In this case the N and ... dimensions of the inputs (see Input types) are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the average parameter applies as usual.

  • ignore_index (Optional[int]) – Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and average=None or 'none', the score for the ignored class will be returned as nan.

  • num_classes (Optional[int]) – Number of classes. Necessary for 'macro', 'weighted' and None average methods.

  • threshold (float) – Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

  • top_k (Optional[int]) –

    Number of highest probability or logit score predictions considered to find the correct label, relevant only for (multi-dimensional) multi-class inputs. The default value (None) will be interpreted as 1 for these inputs.

    Should be left at default (None) for all other types of inputs.

  • multiclass (Optional[bool]) – Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter’s documentation section for a more detailed explanation and examples.

Return type

Tensor

Returns

The shape of the returned tensor depends on the average parameter

  • If average in ['micro', 'macro', 'weighted', 'samples'], a one-element tensor will be returned

  • If average in ['none', None], the shape will be (C,), where C stands for the number of classes

Raises
  • ValueError – If average is not one of "micro", "macro", "weighted", "samples", "none" or None.

  • ValueError – If mdmc_average is not one of None, "samplewise", "global".

  • ValueError – If average is set but num_classes is not provided.

  • ValueError – If num_classes is set and ignore_index is not in the range [0, num_classes).

Example

>>> from torchmetrics.functional import precision
>>> preds  = torch.tensor([2, 0, 2, 1])
>>> target = torch.tensor([1, 1, 2, 0])
>>> precision(preds, target, average='macro', num_classes=3)
tensor(0.1667)
>>> precision(preds, target, average='micro')
tensor(0.2500)

precision_recall [func]

torchmetrics.functional.precision_recall(preds, target, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)[source]

Computes Precision

\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}

\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}

Where \text{TP}text{FN}` and \text{FP} represent the number of true positives, false negatives and false positives respecitively. With the use of top_k parameter, this metric can generalize to Recall@K and Precision@K.

The reduction method (how the recall scores are aggregated) is controlled by the average parameter, and additionally by the mdmc_average parameter in the multi-dimensional multi-class case. Accepts all inputs listed in Input types.

Parameters
  • preds (Tensor) – Predictions from model (probabilities, logits or labels)

  • target (Tensor) – Ground truth values

  • average (str) –

    Defines the reduction that is applied. Should be one of the following:

    • 'micro' [default]: Calculate the metric globally, across all samples and classes.

    • 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).

    • 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn).

    • 'none' or None: Calculate the metric for each class separately, and return the metric for every class.

    • 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).

    Note

    What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_average.

    Note

    If 'none' and a given class doesn’t occur in the preds or target, the value for the class will be nan.

  • mdmc_average (Optional[str]) –

    Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter). Should be one of the following:

    • None [default]: Should be left unchanged if your data is not multi-dimensional multi-class.

    • 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then averaged over samples. The computation for each sample is done by treating the flattened extra axes ... (see Input types) as the N dimension within the sample, and computing the metric for the sample based on that.

    • 'global': In this case the N and ... dimensions of the inputs (see Input types) are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the average parameter applies as usual.

  • ignore_index (Optional[int]) – Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and average=None or 'none', the score for the ignored class will be returned as nan.

  • num_classes (Optional[int]) – Number of classes. Necessary for 'macro', 'weighted' and None average methods.

  • threshold (float) – Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

  • top_k (Optional[int]) –

    Number of highest probability or logit score predictions considered to find the correct label, relevant only for (multi-dimensional) multi-class inputs. The default value (None) will be interpreted as 1 for these inputs.

    Should be left at default (None) for all other types of inputs.

  • multiclass (Optional[bool]) – Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter’s documentation section for a more detailed explanation and examples.

Returns

precision and recall. Their shape depends on the average parameter

  • If average in ['micro', 'macro', 'weighted', 'samples'], they are a single element tensor

  • If average in ['none', None], they are a tensor of shape (C, ), where C stands for the number of classes

Return type

The function returns a tuple with two elements

Raises
  • ValueError – If average is not one of "micro", "macro", "weighted", "samples", "none" or None.

  • ValueError – If mdmc_average is not one of None, "samplewise", "global".

  • ValueError – If average is set but num_classes is not provided.

  • ValueError – If num_classes is set and ignore_index is not in the range [0, num_classes).

Example

>>> from torchmetrics.functional import precision_recall
>>> preds  = torch.tensor([2, 0, 2, 1])
>>> target = torch.tensor([1, 1, 2, 0])
>>> precision_recall(preds, target, average='macro', num_classes=3)
(tensor(0.1667), tensor(0.3333))
>>> precision_recall(preds, target, average='micro')
(tensor(0.2500), tensor(0.2500))

precision_recall_curve [func]

torchmetrics.functional.precision_recall_curve(preds, target, num_classes=None, pos_label=None, sample_weights=None)[source]

Computes precision-recall pairs for different thresholds.

Parameters
  • preds (Tensor) – predictions from model (probabilities)

  • target (Tensor) – ground truth labels

  • num_classes (Optional[int]) – integer with number of classes for multi-label and multiclass problems. Should be set to None for binary problems

  • pos_label (Optional[int]) – integer determining the positive class. Default is None which for binary problem is translate to 1. For multiclass problems this argument should not be set as we iteratively change it in the range [0,num_classes-1]

  • sample_weights (Optional[Sequence]) – sample weights for each data point

Return type

Union[Tuple[Tensor, Tensor, Tensor], Tuple[List[Tensor], List[Tensor], List[Tensor]]]

Returns

3-element tuple containing

precision:

tensor where element i is the precision of predictions with score >= thresholds[i] and the last element is 1. If multiclass, this is a list of such tensors, one for each class.

recall:

tensor where element i is the recall of predictions with score >= thresholds[i] and the last element is 0. If multiclass, this is a list of such tensors, one for each class.

thresholds:

Thresholds used for computing precision/recall scores

Raises
  • ValueError – If preds and target don’t have the same number of dimensions, or one additional dimension for preds.

  • ValueError – If the number of classes deduced from preds is not the same as the num_classes provided.

Example (binary case):
>>> from torchmetrics.functional import precision_recall_curve
>>> pred = torch.tensor([0, 1, 2, 3])
>>> target = torch.tensor([0, 1, 1, 0])
>>> precision, recall, thresholds = precision_recall_curve(pred, target, pos_label=1)
>>> precision
tensor([0.6667, 0.5000, 0.0000, 1.0000])
>>> recall
tensor([1.0000, 0.5000, 0.0000, 0.0000])
>>> thresholds
tensor([1, 2, 3])
Example (multiclass case):
>>> pred = torch.tensor([[0.75, 0.05, 0.05, 0.05, 0.05],
...                      [0.05, 0.75, 0.05, 0.05, 0.05],
...                      [0.05, 0.05, 0.75, 0.05, 0.05],
...                      [0.05, 0.05, 0.05, 0.75, 0.05]])
>>> target = torch.tensor([0, 1, 3, 2])
>>> precision, recall, thresholds = precision_recall_curve(pred, target, num_classes=5)
>>> precision   
[tensor([1., 1.]), tensor([1., 1.]), tensor([0.2500, 0.0000, 1.0000]),
 tensor([0.2500, 0.0000, 1.0000]), tensor([0., 1.])]
>>> recall
[tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0., 0.]), tensor([1., 0., 0.]), tensor([nan, 0.])]
>>> thresholds
[tensor([0.7500]), tensor([0.7500]), tensor([0.0500, 0.7500]), tensor([0.0500, 0.7500]), tensor([0.0500])]

recall [func]

torchmetrics.functional.recall(preds, target, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)[source]

Computes Recall

\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}

Where \text{TP} and \text{FN} represent the number of true positives and false negatives respecitively. With the use of top_k parameter, this metric can generalize to Recall@K.

The reduction method (how the recall scores are aggregated) is controlled by the average parameter, and additionally by the mdmc_average parameter in the multi-dimensional multi-class case. Accepts all inputs listed in Input types.

Parameters
  • preds (Tensor) – Predictions from model (probabilities, logits or labels)

  • target (Tensor) – Ground truth values

  • average (str) –

    Defines the reduction that is applied. Should be one of the following:

    • 'micro' [default]: Calculate the metric globally, across all samples and classes.

    • 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).

    • 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn).

    • 'none' or None: Calculate the metric for each class separately, and return the metric for every class.

    • 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).

    Note

    What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_average.

    Note

    If 'none' and a given class doesn’t occur in the preds or target, the value for the class will be nan.

  • mdmc_average (Optional[str]) –

    Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter). Should be one of the following:

    • None [default]: Should be left unchanged if your data is not multi-dimensional multi-class.

    • 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then averaged over samples. The computation for each sample is done by treating the flattened extra axes ... (see Input types) as the N dimension within the sample, and computing the metric for the sample based on that.

    • 'global': In this case the N and ... dimensions of the inputs (see Input types) are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the average parameter applies as usual.

  • ignore_index (Optional[int]) – Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and average=None or 'none', the score for the ignored class will be returned as nan.

  • num_classes (Optional[int]) – Number of classes. Necessary for 'macro', 'weighted' and None average methods.

  • threshold (float) – Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

  • top_k (Optional[int]) –

    Number of highest probability or logit score predictions considered to find the correct label, relevant only for (multi-dimensional) multi-class inputs. The default value (None) will be interpreted as 1 for these inputs.

    Should be left at default (None) for all other types of inputs.

  • multiclass (Optional[bool]) – Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter’s documentation section for a more detailed explanation and examples.

Return type

Tensor

Returns

The shape of the returned tensor depends on the average parameter

  • If average in ['micro', 'macro', 'weighted', 'samples'], a one-element tensor will be returned

  • If average in ['none', None], the shape will be (C,), where C stands for the number of classes

Raises
  • ValueError – If average is not one of "micro", "macro", "weighted", "samples", "none" or None.

  • ValueError – If mdmc_average is not one of None, "samplewise", "global".

  • ValueError – If average is set but num_classes is not provided.

  • ValueError – If num_classes is set and ignore_index is not in the range [0, num_classes).

Example

>>> from torchmetrics.functional import recall
>>> preds  = torch.tensor([2, 0, 2, 1])
>>> target = torch.tensor([1, 1, 2, 0])
>>> recall(preds, target, average='macro', num_classes=3)
tensor(0.3333)
>>> recall(preds, target, average='micro')
tensor(0.2500)

select_topk [func]

torchmetrics.utilities.data.select_topk(prob_tensor, topk=1, dim=1)[source]

Convert a probability tensor to binary by selecting top-k highest entries.

Parameters
  • prob_tensor (Tensor) – dense tensor of shape [..., C, ...], where C is in the position defined by the dim argument

  • topk (int) – number of highest entries to turn into 1s

  • dim (int) – dimension on which to compare entries

Return type

Tensor

Returns

A binary tensor of the same shape as the input tensor of type torch.int32

Example

>>> x = torch.tensor([[1.1, 2.0, 3.0], [2.0, 1.0, 0.5]])
>>> select_topk(x, topk=2)
tensor([[0, 1, 1],
        [1, 1, 0]], dtype=torch.int32)

specificity [func]

torchmetrics.functional.specificity(preds, target, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)[source]

Computes Specificity

\text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}}

Where \text{TN} and \text{FP} represent the number of true negatives and false positives respecitively. With the use of top_k parameter, this metric can generalize to Specificity@K.

The reduction method (how the specificity scores are aggregated) is controlled by the average parameter, and additionally by the mdmc_average parameter in the multi-dimensional multi-class case. Accepts all inputs listed in Input types.

Parameters
  • preds (Tensor) – Predictions from model (probabilities, or labels)

  • target (Tensor) – Ground truth values

  • average (str) –

    Defines the reduction that is applied. Should be one of the following:

    • 'micro' [default]: Calculate the metric globally, across all samples and classes.

    • 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).

    • 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tn + fp).

    • 'none' or None: Calculate the metric for each class separately, and return the metric for every class.

    • 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).

    Note

    What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_average.

    Note

    If 'none' and a given class doesn’t occur in the preds or target, the value for the class will be nan.

  • mdmc_average (Optional[str]) –

    Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter). Should be one of the following:

    • None [default]: Should be left unchanged if your data is not multi-dimensional multi-class.

    • 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then averaged over samples. The computation for each sample is done by treating the flattened extra axes ... (see Input types) as the N dimension within the sample, and computing the metric for the sample based on that.

    • 'global': In this case the N and ... dimensions of the inputs (see Input types) are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the average parameter applies as usual.

  • ignore_index (Optional[int]) – Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and average=None or 'none', the score for the ignored class will be returned as nan.

  • num_classes (Optional[int]) – Number of classes. Necessary for 'macro', 'weighted' and None average methods.

  • threshold (float) – Threshold probability value for transforming probability predictions to binary (0,1) predictions, in the case of binary or multi-label inputs

  • top_k (Optional[int]) –

    Number of highest probability entries for each sample to convert to 1s - relevant only for inputs with probability predictions. If this parameter is set for multi-label inputs, it will take precedence over threshold. For (multi-dim) multi-class inputs, this parameter defaults to 1.

    Should be left unset (None) for inputs with label predictions.

  • multiclass (Optional[bool]) – Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter’s documentation section for a more detailed explanation and examples.

Return type

Tensor

Returns

The shape of the returned tensor depends on the average parameter

  • If average in ['micro', 'macro', 'weighted', 'samples'], a one-element tensor will be returned

  • If average in ['none', None], the shape will be (C,), where C stands for the number of classes

Raises
  • ValueError – If average is not one of "micro", "macro", "weighted", "samples", "none" or None.

  • ValueError – If mdmc_average is not one of None, "samplewise", "global".

  • ValueError – If average is set but num_classes is not provided.

  • ValueError – If num_classes is set and ignore_index is not in the range [0, num_classes).

Example

>>> from torchmetrics.functional import specificity
>>> preds  = torch.tensor([2, 0, 2, 1])
>>> target = torch.tensor([1, 1, 2, 0])
>>> specificity(preds, target, average='macro', num_classes=3)
tensor(0.6111)
>>> specificity(preds, target, average='micro')
tensor(0.6250)

stat_scores [func]

torchmetrics.functional.stat_scores(preds, target, reduce='micro', mdmc_reduce=None, num_classes=None, top_k=None, threshold=0.5, multiclass=None, ignore_index=None)[source]

Computes the number of true positives, false positives, true negatives, false negatives. Related to Type I and Type II errors and the confusion matrix.

The reduction method (how the statistics are aggregated) is controlled by the reduce parameter, and additionally by the mdmc_reduce parameter in the multi-dimensional multi-class case. Accepts all inputs listed in Input types.

Parameters
  • preds (Tensor) – Predictions from model (probabilities, logits or labels)

  • target (Tensor) – Ground truth values

  • threshold (float) – Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

  • top_k (Optional[int]) –

    Number of highest probability or logit score predictions considered to find the correct label, relevant only for (multi-dimensional) multi-class inputs. The default value (None) will be interpreted as 1 for these inputs.

    Should be left at default (None) for all other types of inputs.

  • reduce (str) –

    Defines the reduction that is applied. Should be one of the following:

    • 'micro' [default]: Counts the statistics by summing over all [sample, class] combinations (globally). Each statistic is represented by a single integer.

    • 'macro': Counts the statistics for each class separately (over all samples). Each statistic is represented by a (C,) tensor. Requires num_classes to be set.

    • 'samples': Counts the statistics for each sample separately (over all classes). Each statistic is represented by a (N, ) 1d tensor.

    Note

    What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_reduce.

  • num_classes (Optional[int]) – Number of classes. Necessary for (multi-dimensional) multi-class or multi-label data.

  • ignore_index (Optional[int]) – Specify a class (label) to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and reduce='macro', the class statistics for the ignored class will all be returned as -1.

  • mdmc_reduce (Optional[str]) –

    Defines how the multi-dimensional multi-class inputs are handeled. Should be one of the following:

    • None [default]: Should be left unchanged if your data is not multi-dimensional multi-class (see Input types for the definition of input types).

    • 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then the outputs are concatenated together. In each sample the extra axes ... are flattened to become the sub-sample axis, and statistics for each sample are computed by treating the sub-sample axis as the N axis for that sample.

    • 'global': In this case the N and ... dimensions of the inputs are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the reduce parameter applies as usual.

  • multiclass (Optional[bool]) – Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter’s documentation section for a more detailed explanation and examples.

Return type

Tensor

Returns

The metric returns a tensor of shape (..., 5), where the last dimension corresponds to [tp, fp, tn, fn, sup] (sup stands for support and equals tp + fn). The shape depends on the reduce and mdmc_reduce (in case of multi-dimensional multi-class data) parameters:

  • If the data is not multi-dimensional multi-class, then

    • If reduce='micro', the shape will be (5, )

    • If reduce='macro', the shape will be (C, 5), where C stands for the number of classes

    • If reduce='samples', the shape will be (N, 5), where N stands for the number of samples

  • If the data is multi-dimensional multi-class and mdmc_reduce='global', then

    • If reduce='micro', the shape will be (5, )

    • If reduce='macro', the shape will be (C, 5)

    • If reduce='samples', the shape will be (N*X, 5), where X stands for the product of sizes of all “extra” dimensions of the data (i.e. all dimensions except for C and N)

  • If the data is multi-dimensional multi-class and mdmc_reduce='samplewise', then

    • If reduce='micro', the shape will be (N, 5)

    • If reduce='macro', the shape will be (N, C, 5)

    • If reduce='samples', the shape will be (N, X, 5)

Raises
  • ValueError – If reduce is none of "micro", "macro" or "samples".

  • ValueError – If mdmc_reduce is none of None, "samplewise", "global".

  • ValueError – If reduce is set to "macro" and num_classes is not provided.

  • ValueError – If num_classes is set and ignore_index is not in the range [0, num_classes).

  • ValueError – If ignore_index is used with binary data.

  • ValueError – If inputs are multi-dimensional multi-class and mdmc_reduce is not provided.

Example

>>> from torchmetrics.functional import stat_scores
>>> preds  = torch.tensor([1, 0, 2, 1])
>>> target = torch.tensor([1, 1, 2, 0])
>>> stat_scores(preds, target, reduce='macro', num_classes=3)
tensor([[0, 1, 2, 1, 1],
        [1, 1, 1, 1, 2],
        [1, 0, 3, 0, 1]])
>>> stat_scores(preds, target, reduce='micro')
tensor([2, 2, 6, 2, 4])

to_categorical [func]

torchmetrics.utilities.data.to_categorical(x, argmax_dim=1)[source]

Converts a tensor of probabilities to a dense label tensor.

Parameters
  • x (Tensor) – probabilities to get the categorical label [N, d1, d2, …]

  • argmax_dim (int) – dimension to apply

Return type

Tensor

Returns

A tensor with categorical labels [N, d2, …]

Example

>>> x = torch.tensor([[0.2, 0.5], [0.9, 0.1]])
>>> to_categorical(x)
tensor([1, 0])

to_onehot [func]

torchmetrics.utilities.data.to_onehot(label_tensor, num_classes=None)[source]

Converts a dense label tensor to one-hot format.

Parameters
  • label_tensor (Tensor) – dense label tensor, with shape [N, d1, d2, …]

  • num_classes (Optional[int]) – number of classes C

Return type

Tensor

Returns

A sparse label tensor with shape [N, C, d1, d2, …]

Example

>>> x = torch.tensor([1, 2, 3])
>>> to_onehot(x)
tensor([[0, 1, 0, 0],
        [0, 0, 1, 0],
        [0, 0, 0, 1]])

Image Metrics

image_gradients [func]

torchmetrics.functional.image_gradients(img)[source]

Computes Gradient Computation of Image of a given image using finite difference.

Parameters

img (Tensor) – An (N, C, H, W) input tensor where C is the number of image channels

Return type

Tuple[Tensor, Tensor]

Returns

Tuple of (dy, dx) with each gradient of shape [N, C, H, W]

Raises

Example

>>> from torchmetrics.functional import image_gradients
>>> image = torch.arange(0, 1*1*5*5, dtype=torch.float32)
>>> image = torch.reshape(image, (1, 1, 5, 5))
>>> dy, dx = image_gradients(image)
>>> dy[0, 0, :, :]
tensor([[5., 5., 5., 5., 5.],
        [5., 5., 5., 5., 5.],
        [5., 5., 5., 5., 5.],
        [5., 5., 5., 5., 5.],
        [0., 0., 0., 0., 0.]])

Note

The implementation follows the 1-step finite difference method as followed by the TF implementation. The values are organized such that the gradient of [I(x+1, y)-[I(x, y)]] are at the (x, y) location

psnr [func]

torchmetrics.functional.psnr(preds, target, data_range=None, base=10.0, reduction='elementwise_mean', dim=None)[source]

Computes the peak signal-to-noise ratio.

Parameters
  • preds (Tensor) – estimated signal

  • target (Tensor) – groun truth signal

  • data_range (Optional[float]) – the range of the data. If None, it is determined from the data (max - min). data_range must be given when dim is not None.

  • base (float) – a base of a logarithm to use (default: 10)

  • reduction (str) –

    a method to reduce metric score over labels.

    • 'elementwise_mean': takes the mean (default)

    • 'sum': takes the sum

    • 'none': no reduction will be applied

  • dim (Union[int, Tuple[int, …], None]) – Dimensions to reduce PSNR scores over provided as either an integer or a list of integers. Default is None meaning scores will be reduced across all dimensions.

Return type

Tensor

Returns

Tensor with PSNR score

Raises

ValueError – If dim is not None and data_range is not provided.

Example

>>> from torchmetrics.functional import psnr
>>> pred = torch.tensor([[0.0, 1.0], [2.0, 3.0]])
>>> target = torch.tensor([[3.0, 2.0], [1.0, 0.0]])
>>> psnr(pred, target)
tensor(2.5527)

Note

Half precision is only support on GPU for this metric

ssim [func]

torchmetrics.functional.ssim(preds, target, kernel_size=(11, 11), sigma=(1.5, 1.5), reduction='elementwise_mean', data_range=None, k1=0.01, k2=0.03)[source]

Computes Structual Similarity Index Measure.

Parameters
  • preds (Tensor) – estimated image

  • target (Tensor) – ground truth image

  • kernel_size (Sequence[int]) – size of the gaussian kernel (default: (11, 11))

  • sigma (Sequence[float]) – Standard deviation of the gaussian kernel (default: (1.5, 1.5))

  • reduction (str) –

    a method to reduce metric score over labels.

    • 'elementwise_mean': takes the mean (default)

    • 'sum': takes the sum

    • 'none': no reduction will be applied

  • data_range (Optional[float]) – Range of the image. If None, it is determined from the image (max - min)

  • k1 (float) – Parameter of SSIM. Default: 0.01

  • k2 (float) – Parameter of SSIM. Default: 0.03

Return type

Tensor

Returns

Tensor with SSIM score

Raises
  • TypeError – If preds and target don’t have the same data type.

  • ValueError – If preds and target don’t have BxCxHxW shape.

  • ValueError – If the length of kernel_size or sigma is not 2.

  • ValueError – If one of the elements of kernel_size is not an odd positive number.

  • ValueError – If one of the elements of sigma is not a positive number.

Example

>>> from torchmetrics.functional import ssim
>>> preds = torch.rand([16, 1, 16, 16])
>>> target = preds * 0.75
>>> ssim(preds, target)
tensor(0.9219)

Regression Metrics

cosine_similarity [func]

torchmetrics.functional.cosine_similarity(preds, target, reduction='sum')[source]

Computes the Cosine Similarity between targets and predictions:

cos_{sim}(x,y) = \frac{x \cdot y}{||x|| \cdot ||y||} =
\frac{\sum_{i=1}^n x_i y_i}{\sqrt{\sum_{i=1}^n x_i^2}\sqrt{\sum_{i=1}^n y_i^2}}

where y is a tensor of target values, and x is a tensor of predictions.

Parameters
  • preds (Tensor) – Predicted tensor with shape (N,d)

  • target (Tensor) – Ground truth tensor with shape (N,d)

  • reduction (str) – The method of reducing along the batch dimension using sum, mean or taking the individual scores

Example

>>> from torchmetrics.functional.regression import cosine_similarity
>>> target = torch.tensor([[1, 2, 3, 4],
...                        [1, 2, 3, 4]])
>>> preds = torch.tensor([[1, 2, 3, 4],
...                       [-1, -2, -3, -4]])
>>> cosine_similarity(preds, target, 'none')
tensor([ 1.0000, -1.0000])
Return type

Tensor

explained_variance [func]

torchmetrics.functional.explained_variance(preds, target, multioutput='uniform_average')[source]

Computes explained variance.

Parameters
  • preds (Tensor) – estimated labels

  • target (Tensor) – ground truth labels

  • multioutput (str) –

    Defines aggregation in the case of multiple output scores. Can be one of the following strings (default is ‘uniform_average’.):

    • ’raw_values’ returns full set of scores

    • ’uniform_average’ scores are uniformly averaged

    • ’variance_weighted’ scores are weighted by their individual variances

Example

>>> from torchmetrics.functional import explained_variance
>>> target = torch.tensor([3, -0.5, 2, 7])
>>> preds = torch.tensor([2.5, 0.0, 2, 8])
>>> explained_variance(preds, target)
tensor(0.9572)
>>> target = torch.tensor([[0.5, 1], [-1, 1], [7, -6]])
>>> preds = torch.tensor([[0, 2], [-1, 2], [8, -5]])
>>> explained_variance(preds, target, multioutput='raw_values')
tensor([0.9677, 1.0000])
Return type

Union[Tensor, Sequence[Tensor]]

mean_absolute_error [func]

torchmetrics.functional.mean_absolute_error(preds, target)[source]

Computes mean absolute error.

Parameters
  • preds (Tensor) – estimated labels

  • target (Tensor) – ground truth labels

Return type

Tensor

Returns

Tensor with MAE

Example

>>> from torchmetrics.functional import mean_absolute_error
>>> x = torch.tensor([0., 1, 2, 3])
>>> y = torch.tensor([0., 1, 2, 2])
>>> mean_absolute_error(x, y)
tensor(0.2500)

mean_absolute_percentage_error [func]

torchmetrics.functional.mean_absolute_percentage_error(preds, target)[source]

Computes mean absolute percentage error.

Parameters
  • preds (Tensor) – estimated labels

  • target (Tensor) – ground truth labels

Return type

Tensor

Returns

Tensor with MAPE

Note

The epsilon value is taken from scikit-learn’s implementation of MAPE.

Example

>>> from torchmetrics.functional import mean_absolute_percentage_error
>>> target = torch.tensor([1, 10, 1e6])
>>> preds = torch.tensor([0.9, 15, 1.2e6])
>>> mean_absolute_percentage_error(preds, target)
tensor(0.2667)

mean_squared_error [func]

torchmetrics.functional.mean_squared_error(preds, target, squared=True)[source]

Computes mean squared error.

Parameters
  • preds (Tensor) – estimated labels

  • target (Tensor) – ground truth labels

  • squared (bool) – returns RMSE value if set to False

Return type

Tensor

Returns

Tensor with MSE

Example

>>> from torchmetrics.functional import mean_squared_error
>>> x = torch.tensor([0., 1, 2, 3])
>>> y = torch.tensor([0., 1, 2, 2])
>>> mean_squared_error(x, y)
tensor(0.2500)

mean_squared_log_error [func]

torchmetrics.functional.mean_squared_log_error(preds, target)[source]

Computes mean squared log error.

Parameters
  • preds (Tensor) – estimated labels

  • target (Tensor) – ground truth labels

Return type

Tensor

Returns

Tensor with RMSLE

Example

>>> from torchmetrics.functional import mean_squared_log_error
>>> x = torch.tensor([0., 1, 2, 3])
>>> y = torch.tensor([0., 1, 2, 2])
>>> mean_squared_log_error(x, y)
tensor(0.0207)

Note

Half precision is only support on GPU for this metric

pearson_corrcoef [func]

torchmetrics.functional.pearson_corrcoef(preds, target)[source]

Computes pearson correlation coefficient.

Parameters
  • preds (Tensor) – estimated scores

  • target (Tensor) – ground truth scores

Example

>>> from torchmetrics.functional import pearson_corrcoef
>>> target = torch.tensor([3, -0.5, 2, 7])
>>> preds = torch.tensor([2.5, 0.0, 2, 8])
>>> pearson_corrcoef(preds, target)
tensor(0.9849)
Return type

Tensor

r2_score [func]

torchmetrics.functional.r2_score(preds, target, adjusted=0, multioutput='uniform_average')[source]

Computes r2 score also known as R2 Score_Coefficient Determination:

R^2 = 1 - \frac{SS_{res}}{SS_{tot}}

where SS_{res}=\sum_i (y_i - f(x_i))^2 is the sum of residual squares, and SS_{tot}=\sum_i (y_i - \bar{y})^2 is total sum of squares. Can also calculate adjusted r2 score given by

R^2_{adj} = 1 - \frac{(1-R^2)(n-1)}{n-k-1}

where the parameter k (the number of independent regressors) should be provided as the adjusted argument.

Parameters
  • preds (Tensor) – estimated labels

  • target (Tensor) – ground truth labels

  • adjusted (int) – number of independent regressors for calculating adjusted r2 score. Default 0 (standard r2 score).

  • multioutput (str) –

    Defines aggregation in the case of multiple output scores. Can be one of the following strings (default is 'uniform_average'.):

    • 'raw_values' returns full set of scores

    • 'uniform_average' scores are uniformly averaged

    • 'variance_weighted' scores are weighted by their individual variances

Raises
  • ValueError – If both preds and targets are not 1D or 2D tensors.

  • ValueError – If len(preds) is less than 2 since at least 2 sampels are needed to calculate r2 score.

  • ValueError – If multioutput is not one of raw_values, uniform_average or variance_weighted.

  • ValueError – If adjusted is not an integer greater than 0.

Example

>>> from torchmetrics.functional import r2_score
>>> target = torch.tensor([3, -0.5, 2, 7])
>>> preds = torch.tensor([2.5, 0.0, 2, 8])
>>> r2_score(preds, target)
tensor(0.9486)
>>> target = torch.tensor([[0.5, 1], [-1, 1], [7, -6]])
>>> preds = torch.tensor([[0, 2], [-1, 2], [8, -5]])
>>> r2_score(preds, target, multioutput='raw_values')
tensor([0.9654, 0.9082])
Return type

Tensor

spearman_corrcoef [func]

torchmetrics.functional.spearman_corrcoef(preds, target)[source]

where rg_x and rg_y are the rank associated to the variables x and y. Spearmans correlations coefficient corresponds to the standard pearsons correlation coefficient calculated on the rank variables.

Parameters
  • preds (Tensor) – estimated scores

  • target (Tensor) – ground truth scores

Example

>>> from torchmetrics.functional import spearman_corrcoef
>>> target = torch.tensor([3, -0.5, 2, 7])
>>> preds = torch.tensor([2.5, 0.0, 2, 8])
>>> spearman_corrcoef(preds, target)
tensor(1.0000)
Return type

Tensor

symmetric_mean_absolute_percentage_error [func]

torchmetrics.functional.symmetric_mean_absolute_percentage_error(preds, target)[source]

Computes symmetric mean absolute percentage error (SMAPE):

\text{SMAPE} = \frac{2}{n}\sum_1^n\frac{max(|   y_i - \hat{y_i} |}{| y_i | + | \hat{y_i} |, \epsilon)}

Where y is a tensor of target values, and \hat{y} is a tensor of predictions.

Parameters
  • preds (Tensor) – estimated labels

  • target (Tensor) – ground truth labels

Return type

Tensor

Returns

Tensor with SMAPE.

Example

>>> from torchmetrics.functional import symmetric_mean_absolute_percentage_error
>>> target = torch.tensor([1, 10, 1e6])
>>> preds = torch.tensor([0.9, 15, 1.2e6])
>>> symmetric_mean_absolute_percentage_error(preds, target)
tensor(0.2290)

tweedie_deviance_score [func]

torchmetrics.functional.tweedie_deviance_score(preds, targets, power=0.0)[source]

Computes the Tweedie Deviance Score between targets and predictions:

deviance\_score(\hat{y},y) =
\begin{cases}
(\hat{y} - y)^2, & \text{for }power=0\\
2 * (y * log(\frac{y}{\hat{y}}) + \hat{y} - y),  & \text{for }power=1\\
2 * (log(\frac{\hat{y}}{y}) + \frac{y}{\hat{y}} - 1),  & \text{for }power=2\\
2 * (\frac{(max(y,0))^{2}}{(1 - power)(2 - power)} - \frac{y(\hat{y})^{1 - power}}{1 - power} + \frac{(\hat{y})
    ^{2 - power}}{2 - power}), & \text{otherwise}
\end{cases}

where y is a tensor of targets values, and \hat{y} is a tensor of predictions.

Parameters
  • preds (Tensor) – Predicted tensor with shape (N,...)

  • targets (Tensor) – Ground truth tensor with shape (N,...)

  • power (float) –

    • power < 0 : Extreme stable distribution. (Requires: preds > 0.)

    • power = 0 : Normal distribution. (Requires: targets and preds can be any real numbers.)

    • power = 1 : Poisson distribution. (Requires: targets >= 0 and y_pred > 0.)

    • 1 < p < 2 : Compound Poisson distribution. (Requires: targets >= 0 and preds > 0.)

    • power = 2 : Gamma distribution. (Requires: targets > 0 and preds > 0.)

    • power = 3 : Inverse Gaussian distribution. (Requires: targets > 0 and preds > 0.)

    • otherwise : Positive stable distribution. (Requires: targets > 0 and preds > 0.)

Example

>>> from torchmetrics.functional import tweedie_deviance_score
>>> targets = torch.tensor([1.0, 2.0, 3.0, 4.0])
>>> preds = torch.tensor([4.0, 3.0, 2.0, 1.0])
>>> tweedie_deviance_score(preds, targets, power=2)
tensor(1.2083)
Return type

Tensor

Pairwise Metrics

pairwise_cosine_similarity [func]

torchmetrics.functional.pairwise_cosine_similarity(x, y=None, reduction=None, zero_diagonal=None)[source]

Calculates pairwise cosine similarity:

s_{cos}(x,y) = \frac{<x,y>}{||x|| \cdot ||y||}
             = \frac{\sum_{d=1}^D x_d \cdot y_d }{\sqrt{\sum_{d=1}^D x_i^2} \cdot \sqrt{\sum_{d=1}^D x_i^2}}

If both x and y are passed in, the calculation will be performed pairwise between the rows of x and y. If only x is passed in, the calculation will be performed between the rows of x.

Parameters
  • x (Tensor) – Tensor with shape [N, d]

  • y (Optional[Tensor]) – Tensor with shape [M, d], optional

  • reduction (Optional[str]) – reduction to apply along the last dimension. Choose between ‘mean’, ‘sum’ (applied along column dimension) or ‘none’, None for no reduction

  • zero_diagonal (Optional[bool]) – if the diagonal of the distance matrix should be set to 0. If only x is given this defaults to True else if y is also given it defaults to False

Return type

Tensor

Returns

A [N,N] matrix of distances if only x is given, else a [N,M] matrix

Example

>>> import torch
>>> from torchmetrics.functional import pairwise_cosine_similarity
>>> x = torch.tensor([[2, 3], [3, 5], [5, 8]], dtype=torch.float32)
>>> y = torch.tensor([[1, 0], [2, 1]], dtype=torch.float32)
>>> pairwise_cosine_similarity(x, y)
tensor([[0.5547, 0.8682],
        [0.5145, 0.8437],
        [0.5300, 0.8533]])
>>> pairwise_cosine_similarity(x)
tensor([[0.0000, 0.9989, 0.9996],
        [0.9989, 0.0000, 0.9998],
        [0.9996, 0.9998, 0.0000]])

pairwise_euclidean_distance [func]

torchmetrics.functional.pairwise_euclidean_distance(x, y=None, reduction=None, zero_diagonal=None)[source]

Calculates pairwise euclidean distances:

d_{euc}(x,y) = ||x - y||_2 = \sqrt{\sum_{d=1}^D (x_d - y_d)^2}

If both x and y are passed in, the calculation will be performed pairwise between the rows of x and y. If only x is passed in, the calculation will be performed between the rows of x.

Parameters
  • x (Tensor) – Tensor with shape [N, d]

  • y (Optional[Tensor]) – Tensor with shape [M, d], optional

  • reduction (Optional[str]) – reduction to apply along the last dimension. Choose between ‘mean’, ‘sum’ (applied along column dimension) or ‘none’, None for no reduction

  • zero_diagonal (Optional[bool]) – if the diagonal of the distance matrix should be set to 0. If only x is given this defaults to True else if y is also given it defaults to False

Return type

Tensor

Returns

A [N,N] matrix of distances if only x is given, else a [N,M] matrix

Example

>>> import torch
>>> from torchmetrics.functional import pairwise_euclidean_distance
>>> x = torch.tensor([[2, 3], [3, 5], [5, 8]], dtype=torch.float32)
>>> y = torch.tensor([[1, 0], [2, 1]], dtype=torch.float32)
>>> pairwise_euclidean_distance(x, y)
tensor([[3.1623, 2.0000],
        [5.3852, 4.1231],
        [8.9443, 7.6158]])
>>> pairwise_euclidean_distance(x)
tensor([[0.0000, 2.2361, 5.8310],
        [2.2361, 0.0000, 3.6056],
        [5.8310, 3.6056, 0.0000]])

pairwise_linear_similarity [func]

torchmetrics.functional.pairwise_linear_similarity(x, y=None, reduction=None, zero_diagonal=None)[source]

Calculates pairwise linear similarity:

s_{lin}(x,y) = <x,y> = \sum_{d=1}^D x_d \cdot y_d

If both x and y are passed in, the calculation will be performed pairwise between the rows of x and y. If only x is passed in, the calculation will be performed between the rows of x.

Parameters
  • x (Tensor) – Tensor with shape [N, d]

  • y (Optional[Tensor]) – Tensor with shape [M, d], optional

  • reduction (Optional[str]) – reduction to apply along the last dimension. Choose between ‘mean’, ‘sum’ (applied along column dimension) or ‘none’, None for no reduction

  • zero_diagonal (Optional[bool]) – if the diagonal of the distance matrix should be set to 0. If only x is given this defaults to True else if y is also given it defaults to False

Return type

Tensor

Returns

A [N,N] matrix of distances if only x is given, else a [N,M] matrix

Example

>>> import torch
>>> from torchmetrics.functional import pairwise_linear_similarity
>>> x = torch.tensor([[2, 3], [3, 5], [5, 8]], dtype=torch.float32)
>>> y = torch.tensor([[1, 0], [2, 1]], dtype=torch.float32)
>>> pairwise_linear_similarity(x, y)
tensor([[ 2.,  7.],
        [ 3., 11.],
        [ 5., 18.]])
>>> pairwise_linear_similarity(x)
tensor([[ 0., 21., 34.],
        [21.,  0., 55.],
        [34., 55.,  0.]])

pairwise_manhatten_distance [func]

torchmetrics.functional.pairwise_manhatten_distance(x, y=None, reduction=None, zero_diagonal=None)[source]

Calculates pairwise manhatten distance:

d_{man}(x,y) = ||x-y||_1 = \sum_{d=1}^D |x_d - y_d|

If both x and y are passed in, the calculation will be performed pairwise between the rows of x and y. If only x is passed in, the calculation will be performed between the rows of x.

Parameters
  • x (Tensor) – Tensor with shape [N, d]

  • y (Optional[Tensor]) – Tensor with shape [M, d], optional

  • reduction (Optional[str]) – reduction to apply along the last dimension. Choose between ‘mean’, ‘sum’ (applied along column dimension) or ‘none’, None for no reduction

  • zero_diagonal (Optional[bool]) – if the diagonal of the distance matrix should be set to 0. If only x is given this defaults to True else if y is also given it defaults to False

Return type

Tensor

Returns

A [N,N] matrix of distances if only x is given, else a [N,M] matrix

Example

>>> import torch
>>> from torchmetrics.functional import pairwise_manhatten_distance
>>> x = torch.tensor([[2, 3], [3, 5], [5, 8]], dtype=torch.float32)
>>> y = torch.tensor([[1, 0], [2, 1]], dtype=torch.float32)
>>> pairwise_manhatten_distance(x, y)
tensor([[ 4.,  2.],
        [ 7.,  5.],
        [12., 10.]])
>>> pairwise_manhatten_distance(x)
tensor([[0., 3., 8.],
        [3., 0., 5.],
        [8., 5., 0.]])

Retrieval

retrieval_average_precision [func]

torchmetrics.functional.retrieval_average_precision(preds, target)[source]

Computes average precision (for information retrieval), as explained in IR Average precision.

preds and target should be of the same shape and live on the same device. If no target is True, 0 is returned. target must be either bool or integers and preds must be float, otherwise an error is raised.

Parameters
  • preds (Tensor) – estimated probabilities of each document to be relevant.

  • target (Tensor) – ground truth about each document being relevant or not.

Return type

Tensor

Returns

a single-value tensor with the average precision (AP) of the predictions preds w.r.t. the labels target.

Example

>>> from torchmetrics.functional import retrieval_average_precision
>>> preds = tensor([0.2, 0.3, 0.5])
>>> target = tensor([True, False, True])
>>> retrieval_average_precision(preds, target)
tensor(0.8333)

retrieval_reciprocal_rank [func]

torchmetrics.functional.retrieval_reciprocal_rank(preds, target)[source]

Computes reciprocal rank (for information retrieval). See Mean Reciprocal Rank

preds and target should be of the same shape and live on the same device. If no target is True, 0 is returned. target must be either bool or integers and preds must be float, otherwise an error is raised.

Parameters
  • preds (Tensor) – estimated probabilities of each document to be relevant.

  • target (Tensor) – ground truth about each document being relevant or not.

Return type

Tensor

Returns

a single-value tensor with the reciprocal rank (RR) of the predictions preds wrt the labels target.

Example

>>> from torchmetrics.functional import retrieval_reciprocal_rank
>>> preds = torch.tensor([0.2, 0.3, 0.5])
>>> target = torch.tensor([False, True, False])
>>> retrieval_reciprocal_rank(preds, target)
tensor(0.5000)

retrieval_precision [func]

torchmetrics.functional.retrieval_precision(preds, target, k=None)[source]

Computes the precision metric (for information retrieval). Precision is the fraction of relevant documents among all the retrieved documents.

preds and target should be of the same shape and live on the same device. If no target is True, 0 is returned. target must be either bool or integers and preds must be float, otherwise an error is raised. If you want to measure Precision@K, k must be a positive integer.

Parameters
  • preds (Tensor) – estimated probabilities of each document to be relevant.

  • target (Tensor) – ground truth about each document being relevant or not.

  • k (Optional[int]) – consider only the top k elements (default: None, which considers them all)

Return type

Tensor

Returns

a single-value tensor with the precision (at k) of the predictions preds w.r.t. the labels target.

Raises

ValueError – If k parameter is not None or an integer larger than 0

Example

>>> preds = tensor([0.2, 0.3, 0.5])
>>> target = tensor([True, False, True])
>>> retrieval_precision(preds, target, k=2)
tensor(0.5000)

retrieval_r_precision [func]

torchmetrics.functional.retrieval_r_precision(preds, target)[source]

Computes the r-precision metric (for information retrieval). R-Precision is the fraction of relevant documents among all the top k retrieved documents where k is equal to the total number of relevant documents.

preds and target should be of the same shape and live on the same device. If no target is True, 0 is returned. target must be either bool or integers and preds must be float, otherwise an error is raised. If you want to measure Precision@K, k must be a positive integer.

Parameters
  • preds (Tensor) – estimated probabilities of each document to be relevant.

  • target (Tensor) – ground truth about each document being relevant or not.

Return type

Tensor

Returns

a single-value tensor with the r-precision of the predictions preds w.r.t. the labels target.

Example

>>> preds = tensor([0.2, 0.3, 0.5])
>>> target = tensor([True, False, True])
>>> retrieval_r_precision(preds, target)
tensor(0.5000)

retrieval_recall [func]

torchmetrics.functional.retrieval_recall(preds, target, k=None)[source]

Computes the recall metric (for information retrieval). Recall is the fraction of relevant documents retrieved among all the relevant documents.

preds and target should be of the same shape and live on the same device. If no target is True, 0 is returned. target must be either bool or integers and preds must be float, otherwise an error is raised. If you want to measure Recall@K, k must be a positive integer.

Parameters
  • preds (Tensor) – estimated probabilities of each document to be relevant.

  • target (Tensor) – ground truth about each document being relevant or not.

  • k (Optional[int]) – consider only the top k elements (default: None, which considers them all)

Return type

Tensor

Returns

a single-value tensor with the recall (at k) of the predictions preds w.r.t. the labels target.

Raises

ValueError – If k parameter is not None or an integer larger than 0

Example

>>> from  torchmetrics.functional import retrieval_recall
>>> preds = tensor([0.2, 0.3, 0.5])
>>> target = tensor([True, False, True])
>>> retrieval_recall(preds, target, k=2)
tensor(0.5000)

retrieval_fall_out [func]

torchmetrics.functional.retrieval_fall_out(preds, target, k=None)[source]

Computes the Fall-out (for information retrieval), as explained in IR Fall-out Fall-out is the fraction of non-relevant documents retrieved among all the non-relevant documents.

preds and target should be of the same shape and live on the same device. If no target is True, 0 is returned. target must be either bool or integers and preds must be float, otherwise an error is raised. If you want to measure Fall-out@K, k must be a positive integer.

Parameters
  • preds (Tensor) – estimated probabilities of each document to be relevant.

  • target (Tensor) – ground truth about each document being relevant or not.

  • k (Optional[int]) – consider only the top k elements (default: None, which considers them all)

Return type

Tensor

Returns

a single-value tensor with the fall-out (at k) of the predictions preds w.r.t. the labels target.

Raises

ValueError – If k parameter is not None or an integer larger than 0

Example

>>> from  torchmetrics.functional import retrieval_fall_out
>>> preds = tensor([0.2, 0.3, 0.5])
>>> target = tensor([True, False, True])
>>> retrieval_fall_out(preds, target, k=2)
tensor(1.)

retrieval_normalized_dcg [func]

torchmetrics.functional.retrieval_normalized_dcg(preds, target, k=None)[source]

Computes Normalized Discounted Cumulative Gain (for information retrieval).

preds and target should be of the same shape and live on the same device. target must be either bool or integers and preds must be float, otherwise an error is raised.

Parameters
  • preds (Tensor) – estimated probabilities of each document to be relevant.

  • target (Tensor) – ground truth about each document relevance.

  • k (Optional[int]) – consider only the top k elements (default: None, which considers them all)

Return type

Tensor

Returns

a single-value tensor with the nDCG of the predictions preds w.r.t. the labels target.

Raises

ValueError – If k parameter is not None or an integer larger than 0

Example

>>> from torchmetrics.functional import retrieval_normalized_dcg
>>> preds = torch.tensor([.1, .2, .3, 4, 70])
>>> target = torch.tensor([10, 0, 0, 1, 5])
>>> retrieval_normalized_dcg(preds, target)
tensor(0.6957)

retrieval_hit_rate [func]

torchmetrics.functional.retrieval_hit_rate(preds, target, k=None)[source]

Computes the hit rate (for information retrieval). The hit rate is 1.0 if there is at least one relevant document among all the top k retrieved documents.

preds and target should be of the same shape and live on the same device. If no target is True, 0 is returned. target must be either bool or integers and preds must be float, otherwise an error is raised. If you want to measure HitRate@K, k must be a positive integer.

Parameters
  • preds (Tensor) – estimated probabilities of each document to be relevant.

  • target (Tensor) – ground truth about each document being relevant or not.

  • k (Optional[int]) – consider only the top k elements (default: None, which considers them all)

Return type

Tensor

Returns

a single-value tensor with the hit rate (at k) of the predictions preds w.r.t. the labels target.

Raises

ValueError – If k parameter is not None or an integer larger than 0

Example

>>> preds = tensor([0.2, 0.3, 0.5])
>>> target = tensor([True, False, True])
>>> retrieval_hit_rate(preds, target, k=2)
tensor(1.)

Text

bert_score [func]

torchmetrics.functional.bert_score(predictions, references, model_name_or_path=None, num_layers=None, all_layers=False, model=None, user_tokenizer=None, user_forward_fn=None, verbose=False, idf=False, device=None, max_length=512, batch_size=64, num_threads=4, return_hash=False, lang='en', rescale_with_baseline=False, baseline_path=None, baseline_url=None)[source]

Bert_score Evaluating Text Generation leverages the pre-trained contextual embeddings from BERT and matches words in candidate and reference sentences by cosine similarity. It has been shown to correlate with human judgment on sentence-level and system-level evaluation. Moreover, BERTScore computes precision, recall, and F1 measure, which can be useful for evaluating different language generation tasks.

This implemenation follows the original implementation from BERT_score

Parameters
  • predictions (Union[List[str], Dict[str, Tensor]]) – Either an iterable of predicted sentences or a Dict[str, torch.Tensor] containing input_ids and attention_mask torch.Tensor.

  • references (Union[List[str], Dict[str, Tensor]]) – Either an iterable of target sentences or a Dict[str, torch.Tensor] containing input_ids and attention_mask torch.Tensor.

  • model_name_or_path (Optional[str]) – A name or a model path used to load transformers pretrained model.

  • num_layers (Optional[int]) – A layer of representation to use.

  • all_layers (bool) – An indication of whether the representation from all model’s layers should be used. If all_layers = True, the argument num_layers is ignored.

  • model (Optional[Module]) – A user’s own model. Must be of torch.nn.Module instance.

  • user_tokenizer (Optional[Any]) – A user’s own tokenizer used with the own model. This must be an instance with the __call__ method. This method must take an iterable of sentences (List[str]) and must return a python dictionary containing “input_ids” and “attention_mask” represented by torch.Tensor. It is up to the user’s model of whether “input_ids” is a torch.Tensor of input ids or embedding vectors. This tokenizer must prepend an equivalent of [CLS] token and append an equivalent of [SEP] token as transformers tokenizer does.

  • user_forward_fn (Optional[Callable[[Module, Dict[str, Tensor]], Tensor]]) – A user’s own forward function used in a combination with user_model. This function must take user_model and a python dictionary of containing “input_ids” and “attention_mask” represented by torch.Tensor as an input and return the model’s output represented by the single torch.Tensor.

  • verbose (bool) – An indication of whether a progress bar to be displayed during the embeddings calculation.

  • idf (bool) – An indication of whether normalization using inverse document frequencies should be used.

  • device (Union[str, device, None]) – A device to be used for calculation.

  • max_length (int) – A maximum length of input sequences. Sequences longer than max_length are to be trimmed.

  • batch_size (int) – A batch size used for model processing.

  • num_threads (int) – A number of threads to use for a dataloader.

  • return_hash (bool) – An indication of whether the correspodning hash_code should be returned.

  • lang (str) – A language of input sentences. It is used when the scores are rescaled with a baseline.

  • rescale_with_baseline (bool) – An indication of whether bertscore should be rescaled with a pre-computed baseline. When a pretrained model from transformers model is used, the corresponding baseline is downloaded from the original bert-score package from BERT_score if available. In other cases, please specify a path to the baseline csv/tsv file, which must follow the formatting of the files from BERT_score

  • baseline_path (Optional[str]) – A path to the user’s own local csv/tsv file with the baseline scale.

  • baseline_url (Optional[str]) – A url path to the user’s own csv/tsv file with the baseline scale.

Return type

Dict[str, Union[List[float], str]]

Returns

Python dictionary containing the keys precision, recall and f1 with corresponding values.

Raises
  • ValueError – If len(predictions) != len(references).

  • ValueError – If tqdm package is required and not installed.

  • ValueError – If transformers package is required and not installed.

  • ValueError – If num_layer is larger than the number of the model layers.

  • ValueError – If invalid input is provided.

Example

>>> predictions = ["hello there", "general kenobi"]
>>> references = ["hello there", "master kenobi"]
>>> bert_score(predictions=predictions, references=references, lang="en")  
{'precision': [0.99..., 0.99...],
 'recall': [0.99..., 0.99...],
 'f1': [0.99..., 0.99...]}

bleu_score [func]

torchmetrics.functional.bleu_score(reference_corpus, translate_corpus, n_gram=4, smooth=False)[source]

Calculate BLEU score of machine translated text with one or more references.

Parameters
  • reference_corpus (Sequence[Sequence[Sequence[str]]]) – An iterable of iterables of reference corpus

  • translate_corpus (Sequence[Sequence[str]]) – An iterable of machine translated corpus

  • n_gram (int) – Gram value ranged from 1 to 4 (Default 4)

  • smooth (bool) – Whether or not to apply smoothing – see [2]

Return type

Tensor

Returns

Tensor with BLEU Score

Example

>>> from torchmetrics.functional import bleu_score
>>> translate_corpus = ['the cat is on the mat'.split()]
>>> reference_corpus = [['there is a cat on the mat'.split(), 'a cat is on the mat'.split()]]
>>> bleu_score(reference_corpus, translate_corpus)
tensor(0.7598)

References

[1] BLEU: a Method for Automatic Evaluation of Machine Translation by Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu BLEU

[2] Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics by Chin-Yew Lin and Franz Josef Och Machine Translation Evolution

char_error_rate [func]

torchmetrics.functional.char_error_rate(predictions, references)[source]

character error rate is a common metric of the performance of an automatic speech recognition system. This value indicates the percentage of characters that were incorrectly predicted. The lower the value, the better the performance of the ASR system with a CER of 0 being a perfect score. :type _sphinx_paramlinks_torchmetrics.functional.char_error_rate.predictions: Union[str, List[str]] :param _sphinx_paramlinks_torchmetrics.functional.char_error_rate.predictions: Transcription(s) to score as a string or list of strings :type _sphinx_paramlinks_torchmetrics.functional.char_error_rate.references: Union[str, List[str]] :param _sphinx_paramlinks_torchmetrics.functional.char_error_rate.references: Reference(s) for each speech input as a string or list of strings

Return type

Tensor

Returns

(Tensor) Character error rate

Examples

>>> predictions = ["this is the prediction", "there is an other sample"]
>>> references = ["this is the reference", "there is another one"]
>>> char_error_rate(predictions=predictions, references=references)
tensor(0.3415)

rouge_score [func]

torchmetrics.functional.rouge_score(preds, targets, use_stemmer=False, rouge_keys=('rouge1', 'rouge2', 'rougeL', 'rougeLsum'))[source]

Calculate Calculate Rouge Score , used for automatic summarization.

Parameters
  • preds (Union[str, List[str]]) – An iterable of predicted sentences.

  • targets (Union[str, List[str]]) – An iterable of target sentences.

  • use_stemmer (bool) – Use Porter stemmer to strip word suffixes to improve matching.

  • rouge_keys (Union[str, Tuple[str, …]]) – A list of rouge types to calculate. Keys that are allowed are rougeL, rougeLsum, and rouge1 through rouge9.

Return type

Dict[str, Tensor]

Returns

Python dictionary of rouge scores for each input rouge key.

Example

>>> targets = "Is your name John".split()
>>> preds = "My name is John".split()
>>> from pprint import pprint
>>> pprint(rouge_score(preds, targets))  
{'rouge1_fmeasure': 0.25,
 'rouge1_precision': 0.25,
 'rouge1_recall': 0.25,
 'rouge2_fmeasure': 0.0,
 'rouge2_precision': 0.0,
 'rouge2_recall': 0.0,
 'rougeL_fmeasure': 0.25,
 'rougeL_precision': 0.25,
 'rougeL_recall': 0.25,
 'rougeLsum_fmeasure': 0.25,
 'rougeLsum_precision': 0.25,
 'rougeLsum_recall': 0.25}
Raises
  • ValueError – If the python package nltk is not installed.

  • ValueError – If any of the rouge_keys does not belong to the allowed set of keys.

References

[1] ROUGE: A Package for Automatic Evaluation of Summaries by Chin-Yew Lin. https://aclanthology.org/W04-1013/

sacre_bleu_score [func]

torchmetrics.functional.sacre_bleu_score(reference_corpus, translate_corpus, n_gram=4, smooth=False, tokenize='13a', lowercase=False)[source]

Calculate BLEU score [1] of machine translated text with one or more references. This implementation follows the behaviour of SacreBLEU [2] implementation from https://github.com/mjpost/sacrebleu.

Parameters
  • reference_corpus (Sequence[Sequence[str]]) – An iterable of iterables of reference corpus

  • translate_corpus (Sequence[str]) – An iterable of machine translated corpus

  • n_gram (int) – Gram value ranged from 1 to 4 (Default 4)

  • smooth (bool) – Whether or not to apply smoothing – see [2]

  • tokenize (Literal[‘none’, ‘13a’, ‘zh’, ‘intl’, ‘char’]) – Tokenization technique to be used. (Default ‘13a’) Supported tokenization: [‘none’, ‘13a’, ‘zh’, ‘intl’, ‘char’]

  • lowercase (bool) – If True, BLEU score over lowercased text is calculated.

Return type

Tensor

Returns

Tensor with BLEU Score

Example

>>> from torchmetrics.functional import sacre_bleu_score
>>> translate_corpus = ['the cat is on the mat']
>>> reference_corpus = [['there is a cat on the mat', 'a cat is on the mat']]
>>> sacre_bleu_score(reference_corpus, translate_corpus)
tensor(0.7598)

References

[1] BLEU: a Method for Automatic Evaluation of Machine Translation by Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu BLEU

[2] A Call for Clarity in Reporting BLEU Scores by Matt Post.

[3] Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics by Chin-Yew Lin and Franz Josef Och Machine Translation Evolution

wer [func]

torchmetrics.functional.wer(predictions, references, concatenate_texts=None)[source]

Word error rate (WER) is a common metric of the performance of an automatic speech recognition system. This value indicates the percentage of words that were incorrectly predicted. The lower the value, the better the performance of the ASR system with a WER of 0 being a perfect score.

Parameters
  • predictions (Union[str, List[str]]) – Transcription(s) to score as a string or list of strings

  • references (Union[str, List[str]]) – Reference(s) for each speech input as a string or list of strings

  • concatenate_texts (Optional[bool]) – Whether to concatenate all input texts or compute WER iteratively This argument is deprecated in v0.6 and it will be removed in v0.7.

Return type

Tensor

Returns

(Tensor) Word error rate

Examples

>>> predictions = ["this is the prediction", "there is an other sample"]
>>> references = ["this is the reference", "there is another one"]
>>> wer(predictions=predictions, references=references)
tensor(0.5000)
Read the Docs v: stable
Versions
latest
stable
v0.6.0
v0.5.1
v0.5.0
v0.4.1
v0.4.0
v0.3.2
v0.3.1
v0.3.0
v0.2.0
v0.1.0
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.