Shortcuts

F-Beta Score

Module Interface

FBetaScore

class torchmetrics.FBetaScore(num_classes=None, beta=1.0, threshold=0.5, average='micro', mdmc_average=None, ignore_index=None, top_k=None, multiclass=None, **kwargs)[source]

F-Beta Score.

Note

From v0.10 an 'binary_*', 'multiclass_*', 'multilabel_*' version now exist of each classification metric. Moving forward we recommend using these versions. This base metric will still work as it did prior to v0.10 until v0.11. From v0.11 the task argument introduced in this metric will be required and the general order of arguments may change, such that this metric will just function as an single entrypoint to calling the three specialized versions.

Computes F-score, specifically:

F_\beta = (1 + \beta^2) * \frac{\text{precision} * \text{recall}}
{(\beta^2 * \text{precision}) + \text{recall}}

Where \beta is some positive real factor. Works with binary, multiclass, and multilabel data. Accepts logit scores or probabilities from a model output or integer class values in prediction. Works with multi-dimensional preds and target.

Forward accepts

  • preds (float or long tensor): (N, ...) or (N, C, ...) where C is the number of classes

  • target (long tensor): (N, ...)

If preds and target are the same shape and preds is a float tensor, we use the self.threshold argument to convert into integer labels. This is the case for binary and multi-label logits and probabilities.

If preds has an extra dimension as in the case of multi-class scores we perform an argmax on dim=1.

Parameters
  • num_classes (Optional[int]) – Number of classes. Necessary for 'macro', 'weighted' and None average methods.

  • beta (float) – Beta coefficient in the F measure.

  • threshold (float) – Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

  • average (Optional[Literal[‘micro’, ‘macro’, ‘weighted’, ‘none’]]) –

    Defines the reduction that is applied. Should be one of the following:

    • 'micro' [default]: Calculate the metric globally, across all samples and classes.

    • 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).

    • 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn).

    • 'none' or None: Calculate the metric for each class separately, and return the metric for every class.

    • 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).

    Note

    What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_average.

    Note

    If 'none' and a given class doesn’t occur in the preds or target, the value for the class will be nan.

  • mdmc_average (Optional[str]) –

    Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter). Should be one of the following:

    • None [default]: Should be left unchanged if your data is not multi-dimensional multi-class.

    • 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then averaged over samples. The computation for each sample is done by treating the flattened extra axes ... (see Input types) as the N dimension within the sample, and computing the metric for the sample based on that.

    • 'global': In this case the N and ... dimensions of the inputs (see Input types) are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the average parameter applies as usual.

  • ignore_index (Optional[int]) – Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and average=None or 'none', the score for the ignored class will be returned as nan.

  • top_k (Optional[int]) –

    Number of the highest probability or logit score predictions considered finding the correct label, relevant only for (multi-dimensional) multi-class inputs. The default value (None) will be interpreted as 1 for these inputs.

    Should be left at default (None) for all other types of inputs.

  • multiclass (Optional[bool]) – Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter’s documentation section for a more detailed explanation and examples.

  • kwargs (Any) – Additional keyword arguments, see Advanced metric settings for more info.

Raises

ValueError – If average is none of "micro", "macro", "weighted", "none", None.

Example

>>> import torch
>>> from torchmetrics import FBetaScore
>>> target = torch.tensor([0, 1, 2, 0, 1, 2])
>>> preds = torch.tensor([0, 2, 1, 0, 0, 1])
>>> f_beta = FBetaScore(num_classes=3, beta=0.5)
>>> f_beta(preds, target)
tensor(0.3333)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

compute()[source]

Computes f-beta over state.

Return type

Tensor

BinaryFBetaScore

class torchmetrics.classification.BinaryFBetaScore(beta, threshold=0.5, multidim_average='global', ignore_index=None, validate_args=True, **kwargs)[source]

Computes F-score metric for binary tasks:

F_{\beta} = (1 + \beta^2) * \frac{\text{precision} * \text{recall}}
{(\beta^2 * \text{precision}) + \text{recall}}

Accepts the following input tensors:

  • preds (int or float tensor): (N, ...). If preds is a floating point tensor with values outside [0,1] range we consider the input to be logits and will auto apply sigmoid per element. Addtionally, we convert to int tensor with thresholding using the value in threshold.

  • target (int tensor): (N, ...)

The influence of the additional dimension ... (if present) will be determined by the multidim_average argument.

Parameters
  • beta (float) – Weighting between precision and recall in calculation. Setting to 1 corresponds to equal weight

  • threshold (float) – Threshold for transforming probability to binary {0,1} predictions

  • multidim_average (Literal[‘global’, ‘samplewise’]) –

    Defines how additionally dimensions ... should be handled. Should be one of the following:

    • global: Additional dimensions are flatted along the batch dimension

    • samplewise: Statistic will be calculated independently for each sample on the N axis. The statistics in this case are calculated over the additional dimensions.

  • ignore_index (Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculation

  • validate_args (bool) – bool indicating if input arguments and tensors should be validated for correctness. Set to False for faster computations.

Returns

If multidim_average is set to global, the metric returns a scalar value. If multidim_average is set to samplewise, the metric returns (N,) vector consisting of a scalar value per sample.

Example (preds is int tensor):
>>> from torchmetrics.classification import BinaryFBetaScore
>>> target = torch.tensor([0, 1, 0, 1, 0, 1])
>>> preds = torch.tensor([0, 0, 1, 1, 0, 1])
>>> metric = BinaryFBetaScore(beta=2.0)
>>> metric(preds, target)
tensor(0.6667)
Example (preds is float tensor):
>>> from torchmetrics.classification import BinaryFBetaScore
>>> target = torch.tensor([0, 1, 0, 1, 0, 1])
>>> preds = torch.tensor([0.11, 0.22, 0.84, 0.73, 0.33, 0.92])
>>> metric = BinaryFBetaScore(beta=2.0)
>>> metric(preds, target)
tensor(0.6667)
Example (multidim tensors):
>>> from torchmetrics.classification import BinaryFBetaScore
>>> target = torch.tensor([[[0, 1], [1, 0], [0, 1]], [[1, 1], [0, 0], [1, 0]]])
>>> preds = torch.tensor(
...     [
...         [[0.59, 0.91], [0.91, 0.99], [0.63, 0.04]],
...         [[0.38, 0.04], [0.86, 0.780], [0.45, 0.37]],
...     ]
... )
>>> metric = BinaryFBetaScore(beta=2.0, multidim_average='samplewise')
>>> metric(preds, target)
tensor([0.5882, 0.0000])

Initializes internal Module state, shared by both nn.Module and ScriptModule.

compute()[source]

Computes the final statistics.

Return type

Tensor

Returns

The metric returns a tensor of shape (..., 5), where the last dimension corresponds to [tp, fp, tn, fn, sup] (sup stands for support and equals tp + fn). The shape depends on the multidim_average parameter:

  • If multidim_average is set to global, the shape will be (5,)

  • If multidim_average is set to samplewise, the shape will be (N, 5)

MulticlassFBetaScore

class torchmetrics.classification.MulticlassFBetaScore(beta, num_classes, top_k=1, average='macro', multidim_average='global', ignore_index=None, validate_args=True, **kwargs)[source]

Computes F-score metric for multiclass tasks:

F_{\beta} = (1 + \beta^2) * \frac{\text{precision} * \text{recall}}
{(\beta^2 * \text{precision}) + \text{recall}}

Accepts the following input tensors:

  • preds: (N, ...) (int tensor) or (N, C, ..) (float tensor). If preds is a floating point we apply torch.argmax along the C dimension to automatically convert probabilities/logits into an int tensor.

  • target (int tensor): (N, ...)

The influence of the additional dimension ... (if present) will be determined by the multidim_average argument.

Parameters
  • beta (float) – Weighting between precision and recall in calculation. Setting to 1 corresponds to equal weight

  • num_classes (int) – Integer specifing the number of classes

  • average (Optional[Literal[‘micro’, ‘macro’, ‘weighted’, ‘none’]]) –

    Defines the reduction that is applied over labels. Should be one of the following:

    • micro: Sum statistics over all labels

    • macro: Calculate statistics for each label and average them

    • weighted: Calculates statistics for each label and computes weighted average using their support

    • "none" or None: Calculates statistic for each label and applies no reduction

  • top_k (int) – Number of highest probability or logit score predictions considered to find the correct label. Only works when preds contain probabilities/logits.

  • multidim_average (Literal[‘global’, ‘samplewise’]) –

    Defines how additionally dimensions ... should be handled. Should be one of the following:

    • global: Additional dimensions are flatted along the batch dimension

    • samplewise: Statistic will be calculated independently for each sample on the N axis. The statistics in this case are calculated over the additional dimensions.

  • ignore_index (Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculation

  • validate_args (bool) – bool indicating if input arguments and tensors should be validated for correctness. Set to False for faster computations.

Returns

  • If multidim_average is set to global:

    • If average='micro'/'macro'/'weighted', the output will be a scalar tensor

    • If average=None/'none', the shape will be (C,)

  • If multidim_average is set to samplewise:

    • If average='micro'/'macro'/'weighted', the shape will be (N,)

    • If average=None/'none', the shape will be (N, C)

Return type

The returned shape depends on the average and multidim_average arguments

Example (preds is int tensor):
>>> from torchmetrics.classification import MulticlassFBetaScore
>>> target = torch.tensor([2, 1, 0, 0])
>>> preds = torch.tensor([2, 1, 0, 1])
>>> metric = MulticlassFBetaScore(beta=2.0, num_classes=3)
>>> metric(preds, target)
tensor(0.7963)
>>> metric = MulticlassFBetaScore(beta=2.0, num_classes=3, average=None)
>>> metric(preds, target)
tensor([0.5556, 0.8333, 1.0000])
Example (preds is float tensor):
>>> from torchmetrics.classification import MulticlassFBetaScore
>>> target = torch.tensor([2, 1, 0, 0])
>>> preds = torch.tensor([
...   [0.16, 0.26, 0.58],
...   [0.22, 0.61, 0.17],
...   [0.71, 0.09, 0.20],
...   [0.05, 0.82, 0.13],
... ])
>>> metric = MulticlassFBetaScore(beta=2.0, num_classes=3)
>>> metric(preds, target)
tensor(0.7963)
>>> metric = MulticlassFBetaScore(beta=2.0, num_classes=3, average=None)
>>> metric(preds, target)
tensor([0.5556, 0.8333, 1.0000])
Example (multidim tensors):
>>> from torchmetrics.classification import MulticlassFBetaScore
>>> target = torch.tensor([[[0, 1], [2, 1], [0, 2]], [[1, 1], [2, 0], [1, 2]]])
>>> preds = torch.tensor([[[0, 2], [2, 0], [0, 1]], [[2, 2], [2, 1], [1, 0]]])
>>> metric = MulticlassFBetaScore(beta=2.0, num_classes=3, multidim_average='samplewise')
>>> metric(preds, target)
tensor([0.4697, 0.2706])
>>> metric = MulticlassFBetaScore(beta=2.0, num_classes=3, multidim_average='samplewise', average=None)
>>> metric(preds, target)
tensor([[0.9091, 0.0000, 0.5000],
        [0.0000, 0.3571, 0.4545]])

Initializes internal Module state, shared by both nn.Module and ScriptModule.

compute()[source]

Computes the final statistics.

Return type

Tensor

Returns

The metric returns a tensor of shape (..., 5), where the last dimension corresponds to [tp, fp, tn, fn, sup] (sup stands for support and equals tp + fn). The shape depends on average and multidim_average parameters:

  • If multidim_average is set to global

  • If average='micro'/'macro'/'weighted', the shape will be (5,)

  • If average=None/'none', the shape will be (C, 5)

  • If multidim_average is set to samplewise

  • If average='micro'/'macro'/'weighted', the shape will be (N, 5)

  • If average=None/'none', the shape will be (N, C, 5)

MultilabelFBetaScore

class torchmetrics.classification.MultilabelFBetaScore(beta, num_labels, threshold=0.5, average='macro', multidim_average='global', ignore_index=None, validate_args=True, **kwargs)[source]

Computes F-score metric for multilabel tasks:

F_{\beta} = (1 + \beta^2) * \frac{\text{precision} * \text{recall}}
{(\beta^2 * \text{precision}) + \text{recall}}

Accepts the following input tensors:

  • preds (int or float tensor): (N, C, ...). If preds is a floating point tensor with values outside [0,1] range we consider the input to be logits and will auto apply sigmoid per element. Addtionally, we convert to int tensor with thresholding using the value in threshold.

  • target (int tensor): (N, C, ...)

The influence of the additional dimension ... (if present) will be determined by the multidim_average argument.

Parameters
  • beta (float) – Weighting between precision and recall in calculation. Setting to 1 corresponds to equal weight

  • num_labels (int) – Integer specifing the number of labels

  • threshold (float) – Threshold for transforming probability to binary (0,1) predictions

  • average (Optional[Literal[‘micro’, ‘macro’, ‘weighted’, ‘none’]]) –

    Defines the reduction that is applied over labels. Should be one of the following:

    • micro: Sum statistics over all labels

    • macro: Calculate statistics for each label and average them

    • weighted: Calculates statistics for each label and computes weighted average using their support

    • "none" or None: Calculates statistic for each label and applies no reduction

  • multidim_average (Literal[‘global’, ‘samplewise’]) –

    Defines how additionally dimensions ... should be handled. Should be one of the following:

    • global: Additional dimensions are flatted along the batch dimension

    • samplewise: Statistic will be calculated independently for each sample on the N axis. The statistics in this case are calculated over the additional dimensions.

  • ignore_index (Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculation

  • validate_args (bool) – bool indicating if input arguments and tensors should be validated for correctness. Set to False for faster computations.

Returns

  • If multidim_average is set to global:

    • If average='micro'/'macro'/'weighted', the output will be a scalar tensor

    • If average=None/'none', the shape will be (C,)

  • If multidim_average is set to samplewise:

    • If average='micro'/'macro'/'weighted', the shape will be (N,)

    • If average=None/'none', the shape will be (N, C)

Return type

The returned shape depends on the average and multidim_average arguments

Example (preds is int tensor):
>>> from torchmetrics.classification import MultilabelFBetaScore
>>> target = torch.tensor([[0, 1, 0], [1, 0, 1]])
>>> preds = torch.tensor([[0, 0, 1], [1, 0, 1]])
>>> metric = MultilabelFBetaScore(beta=2.0, num_labels=3)
>>> metric(preds, target)
tensor(0.6111)
>>> metric = MultilabelFBetaScore(beta=2.0, num_labels=3, average=None)
>>> metric(preds, target)
tensor([1.0000, 0.0000, 0.8333])
Example (preds is float tensor):
>>> from torchmetrics.classification import MultilabelFBetaScore
>>> target = torch.tensor([[0, 1, 0], [1, 0, 1]])
>>> preds = torch.tensor([[0.11, 0.22, 0.84], [0.73, 0.33, 0.92]])
>>> metric = MultilabelFBetaScore(beta=2.0, num_labels=3)
>>> metric(preds, target)
tensor(0.6111)
>>> metric = MultilabelFBetaScore(beta=2.0, num_labels=3, average=None)
>>> metric(preds, target)
tensor([1.0000, 0.0000, 0.8333])
Example (multidim tensors):
>>> from torchmetrics.classification import MultilabelFBetaScore
>>> target = torch.tensor([[[0, 1], [1, 0], [0, 1]], [[1, 1], [0, 0], [1, 0]]])
>>> preds = torch.tensor(
...     [
...         [[0.59, 0.91], [0.91, 0.99], [0.63, 0.04]],
...         [[0.38, 0.04], [0.86, 0.780], [0.45, 0.37]],
...     ]
... )
>>> metric = MultilabelFBetaScore(num_labels=3, beta=2.0, multidim_average='samplewise')
>>> metric(preds, target)
tensor([0.5556, 0.0000])
>>> metric = MultilabelFBetaScore(num_labels=3, beta=2.0, multidim_average='samplewise', average=None)
>>> metric(preds, target)
tensor([[0.8333, 0.8333, 0.0000],
        [0.0000, 0.0000, 0.0000]])

Initializes internal Module state, shared by both nn.Module and ScriptModule.

compute()[source]

Computes the final statistics.

Return type

Tensor

Returns

The metric returns a tensor of shape (..., 5), where the last dimension corresponds to [tp, fp, tn, fn, sup] (sup stands for support and equals tp + fn). The shape depends on average and multidim_average parameters:

  • If multidim_average is set to global

  • If average='micro'/'macro'/'weighted', the shape will be (5,)

  • If average=None/'none', the shape will be (C, 5)

  • If multidim_average is set to samplewise

  • If average='micro'/'macro'/'weighted', the shape will be (N, 5)

  • If average=None/'none', the shape will be (N, C, 5)

Functional Interface

fbeta_score

torchmetrics.functional.fbeta_score(preds, target, beta=1.0, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None, task=None, num_labels=None, multidim_average='global', validate_args=True)[source]

F-Beta score.

Note

From v0.10 an 'binary_*', 'multiclass_*', 'multilabel_*' version now exist of each classification metric. Moving forward we recommend using these versions. This base metric will still work as it did prior to v0.10 until v0.11. From v0.11 the task argument introduced in this metric will be required and the general order of arguments may change, such that this metric will just function as an single entrypoint to calling the three specialized versions.

Computes f_beta metric.

F_{\beta} = (1 + \beta^2) * \frac{\text{precision} * \text{recall}}
{(\beta^2 * \text{precision}) + \text{recall}}

Works with binary, multiclass, and multilabel data. Accepts probabilities or logits from a model output or integer class values in prediction. Works with multi-dimensional preds and target.

If preds and target are the same shape and preds is a float tensor, we use the self.threshold argument to convert into integer labels. This is the case for binary and multi-label logits or probabilities.

If preds has an extra dimension as in the case of multi-class scores we perform an argmax on dim=1.

The reduction method (how the precision scores are aggregated) is controlled by the average parameter, and additionally by the mdmc_average parameter in the multi-dimensional multi-class case. Accepts all inputs listed in Input types.

Parameters
  • preds (Tensor) – Predictions from model (probabilities, logits or labels)

  • target (Tensor) – Ground truth values

  • beta (float) – beta coefficient

  • average (Optional[Literal[‘micro’, ‘macro’, ‘weighted’, ‘none’]]) –

    Defines the reduction that is applied. Should be one of the following:

    • 'micro' [default]: Calculate the metric globally, across all samples and classes.

    • 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).

    • 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn).

    • 'none' or None: Calculate the metric for each class separately, and return the metric for every class.

    • 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).

    Note

    What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_average.

    Note

    If 'none' and a given class doesn’t occur in the preds or target, the value for the class will be nan.

  • mdmc_average (Optional[str]) –

    Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter). Should be one of the following:

    • None [default]: Should be left unchanged if your data is not multi-dimensional multi-class.

    • 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then averaged over samples. The computation for each sample is done by treating the flattened extra axes ... (see Input types) as the N dimension within the sample, and computing the metric for the sample based on that.

    • 'global': In this case the N and ... dimensions of the inputs (see Input types) are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the average parameter applies as usual.

  • ignore_index (Optional[int]) – Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and average=None or 'none', the score for the ignored class will be returned as nan.

  • num_classes (Optional[int]) – Number of classes. Necessary for 'macro', 'weighted' and None average methods.

  • threshold (float) – Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

  • top_k (Optional[int]) –

    Number of highest probability or logit score predictions considered to find the correct label, relevant only for (multi-dimensional) multi-class inputs. The default value (None) will be interpreted as 1 for these inputs.

    Should be left at default (None) for all other types of inputs.

  • multiclass (Optional[bool]) – Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter’s documentation section for a more detailed explanation and examples.

Return type

Tensor

Returns

The shape of the returned tensor depends on the average parameter

  • If average in ['micro', 'macro', 'weighted', 'samples'], a one-element tensor will be returned

  • If average in ['none', None], the shape will be (C,), where C stands for the number of classes

Example

>>> from torchmetrics.functional import fbeta_score
>>> target = torch.tensor([0, 1, 2, 0, 1, 2])
>>> preds = torch.tensor([0, 2, 1, 0, 0, 1])
>>> fbeta_score(preds, target, num_classes=3, beta=0.5)
tensor(0.3333)

binary_fbeta_score

torchmetrics.functional.classification.binary_fbeta_score(preds, target, beta, threshold=0.5, multidim_average='global', ignore_index=None, validate_args=True)[source]

Computes F-score metric for binary tasks:

F_{\beta} = (1 + \beta^2) * \frac{\text{precision} * \text{recall}}
{(\beta^2 * \text{precision}) + \text{recall}}

Accepts the following input tensors:

  • preds (int or float tensor): (N, ...). If preds is a floating point tensor with values outside [0,1] range we consider the input to be logits and will auto apply sigmoid per element. Addtionally, we convert to int tensor with thresholding using the value in threshold.

  • target (int tensor): (N, ...)

The influence of the additional dimension ... (if present) will be determined by the multidim_average argument.

Parameters
  • preds (Tensor) – Tensor with predictions

  • target (Tensor) – Tensor with true labels

  • beta (float) – Weighting between precision and recall in calculation. Setting to 1 corresponds to equal weight

  • threshold (float) – Threshold for transforming probability to binary {0,1} predictions

  • multidim_average (Literal[‘global’, ‘samplewise’]) –

    Defines how additionally dimensions ... should be handled. Should be one of the following:

    • global: Additional dimensions are flatted along the batch dimension

    • samplewise: Statistic will be calculated independently for each sample on the N axis. The statistics in this case are calculated over the additional dimensions.

  • ignore_index (Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculation

  • validate_args (bool) – bool indicating if input arguments and tensors should be validated for correctness. Set to False for faster computations.

Return type

Tensor

Returns

If multidim_average is set to global, the metric returns a scalar value. If multidim_average is set to samplewise, the metric returns (N,) vector consisting of a scalar value per sample.

Example (preds is int tensor):
>>> from torchmetrics.functional.classification import binary_fbeta_score
>>> target = torch.tensor([0, 1, 0, 1, 0, 1])
>>> preds = torch.tensor([0, 0, 1, 1, 0, 1])
>>> binary_fbeta_score(preds, target, beta=2.0)
tensor(0.6667)
Example (preds is float tensor):
>>> from torchmetrics.functional.classification import binary_fbeta_score
>>> target = torch.tensor([0, 1, 0, 1, 0, 1])
>>> preds = torch.tensor([0.11, 0.22, 0.84, 0.73, 0.33, 0.92])
>>> binary_fbeta_score(preds, target, beta=2.0)
tensor(0.6667)
Example (multidim tensors):
>>> from torchmetrics.functional.classification import binary_fbeta_score
>>> target = torch.tensor([[[0, 1], [1, 0], [0, 1]], [[1, 1], [0, 0], [1, 0]]])
>>> preds = torch.tensor(
...     [
...         [[0.59, 0.91], [0.91, 0.99], [0.63, 0.04]],
...         [[0.38, 0.04], [0.86, 0.780], [0.45, 0.37]],
...     ]
... )
>>> binary_fbeta_score(preds, target, beta=2.0, multidim_average='samplewise')
tensor([0.5882, 0.0000])

multiclass_fbeta_score

torchmetrics.functional.classification.multiclass_fbeta_score(preds, target, beta, num_classes, average='macro', top_k=1, multidim_average='global', ignore_index=None, validate_args=True)[source]

Computes F-score metric for multiclass tasks:

F_{\beta} = (1 + \beta^2) * \frac{\text{precision} * \text{recall}}
{(\beta^2 * \text{precision}) + \text{recall}}

Accepts the following input tensors:

  • preds: (N, ...) (int tensor) or (N, C, ..) (float tensor). If preds is a floating point we apply torch.argmax along the C dimension to automatically convert probabilities/logits into an int tensor.

  • target (int tensor): (N, ...)

The influence of the additional dimension ... (if present) will be determined by the multidim_average argument.

Parameters
  • preds (Tensor) – Tensor with predictions

  • target (Tensor) – Tensor with true labels

  • beta (float) – Weighting between precision and recall in calculation. Setting to 1 corresponds to equal weight

  • num_classes (int) – Integer specifing the number of classes

  • average (Optional[Literal[‘micro’, ‘macro’, ‘weighted’, ‘none’]]) –

    Defines the reduction that is applied over labels. Should be one of the following:

    • micro: Sum statistics over all labels

    • macro: Calculate statistics for each label and average them

    • weighted: Calculates statistics for each label and computes weighted average using their support

    • "none" or None: Calculates statistic for each label and applies no reduction

  • top_k (int) – Number of highest probability or logit score predictions considered to find the correct label. Only works when preds contain probabilities/logits.

  • multidim_average (Literal[‘global’, ‘samplewise’]) –

    Defines how additionally dimensions ... should be handled. Should be one of the following:

    • global: Additional dimensions are flatted along the batch dimension

    • samplewise: Statistic will be calculated independently for each sample on the N axis. The statistics in this case are calculated over the additional dimensions.

  • ignore_index (Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculation

  • validate_args (bool) – bool indicating if input arguments and tensors should be validated for correctness. Set to False for faster computations.

Returns

  • If multidim_average is set to global:

    • If average='micro'/'macro'/'weighted', the output will be a scalar tensor

    • If average=None/'none', the shape will be (C,)

  • If multidim_average is set to samplewise:

    • If average='micro'/'macro'/'weighted', the shape will be (N,)

    • If average=None/'none', the shape will be (N, C)

Return type

The returned shape depends on the average and multidim_average arguments

Example (preds is int tensor):
>>> from torchmetrics.functional.classification import multiclass_fbeta_score
>>> target = torch.tensor([2, 1, 0, 0])
>>> preds = torch.tensor([2, 1, 0, 1])
>>> multiclass_fbeta_score(preds, target, beta=2.0, num_classes=3)
tensor(0.7963)
>>> multiclass_fbeta_score(preds, target, beta=2.0, num_classes=3, average=None)
tensor([0.5556, 0.8333, 1.0000])
Example (preds is float tensor):
>>> from torchmetrics.functional.classification import multiclass_fbeta_score
>>> target = torch.tensor([2, 1, 0, 0])
>>> preds = torch.tensor([
...   [0.16, 0.26, 0.58],
...   [0.22, 0.61, 0.17],
...   [0.71, 0.09, 0.20],
...   [0.05, 0.82, 0.13],
... ])
>>> multiclass_fbeta_score(preds, target, beta=2.0, num_classes=3)
tensor(0.7963)
>>> multiclass_fbeta_score(preds, target, beta=2.0, num_classes=3, average=None)
tensor([0.5556, 0.8333, 1.0000])
Example (multidim tensors):
>>> from torchmetrics.functional.classification import multiclass_fbeta_score
>>> target = torch.tensor([[[0, 1], [2, 1], [0, 2]], [[1, 1], [2, 0], [1, 2]]])
>>> preds = torch.tensor([[[0, 2], [2, 0], [0, 1]], [[2, 2], [2, 1], [1, 0]]])
>>> multiclass_fbeta_score(preds, target, beta=2.0, num_classes=3, multidim_average='samplewise')
tensor([0.4697, 0.2706])
>>> multiclass_fbeta_score(preds, target, beta=2.0, num_classes=3, multidim_average='samplewise', average=None)
tensor([[0.9091, 0.0000, 0.5000],
        [0.0000, 0.3571, 0.4545]])

multilabel_fbeta_score

torchmetrics.functional.classification.multilabel_fbeta_score(preds, target, beta, num_labels, threshold=0.5, average='macro', multidim_average='global', ignore_index=None, validate_args=True)[source]

Computes F-score metric for multilabel tasks:

F_{\beta} = (1 + \beta^2) * \frac{\text{precision} * \text{recall}}
{(\beta^2 * \text{precision}) + \text{recall}}

Accepts the following input tensors:

  • preds (int or float tensor): (N, C, ...). If preds is a floating point tensor with values outside [0,1] range we consider the input to be logits and will auto apply sigmoid per element. Addtionally, we convert to int tensor with thresholding using the value in threshold.

  • target (int tensor): (N, C, ...)

The influence of the additional dimension ... (if present) will be determined by the multidim_average argument.

Parameters
  • preds (Tensor) – Tensor with predictions

  • target (Tensor) – Tensor with true labels

  • beta (float) – Weighting between precision and recall in calculation. Setting to 1 corresponds to equal weight

  • num_labels (int) – Integer specifing the number of labels

  • threshold (float) – Threshold for transforming probability to binary (0,1) predictions

  • average (Optional[Literal[‘micro’, ‘macro’, ‘weighted’, ‘none’]]) –

    Defines the reduction that is applied over labels. Should be one of the following:

    • micro: Sum statistics over all labels

    • macro: Calculate statistics for each label and average them

    • weighted: Calculates statistics for each label and computes weighted average using their support

    • "none" or None: Calculates statistic for each label and applies no reduction

  • multidim_average (Literal[‘global’, ‘samplewise’]) –

    Defines how additionally dimensions ... should be handled. Should be one of the following:

    • global: Additional dimensions are flatted along the batch dimension

    • samplewise: Statistic will be calculated independently for each sample on the N axis. The statistics in this case are calculated over the additional dimensions.

  • ignore_index (Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculation

  • validate_args (bool) – bool indicating if input arguments and tensors should be validated for correctness. Set to False for faster computations.

Returns

  • If multidim_average is set to global:

    • If average='micro'/'macro'/'weighted', the output will be a scalar tensor

    • If average=None/'none', the shape will be (C,)

  • If multidim_average is set to samplewise:

    • If average='micro'/'macro'/'weighted', the shape will be (N,)

    • If average=None/'none', the shape will be (N, C)

Return type

The returned shape depends on the average and multidim_average arguments

Example (preds is int tensor):
>>> from torchmetrics.functional.classification import multilabel_fbeta_score
>>> target = torch.tensor([[0, 1, 0], [1, 0, 1]])
>>> preds = torch.tensor([[0, 0, 1], [1, 0, 1]])
>>> multilabel_fbeta_score(preds, target, beta=2.0, num_labels=3)
tensor(0.6111)
>>> multilabel_fbeta_score(preds, target, beta=2.0, num_labels=3, average=None)
tensor([1.0000, 0.0000, 0.8333])
Example (preds is float tensor):
>>> from torchmetrics.functional.classification import multilabel_fbeta_score
>>> target = torch.tensor([[0, 1, 0], [1, 0, 1]])
>>> preds = torch.tensor([[0.11, 0.22, 0.84], [0.73, 0.33, 0.92]])
>>> multilabel_fbeta_score(preds, target, beta=2.0, num_labels=3)
tensor(0.6111)
>>> multilabel_fbeta_score(preds, target, beta=2.0, num_labels=3, average=None)
tensor([1.0000, 0.0000, 0.8333])
Example (multidim tensors):
>>> from torchmetrics.functional.classification import multilabel_fbeta_score
>>> target = torch.tensor([[[0, 1], [1, 0], [0, 1]], [[1, 1], [0, 0], [1, 0]]])
>>> preds = torch.tensor(
...     [
...         [[0.59, 0.91], [0.91, 0.99], [0.63, 0.04]],
...         [[0.38, 0.04], [0.86, 0.780], [0.45, 0.37]],
...     ]
... )
>>> multilabel_fbeta_score(preds, target, num_labels=3, beta=2.0, multidim_average='samplewise')
tensor([0.5556, 0.0000])
>>> multilabel_fbeta_score(preds, target, num_labels=3, beta=2.0, multidim_average='samplewise', average=None)
tensor([[0.8333, 0.8333, 0.0000],
        [0.0000, 0.0000, 0.0000]])