F1 Score¶

Module Interface¶

class torchmetrics.F1Score(num_classes=None, threshold=0.5, average='micro', mdmc_average=None, ignore_index=None, top_k=None, multiclass=None, compute_on_step=None, **kwargs)[source]

Computes F1 metric.

F1 metrics correspond to a harmonic mean of the precision and recall scores. Works with binary, multiclass, and multilabel data. Accepts logits or probabilities from a model output or integer class values in prediction. Works with multi-dimensional preds and target.

Forward accepts

preds (float or long tensor): (N, ...) or (N, C, ...) where C is the number of classes
target (long tensor): (N, ...)

If preds and target are the same shape and preds is a float tensor, we use the self.threshold argument. This is the case for binary and multi-label logits.

If preds has an extra dimension as in the case of multi-class scores we perform an argmax on dim=1.

Parameters

num_classes¶ (Optional[int]) – Number of classes. Necessary for 'macro', 'weighted' and None average methods.
threshold¶ (float) – Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.
average¶ (str) –
Defines the reduction that is applied. Should be one of the following:
- 'micro' [default]: Calculate the metric globally, across all samples and classes.
- 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).
- 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn).
- 'none' or None: Calculate the metric for each class separately, and return the metric for every class.
- 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).
Note

What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_average.
mdmc_average¶ (Optional[str]) –
Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter). Should be one of the following:
- None [default]: Should be left unchanged if your data is not multi-dimensional multi-class.
- 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then averaged over samples. The computation for each sample is done by treating the flattened extra axes ... (see Input types) as the N dimension within the sample, and computing the metric for the sample based on that.
- 'global': In this case the N and ... dimensions of the inputs (see Input types) are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the average parameter applies as usual.
ignore_index¶ (Optional[int]) – Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and average=None or 'none', the score for the ignored class will be returned as nan.
top_k¶ (Optional[int]) – Number of the highest probability or logit score predictions considered finding the correct label, relevant only for (multi-dimensional) multi-class inputs. The default value (None) will be interpreted as 1 for these inputs. Should be left at default (None) for all other types of inputs.
multiclass¶ (Optional[bool]) – Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter’s documentation section for a more detailed explanation and examples.
compute_on_step¶ (Optional[bool]) –
Forward only calls update() and returns None if this is set to False.

Deprecated since version v0.8: Argument has no use anymore and will be removed v0.9.
kwargs¶ (Dict[str, Any]) – Additional keyword arguments, see Advanced metric settings for more info.

Example

>>> import torch
>>> from torchmetrics import F1Score
>>> target = torch.tensor([0, 1, 2, 0, 1, 2])
>>> preds = torch.tensor([0, 2, 1, 0, 0, 1])
>>> f1 = F1Score(num_classes=3)
>>> f1(preds, target)
tensor(0.3333)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Functional Interface¶

torchmetrics.functional.f1_score(preds, target, beta=1.0, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)[source]

Computes F1 metric. F1 metrics correspond to a equally weighted average of the precision and recall scores.

Works with binary, multiclass, and multilabel data. Accepts probabilities or logits from a model output or integer class values in prediction. Works with multi-dimensional preds and target.

If preds and target are the same shape and preds is a float tensor, we use the self.threshold argument to convert into integer labels. This is the case for binary and multi-label probabilities or logits.

If preds has an extra dimension as in the case of multi-class scores we perform an argmax on dim=1.

The reduction method (how the precision scores are aggregated) is controlled by the average parameter, and additionally by the mdmc_average parameter in the multi-dimensional multi-class case. Accepts all inputs listed in Input types.

Parameters

preds¶ (Tensor) – Predictions from model (probabilities, logits or labels)
target¶ (Tensor) – Ground truth values
beta¶ (float) – it is ignored
average¶ (str) –
Defines the reduction that is applied. Should be one of the following:
- 'micro' [default]: Calculate the metric globally, across all samples and classes.
- 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).
- 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn).
- 'none' or None: Calculate the metric for each class separately, and return the metric for every class.
- 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).
Note

What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_average.

Note

If 'none' and a given class doesn’t occur in the preds or target, the value for the class will be nan.
mdmc_average¶ (Optional[str]) –
Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter). Should be one of the following:
- None [default]: Should be left unchanged if your data is not multi-dimensional multi-class.
- 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then averaged over samples. The computation for each sample is done by treating the flattened extra axes ... (see Input types) as the N dimension within the sample, and computing the metric for the sample based on that.
- 'global': In this case the N and ... dimensions of the inputs (see Input types) are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the average parameter applies as usual.
ignore_index¶ (Optional[int]) – Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and average=None or 'none', the score for the ignored class will be returned as nan.
num_classes¶ (Optional[int]) – Number of classes. Necessary for 'macro', 'weighted' and None average methods.
threshold¶ (float) – Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.
top_k¶ (Optional[int]) –
Number of highest probability or logit score predictions considered to find the correct label, relevant only for (multi-dimensional) multi-class inputs. The default value (None) will be interpreted as 1 for these inputs.

Should be left at default (None) for all other types of inputs.
multiclass¶ (Optional[bool]) – Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter’s documentation section for a more detailed explanation and examples.

Return type

Tensor

Returns

The shape of the returned tensor depends on the average parameter

If average in ['micro', 'macro', 'weighted', 'samples'], a one-element tensor will be returned
If average in ['none', None], the shape will be (C,), where C stands for the number of classes

Example

>>> from torchmetrics.functional import f1_score
>>> target = torch.tensor([0, 1, 2, 0, 1, 2])
>>> preds = torch.tensor([0, 2, 1, 0, 0, 1])
>>> f1_score(preds, target, num_classes=3)
tensor(0.3333)