Shortcuts

Cohen Kappa

Module Interface

CohenKappa

class torchmetrics.CohenKappa(num_classes, weights=None, threshold=0.5, **kwargs)[source]

Note

From v0.10 an ‘binary_*’, ‘multiclass_*’, `’multilabel_*’ version now exist of each classification metric. Moving forward we recommend using these versions. This base metric will still work as it did prior to v0.10 until v0.11. From v0.11 the task argument introduced in this metric will be required and the general order of arguments may change, such that this metric will just function as an single entrypoint to calling the three specialized versions.

Calculates Cohen’s kappa score that measures inter-annotator agreement. It is defined as

\kappa = (p_o - p_e) / (1 - p_e)

where p_o is the empirical probability of agreement and p_e is the expected agreement when both annotators assign labels randomly. Note that p_e is estimated using a per-annotator empirical prior over the class labels.

Works with binary, multiclass, and multilabel data. Accepts probabilities from a model output or integer class values in prediction. Works with multi-dimensional preds and target.

Forward accepts
  • preds (float or long tensor): (N, ...) or (N, C, ...) where C is the number of classes

  • target (long tensor): (N, ...)

If preds and target are the same shape and preds is a float tensor, we use the self.threshold argument to convert into integer labels. This is the case for binary and multi-label probabilities or logits.

If preds has an extra dimension as in the case of multi-class scores we perform an argmax on dim=1.

Parameters
  • num_classes (int) – Number of classes in the dataset.

  • weights (Optional[str]) –

    Weighting type to calculate the score. Choose from:

    • None or 'none': no weighting

    • 'linear': linear weighting

    • 'quadratic': quadratic weighting

  • threshold (float) – Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

  • kwargs (Any) – Additional keyword arguments, see Advanced metric settings for more info.

Example

>>> from torchmetrics import CohenKappa
>>> target = torch.tensor([1, 1, 0, 0])
>>> preds = torch.tensor([0, 1, 0, 0])
>>> cohenkappa = CohenKappa(num_classes=2)
>>> cohenkappa(preds, target)
tensor(0.5000)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

compute()[source]

Computes cohen kappa score.

Return type

Tensor

update(preds, target)[source]

Update state with predictions and targets.

Parameters
  • preds (Tensor) – Predictions from model

  • target (Tensor) – Ground truth values

Return type

None

BinaryCohenKappa

class torchmetrics.classification.BinaryCohenKappa(threshold=0.5, ignore_index=None, weights=None, validate_args=True, **kwargs)[source]

Calculates Cohen’s kappa score that measures inter-annotator agreement for binary tasks. It is defined as

\kappa = (p_o - p_e) / (1 - p_e)

where p_o is the empirical probability of agreement and p_e is the expected agreement when both annotators assign labels randomly. Note that p_e is estimated using a per-annotator empirical prior over the class labels.

Accepts the following input tensors:

  • preds (int or float tensor): (N, ...). If preds is a floating point tensor with values outside [0,1] range we consider the input to be logits and will auto apply sigmoid per element. Addtionally, we convert to int tensor with thresholding using the value in threshold.

  • target (int tensor): (N, ...)

Additional dimension ... will be flattened into the batch dimension.

Parameters
  • threshold (float) – Threshold for transforming probability to binary (0,1) predictions

  • ignore_index (Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculation

  • weights (Optional[Literal[‘linear’, ‘quadratic’, ‘none’]]) –

    Weighting type to calculate the score. Choose from:

    • None or 'none': no weighting

    • 'linear': linear weighting

    • 'quadratic': quadratic weighting

  • validate_args (bool) – bool indicating if input arguments and tensors should be validated for correctness. Set to False for faster computations.

  • kwargs (Any) – Additional keyword arguments, see Advanced metric settings for more info.

Example (preds is int tensor):
>>> from torchmetrics.classification import BinaryCohenKappa
>>> target = torch.tensor([1, 1, 0, 0])
>>> preds = torch.tensor([0, 1, 0, 0])
>>> metric = BinaryCohenKappa()
>>> metric(preds, target)
tensor(0.5000)
Example (preds is float tensor):
>>> from torchmetrics.classification import BinaryCohenKappa
>>> target = torch.tensor([1, 1, 0, 0])
>>> preds = torch.tensor([0.35, 0.85, 0.48, 0.01])
>>> metric = BinaryCohenKappa()
>>> metric(preds, target)
tensor(0.5000)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

MulticlassCohenKappa

class torchmetrics.classification.MulticlassCohenKappa(num_classes, ignore_index=None, weights=None, validate_args=True, **kwargs)[source]

Calculates Cohen’s kappa score that measures inter-annotator agreement for multiclass tasks. It is defined as

\kappa = (p_o - p_e) / (1 - p_e)

where p_o is the empirical probability of agreement and p_e is the expected agreement when both annotators assign labels randomly. Note that p_e is estimated using a per-annotator empirical prior over the class labels.

Accepts the following input tensors:

  • preds: (N, ...) (int tensor) or (N, C, ..) (float tensor). If preds is a floating point we apply torch.argmax along the C dimension to automatically convert probabilities/logits into an int tensor.

  • target (int tensor): (N, ...)

Additional dimension ... will be flattened into the batch dimension.

Parameters
  • num_classes (int) – Integer specifing the number of classes

  • ignore_index (Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculation

  • weights (Optional[Literal[‘linear’, ‘quadratic’, ‘none’]]) –

    Weighting type to calculate the score. Choose from:

    • None or 'none': no weighting

    • 'linear': linear weighting

    • 'quadratic': quadratic weighting

  • validate_args (bool) – bool indicating if input arguments and tensors should be validated for correctness. Set to False for faster computations.

  • kwargs (Any) – Additional keyword arguments, see Advanced metric settings for more info.

Example (pred is integer tensor):
>>> from torchmetrics.classification import MulticlassCohenKappa
>>> target = torch.tensor([2, 1, 0, 0])
>>> preds = torch.tensor([2, 1, 0, 1])
>>> metric = MulticlassCohenKappa(num_classes=3)
>>> metric(preds, target)
tensor(0.6364)
Example (pred is float tensor):
>>> from torchmetrics.classification import MulticlassCohenKappa
>>> target = torch.tensor([2, 1, 0, 0])
>>> preds = torch.tensor([
...   [0.16, 0.26, 0.58],
...   [0.22, 0.61, 0.17],
...   [0.71, 0.09, 0.20],
...   [0.05, 0.82, 0.13],
... ])
>>> metric = MulticlassCohenKappa(num_classes=3)
>>> metric(preds, target)
tensor(0.6364)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Functional Interface

cohen_kappa

torchmetrics.functional.cohen_kappa(preds, target, num_classes, weights=None, threshold=0.5, task=None, ignore_index=None, validate_args=True)[source]

Note

From v0.10 an ‘binary_*’, ‘multiclass_*’, `’multilabel_*’ version now exist of each classification metric. Moving forward we recommend using these versions. This base metric will still work as it did prior to v0.10 until v0.11. From v0.11 the task argument introduced in this metric will be required and the general order of arguments may change, such that this metric will just function as an single entrypoint to calling the three specialized versions.

Calculates Cohen’s kappa score that measures inter-annotator agreement.

It is defined as

\kappa = (p_o - p_e) / (1 - p_e)

where p_o is the empirical probability of agreement and p_e is the expected agreement when both annotators assign labels randomly. Note that p_e is estimated using a per-annotator empirical prior over the class labels.

Parameters
  • preds (Tensor) – (float or long tensor), Either a (N, ...) tensor with labels or (N, C, ...) where C is the number of classes, tensor with labels/probabilities

  • target (Tensor) – target (long tensor), tensor with shape (N, ...) with ground true labels

  • num_classes (int) – Number of classes in the dataset.

  • weights (Optional[Literal[‘linear’, ‘quadratic’, ‘none’]]) –

    Weighting type to calculate the score. Choose from:

    • None or 'none': no weighting

    • 'linear': linear weighting

    • 'quadratic': quadratic weighting

  • threshold (float) – Threshold value for binary or multi-label probabilities.

Example

>>> from torchmetrics.functional import cohen_kappa
>>> target = torch.tensor([1, 1, 0, 0])
>>> preds = torch.tensor([0, 1, 0, 0])
>>> cohen_kappa(preds, target, num_classes=2)
tensor(0.5000)
Return type

Tensor

binary_cohen_kappa

torchmetrics.functional.classification.binary_cohen_kappa(preds, target, threshold=0.5, weights=None, ignore_index=None, validate_args=True)[source]

Calculates Cohen’s kappa score that measures inter-annotator agreement for binary tasks. It is defined as

\kappa = (p_o - p_e) / (1 - p_e)

where p_o is the empirical probability of agreement and p_e is the expected agreement when both annotators assign labels randomly. Note that p_e is estimated using a per-annotator empirical prior over the class labels.

Accepts the following input tensors:

  • preds (int or float tensor): (N, ...). If preds is a floating point tensor with values outside [0,1] range we consider the input to be logits and will auto apply sigmoid per element. Addtionally, we convert to int tensor with thresholding using the value in threshold.

  • target (int tensor): (N, ...)

Additional dimension ... will be flattened into the batch dimension.

Parameters
  • preds (Tensor) – Tensor with predictions

  • target (Tensor) – Tensor with true labels

  • threshold (float) – Threshold for transforming probability to binary (0,1) predictions

  • weights (Optional[Literal[‘linear’, ‘quadratic’, ‘none’]]) –

    Weighting type to calculate the score. Choose from:

    • None or 'none': no weighting

    • 'linear': linear weighting

    • 'quadratic': quadratic weighting

  • ignore_index (Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculation

  • validate_args (bool) – bool indicating if input arguments and tensors should be validated for correctness. Set to False for faster computations.

  • kwargs – Additional keyword arguments, see Advanced metric settings for more info.

Example (preds is int tensor):
>>> from torchmetrics.functional.classification import binary_cohen_kappa
>>> target = torch.tensor([1, 1, 0, 0])
>>> preds = torch.tensor([0, 1, 0, 0])
>>> binary_cohen_kappa(preds, target)
tensor(0.5000)
Example (preds is float tensor):
>>> from torchmetrics.functional.classification import binary_cohen_kappa
>>> target = torch.tensor([1, 1, 0, 0])
>>> preds = torch.tensor([0.35, 0.85, 0.48, 0.01])
>>> binary_cohen_kappa(preds, target)
tensor(0.5000)
Return type

Tensor

multiclass_cohen_kappa

torchmetrics.functional.classification.multiclass_cohen_kappa(preds, target, num_classes, weights=None, ignore_index=None, validate_args=True)[source]

Calculates Cohen’s kappa score that measures inter-annotator agreement for multiclass tasks. It is defined as

\kappa = (p_o - p_e) / (1 - p_e)

where p_o is the empirical probability of agreement and p_e is the expected agreement when both annotators assign labels randomly. Note that p_e is estimated using a per-annotator empirical prior over the class labels.

Accepts the following input tensors:

  • preds: (N, ...) (int tensor) or (N, C, ..) (float tensor). If preds is a floating point we apply torch.argmax along the C dimension to automatically convert probabilities/logits into an int tensor.

  • target (int tensor): (N, ...)

Additional dimension ... will be flattened into the batch dimension.

Parameters
  • preds (Tensor) – Tensor with predictions

  • target (Tensor) – Tensor with true labels

  • num_classes (int) – Integer specifing the number of classes

  • weights (Optional[Literal[‘linear’, ‘quadratic’, ‘none’]]) –

    Weighting type to calculate the score. Choose from:

    • None or 'none': no weighting

    • 'linear': linear weighting

    • 'quadratic': quadratic weighting

  • ignore_index (Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculation

  • validate_args (bool) – bool indicating if input arguments and tensors should be validated for correctness. Set to False for faster computations.

  • kwargs – Additional keyword arguments, see Advanced metric settings for more info.

Example (pred is integer tensor):
>>> from torchmetrics.functional.classification import multiclass_cohen_kappa
>>> target = torch.tensor([2, 1, 0, 0])
>>> preds = torch.tensor([2, 1, 0, 1])
>>> multiclass_cohen_kappa(preds, target, num_classes=3)
tensor(0.6364)
Example (pred is float tensor):
>>> from torchmetrics.functional.classification import multiclass_cohen_kappa
>>> target = torch.tensor([2, 1, 0, 0])
>>> preds = torch.tensor([
...   [0.16, 0.26, 0.58],
...   [0.22, 0.61, 0.17],
...   [0.71, 0.09, 0.20],
...   [0.05, 0.82, 0.13],
... ])
>>> multiclass_cohen_kappa(preds, target, num_classes=3)
tensor(0.6364)
Return type

Tensor