Fleiss Kappa

Module Interface

class torchmetrics.nominal.FleissKappa(mode='counts', **kwargs)[source]

Calculatees Fleiss kappa a statistical measure for inter agreement between raters.

\[\kappa = \frac{\bar{p} - \bar{p_e}}{1 - \bar{p_e}}\]

where \(\bar{p}\) is the mean of the agreement probability over all raters and \(\bar{p_e}\) is the mean agreement probability over all raters if they were randomly assigned. If the raters are in complete agreement then the score 1 is returned, if there is no agreement among the raters (other than what would be expected by chance) then a score smaller than 0 is returned.

As input to forward and update the metric accepts the following input:

  • ratings (Tensor): Ratings of shape [n_samples, n_categories] or [n_samples, n_categories, n_raters] depedenent on mode. If mode is counts, ratings must be integer and contain the number of raters that chose each category. If mode is probs, ratings must be floating point and contain the probability/logits that each rater chose each category.

As output of forward and compute the metric returns the following output:

  • fleiss_k (Tensor): A float scalar tensor with the calculated Fleiss’ kappa score.

Parameters:
  • mode (Literal['counts', 'probs']) – Whether ratings will be provided as counts or probabilities.

  • kwargs (Any) – Additional keyword arguments, see Advanced metric settings for more info.

Example

>>> # Ratings are provided as counts
>>> import torch
>>> from torchmetrics.nominal import FleissKappa
>>> _ = torch.manual_seed(42)
>>> ratings = torch.randint(0, 10, size=(100, 5)).long()  # 100 samples, 5 categories, 10 raters
>>> metric = FleissKappa(mode='counts')
>>> metric(ratings)
tensor(0.0089)

Example

>>> # Ratings are provided as probabilities
>>> import torch
>>> from torchmetrics.nominal import FleissKappa
>>> _ = torch.manual_seed(42)
>>> ratings = torch.randn(100, 5, 10).softmax(dim=1)  # 100 samples, 5 categories, 10 raters
>>> metric = FleissKappa(mode='probs')
>>> metric(ratings)
tensor(-0.0105)
plot(val=None, ax=None)[source]

Plot a single or multiple values from the metric.

Parameters:
  • val (Union[Tensor, Sequence[Tensor], None]) – Either a single result from calling metric.forward or metric.compute or a list of these results. If no value is provided, will automatically call metric.compute and plot that result.

  • ax (Optional[Axes]) – An matplotlib axis object. If provided will add plot to that axis

Return type:

Tuple[Figure, Union[Axes, ndarray]]

Returns:

Figure and Axes object

Raises:

ModuleNotFoundError – If matplotlib is not installed

>>> # Example plotting a single value
>>> import torch
>>> from torchmetrics.nominal import FleissKappa
>>> metric = FleissKappa(mode="probs")
>>> metric.update(torch.randn(100, 5, 10).softmax(dim=1))
>>> fig_, ax_ = metric.plot()
../_images/fleiss_kappa-1.png
>>> # Example plotting multiple values
>>> import torch
>>> from torchmetrics.nominal import FleissKappa
>>> metric = FleissKappa(mode="probs")
>>> values = [ ]
>>> for _ in range(10):
...     values.append(metric(torch.randn(100, 5, 10).softmax(dim=1)))
>>> fig_, ax_ = metric.plot(values)
../_images/fleiss_kappa-2.png

Functional Interface

torchmetrics.functional.nominal.fleiss_kappa(ratings, mode='counts')[source]

Calculatees Fleiss kappa a statistical measure for inter agreement between raters.

\[\kappa = \frac{\bar{p} - \bar{p_e}}{1 - \bar{p_e}}\]

where \(\bar{p}\) is the mean of the agreement probability over all raters and \(\bar{p_e}\) is the mean agreement probability over all raters if they were randomly assigned. If the raters are in complete agreement then the score 1 is returned, if there is no agreement among the raters (other than what would be expected by chance) then a score smaller than 0 is returned.

Parameters:
  • ratings (Tensor) – Ratings of shape [n_samples, n_categories] or [n_samples, n_categories, n_raters] depedenent on mode. If mode is counts, ratings must be integer and contain the number of raters that chose each category. If mode is probs, ratings must be floating point and contain the probability/logits that each rater chose each category.

  • mode (Literal['counts', 'probs']) – Whether ratings will be provided as counts or probabilities.

Return type:

Tensor

Example

>>> # Ratings are provided as counts
>>> import torch
>>> from torchmetrics.functional.nominal import fleiss_kappa
>>> _ = torch.manual_seed(42)
>>> ratings = torch.randint(0, 10, size=(100, 5)).long()  # 100 samples, 5 categories, 10 raters
>>> fleiss_kappa(ratings)
tensor(0.0089)

Example

>>> # Ratings are provided as probabilities
>>> import torch
>>> from torchmetrics.functional.nominal import fleiss_kappa
>>> _ = torch.manual_seed(42)
>>> ratings = torch.randn(100, 5, 10).softmax(dim=1)  # 100 samples, 5 categories, 10 raters
>>> fleiss_kappa(ratings, mode='probs')
tensor(-0.0105)