BLEU Score¶

Module Interface¶

class torchmetrics.BLEUScore(n_gram=4, smooth=False, weights=None, **kwargs)[source]

Calculate BLEU score of machine translated text with one or more references.

As input to forward and update the metric accepts the following input:

preds (Sequence): An iterable of machine translated corpus
target (Sequence): An iterable of iterables of reference corpus

As output of forward and update the metric returns the following output:

bleu (Tensor): A tensor with the BLEU Score

Parameters

n_gram¶ (int) – Gram value ranged from 1 to 4
smooth¶ (bool) – Whether or not to apply smoothing, see Machine Translation Evolution
kwargs¶ (Any) – Additional keyword arguments, see Advanced metric settings for more info.
weights¶ (Optional[Sequence[float]]) – Weights used for unigrams, bigrams, etc. to calculate BLEU score. If not provided, uniform weights are used.

Raises

ValueError – If a length of a list of weights is not None and not equal to n_gram.

Example

>>> from torchmetrics import BLEUScore
>>> preds = ['the cat is on the mat']
>>> target = [['there is a cat on the mat', 'a cat is on the mat']]
>>> bleu = BLEUScore()
>>> bleu(preds, target)
tensor(0.7598)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Functional Interface¶

torchmetrics.functional.bleu_score(preds, target, n_gram=4, smooth=False, weights=None)[source]

Calculate BLEU score of machine translated text with one or more references.

Parameters

preds¶ (Union[str, Sequence[str]]) – An iterable of machine translated corpus
target¶ (Sequence[Union[str, Sequence[str]]]) – An iterable of iterables of reference corpus
n_gram¶ (int) – Gram value ranged from 1 to 4
smooth¶ (bool) – Whether to apply smoothing – see [2]
weights¶ (Optional[Sequence[float]]) – Weights used for unigrams, bigrams, etc. to calculate BLEU score. If not provided, uniform weights are used.

Return type

Tensor

Returns

Tensor with BLEU Score

Raises

ValueError – If preds and target corpus have different lengths.
ValueError – If a length of a list of weights is not None and not equal to n_gram.

Example

>>> from torchmetrics.functional import bleu_score
>>> preds = ['the cat is on the mat']
>>> target = [['there is a cat on the mat', 'a cat is on the mat']]
>>> bleu_score(preds, target)
tensor(0.7598)

References

[1] BLEU: a Method for Automatic Evaluation of Machine Translation by Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu BLEU

[2] Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics by Chin-Yew Lin and Franz Josef Och Machine Translation Evolution