Sacre BLEU Score¶
Module Interface¶
- class torchmetrics.SacreBLEUScore(n_gram=4, smooth=False, tokenize='13a', lowercase=False, weights=None, **kwargs)[source]
Calculate BLEU score of machine translated text with one or more references. This implementation follows the behaviour of SacreBLEU.
The SacreBLEU implementation differs from the NLTK BLEU implementation in tokenization techniques.
As input to
forward
andupdate
the metric accepts the following input:preds
(Sequence
): An iterable of machine translated corpustarget
(Sequence
): An iterable of iterables of reference corpus
As output of
forward
andcompute
the metric returns the following output:sacre_bleu
(Tensor
): A tensor with the SacreBLEU Score
- Parameters
tokenize¶ (
Literal
[‘none’, ‘13a’, ‘zh’, ‘intl’, ‘char’]) – Tokenization technique to be used. Supported tokenization:['none', '13a', 'zh', 'intl', 'char']
lowercase¶ (
bool
) – IfTrue
, BLEU score over lowercased text is calculated.kwargs¶ (
Any
) – Additional keyword arguments, see Advanced metric settings for more info.weights¶ (
Optional
[Sequence
[float
]]) – Weights used for unigrams, bigrams, etc. to calculate BLEU score. If not provided, uniform weights are used.
- Raises
ValueError – If
tokenize
not one of ‘none’, ‘13a’, ‘zh’, ‘intl’ or ‘char’ValueError – If
tokenize
is set to ‘intl’ and regex is not installedValueError – If a length of a list of weights is not
None
and not equal ton_gram
.
Example
>>> from torchmetrics import SacreBLEUScore >>> preds = ['the cat is on the mat'] >>> target = [['there is a cat on the mat', 'a cat is on the mat']] >>> sacre_bleu = SacreBLEUScore() >>> sacre_bleu(preds, target) tensor(0.7598)
Additional References:
Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics by Chin-Yew Lin and Franz Josef Och Machine Translation Evolution
Initializes internal Module state, shared by both nn.Module and ScriptModule.
Functional Interface¶
- torchmetrics.functional.sacre_bleu_score(preds, target, n_gram=4, smooth=False, tokenize='13a', lowercase=False, weights=None)[source]
Calculate BLEU score [1] of machine translated text with one or more references. This implementation follows the behaviour of SacreBLEU [2] implementation from https://github.com/mjpost/sacrebleu.
- Parameters
preds¶ (
Sequence
[str
]) – An iterable of machine translated corpustarget¶ (
Sequence
[Sequence
[str
]]) – An iterable of iterables of reference corpustokenize¶ (
Literal
[‘none’, ‘13a’, ‘zh’, ‘intl’, ‘char’]) – Tokenization technique to be used. Supported tokenization: [‘none’, ‘13a’, ‘zh’, ‘intl’, ‘char’]lowercase¶ (
bool
) – IfTrue
, BLEU score over lowercased text is calculated.weights¶ (
Optional
[Sequence
[float
]]) – Weights used for unigrams, bigrams, etc. to calculate BLEU score. If not provided, uniform weights are used.
- Return type
- Returns
Tensor with BLEU Score
- Raises
ValueError – If
preds
andtarget
corpus have different lengths.ValueError – If a length of a list of weights is not
None
and not equal ton_gram
.
Example
>>> from torchmetrics.functional import sacre_bleu_score >>> preds = ['the cat is on the mat'] >>> target = [['there is a cat on the mat', 'a cat is on the mat']] >>> sacre_bleu_score(preds, target) tensor(0.7598)
References
[1] BLEU: a Method for Automatic Evaluation of Machine Translation by Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu BLEU
[2] A Call for Clarity in Reporting BLEU Scores by Matt Post.
[3] Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics by Chin-Yew Lin and Franz Josef Och Machine Translation Evolution