Sacre BLEU Score¶
Module Interface¶
- class torchmetrics.SacreBLEUScore(n_gram=4, smooth=False, tokenize='13a', lowercase=False, compute_on_step=None, **kwargs)[source]
Calculate BLEU score [1] of machine translated text with one or more references. This implementation follows the behaviour of SacreBLEU [2] implementation from https://github.com/mjpost/sacrebleu.
The SacreBLEU implementation differs from the NLTK BLEU implementation in tokenization techniques.
- Parameters
tokenize¶ (
Literal
[‘none’, ‘13a’, ‘zh’, ‘intl’, ‘char’]) – Tokenization technique to be used. Supported tokenization:['none', '13a', 'zh', 'intl', 'char']
lowercase¶ (
bool
) – IfTrue
, BLEU score over lowercased text is calculated.compute_on_step¶ (
Optional
[bool
]) –Forward only calls
update()
and returns None if this is set to False.Deprecated since version v0.8: Argument has no use anymore and will be removed v0.9.
Additional keyword arguments, see Advanced metric settings for more info.
- Raises:
- ValueError:
If
tokenize
not one of ‘none’, ‘13a’, ‘zh’, ‘intl’ or ‘char’- ValueError:
If
tokenize
is set to ‘intl’ and regex is not installed
Example
>>> from torchmetrics import SacreBLEUScore >>> preds = ['the cat is on the mat'] >>> target = [['there is a cat on the mat', 'a cat is on the mat']] >>> metric = SacreBLEUScore() >>> metric(preds, target) tensor(0.7598)
References
[1] BLEU: a Method for Automatic Evaluation of Machine Translation by Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu BLEU
[2] A Call for Clarity in Reporting BLEU Scores by Matt Post.
[3] Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics by Chin-Yew Lin and Franz Josef Och Machine Translation Evolution
Initializes internal Module state, shared by both nn.Module and ScriptModule.
Functional Interface¶
- torchmetrics.functional.sacre_bleu_score(preds, target, n_gram=4, smooth=False, tokenize='13a', lowercase=False)[source]
Calculate BLEU score [1] of machine translated text with one or more references. This implementation follows the behaviour of SacreBLEU [2] implementation from https://github.com/mjpost/sacrebleu.
- Parameters
preds¶ (
Sequence
[str
]) – An iterable of machine translated corpustarget¶ (
Sequence
[Sequence
[str
]]) – An iterable of iterables of reference corpustokenize¶ (
Literal
[‘none’, ‘13a’, ‘zh’, ‘intl’, ‘char’]) – Tokenization technique to be used. Supported tokenization: [‘none’, ‘13a’, ‘zh’, ‘intl’, ‘char’]lowercase¶ (
bool
) – IfTrue
, BLEU score over lowercased text is calculated.
- Return type
- Returns
Tensor with BLEU Score
Example
>>> from torchmetrics.functional import sacre_bleu_score >>> preds = ['the cat is on the mat'] >>> target = [['there is a cat on the mat', 'a cat is on the mat']] >>> sacre_bleu_score(preds, target) tensor(0.7598)
References
[1] BLEU: a Method for Automatic Evaluation of Machine Translation by Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu BLEU
[2] A Call for Clarity in Reporting BLEU Scores by Matt Post.
[3] Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics by Chin-Yew Lin and Franz Josef Och Machine Translation Evolution