BERT Score¶
Module Interface¶
- class torchmetrics.text.bert.BERTScore(model_name_or_path=None, num_layers=None, all_layers=False, model=None, user_tokenizer=None, user_forward_fn=None, verbose=False, idf=False, device=None, max_length=512, batch_size=64, num_threads=4, return_hash=False, lang='en', rescale_with_baseline=False, baseline_path=None, baseline_url=None, **kwargs)[source]
Bert_score Evaluating Text Generation leverages the pre-trained contextual embeddings from BERT and matches words in candidate and reference sentences by cosine similarity. It has been shown to correlate with human judgment on sentence-level and system-level evaluation. Moreover, BERTScore computes precision, recall, and F1 measure, which can be useful for evaluating different language generation tasks.
This implemenation follows the original implementation from BERT_score.
- Parameters
preds¶ – An iterable of predicted sentences.
target¶ – An iterable of target sentences.
model_type¶ – A name or a model path used to load transformers pretrained model.
num_layers¶ (
Optional
[int
]) – A layer of representation to use.all_layers¶ (
bool
) – An indication of whether the representation from all model’s layers should be used. If all_layers = True, the argument num_layers is ignored.model¶ (
Optional
[Module
]) – A user’s own model. Must be of torch.nn.Module instance.user_tokenizer¶ (
Optional
[Any
]) – A user’s own tokenizer used with the own model. This must be an instance with the __call__ method. This method must take an iterable of sentences (List[str]) and must return a python dictionary containing “input_ids” and “attention_mask” represented by torch.Tensor. It is up to the user’s model of whether “input_ids” is a torch.Tensor of input ids or embedding vectors. This tokenizer must prepend an equivalent of [CLS] token and append an equivalent of [SEP] token as transformers tokenizer does.user_forward_fn¶ (
Optional
[Callable
[[Module
,Dict
[str
,Tensor
]],Tensor
]]) – A user’s own forward function used in a combination with user_model. This function must take user_model and a python dictionary of containing “input_ids” and “attention_mask” represented by torch.Tensor as an input and return the model’s output represented by the single torch.Tensor.verbose¶ (
bool
) – An indication of whether a progress bar to be displayed during the embeddings’ calculation.idf¶ (
bool
) – An indication whether normalization using inverse document frequencies should be used.device¶ (
Union
[str
,device
,None
]) – A device to be used for calculation.max_length¶ (
int
) – A maximum length of input sequences. Sequences longer than max_length are to be trimmed.num_threads¶ (
int
) – A number of threads to use for a dataloader.return_hash¶ (
bool
) – An indication of whether the correspodning hash_code should be returned.rescale_with_baseline¶ (
bool
) – An indication of whether bertscore should be rescaled with a pre-computed baseline. When a pretrained model from transformers model is used, the corresponding baseline is downloaded from the original bert-score package from BERT_score if available. In other cases, please specify a path to the baseline csv/tsv file, which must follow the formatting of the files from BERT_score.baseline_path¶ (
Optional
[str
]) – A path to the user’s own local csv/tsv file with the baseline scale.baseline_url¶ (
Optional
[str
]) – A url path to the user’s own csv/tsv file with the baseline scale.kwargs¶ (
Any
) – Additional keyword arguments, see Advanced metric settings for more info.
- Returns
Python dictionary containing the keys precision, recall and f1 with corresponding values.
Example
>>> from torchmetrics.text.bert import BERTScore >>> preds = ["hello there", "general kenobi"] >>> target = ["hello there", "master kenobi"] >>> bertscore = BERTScore() >>> score = bertscore(preds, target) >>> from pprint import pprint >>> rounded_score = {k: [round(v, 3) for v in vv] for k, vv in score.items()} >>> pprint(rounded_score) {'f1': [1.0, 0.996], 'precision': [1.0, 0.996], 'recall': [1.0, 0.996]}
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- compute()[source]
Calculate BERT scores.
- update(preds, target)[source]
Store predictions/references for computing BERT scores. It is necessary to store sentences in a tokenized form to ensure the DDP mode working.
Functional Interface¶
- torchmetrics.functional.text.bert.bert_score(preds, target, model_name_or_path=None, num_layers=None, all_layers=False, model=None, user_tokenizer=None, user_forward_fn=None, verbose=False, idf=False, device=None, max_length=512, batch_size=64, num_threads=4, return_hash=False, lang='en', rescale_with_baseline=False, baseline_path=None, baseline_url=None)[source]¶
Bert_score Evaluating Text Generation leverages the pre-trained contextual embeddings from BERT and matches words in candidate and reference sentences by cosine similarity.
It has been shown to correlate with human judgment on sentence-level and system-level evaluation. Moreover, BERTScore computes precision, recall, and F1 measure, which can be useful for evaluating different language generation tasks.
This implemenation follows the original implementation from BERT_score.
- Parameters
preds¶ (
Union
[List
[str
],Dict
[str
,Tensor
]]) – Either an iterable of predicted sentences or aDict[input_ids, attention_mask]
.target¶ (
Union
[List
[str
],Dict
[str
,Tensor
]]) – Either an iterable of target sentences or aDict[input_ids, attention_mask]
.model_name_or_path¶ (
Optional
[str
]) – A name or a model path used to loadtransformers
pretrained model.num_layers¶ (
Optional
[int
]) – A layer of representation to use.all_layers¶ (
bool
) – An indication of whether the representation from all model’s layers should be used. Ifall_layers = True
, the argumentnum_layers
is ignored.user_tokenizer¶ (
Optional
[Any
]) – A user’s own tokenizer used with the own model. This must be an instance with the__call__
method. This method must take an iterable of sentences (List[str]
) and must return a python dictionary containing"input_ids"
and"attention_mask"
represented bytorch.Tensor
. It is up to the user’s model of whether"input_ids"
is atorch.Tensor
of input ids or embedding vectors. his tokenizer must prepend an equivalent of[CLS]
token and append an equivalent of[SEP]
token as transformers tokenizer does.user_forward_fn¶ (
Optional
[Callable
[[Module
,Dict
[str
,Tensor
]],Tensor
]]) – A user’s own forward function used in a combination withuser_model
. This function must takeuser_model
and a python dictionary of containing"input_ids"
and"attention_mask"
represented bytorch.Tensor
as an input and return the model’s output represented by the singletorch.Tensor
.verbose¶ (
bool
) – An indication of whether a progress bar to be displayed during the embeddings’ calculation.idf¶ (
bool
) – An indication of whether normalization using inverse document frequencies should be used.device¶ (
Union
[str
,device
,None
]) – A device to be used for calculation.max_length¶ (
int
) – A maximum length of input sequences. Sequences longer thanmax_length
are to be trimmed.num_threads¶ (
int
) – A number of threads to use for a dataloader.return_hash¶ (
bool
) – An indication of whether the correspodninghash_code
should be returned.lang¶ (
str
) – A language of input sentences. It is used when the scores are rescaled with a baseline.rescale_with_baseline¶ (
bool
) – An indication of whether bertscore should be rescaled with a pre-computed baseline. When a pretrained model fromtransformers
model is used, the corresponding baseline is downloaded from the originalbert-score
package from BERT_score if available. In other cases, please specify a path to the baseline csv/tsv file, which must follow the formatting of the files from BERT_scorebaseline_path¶ (
Optional
[str
]) – A path to the user’s own local csv/tsv file with the baseline scale.baseline_url¶ (
Optional
[str
]) – A url path to the user’s own csv/tsv file with the baseline scale.
- Return type
- Returns
Python dictionary containing the keys
precision
,recall
andf1
with corresponding values.- Raises
ValueError – If
len(preds) != len(target)
.ModuleNotFoundError – If tqdm package is required and not installed.
ModuleNotFoundError – If
transformers
package is required and not installed.ValueError – If
num_layer
is larger than the number of the model layers.ValueError – If invalid input is provided.
Example
>>> from torchmetrics.functional.text.bert import bert_score >>> preds = ["hello there", "general kenobi"] >>> target = ["hello there", "master kenobi"] >>> score = bert_score(preds, target) >>> from pprint import pprint >>> rounded_score = {k: [round(v, 3) for v in vv] for k, vv in score.items()} >>> pprint(rounded_score) {'f1': [1.0, 0.996], 'precision': [1.0, 0.996], 'recall': [1.0, 0.996]}