Shortcuts

BERT Score

Module Interface

class torchmetrics.text.bert.BERTScore(model_name_or_path=None, num_layers=None, all_layers=False, model=None, user_tokenizer=None, user_forward_fn=None, verbose=False, idf=False, device=None, max_length=512, batch_size=64, num_threads=4, return_hash=False, lang='en', rescale_with_baseline=False, baseline_path=None, baseline_url=None, compute_on_step=None, **kwargs)[source]

Bert_score Evaluating Text Generation leverages the pre-trained contextual embeddings from BERT and matches words in candidate and reference sentences by cosine similarity. It has been shown to correlate with human judgment on sentence-level and system-level evaluation. Moreover, BERTScore computes precision, recall, and F1 measure, which can be useful for evaluating different language generation tasks.

This implemenation follows the original implementation from BERT_score.

Parameters
  • preds – An iterable of predicted sentences.

  • target – An iterable of target sentences.

  • model_type – A name or a model path used to load transformers pretrained model.

  • num_layers (Optional[int]) – A layer of representation to use.

  • all_layers (bool) – An indication of whether the representation from all model’s layers should be used. If all_layers = True, the argument num_layers is ignored.

  • model (Optional[Module]) – A user’s own model. Must be of torch.nn.Module instance.

  • user_tokenizer (Optional[Any]) – A user’s own tokenizer used with the own model. This must be an instance with the __call__ method. This method must take an iterable of sentences (List[str]) and must return a python dictionary containing “input_ids” and “attention_mask” represented by torch.Tensor. It is up to the user’s model of whether “input_ids” is a torch.Tensor of input ids or embedding vectors. This tokenizer must prepend an equivalent of [CLS] token and append an equivalent of [SEP] token as transformers tokenizer does.

  • user_forward_fn (Optional[Callable[[Module, Dict[str, Tensor]], Tensor]]) – A user’s own forward function used in a combination with user_model. This function must take user_model and a python dictionary of containing “input_ids” and “attention_mask” represented by torch.Tensor as an input and return the model’s output represented by the single torch.Tensor.

  • verbose (bool) – An indication of whether a progress bar to be displayed during the embeddings calculation.

  • idf (bool) – An indication whether normalization using inverse document frequencies should be used.

  • device (Union[str, device, None]) – A device to be used for calculation.

  • max_length (int) – A maximum length of input sequences. Sequences longer than max_length are to be trimmed.

  • batch_size (int) – A batch size used for model processing.

  • num_threads (int) – A number of threads to use for a dataloader.

  • return_hash (bool) – An indication of whether the correspodning hash_code should be returned.

  • lang (str) – A language of input sentences.

  • rescale_with_baseline (bool) – An indication of whether bertscore should be rescaled with a pre-computed baseline. When a pretrained model from transformers model is used, the corresponding baseline is downloaded from the original bert-score package from BERT_score if available. In other cases, please specify a path to the baseline csv/tsv file, which must follow the formatting of the files from BERT_score.

  • baseline_path (Optional[str]) – A path to the user’s own local csv/tsv file with the baseline scale.

  • baseline_url (Optional[str]) – A url path to the user’s own csv/tsv file with the baseline scale.

  • compute_on_step (Optional[bool]) –

    Forward only calls update() and returns None if this is set to False.

    Deprecated since version v0.8: Argument has no use anymore and will be removed v0.9.

  • kwargs (Dict[str, Any]) – Additional keyword arguments, see Advanced metric settings for more info.

Returns

Python dictionary containing the keys precision, recall and f1 with corresponding values.

Example

>>> from torchmetrics.text.bert import BERTScore
>>> preds = ["hello there", "general kenobi"]
>>> target = ["hello there", "master kenobi"]
>>> bertscore = BERTScore()
>>> score = bertscore(preds, target)
>>> from pprint import pprint
>>> rounded_score = {k: [round(v, 3) for v in vv] for k, vv in score.items()}
>>> pprint(rounded_score)
{'f1': [1.0, 0.996], 'precision': [1.0, 0.996], 'recall': [1.0, 0.996]}

Initializes internal Module state, shared by both nn.Module and ScriptModule.

compute()[source]

Calculate BERT scores.

Return type

Dict[str, Union[List[float], str]]

Returns

Python dictionary containing the keys precision, recall and f1 with corresponding values.

update(preds, target)[source]

Store predictions/references for computing BERT scores. It is necessary to store sentences in a tokenized form to ensure the DDP mode working.

Parameters
  • preds (List[str]) – An iterable of predicted sentences.

  • target (List[str]) – An iterable of reference sentences.

Return type

None

Functional Interface

torchmetrics.functional.text.bert.bert_score(preds, target, model_name_or_path=None, num_layers=None, all_layers=False, model=None, user_tokenizer=None, user_forward_fn=None, verbose=False, idf=False, device=None, max_length=512, batch_size=64, num_threads=4, return_hash=False, lang='en', rescale_with_baseline=False, baseline_path=None, baseline_url=None)[source]

Bert_score Evaluating Text Generation leverages the pre-trained contextual embeddings from BERT and matches words in candidate and reference sentences by cosine similarity.

It has been shown to correlate with human judgment on sentence-level and system-level evaluation. Moreover, BERTScore computes precision, recall, and F1 measure, which can be useful for evaluating different language generation tasks.

This implemenation follows the original implementation from BERT_score.

Parameters
  • preds (Union[List[str], Dict[str, Tensor]]) – Either an iterable of predicted sentences or a Dict[input_ids, attention_mask].

  • target (Union[List[str], Dict[str, Tensor]]) – Either an iterable of target sentences or a Dict[input_ids, attention_mask].

  • model_name_or_path (Optional[str]) – A name or a model path used to load transformers pretrained model.

  • num_layers (Optional[int]) – A layer of representation to use.

  • all_layers (bool) – An indication of whether the representation from all model’s layers should be used. If all_layers = True, the argument num_layers is ignored.

  • model (Optional[Module]) – A user’s own model.

  • user_tokenizer (Optional[Any]) – A user’s own tokenizer used with the own model. This must be an instance with the __call__ method. This method must take an iterable of sentences (List[str]) and must return a python dictionary containing "input_ids" and "attention_mask" represented by torch.Tensor. It is up to the user’s model of whether "input_ids" is a torch.Tensor of input ids or embedding vectors. his tokenizer must prepend an equivalent of [CLS] token and append an equivalent of [SEP] token as transformers tokenizer does.

  • user_forward_fn (Optional[Callable[[Module, Dict[str, Tensor]], Tensor]]) – A user’s own forward function used in a combination with user_model. This function must take user_model and a python dictionary of containing "input_ids" and "attention_mask" represented by torch.Tensor as an input and return the model’s output represented by the single torch.Tensor.

  • verbose (bool) – An indication of whether a progress bar to be displayed during the embeddings’ calculation.

  • idf (bool) – An indication of whether normalization using inverse document frequencies should be used.

  • device (Union[str, device, None]) – A device to be used for calculation.

  • max_length (int) – A maximum length of input sequences. Sequences longer than max_length are to be trimmed.

  • batch_size (int) – A batch size used for model processing.

  • num_threads (int) – A number of threads to use for a dataloader.

  • return_hash (bool) – An indication of whether the correspodning hash_code should be returned.

  • lang (str) – A language of input sentences. It is used when the scores are rescaled with a baseline.

  • rescale_with_baseline (bool) – An indication of whether bertscore should be rescaled with a pre-computed baseline. When a pretrained model from transformers model is used, the corresponding baseline is downloaded from the original bert-score package from BERT_score if available. In other cases, please specify a path to the baseline csv/tsv file, which must follow the formatting of the files from BERT_score

  • baseline_path (Optional[str]) – A path to the user’s own local csv/tsv file with the baseline scale.

  • baseline_url (Optional[str]) – A url path to the user’s own csv/tsv file with the baseline scale.

Return type

Dict[str, Union[List[float], str]]

Returns

Python dictionary containing the keys precision, recall and f1 with corresponding values.

Raises

Example

>>> from torchmetrics.functional.text.bert import bert_score
>>> preds = ["hello there", "general kenobi"]
>>> target = ["hello there", "master kenobi"]
>>> score = bert_score(preds, target)
>>> from pprint import pprint
>>> rounded_score = {k: [round(v, 3) for v in vv] for k, vv in score.items()}
>>> pprint(rounded_score)
{'f1': [1.0, 0.996], 'precision': [1.0, 0.996], 'recall': [1.0, 0.996]}
Read the Docs v: v0.8.1
Versions
latest
stable
v0.8.1
v0.8.0
v0.7.3
v0.7.2
v0.7.1
v0.7.0
v0.6.2
v0.6.1
v0.6.0
v0.5.1
v0.5.0
v0.4.1
v0.4.0
v0.3.2
v0.3.1
v0.3.0
v0.2.0
v0.1.0
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.