Shortcuts

CLIP Score

Module Interface

class torchmetrics.multimodal.clip_score.CLIPScore(model_name_or_path='openai/clip-vit-large-patch14', **kwargs)[source]

CLIP Score is a reference free metric that can be used to evaluate the correlation between a generated caption for an image and the actual content of the image. It has been found to be highly correlated with human judgement. The metric is defined as:

\text{CLIPScore(I, C)} = max(100 * cos(E_I, E_C), 0)

which corresponds to the cosine similarity between visual CLIP embedding E_i for an image i and textual CLIP embedding E_C for an caption C. The score is bound between 0 and 100 and the closer to 100 the better.

Note

Metric is not scriptable

Parameters
  • model_name_or_path (Literal[‘openai/clip-vit-base-patch16’, ‘openai/clip-vit-base-patch32’, ‘openai/clip-vit-large-patch14-336’, ‘openai/clip-vit-large-patch14’]) – string indicating the version of the CLIP model to use. Available models are “openai/clip-vit-base-patch16”, “openai/clip-vit-base-patch32”, “openai/clip-vit-large-patch14-336” and “openai/clip-vit-large-patch14”,

  • kwargs (Any) – Additional keyword arguments, see Advanced metric settings for more info.

Raises

ModuleNotFoundError – If transformers package is not installed or version is lower than 4.10.0

Example

>>> import torch
>>> _ = torch.manual_seed(42)
>>> from torchmetrics.multimodal import CLIPScore
>>> metric = CLIPScore(model_name_or_path="openai/clip-vit-base-patch16")
>>> score = metric(torch.randint(255, (3, 224, 224)), "a photo of a cat")
>>> print(score.detach())
tensor(25.0936)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

compute()[source]

Compute accumulated clip score.

Return type

Tensor

update(images, text)[source]

Update CLIP score on a batch of images and text.

Parameters
Raises
  • ValueError – If not all images have format [C, H, W]

  • ValueError – If the number of images and captions do not match

Return type

None

Functional Interface

torchmetrics.functional.multimodal.clip_score.clip_score(images, text, model_name_or_path='openai/clip-vit-large-patch14')[source]

CLIP Score is a reference free metric that can be used to evaluate the correlation between a generated caption for an image and the actual content of the image. It has been found to be highly correlated with human judgement. The metric is defined as:

\text{CLIPScore(I, C)} = max(100 * cos(E_I, E_C), 0)

which corresponds to the cosine similarity between visual CLIP embedding E_i for an image i and textual CLIP embedding E_C for an caption C. The score is bound between 0 and 100 and the closer to 100 the better.

Note

Metric is not scriptable

Parameters
  • images (Union[Tensor, List[Tensor]]) – Either a single [N, C, H, W] tensor or a list of [C, H, W] tensors

  • text (Union[str, List[str]]) – Either a single caption or a list of captions

  • model_name_or_path (Literal[‘openai/clip-vit-base-patch16’, ‘openai/clip-vit-base-patch32’, ‘openai/clip-vit-large-patch14-336’, ‘openai/clip-vit-large-patch14’]) – string indicating the version of the CLIP model to use. Available models are “openai/clip-vit-base-patch16”, “openai/clip-vit-base-patch32”, “openai/clip-vit-large-patch14-336” and “openai/clip-vit-large-patch14”,

Raises
  • ModuleNotFoundError – If transformers package is not installed or version is lower than 4.10.0

  • ValueError – If not all images have format [C, H, W]

  • ValueError – If the number of images and captions do not match

Example

>>> import torch
>>> _ = torch.manual_seed(42)
>>> from torchmetrics.functional.multimodal import clip_score
>>> score = clip_score(torch.randint(255, (3, 224, 224)), "a photo of a cat", "openai/clip-vit-base-patch16")
>>> print(score.detach())
tensor(24.4255)
Return type

Tensor

Read the Docs v: latest
Versions
latest
stable
v0.11.1
v0.11.0
v0.10.3
v0.10.2
v0.10.1
v0.10.0
v0.9.3
v0.9.2
v0.9.1
v0.9.0
v0.8.2
v0.8.1
v0.8.0
v0.7.3
v0.7.2
v0.7.1
v0.7.0
v0.6.2
v0.6.1
v0.6.0
v0.5.1
v0.5.0
v0.4.1
v0.4.0
v0.3.2
v0.3.1
v0.3.0
v0.2.0
v0.1.0
Downloads
pdf
html
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.