Torch¶
Use pip install datachain[torch]
and then import from datachain.torch
to use the
PyTorch functionality.
DataChain.to_pytorch
converts a
chain into a PyTorch Dataset
for downstream tasks like model training or inference.
The classes and methods below help manipulate data from the chain for PyTorch.
clip_similarity_scores
¶
clip_similarity_scores(
images: Union[None, Image, list[Image]],
text: Union[None, str, list[str]],
model: Any,
preprocess: Callable,
tokenizer: Callable,
prob: bool = False,
image_to_text: bool = True,
device: Optional[Union[str, device]] = None,
) -> list[list[float]]
Calculate CLIP similarity scores between one or more images and/or text.
Parameters:
-
images
–Images to use as inputs.
-
text
–Text to use as inputs.
-
model
–Model from clip or open_clip packages.
-
preprocess
–Image preprocessor to apply.
-
tokenizer
–Text tokenizer.
-
prob
–Compute softmax probabilities.
-
image_to_text
–Whether to compute for image-to-text or text-to-image. Ignored if only one of images or text provided.
-
device
–Device to use. Defaults is None - use model's device.
Example
Using https://github.com/openai/CLIP
>>> import clip
>>> model, preprocess = clip.load("ViT-B/32")
>>> similarity_scores(img, "cat", model, preprocess, clip.tokenize)
[[21.813]]
Using https://github.com/mlfoundations/open_clip
>>> import open_clip
>>> model, _, preprocess = open_clip.create_model_and_transforms(
... "ViT-B-32", pretrained="laion2b_s34b_b79k"
... )
>>> tokenizer = open_clip.get_tokenizer("ViT-B-32")
>>> similarity_scores(img, "cat", model, preprocess, tokenizer)
[[21.813]]
Using https://huggingface.co/docs/transformers/en/model_doc/clip
>>> from transformers import CLIPProcessor, CLIPModel
>>> model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
>>> processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
>>> scores = similarity_scores(
... img, "cat", model, processor.image_processor, processor.tokenizer
... )
[[21.813]]
Image -> list of text
List of images -> text
List of images -> list of text
>>> similarity_scores(
... [img1, img2], ["cat", "dog"], model, preprocess, tokenizer)
... )
[[21.813, 35.313], [83.123, 34.843]]
List of images -> list of images
List of text -> list of text
Text -> list of images
Show scores as softmax probabilities
Source code in datachain/lib/clip.py
|
|
convert_image
¶
convert_image(
img: Image,
mode: str = "RGB",
size: Optional[tuple[int, int]] = None,
transform: Optional[Callable] = None,
encoder: Optional[Callable] = None,
device: Optional[Union[str, device]] = None,
) -> Union[Image, Tensor]
Resize, transform, and otherwise convert an image.
Parameters:
-
img
(Image
) –PIL.Image object.
-
mode
(str
, default:'RGB'
) –PIL.Image mode.
-
size
(tuple[int, int]
, default:None
) –Size in (width, height) pixels for resizing.
-
transform
(Callable
, default:None
) –Torchvision transform or huggingface processor to apply.
-
encoder
(Callable
, default:None
) –Encode image using model.
-
device
(str or device
, default:None
) –Device to use.
Source code in datachain/lib/image.py
convert_images
¶
convert_images(
images: Union[Image, list[Image]],
mode: str = "RGB",
size: Optional[tuple[int, int]] = None,
transform: Optional[Callable] = None,
encoder: Optional[Callable] = None,
device: Optional[Union[str, device]] = None,
) -> Union[list[Image], Tensor]
Resize, transform, and otherwise convert one or more images.
Parameters:
-
images
((Image, list[Image])
) –PIL.Image object or list of objects.
-
mode
(str
, default:'RGB'
) –PIL.Image mode.
-
size
(tuple[int, int]
, default:None
) –Size in (width, height) pixels for resizing.
-
transform
(Callable
, default:None
) –Torchvision transform or huggingface processor to apply.
-
encoder
(Callable
, default:None
) –Encode image using model.
-
device
(str or device
, default:None
) –Device to use.
Source code in datachain/lib/image.py
convert_text
¶
convert_text(
text: Union[str, list[str]],
tokenizer: Optional[Callable] = None,
tokenizer_kwargs: Optional[dict[str, Any]] = None,
encoder: Optional[Callable] = None,
device: Optional[Union[str, device]] = None,
) -> Union[str, list[str], Tensor]
Tokenize and otherwise transform text.
Parameters:
-
text
(str
) –Text to convert.
-
tokenizer
(Callable
, default:None
) –Tokenizer to use to tokenize objects.
-
tokenizer_kwargs
(dict
, default:None
) –Additional kwargs to pass when calling tokenizer.
-
encoder
(Callable
, default:None
) –Encode text using model.
-
device
(str or device
, default:None
) –Device to use.