Torch
Use pip install datachain[torch]
and then import from datachain.torch
to use the
PyTorch functionality.
DataChain.to_pytorch
converts a
chain into a PyTorch Dataset
for downstream tasks like model training or inference.
The classes and methods below help manipulate data from the chain for PyTorch.
clip_similarity_scores
clip_similarity_scores(
images: Union[None, Image, list[Image]],
text: Union[None, str, list[str]],
model: Any,
preprocess: Callable,
tokenizer: Callable,
prob: bool = False,
image_to_text: bool = True,
device: Optional[Union[str, device]] = None,
) -> list[list[float]]
Calculate CLIP similarity scores between one or more images and/or text.
Parameters:
-
images
–Images to use as inputs.
-
text
–Text to use as inputs.
-
model
–Model from clip or open_clip packages.
-
preprocess
–Image preprocessor to apply.
-
tokenizer
–Text tokenizer.
-
prob
–Compute softmax probabilities.
-
image_to_text
–Whether to compute for image-to-text or text-to-image. Ignored if only one of images or text provided.
-
device
–Device to use. Defaults is None - use model's device.
Example
Using https://github.com/openai/CLIP
>>> import clip
>>> model, preprocess = clip.load("ViT-B/32")
>>> similarity_scores(img, "cat", model, preprocess, clip.tokenize)
[[21.813]]
Using https://github.com/mlfoundations/open_clip
>>> import open_clip
>>> model, _, preprocess = open_clip.create_model_and_transforms(
... "ViT-B-32", pretrained="laion2b_s34b_b79k"
... )
>>> tokenizer = open_clip.get_tokenizer("ViT-B-32")
>>> similarity_scores(img, "cat", model, preprocess, tokenizer)
[[21.813]]
Using https://huggingface.co/docs/transformers/en/model_doc/clip
>>> from transformers import CLIPProcessor, CLIPModel
>>> model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
>>> processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
>>> scores = similarity_scores(
... img, "cat", model, processor.image_processor, processor.tokenizer
... )
[[21.813]]
Image -> list of text
List of images -> text
List of images -> list of text
>>> similarity_scores(
... [img1, img2], ["cat", "dog"], model, preprocess, tokenizer)
... )
[[21.813, 35.313], [83.123, 34.843]]
List of images -> list of images
List of text -> list of text
Text -> list of images
Show scores as softmax probabilities
Source code in datachain/lib/clip.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 |
|
convert_image
convert_image(
img: Image,
mode: str = "RGB",
size: Optional[tuple[int, int]] = None,
transform: Optional[Callable] = None,
encoder: Optional[Callable] = None,
device: Optional[Union[str, device]] = None,
) -> Union[Image, Tensor]
Resize, transform, and otherwise convert an image.
Parameters:
-
img
(Image
) –PIL.Image object.
-
mode
(str
, default:'RGB'
) –PIL.Image mode.
-
size
(tuple[int, int]
, default:None
) –Size in (width, height) pixels for resizing.
-
transform
(Callable
, default:None
) –Torchvision transform or huggingface processor to apply.
-
encoder
(Callable
, default:None
) –Encode image using model.
-
device
(str or device
, default:None
) –Device to use.
Source code in datachain/lib/image.py
convert_images
convert_images(
images: Union[Image, list[Image]],
mode: str = "RGB",
size: Optional[tuple[int, int]] = None,
transform: Optional[Callable] = None,
encoder: Optional[Callable] = None,
device: Optional[Union[str, device]] = None,
) -> Union[list[Image], Tensor]
Resize, transform, and otherwise convert one or more images.
Parameters:
-
images
((Image, list[Image])
) –PIL.Image object or list of objects.
-
mode
(str
, default:'RGB'
) –PIL.Image mode.
-
size
(tuple[int, int]
, default:None
) –Size in (width, height) pixels for resizing.
-
transform
(Callable
, default:None
) –Torchvision transform or huggingface processor to apply.
-
encoder
(Callable
, default:None
) –Encode image using model.
-
device
(str or device
, default:None
) –Device to use.
Source code in datachain/lib/image.py
convert_text
convert_text(
text: Union[str, list[str]],
tokenizer: Optional[Callable] = None,
tokenizer_kwargs: Optional[dict[str, Any]] = None,
encoder: Optional[Callable] = None,
device: Optional[Union[str, device]] = None,
) -> Union[str, list[str], Tensor]
Tokenize and otherwise transform text.
Parameters:
-
text
(str
) –Text to convert.
-
tokenizer
(Callable
, default:None
) –Tokenizer to use to tokenize objects.
-
tokenizer_kwargs
(dict
, default:None
) –Additional kwargs to pass when calling tokenizer.
-
encoder
(Callable
, default:None
) –Encode text using model.
-
device
(str or device
, default:None
) –Device to use.