ZarrStore
ZarrStore is a DataModel that
points at a Zarr store root and provides methods for
inspecting and reading its arrays.
Unlike File, which represents a single byte stream, a Zarr store is
a tree of objects (metadata and chunks). ZarrStore rows are created when a
DataChain is initialized from Zarr stores,
which collapses every object under a store root into a single row:
import datachain as dc
chain = dc.read_zarr("s3://bucket-name/data/")
for (store,) in chain.limit(1).to_iter("zarr"):
print(store.get_info())
There are additional models for working with Zarr stores:
ZarrInfo- summary metadata for a store (format, array paths, attributes).ZarrArray- a single array within a store; exposesshape,chunks,dtype, andattrs, and reads data viaread()orselect().ZarrSelection- a lazy, bounded region inside an array (e.g. one image frame) that can travel through a chain as a column and is materialized on demand viaread()or rendered to image bytes viaread_bytes().
For a complete example of Zarr processing with DataChain, see Embedding Zarr image frames - a pipeline that reads RGB camera frames from a directory of Zarr stores and encodes them with OpenCLIP.
ZarrStore
Bases: DataModel
A Zarr store root.
Unlike :class:~datachain.lib.file.File, a store is a tree of objects
(metadata and chunks) rather than a single byte stream, so it is modeled as
a plain DataModel. The nested file points at the store root prefix
and carries the storage credentials/catalog needed to read the store.
get_array
Return a single array by its path within the store.
Source code in datachain/lib/zarr.py
get_arrays
ZarrArray
Bases: DataModel
A single array within a :class:ZarrStore.
read
Read array data, optionally restricted to a NumPy-style selection.
Source code in datachain/lib/zarr.py
select
select(
index: int | list[int],
media: Literal["image", "audio", "video"] | None = None,
) -> ZarrSelection
Return a lazy :class:ZarrSelection pointing at an item in this array.
index addresses the leading axes (e.g. i or [i] for one
frame of an (N, H, W, C) array). The region is read on demand via
:meth:ZarrSelection.read, so the item can travel through a DataChain
as a column without materializing its bytes.
Source code in datachain/lib/zarr.py
ZarrSelection
Bases: DataModel
A lazy, bounded region inside a :class:ZarrArray.
Points at a single item (or block) inside an array without reading it,
analogous to how :class:~datachain.lib.file.File points at a byte stream.
index addresses the leading axes; :meth:read materializes the region.
read
read() -> Any
read_bytes
Render the selected region to encoded media bytes.
Only media="image" is supported for now: the region is read and
encoded with Pillow (e.g. PNG), so callers such as Studio can stream a
preview without materializing the image into the row.