File
File is a special DataModel,
which is automatically generated when a DataChain is created from files,
such as in dc.read_storage:
import datachain as dc
chain = dc.read_storage("gs://datachain-demo/dogs-and-cats")
chain.print_schema()
Output:
file: File@v1
source: str
path: str
size: int
version: str
etag: str
is_latest: bool
last_modified: datetime
location: Union[dict, list[dict], NoneType]
File classes include various metadata fields describing the underlying file,
along with methods to read and manipulate file contents.
File
Bases: DataModel
DataModel for reading binary files.
Attributes:
-
source(str) βThe source of the file (e.g., 's3://bucket-name/').
-
path(str) βThe path to the file (e.g., 'path/to/file.txt').
-
size(int) βThe size of the file in bytes. Defaults to 0.
-
version(str) βThe version of the file. Defaults to an empty string.
-
etag(str) βThe ETag of the file. Defaults to an empty string.
-
is_latest(bool) βWhether the file is the latest version. Defaults to
True. -
last_modified(datetime) βThe last modified timestamp of the file. Defaults to Unix epoch (
1970-01-01T00:00:00). -
location(dict | list[dict]) βThe location of the file. Defaults to
None.
Source code in datachain/lib/file.py
as_audio_file
Convert the file to a AudioFile object.
Source code in datachain/lib/file.py
as_image_file
as_image_file() -> ImageFile
Convert the file to a ImageFile object.
Source code in datachain/lib/file.py
as_text_file
as_text_file() -> TextFile
Convert the file to a TextFile object.
Source code in datachain/lib/file.py
as_video_file
as_video_file() -> VideoFile
Convert the file to a VideoFile object.
Source code in datachain/lib/file.py
at
classmethod
Construct a File from a full URI in one call.
Example
file = File.at("s3://bucket/path/to/output.png") with file.open("wb") as f: ...
Source code in datachain/lib/file.py
export
export(
output: str | PathLike[str],
placement: ExportPlacement = "fullpath",
use_cache: bool = True,
link_type: Literal["copy", "symlink"] = "copy",
client_config: dict | None = None,
) -> None
Export file to new location.
Source code in datachain/lib/file.py
get_destination_path
Returns full destination path of a file for exporting to some output based on export placement
Source code in datachain/lib/file.py
get_file_ext
get_file_stem
get_file_suffix
get_fs
get_fs_path
get_fs_path() -> str
Returns file path with respect to the filescheme.
If normalize is True, the path is normalized to remove any redundant
separators and up-level references.
If the file scheme is "file", the path is converted to a local file path
using url2pathname. Otherwise, the original path with scheme is returned.
Source code in datachain/lib/file.py
get_full_name
[DEPRECATED] Use file.path directly instead.
Returns name with parent directories.
Source code in datachain/lib/file.py
get_local_path
get_local_path() -> str | None
Return path to a file in a local cache.
Returns None if file is not cached. Raises an exception if cache is not setup.
Source code in datachain/lib/file.py
get_uri
get_uri() -> str
open
open(
mode: str = "rb",
*,
client_config: dict[str, Any] | None = None,
**open_kwargs
) -> Iterator[Any]
Open the file and return a file-like object.
Supports both read ("rb", "r") and write modes (e.g. "wb", "w", "ab"). When opened in a write mode, metadata is refreshed after closing.
Source code in datachain/lib/file.py
read
read(length: int = -1)
read_bytes
read_bytes(length: int = -1)
read_text
Return file contents decoded as text.
**open_kwargs : Any
Extra keyword arguments forwarded to open(mode="r", ...)
(e.g. encoding="utf-8", errors="ignore")
Source code in datachain/lib/file.py
rebase
Rebase the file's URI from one base directory to another.
Parameters:
-
old_base(str) βBase directory to remove from the file's URI
-
new_base(str) βNew base directory to prepend
-
suffix(str, default:'') βOptional suffix to add before file extension
-
extension(str, default:'') βOptional new file extension (without dot)
Returns:
-
str(str) βRebased URI with new base directory
Raises:
-
ValueErrorβIf old_base is not found in the file's URI
Examples:
>>> file = File(source="s3://bucket", path="data/2025-05-27/file.wav")
>>> file.rebase("s3://bucket/data", "s3://output-bucket/processed", extension="mp3")
's3://output-bucket/processed/2025-05-27/file.mp3'
>>> file.rebase("data/audio", "/local/output", suffix="_ch1",
extension="npy")
'/local/output/file_ch1.npy'
Source code in datachain/lib/file.py
resolve
Resolve a File object by checking its existence and updating its metadata.
Returns:
-
File(Self) βThe resolved File object with updated metadata.
Source code in datachain/lib/file.py
save
Writes it's content to destination
Source code in datachain/lib/file.py
FileError
TarVFile
Bases: VFile
Virtual file model for files extracted from tar archives.
open
classmethod
Stream file from tar archive based on location in archive.