Skip to content

API Reference

DataChain's API is organized into several modules:

  • DataChain - Core chain operations and dataset management
  • Data Types - Supported data types and schema definitions
    • File - File handling and storage operations
    • TextFile - Text file
    • ImageFile - Image file
    • VideoFile - Video file
    • TarVFile - Virtual file model for files extracted from tar archives
    • ArrowRow - Working with Arrow-supported file
    • BBox - Bounding box data type
    • Pose - Pose data type
    • Segment - Segment data type
  • UDF - User-defined functions and transformations
  • Functions - Built-in functions for data manipulation and analysis
  • Torch - PyTorch data loading utilities
  • Toolkit - Functions for common DS/ML operations