Skip to content

DataChain as a data harness

Claude Code, Cursor, and Codex are harnesses for code: they give the LLM repo context, dedicated tools, and persistent memory across sessions. That is what makes agents good at code instead of guessing at it.

DataChain is the same shape for data. The agent's repo is replaced by your storage and databases; the harness's notes file is replaced by the Knowledge Base; the operational memory of git history is replaced by Data Memory. Both halves of the harness, code-side and data-side, feed the same agent.

DataChain mirrors the code harness, for data

Mapping

Code harness Data harness
Repo files Files in object storage, tables in databases
~/.claude/ notes, CLAUDE.md Knowledge Base (dc-knowledge/)
git history, type signatures Data Memory: typed, versioned datasets with lineage
Tools (Read, Edit, Bash) DataChain operations (read_storage, map, save)
Test results in the working directory Saved datasets in .datachain/db