Jobs

DataChain Studio allows you to run DataChain scripts directly in the cloud, processing data from connected storage. Write your code in the Studio editor and execute it with configurable compute resources.

Key Features

Create and Run - Write and execute DataChain scripts in Studio
Monitor Jobs - Track job progress, view logs, and analyze results

How Jobs Work

Jobs in DataChain Studio let you execute data processing workflows:

Direct Script Execution

Write DataChain code directly in the Studio editor
Execute scripts against connected storage (S3, GCS, Azure)
Results saved automatically

Configurable Compute

Select Python version for your environment
Configure number of workers for parallel processing
Set job priority for queue management
Specify custom requirements and environment variables

Job Lifecycle

1. Write Script

Write your DataChain code in the Studio editor using connected storage sources.

2. Configure Settings

Set Python version, workers, priority, and any required dependencies or environment variables.

3. Execute

Submit the job to run on Studio's compute infrastructure with your specified configuration.

4. Monitor

View real-time logs, progress, and results as your job executes.

5. Review Results

Access processed data through the Studio interface, with datasets saved automatically.

Job States

QUEUED: Job is waiting in the execution queue
INIT: Job environment is being initialized
RUNNING: Job is actively processing data
COMPLETE: Job finished successfully
FAILED: Job encountered an error
CANCELED: Job was stopped by user

Getting Started

Connect your storage sources (S3, GCS, Azure)
Write DataChain code in the Studio editor
Configure job settings (Python version, workers, priority)
Run your job and monitor execution
View results in the data table

Next Steps

Learn how to create and run jobs
Explore job monitoring capabilities
Set up webhooks for job notifications
Configure team collaboration for shared access