Monitor Jobs
Track your DataChain job execution in real-time with Studio's monitoring interface.
Job Status Bar
At the top of the Studio interface, you'll see the current job status:
Status Display
- Workers: Shows active/total workers (e.g., "2 / 10 workers busy")
- Tasks: Displays running tasks count (e.g., "2 tasks")
- Execution Time: Shows how long the job has been running
Job States
- 🟡 QUEUED: Waiting in the execution queue
- 🔵 INIT: Setting up environment and dependencies
- 🟢 RUNNING: Actively processing data
- ✅ COMPLETE: Successfully finished
- ❌ FAILED: Encountered an error
- ⚫ CANCELED: Stopped by user
Real-time Logs
Logs Tab
Click the "Logs" tab to view real-time execution output:
Running job 7897833d-080c-464f-978b-59316886099a in cluster 'default'
Using cached virtualenv
Listing gs://datachain-demo: 269981 objects [00:16, 16568.50 objects/s]
Log Information
- Job ID and Cluster: Shows which cluster is running your job
- Environment Status: Indicates if using cached virtualenv or installing fresh
- Timestamped Entries: Real-time progress updates
- Error Messages: Stack traces for debugging failures
- Data Statistics: Files processed and rows handled
- Performance Metrics: Execution timing information
Dependencies Tab
View data lineage and dataset dependencies:
Dataset Lineage
The Dependencies tab shows a visual graph of data flow:
- Output Dataset: Your saved dataset (e.g.,
@[email protected]) - Shows version number
- Displays creator and timestamp
-
Indicates verification status
-
Source Storage: Connected storage sources (e.g.,
gs://datachain-demo/) - Shows storage path
- Displays who added the storage
-
Links to original data source
-
Data Flow: Visual arrows showing how data flows from source to output
This helps you: - Understand data lineage and provenance - Track which storages were used - Verify dataset versions - Debug data pipeline issues
Diagnostics Tab
View detailed job execution timeline and diagnostics:
Job Summary
At the top, see the overall job status:
- Execution Time: Total duration (hours:minutes:seconds)
- Status Icon: Checkmark for success, X for failure
Execution Details
Key job information:
- Started: Start timestamp with timezone (e.g.,
2025-10-18 07:48:27 GMT+5:45) - Finished: Completion timestamp
- Compute Cluster: Which cluster ran the job (e.g.,
default) - Job ID: Unique identifier for the job (e.g.,
7897833d-080c-464f-978b-59316886099a)
Execution Timeline
Detailed breakdown of each execution phase:
✓ Waiting in queue 2s
✓ Starting a worker 15s
✓ Initializing job 3s
✓ Installing dependencies 0s
✓ Waking up data warehouse 29s
✓ Running query 2m 35s
Each phase shows: - Checkmark: Indicates successful completion - Phase Name: What the system was doing - Duration: Time spent in that phase
Understanding Phase Durations
- Waiting in queue: Time before resources became available
- Starting a worker: Worker initialization and allocation
- Initializing job: Setting up job environment
- Installing dependencies: Installing Python packages from requirements.txt
- Waking up data warehouse: Activating data processing infrastructure
- Running query: Actual data processing time
This breakdown helps identify bottlenecks and optimize job performance.
Data Results
Data Tab
View processed results in the data table:
- Row Count: Shows processed rows (e.g., "20 of 270,345 rows")
- Columns: File paths, sizes, and metadata
- Sorting: Click column headers to sort
- Filtering: Use filters to find specific data
- Pagination: Navigate through large result sets
Files Tab
Browse processed files:
- File paths and names
- File sizes and types
- Metadata and attributes
- Quick preview capabilities
Job Controls
Stop Job
Click the stop button to cancel a running job: - Job will transition to CANCELING state - Current operations complete gracefully - Resources are cleaned up
Monitoring Job Progress
Progress Indicators
Track your job execution:
- Rows Processed: Current progress through dataset
- Processing Rate: Files or records per second
- Time Elapsed: How long the job has been running
- Estimated Completion: Projected finish time (when available)
Resource Usage
Monitor resource consumption:
- Workers Active: Number of parallel workers processing data
- Memory Usage: RAM consumption during processing
- Storage I/O: Data read/write operations
Troubleshooting
Common Issues
Job Stuck in QUEUED
- Check worker availability in status bar
- Verify team hasn't exceeded resource quotas
- Review job priority settings
INIT Failures
- Check requirements.txt for invalid packages
- Verify package versions are compatible
- Review error messages in Logs tab
RUNNING Failures
- Examine stack trace in Logs tab
- Verify storage credentials are valid
- Check storage paths are accessible
- Review error messages for specific issues
Storage Access Errors
- Verify credentials in account settings
- Check storage bucket permissions
- Ensure storage path exists
- Test storage connection separately
Debugging Workflow
- Check Diagnostics Tab: Review job completion status and execution timeline
- Identify Bottleneck: Look for phases with unusually long durations:
- Long "Starting a worker" time → Check cluster availability
- Long "Installing dependencies" → Review requirements.txt
- Long "Waking up data warehouse" → Contact support
- Long "Running query" → Optimize DataChain code
- Open Logs Tab: Look for error messages and stack traces
- Check Dependencies Tab: Verify data sources are connected correctly
- Test with Subset: Try with smaller data sample
- Contact Support: Provide Job ID from Diagnostics tab
Performance Optimization
Analyzing Execution Timeline
Use the Diagnostics tab to identify optimization opportunities:
Quick Queue Times (< 2m)
✓ Good - Your jobs are getting resources quickly
Long Worker Start (> 5m)
Possible causes: - High cluster demand - Cold start of compute resources
Slow Dependency Installation (> 3m)
Optimization tips: - Pin package versions in requirements.txt - Minimize number of dependencies
Extended Data Warehouse Wake (> 2m)
This is infrastructure initialization. If consistently slow: - Keep warehouse warm with regular jobs - Contact support for dedicated warehouse
Long Running Query Time
Optimize your DataChain code: - Filter data early to reduce volume - Use efficient DataChain operations - Increase worker count for large datasets - Batch operations appropriately
Monitoring Best Practices
- Compare Job Runs: Check Diagnostics across multiple runs to spot trends
- Track Phase Durations: Note which phases take longest
- Use Job ID: Reference Job ID when reporting issues
- Review Logs: Check for warnings about performance
Next Steps
- Set up webhook notifications for job status updates
- Configure team collaboration for shared job access
- Explore DataChain operations for optimization
- Review account settings for credentials