AWS AMI Installation
Prerequisites
DataChain Studio Images
The DataChain Studio machine image (AMI) and access to the DataChain Studio Docker images need to be provided by the DataChain team to enable the installation.
DNS
Create a DNS record pointing to the IP address of the EC2 instance. This hostname will be used for DataChain Studio.
Installation
-
Open the AWS Console
-
Navigate to EC2 -> Instances
-
Click Launch instances
-
Provide a name for your EC2 instance
-
Select datachain-studio-selfhosted from the AMI catalog
-
Select an appropriate instance type.
- Minimum requirements: 16 GB RAM, 4 vCPUs
-
Recommended requirements: 32 GB RAM, 8 vCPUs
-
To enable SSH connections to the instance, select an existing key pair to use or create a new one. We recommend ED25519 keys.
-
In the network settings, use either the default VPC or change it to a desired one. Under the Firewall setting, create a new security group with SSH, HTTP, and HTTPS access or use an existing one with the same level of access.
Warning
It's important to ensure that your VPC has connectivity to your Git forge provider (GitHub.com, GitLab.com, Bitbucket.org) and your storage provider (S3, GCS, etc.), to ensure DataChain Studio can access these resources.
- Configure storage:
- Use at least 100 GB of EBS storage
- Consider using GP3 for better performance
-
Enable encryption for security
-
Launch the instance
Configuration
Once the instance is running, you need to configure DataChain Studio:
Initial Setup
-
SSH into the instance:
-
Navigate to the configuration directory:
-
Copy the example configuration:
-
Edit the configuration file:
Configuration Parameters
Edit the following parameters in config.yml:
# Basic configuration
domain: your-studio-domain.com
ssl:
enabled: true
cert_path: /etc/ssl/certs/studio.crt
key_path: /etc/ssl/private/studio.key
# Database configuration
database:
host: localhost
port: 5432
name: datachain_studio
user: studio
password: your-secure-password
# Storage configuration
storage:
type: s3
bucket: your-studio-bucket
region: us-east-1
access_key: your-access-key
secret_key: your-secret-key
# Git forge configuration
git:
github:
enabled: true
app_id: your-github-app-id
private_key_path: /etc/studio/github-private-key.pem
gitlab:
enabled: true
url: https://gitlab.com
app_id: your-gitlab-app-id
secret: your-gitlab-secret
SSL Configuration
- Upload your SSL certificate and private key to the instance
- Update the paths in the configuration file
- Ensure proper file permissions:
Start Services
-
Start DataChain Studio services:
-
Check service status:
-
View logs:
Verification
- Access DataChain Studio at
https://your-domain.com - Check that all services are running:
- Verify database connectivity:
Security Considerations
Network Security
- Use security groups to restrict access
- Enable VPC flow logs for monitoring
- Consider using AWS WAF for web application protection
Data Security
- Enable EBS encryption
- Use IAM roles instead of access keys where possible
- Regularly rotate secrets and keys
- Enable CloudTrail for audit logging
Backup Strategy
- Set up automated EBS snapshots
- Configure database backups
- Test restore procedures regularly
Troubleshooting
Common Issues
Services won't start: - Check configuration file syntax - Verify SSL certificate paths and permissions - Check Docker service status
Cannot access Studio: - Verify DNS resolution - Check security group rules - Confirm SSL certificate validity
Database connection issues: - Check database service status - Verify connection parameters - Check database logs
Getting Help
- Check service logs:
sudo journalctl -u datachain-studio - Review configuration:
sudo cat /opt/datachain-studio/config.yml - Contact support with instance details and error messages