Git Forges Configuration
This section covers how to configure DataChain Studio to integrate with various Git hosting providers (forges) including GitHub, GitLab, and Bitbucket.
Overview
DataChain Studio supports integration with multiple Git forges to enable:
- Repository Access: Connect to Git repositories for code and data
- Authentication: OAuth-based user authentication
- Webhook Integration: Automatic job triggering on Git events
- Team Management: Synchronize teams and permissions
Supported Git Forges
- GitHub - GitHub.com and GitHub Enterprise Server
- GitLab - GitLab.com and self-hosted GitLab instances
- Bitbucket - Bitbucket Cloud and Bitbucket Server
General Configuration
All Git forge integrations share common configuration patterns:
Basic Configuration Structure
global:
git:
# GitHub configuration
github:
enabled: true
appId: "your-app-id"
privateKey: "your-private-key"
webhookSecret: "your-webhook-secret"
# GitLab configuration
gitlab:
enabled: true
url: "https://gitlab.com"
clientId: "your-client-id"
clientSecret: "your-client-secret"
webhookSecret: "your-webhook-secret"
# Bitbucket configuration
bitbucket:
enabled: true
clientId: "your-client-id"
clientSecret: "your-client-secret"
webhookSecret: "your-webhook-secret"
Common Configuration Options
All Git forges support these common options:
git:
<forge-name>:
enabled: true|false
# Authentication settings
clientId: "oauth-client-id"
clientSecret: "oauth-client-secret"
# Webhook configuration
webhookSecret: "webhook-secret-key"
webhookEvents:
- push
- pull_request
- release
# SSL/TLS settings
ssl:
verify: true
caCertificate: |
-----BEGIN CERTIFICATE-----
... custom CA certificate ...
-----END CERTIFICATE-----
# Rate limiting
rateLimit:
requestsPerHour: 5000
burstSize: 100
# Timeout settings
timeout:
connect: 30s
read: 60s
write: 30s
Multi-Forge Configuration
DataChain Studio can be configured to work with multiple Git forges simultaneously:
global:
git:
# Primary forge
github:
enabled: true
appId: "123456"
privateKey: |
-----BEGIN RSA PRIVATE KEY-----
... GitHub App private key ...
-----END RSA PRIVATE KEY-----
# Secondary forge for internal repositories
gitlab:
enabled: true
url: "https://gitlab.internal.company.com"
clientId: "internal-gitlab-client-id"
clientSecret: "internal-gitlab-secret"
# Additional forge for specific teams
bitbucket:
enabled: true
clientId: "bitbucket-client-id"
clientSecret: "bitbucket-secret"
Authentication Flow
OAuth 2.0 Flow
All Git forges use OAuth 2.0 for authentication:
- User Authorization: User authorizes DataChain Studio to access their Git forge account
- Code Exchange: Studio exchanges authorization code for access token
- Token Storage: Access tokens are securely stored and used for API calls
- Token Refresh: Tokens are automatically refreshed when needed
Configuration Requirements
Each forge requires specific OAuth application setup:
- Redirect URIs: Must include Studio's callback URLs
- Scopes: Appropriate permissions for repository and user access
- Webhook URLs: For receiving Git events
Webhook Configuration
Automatic Webhook Setup
DataChain Studio can automatically configure webhooks:
git:
<forge-name>:
webhooks:
autoSetup: true
events:
- push
- pull_request
- release
# Custom webhook settings
ssl:
verify: true
contentType: "application/json"
secret: "webhook-secret-key"
Manual Webhook Configuration
For manual webhook setup, configure each repository with:
- Payload URL:
https://studio.yourcompany.com/api/webhooks/<forge-name> - Content Type:
application/json - Secret: Your configured webhook secret
- Events:
push,pull_request,release
Security Configuration
SSL/TLS Configuration
For self-hosted Git forges with custom certificates:
git:
gitlab:
url: "https://gitlab.internal.company.com"
ssl:
verify: true
caCertificate: |
-----BEGIN CERTIFICATE-----
... your internal CA certificate ...
-----END CERTIFICATE-----
Access Control
Configure repository access patterns:
git:
<forge-name>:
access:
# Repository filtering
repositories:
allowed:
- "org/allowed-repo"
- "org/*-public"
blocked:
- "org/sensitive-repo"
# User/organization filtering
organizations:
allowed:
- "your-org"
- "partner-org"
blocked:
- "external-org"
Error Handling and Retry Logic
Configure resilient Git forge connections:
git:
<forge-name>:
retry:
enabled: true
maxAttempts: 3
initialDelay: 1s
maxDelay: 30s
exponentialBackoff: true
circuitBreaker:
enabled: true
failureThreshold: 10
recoveryTimeout: 60s
Monitoring and Alerting
Monitor Git forge integrations:
monitoring:
gitForges:
enabled: true
healthChecks:
enabled: true
interval: 30s
timeout: 10s
metrics:
- apiCalls
- responseTime
- errorRate
- webhookDelivery
alerts:
- name: "Git Forge API Error Rate High"
condition: "error_rate > 5%"
duration: "5m"
severity: "warning"
Testing Configuration
Connectivity Testing
Test Git forge connections:
# Test GitHub connection
curl -k https://studio.yourcompany.com/api/git/github/test
# Test GitLab connection
curl -k https://studio.yourcompany.com/api/git/gitlab/test
# Test webhook delivery
curl -X POST https://studio.yourcompany.com/api/webhooks/github \
-H "Content-Type: application/json" \
-H "X-GitHub-Event: ping" \
-d '{"zen": "Test webhook"}'
Configuration Validation
Validate configuration before deployment:
# Validate Helm configuration
helm template datachain-studio ./chart \
--values values.yaml \
--dry-run
# Test OAuth flow
curl "https://github.com/login/oauth/authorize?client_id=YOUR_CLIENT_ID&redirect_uri=https://studio.yourcompany.com/auth/github/callback"
Troubleshooting
Common Issues
OAuth authentication failures: - Verify client ID and secret - Check redirect URI configuration - Ensure proper scopes are granted
Webhook delivery failures: - Verify webhook secret matches - Check webhook URL accessibility - Review webhook event configuration
API rate limiting: - Monitor API usage - Implement proper caching - Configure rate limit settings
Debug Commands
# Check Git forge connectivity
kubectl logs -f deployment/datachain-studio-backend -n datachain-studio | grep -i git
# Test OAuth flow
kubectl port-forward service/datachain-studio-frontend 8080:80 -n datachain-studio
# Verify webhook configuration
kubectl exec -it deployment/datachain-studio-backend -n datachain-studio -- curl -I https://api.github.com
Migration Between Forges
When migrating between Git forges:
- Export Configuration: Back up existing Git forge settings
- Configure New Forge: Set up authentication with new provider
- Update Repositories: Migrate repository connections
- Test Integration: Verify all functionality works
- Update Webhooks: Reconfigure webhook endpoints
- Cleanup: Remove old forge configuration
Next Steps
Choose your Git forge for detailed configuration:
- GitHub Configuration - Set up GitHub.com or GitHub Enterprise
- GitLab Configuration - Configure GitLab.com or self-hosted GitLab
- Bitbucket Configuration - Integrate with Bitbucket Cloud or Server
For additional configuration options:
- SSL/TLS Configuration for secure connections
- CA Certificates for custom certificate authorities
- Troubleshooting Guide for common issues