pipeline create
Create a pipeline to update datasets in Studio.
Synopsis
Description
Creates a pipeline in Studio to update the specified datasets. The pipeline automatically includes all necessary jobs to update the datasets based on their dependencies.
Each dataset name can optionally include a version suffix (e.g., [email protected]). If no version is specified, the latest version is used.
The pipeline is created in paused state for review. Use datachain pipeline resume to start execution.
Dataset names can be provided in fully qualified format (e.g., @namespace.project.name) or as a short name. Short names use the default project and namespace from Studio.
Arguments
dataset [dataset ...]- Dataset name(s). Can be fully qualified (e.g.,@namespace.project.name) or short names. Optionally include version suffix:name@version. Multiple datasets can be specified.
Options
-t TEAM, --team TEAM- Team to create the pipeline for (default: from config)-h,--help- Show the help message and exit-v,--verbose- Be verbose-q,--quiet- Be quiet
Examples
-
Create a pipeline for a single dataset with a specific version:
datachain pipeline create "@[email protected]" -
Create a pipeline for multiple datasets:
This creates a pipeline that updates:datachain pipeline create "@[email protected]" "final_result_new" "[email protected]" - Version
1.0.9of@amritghimire.default.final_result - Latest version of
final_result_new(using default namespace and project) -
Version
1.0.9offinal_result_updated(using default namespace and project) -
Create a pipeline for a dataset using the latest version: