Pipelines

A Pipeline is a series of Tools with connected inputs and outputs configured to execute in a specific order.

Linking Existing Pipelines

Linking a pipeline (Projects > your_project > Flow > Pipelines > Link) adds that pipeline to your project. This is not as a copy, but as the actual pipeline, so any changes to the pipeline are atomatically propagated to and from any project which has this pipeline linked.

You can link a pipeline if it is not already linked to your project and it is from your tenant or available in your bundle or activation code.

Activation codes are tokens which allow you to run your analyses and are used for accounting and allocating the appropriate resources. ICA will automatically determine the best matching activation code, but this can be overwritten if needed.

If you unlink a pipeline it removes the pipline from your project, but it remains part of the list of pipelines of your tenant, so it can be linked to other projects later on.

There is no way to permanently delete a pipeline.

Create a Pipeline

Pipelines are created and stored within projects.

Navigate to Projects > your_project > Flow > Pipelines > +Create.
Select Nextflow (XML / JSON) , CWL Graphical or CWL code (XML / JSON) to create a new Pipeline.
Configure pipeline settings in the pipeline property tabs.
When creating a graphical CWL pipeline, drag connectors to link tools to input and output files in the canvas. Required tool inputs are indicated by a yellow connector.
Select Save.

Pipelines use the latest tool definition when the pipeline was last saved. Tool changes do not automatically propagate to the pipeline. In order to update the pipeline with the latest tool changes, edit the pipeline definition by removing the tool and re-adding it back to the pipeline.

Individual Pipeline files are limited to 20 Megabytes. If you need to add more than this, split your content over multiple files.

Pipeline Status

You can edit pipelines while they are in Draft or Release Candidate status. Once released, pipelines can no longer be edited.

Status

Draft

Description

Fully editable draft.

Status

Release Candidate

Description

The pipeline is ready for release. Editing is locked but the pipeline can be cloned (top right in the details view) to create a new version.

Status

Released

Description

The pipeline is released. To release a pipeline, all tools of that pipeline must also be in released status. Editing a released pipeline is not possible, but the pipeline can be cloned (top right in the details view) to create a new editable version.

Pipeline Properties

The following sections describe the properties that can be configured in each tab of the pipeline editor.

Depending on how you design the pipeline, the displayed tabs differ between the graphical and code definitions. For CWL you have a choice on how to define the pipeline, Nextflow is always defined in code mode.

CWL Graphical

Details
Documentation
Definition
Analysis Report
Metadata Model
Report

CWL Code

Details
Documentation
Inputform files (JSON) or XML Configuration (XML)
CWL Files
Metadata Model
Report

Nextflow Code

Details
Documentation
Inputform Files (JSON) or XML Configuration (XML)
Nextflow files
Metadata Model
Report

Any additional source files related to your pipeline will be displayed here in alphabetical order.

See the following pages for language-specific details for defining pipelines:

Details

The details tab provides options for configuring basic information about the pipeline.

Field

Entry

Code

The name of the pipeline. The name must be unique within the tenant, including linked and unlinked pipelines.

Nextflow Version

User selectable Nextflow version available only for Nextflow pipelines

Documentation

The Documentation tab provides is the place where you explain how your pipeline works to users. The description appears in the tool repository but is excluded from exported CWL definitions. If no documentation has been provided, this tab will be empty.

Definition (Graphical)

When using graphical mode for the pipeline definition, the Definition tab provides options for configuring the pipeline using a visualization panel and a list of component menus.

Description

Machine profiles

Compute types available to use with Tools in the pipeline.

Shared settings

Settings for pipelines used in more than one tool.

Reference files

Descriptions of reference files used in the pipeline.

Input files

Descriptions of input files used in the pipeline.

Output files

Descriptions of output files used in the pipeline.

Tool

Details about the tool selected in the visualization panel.

Tool repository

A list of tools available to be used in the pipeline.

In graphical mode, you can drag and drop inputs into the visualization panel to connect them to the tools. Make sure to connect the input icons to the tool before editing the input details in the component menu. Required tool inputs are indicated by a yellow connector.

XML Configuration / JSON Inputform Files (Code)

This page is used to specify all relevant information about the pipeline parameters.

There is a limit of 200 reports per report pattern which will be shown when you have multiple reports matching your regular expression.

Compute Resources

Compute Nodes

For each process defined by the workflow, ICA will launch a compute node to execute the process.

For each compute type, the standard (default - AWS on-demand) or economy (AWS spot instance) tiers can be selected.
When selecting an fpga instance type for running analyses on ICA, it is recommended to use the medium size. While the large size offers slight performance benefits, these do not proportionately justify the associated cost increase for most use cases.
When no type is specified, the default type of compute node is standard-small.

By default, compute nodes have no scratch space. This is an advanced setting and should only be used when absolutely necessary as it will incur additional costs and may offer only limited performance benefits because it is not local to the compute node.

For simplicity and better integration, consider using shared storage available at /ces. It is what is provided in the Small/Medium/Large+ compute types. This shared storage is used when writing files with relative paths.

Scratch space

If you do require scratch space via a Nextflow pod annotation or a CWL resource requirement, the path is /scratch.

For Nextflow pod annotation: 'volumes.illumina.com/scratchSize', value: '1TiB' will reserve 1 TiB.
For CWL, adding - class: ResourceRequirement tmpdirMin: 5000 to your requirements section will reserve 5000 MiB for CWL.

Avoid the following as it does not align with ICAv2 scratch space configuration.

Container overlay tmp path: /tmp
Legacy paths: /ephemeral
Environment Variables ($TMPDIR, $TEMP and $TMP)
Bash Command mktemp
CWL runtime.tmpdir

Compute Types

Daemon sets and system processes consume approximately 1 CPU and 2 GB Memory from the base values shown in the table. Consumption will vary based on the activity of the pod.

Compute Type

CPUs

Mem (GiB)

Nextflow (pod.value)

CWL (type, size)

standard-small

standard, small

standard-medium

standard, medium

standard-large

standard, large

standard-xlarge

standard, xlarge

standard-2xlarge

128

standard-2xlarge

standard, 2xlarge

standard-3xlarge

256

standard-3xlarge

standard, 3xlarge

hicpu-small

hicpu, small

hicpu-medium

hicpu, medium

hicpu-large

144

hicpu-large

hicpu, large

himem-small

himem, small

himem-medium

128

himem-medium

himem, medium

himem-large

384

himem-large

himem, large

himem-xlarge²

700

himem-xlarge

himem, xlarge

hiio-small

hiio, small

hiio-medium

hiio, medium

fpga2-medium¹

256

fpga2-medium

fpga2,medium

fpga-medium

244

fpga-medium

fpga, medium

fpga-large³

976

fpga-large

fpga, large

transfer-small⁴

transfer-small

transfer, small

transfer-medium ⁴

transfer-medium

transfer, medium

transfer-large⁴

transfer-large

transfer, large

¹ DRAGEN pipelines running on fpga2 compute type will incur a DRAGEN License cost of 0.10 iCredits per gigabase of data processed, with additional discounts as shown below.

80 or less gigabase per sample - no discount - 0.10 iCredits per gigabase
> 80 to 160 gigabase per sample - 20% discount - 0.08 iCredits per gigabase
> 160 to 240 gigabase per sample - 30% discount - 0.07 iCredits per gigabase
> 240 to 320 gigabase per sample - 40% discount - 0.06 iCredits per gigabase
> 320 and more gigabase per sample - 50% discount - 0.05 iCredits per gigabase

If your DRAGEN job fails, no DRAGEN license cost will be charged.

(2) The compute type himem-xlarge has low availability.

(3) The compute type fpga-large is only available in the US (use1) region. This compute type is not recommended as it suffers from low availability and offers little performance benefit over fpga-medium at significant additional cost.

(4) The transfer size selected is based on the selected storage size for compute type and used during upload and download system tasks.

Analysis Report (Graphical)

The pipeline analysis report appears in the pipeline execution results. The report is configured from widgets added to the Analysis Report tab in the pipeline editor.

[Optional] Import widgets from another pipeline.
1. Select Import from other pipeline.
2. Select the pipeline that contains the report you want to copy.
3. Select an import option: Replace current report or Append to current report.
4. Select Import.
From the Analysis Report tab, select Add widget, and then select a widget type.
Configure widget details.
Widget
Settings
Title
Add and format title text.
Analysis details
Add heading text and select the analysis metadata details to display.
Free text
Add formatted free text. The widget includes options for placeholder variables that display the corresponding project values.
Inline viewer
Add options to view the content of an analysis output file.
Analysis comments
Add comments that can be edited after an analysis has been performed.
Input details
Add heading text and select the input details to display. The widget includes an option to group details by input name.
Project details
Add heading text and select the project details to display.
Page break
Add a page break widget where page breaks should appear between report sections.
Select Save.

By default, compute nodes have no scratch space. This is an advanced setting and should only be used when absolutely necessary as it will incur additional costs and offers only limited performance benefits because it is not local to the compute node.

For better integration, use shared storage available at /ces. This shared storage is used when writing files with relative paths.

Daemon sets and system processes consume approximately 1 CPU and 2 GiB Memory from the base values shown in the table. Consumption will vary based on the activity of the pod.

Compute Type

CPUs

Mem (GiB)

Nextflow (pod.value)

CWL (type, size)

standard-small

standard, small

standard-medium

standard, medium

standard-large

standard, large

standard-xlarge

standard, xlarge

standard-2xlarge

128

standard-2xlarge

standard, 2xlarge

hicpu-small

hicpu, small

hicpu-medium

hicpu, medium

hicpu-large

144

hicpu-large

hicpu, large

himem-small

himem, small

himem-medium

128

himem-medium

himem, medium

himem-large

384

himem-large

himem, large

himem-xlarge²

700

himem-xlarge

himem, xlarge

hiio-small

hiio, small

hiio-medium

hiio, medium

fpga2-medium¹

256

fpga2-medium

fpga2,medium

fpga-medium

244

fpga-medium

fpga, medium

fpga-large³

976

fpga-large

fpga, large

transfer-small⁴

transfer-small

transfer, small

transfer-medium⁴

transfer-medium

transfer, medium

transfer-large⁴

transfer-large

transfer, large

¹ DRAGEN pipelines running on fpga2 compute type will incur a DRAGEN License cost of 0.10 iCredits per gigabase of data processed, with additional discounts as shown below.

80 or less gigabase per sample - no discount - 0.10 iCredits per gigabase
> 80 to 160 gigabase per sample - 20% discount - 0.08 iCredits per gigabase
> 160 to 240 gigabase per sample - 30% discount - 0.07 iCredits per gigabase
> 240 to 320 gigabase per sample - 40% discount - 0.06 iCredits per gigabase
> 320 and more gigabase per sample - 50% discount - 0.05 iCredits per gigabase

If your DRAGEN job fails, no DRAGEN license cost will be charged.

(*2) The compute type himem-xlarge has low availability.

Free Text Placeholders

The following placeholders can be used to insert project data.

Placeholder

Description

[[BB_PROJECT_NAME]]

The project name.

[[BB_PROJECT_OWNER]]

The project owner.

[[BB_PROJECT_DESCRIPTION]]

The project short description.

[[BB_PROJECT_INFORMATION]]

The project information.

[[BB_PROJECT_LOCATION]]

The project location.

[[BB_PROJECT_BILLING_MODE]]

The project billing mode.

[[BB_PROJECT_DATA_SHARING]]

The project data sharing settings.

[[BB_REFERENCE]]

The analysis reference.

[[BB_USERREFERENCE]]

The user analysis reference.

[[BB_PIPELINE]]

The name of the pipeline.

[[BB_USER_OPTIONS]]

The analysis user options.

[[BB_TECH_OPTIONS]]

The analysis technical options. Technical options include the TECH suffix and are not visible to end users.

[[BB_ALL_OPTIONS]]

All analysis options. Technical options include the TECH suffix and are not visible to end users.

[[BB_SAMPLE]]

The sample.

[[BB_REQUEST_DATE]]

The analysis request date.

[[BB_START_DATE]]

The analysis start date.

[[BB_DURATION]]

The analysis duration.

[[BB_REQUESTOR]]

The user requesting analysis execution.

[[BB_RUNSTATUS]]

The status of the analysis.

[[BB_ENTITLEMENTDETAIL]]

The used entitlement detail.

[[BB_METADATA:path]]

The value or list of values of a metadata field or multi-value fields.

Nextflow/CWL Files (Code)

Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list. The following formats are supported:

DIFF (.diff)
GROOVY (.groovy .nf)
JAVASCRIPT (.js .javascript)
JSON (.json)
SH (.sh)
SQL (.sql)
TXT (.txt)
XML (.xml)
YAML (.yaml .cwl)

If the file type is not recognized, it will default to text display. This can result in the application interpreting binary files as text when trying to display the contents.

Main.nf (Nextflow code)

The Nextflow project main script.

Nextflow.config (Nextflow code)

The Nextflow configuration settings.

Workflow.cwl (CWL code)

The Common Workflow Language main script.

Adding Files

Multiple files can be added by selecting the +Create option at the bottom of the screen to make pipelines more modular and manageable.

Metadata Model

See Metadata Models

Report

Here patterns for detecting report files in the analysis output can be defined. On opening an analysis result window of this pipeline, an additional tab will display these report files. The goal is to provide a pipeline-specific user-friendly representation of the analysis result.

To add a report select the + symbol on the left side. Provide your report with a unique name, a regular expression matching the report and optionally, select the format of the report. This must be the source format of the report data generated during the analysis.

There is a limit of 20 reports per report pattern which will be shown when you have multiple reports matching your regular expression.

Start a New Analysis

Use the following instructions to start a new analysis for a single pipeline.

Select Projects > your_project > Flow > Pipelines.
Select the pipeline or pipeline details of the pipeline you want to run.
Select Start Analysis.
Configure analysis settings.
Select Start Analysis.
View the analysis status on the Analyses page.
- Requested—The analysis is scheduled to begin.
- In Progress—The analysis is in progress.
- Succeeded—The analysis is complete.
- Failed —The analysis has failed.
- Aborted — The analysis was aborted before completing.
To end an analysis, select Abort.
To perform a completed analysis again, select Re-run.

Analysis Tab

The Analysis tab provides options for configuring basic information about the analysis.

Field

Entry

User Reference

The unique analysis name.

User tags

One or more tags used to filter the analysis list. Select from existing tags or type a new tag name in the field.

Pricing

Select a subscription to which the analysis will be charged.

Notification

Enter your email address if you want to be notified when the analysis completes.

Output Folder

Select a folder in which the output folder of the analysis should be located. When no folder is selected, the output folder will be located in the root of the project. When you open the folder selection dialog, you have the option to create a new folder (bottom of the screen).

Input

Select the input files to use in the analysis. (max. 50,000)

Settings

Provide input settings.

Aborting Analyses

You can abort a running analysis from either the analysis overview (Projects > your_project > Flow > Analyses > your_analysis > Manage > Abort) or from the analysis details (Projects > your_project > Flow > Analyses > your_analysis > Details tab > Abort).

View Analysis Results

You can view analysis results on the Analyses page or in the output_folder on the Data page.

Select a project, and then select the Flow > Analyses page.
Select an analysis.
On the Details tab, select the square symbol right of the output files.
From the output files view, expand the list and select an output file.
1. If you want to add or remove any user or technical tags, you can do so from the data details view.
2. If you want to download the file, select Schedule download.
To preview the file, select the View tab.
Return to Flow > Analyses > your_analysis.
View additional analysis result information on the following tabs:
- Details - View information on the pipeline configuration.
- Steps - stderr and stdout information
- Nextflow Timeline - Nextflow process execution timeline.
- Nextflow Execution - Nextflow analysis report. Showing the run times, commands, resource usage and tasks for Nextflow analyses.
- Report - Shows the reports defined on the pipeline report tab.

PreviousReference Data NextNextflow

Last updated 1 month ago

Was this helpful?