# Pipelines

A Pipeline is a series of Tools with connected inputs and outputs configured to execute in a specific order.

## Linking Existing Pipelines

Linking a pipeline (**Projects > your\_project > Flow > Pipelines > Link**) adds that pipeline to your project. This is not as a copy, but as the actual pipeline, so any changes to the pipeline are atomatically propagated to and from any project which has this pipeline linked.

You can link a pipeline if it is not already linked to your project and it is from your tenant or available in your [bundle](https://help.connected.illumina.com/connected-analytics/home/h-bundles) or activation code.

{% hint style="info" %}
**Activation codes** are tokens which allow you to run your analyses and are used for accounting and allocating the appropriate resources. ICA will automatically determine the best matching activation code, but this can be overwritten if needed.
{% endhint %}

If you **unlink a pipeline** it removes the pipline from your project, but it remains part of the list of pipelines of your tenant, so it can be linked to other projects later on.

{% hint style="info" %}
There is no way to permanently delete a pipeline.
{% endhint %}

***

## Create a Pipeline

Pipelines are created and stored within projects.

1. Navigate to **Projects > your\_project > Flow > Pipelines > +Create**.
2. Select **Nextflow** (XML / JSON / Git) , **CWL Graphical** or **CWL code** (XML / JSON / Git) to create a new Pipeline.
3. Configure pipeline settings in the pipeline property tabs.
4. When creating a graphical CWL pipeline, drag connectors to link tools to input and output files in the canvas. Required tool inputs are indicated by a yellow connector.
5. Select Save.

{% hint style="warning" %}
Pipelines use the latest tool definition when the pipeline was last saved. Tool changes do not automatically propagate to the pipeline. In order to update the pipeline with the latest tool changes, edit the pipeline definition by removing the tool and re-adding it back to the pipeline.
{% endhint %}

{% hint style="info" %}
Individual Pipeline files are limited to 20 Megabytes. If you need to add more than this, split your content over multiple files.
{% endhint %}

### Pipeline Statuses

For pipeline authors sharing and distributing their pipelines, the **draft**, **released**, **deprecated**, and **archived** statuses provide a structured framework for managing pipeline availability, user communication, and transition planning. To change the pipeline status, select it at **Projects > your\_project > Pipelines > your\_pipeline > change status.**

{% hint style="info" %}
You can edit pipelines while they are in *Draft* status. Once they move away from draft, pipelines can no longer be edited. Pipelines can be cloned (top right in the details view) to create a new editable version.
{% endhint %}

<table data-view="cards"><thead><tr><th>Status</th><th>Purpose</th><th>Best Practice</th></tr></thead><tbody><tr><td><strong>Draft</strong></td><td>Use the draft status while <strong>developing or testing</strong> a pipeline version internally.</td><td>Only share draft pipelines with collaborators who are actively involved in development.</td></tr><tr><td><strong>Released</strong></td><td>The released status signals that a pipeline is stable and ready <strong>for general use</strong>.</td><td>Share your pipeline when it is ready for broad use. Ensure users have access to current documentation and know where to find support or updates. Releasing a pipeline is only possible if all tools of that pipeline must be in released status.</td></tr><tr><td><strong>Deprecated</strong></td><td>Deprecation is used when a pipeline version is <strong>scheduled for retirement or replacement</strong>. Deprecated pipelines <strong>can not be linked</strong> to bundles, but will not be unlinked from existing bundles. Users who already have access will <strong>still be able to start analyses</strong>. You can add a message (max 256 chars) when deprecating pipelines.</td><td>Deprecate in advance of archiving a pipeline, making sure the new pipeline is available in the same bundle as the deprecated pipeline. This will allow the pipeline author to link the new or alternative pipeline in the deprecation message field.</td></tr><tr><td><strong>Archived</strong></td><td>Archiving a pipeline version removes it from active use; users <strong>can no longer launch</strong> <strong>analyses</strong>. Archived pipelines can not be linked to bundles, but are not automatically unlinked from bundles or projects. You can add a message (max 256 chars) when archiving pipelines.</td><td>Warn users in advance: Deprecate the pipeline before archiving to allow existing users time to transition. Use the archive message to point users to the new or alternative pipeline</td></tr></tbody></table>

***

## Pipeline Properties

The following sections describe the properties that can be configured in each tab of the pipeline editor.

Depending on how you design the pipeline, the displayed tabs differ between the graphical and code definitions. For **CWL** you have a choice on how to define the pipeline, **Nextflow** is always defined in code mode.

<table data-view="cards"><thead><tr><th></th><th></th><th></th></tr></thead><tbody><tr><td><strong>CWL Graphical</strong></td><td><ul><li>Details</li><li>Documentation</li><li>Definition</li><li>Analysis Report</li><li>Metadata Model</li><li>Report</li></ul></td><td></td></tr><tr><td><strong>CWL Code</strong></td><td><ul><li>Details</li><li>Documentation</li><li>Inputform files (JSON) or XML Configuration (XML)</li><li>CWL Files</li><li>Metadata Model</li><li>Report</li></ul></td><td></td></tr><tr><td><strong>Nextflow Code</strong></td><td><ul><li>Details</li><li>Documentation</li><li>Inputform Files (JSON) or XML Configuration (XML)</li><li>Nextflow files</li><li>Metadata Model</li><li>Report</li></ul></td><td></td></tr></tbody></table>

Any additional source files related to your pipeline will be displayed here in alphabetical order.

See the following pages for language-specific details for defining pipelines:

* [Nextflow](https://help.connected.illumina.com/connected-analytics/project/p-flow/f-pipelines/pi-nextflow)
* [CWL](https://help.connected.illumina.com/connected-analytics/project/p-flow/f-pipelines/pi-cwl)

***

### Details

The details tab provides options for configuring basic information about the pipeline.

<table><thead><tr><th width="216">Field</th><th>Entry</th></tr></thead><tbody><tr><td>Code (pipeline name)</td><td>The name of the pipeline. The name must be unique within the tenant, including linked and unlinked pipelines.</td></tr><tr><td>Nextflow Version</td><td>User selectable Nextflow version available only for Nextflow pipelines</td></tr><tr><td>Description</td><td>A short description of the pipeline.</td></tr><tr><td>Status</td><td>The <a href="#pipeline-statuses">release status</a> of the pipeline.</td></tr><tr><td>Proprietary</td><td>Hide the pipeline scripts and details from users who do not belong to the tenant who owns the pipeline. This also prevents cloning the pipeline.</td></tr><tr><td>Storage size</td><td>User selectable <a href="../../../reference/r-pricing#data-storage">storage size</a> for running the pipeline. This must be large enough to run the pipeline, but setting it too large incurs unnecessary costs.</td></tr><tr><td>Links</td><td>External reference links. (max 100 chars as name and 2048 chars as link)</td></tr></tbody></table>

The following information becomes visible when viewing the pipeline details.

<table><thead><tr><th width="217">Field</th><th>Entry</th></tr></thead><tbody><tr><td>ID</td><td>Unique Identifier of the pipeline.</td></tr><tr><td>URN</td><td>Identification of the pipeline in Uniform Resource Name</td></tr></tbody></table>

The **clone** action will be shown in the pipeline details at the top-right. Cloning a pipeline allows you to create modifications without impacting the original pipeline. When cloning a pipeline, you become the owner of the cloned pipeline. When you clone a pipeline, you must give it a unique name because no duplicate names are allowed within all projects of the tenant. **So the** **name must be unique per tenant**. It is possible that you see the same pipeline name twice when a pipeline linked from another tenant is cloned with that same name in your tenant. The name is then still unique per tenant, but you will see them both in your tenant.

When you clone a Nextflow pipeline, a verification of the configured Nextflow version is done to prevent the use of deprecated versions.

### Documentation

The Documentation tab provides is the place where you explain how your pipeline works to users. The description appears in the tool repository but is excluded from exported CWL definitions. If no documentation has been provided, this tab will be empty.

### Definition (Graphical)

When using graphical mode for the pipeline definition, the Definition tab provides options for configuring the pipeline using a visualization panel and a list of component menus.

<table><thead><tr><th width="219">Menu</th><th>Description</th></tr></thead><tbody><tr><td>Machine profiles</td><td><a href="#compute">Compute types</a> available to use with Tools in the pipeline.</td></tr><tr><td>Shared settings</td><td>Settings for pipelines used in more than one tool.</td></tr><tr><td>Reference files</td><td>Descriptions of reference files used in the pipeline.</td></tr><tr><td>Input files</td><td>Descriptions of input files used in the pipeline.</td></tr><tr><td>Output files</td><td>Descriptions of output files used in the pipeline.</td></tr><tr><td>Tool</td><td>Details about the tool selected in the visualization panel.</td></tr><tr><td>Tool repository</td><td>A list of tools available to be used in the pipeline.</td></tr></tbody></table>

{% hint style="info" %}
In graphical mode, you can **drag and drop inputs** into the visualization panel to connect them to the tools. Make sure to **connect the input icons to the tool before editing the input details** in the component menu. Required tool inputs are indicated by a yellow connector.

**Safari is not supported** as browser for graphical editing.
{% endhint %}

{% hint style="warning" %}
When creating a graphical CWL pipeline, **do not use spaces in the input field names, use underscores instead**. The API performs normalization of input names when running the analysis to prevent issues with special characters (such as accented letters) by replacing them with their more common (unaccented) counterpart. Part of this normalization includes replacing spaces in names with underscores. This normalization is applied to file input name, reference file input name, step id and step parameters.

You will encounter the error ICA\_API\_004 "No value found for required input parameter" when trying to run an API analysis on a graphical pipeline that has been designed with spaces in input parameters.
{% endhint %}

### XML Configuration / JSON Inputform Files (Code)

This page is used to specify all relevant information about the pipeline parameters.

{% hint style="info" %}
There is a limit of 200 reports per report pattern which will be shown when you have multiple reports matching your regular expression.
{% endhint %}

### Compute Resources

#### Compute Nodes

For each process defined by the workflow, ICA will launch a compute node to execute the process.

* For each compute type, the `standard` (default - AWS on-demand) or `economy` (AWS spot instance) tiers can be selected.
* When selecting an **fpga** instance type for running analyses on ICA, it is recommended to use the medium size. While the large size offers slight performance benefits, these do not proportionately justify the associated cost increase for most use cases.
* When no type is specified, the default type of compute node is `standard-small`.

{% hint style="info" %}
You can see **which resources were used** in the different analysis steps at **Projects > your\_project > Flow > Analyses > your\_analysis > Steps tab**. (For child steps, these are displayed on the parent step)
{% endhint %}

By default, compute nodes have no scratch space. This is an advanced setting and should only be used when absolutely necessary as it will incur additional costs and may offer only limited performance benefits because it is not local to the compute node.

For simplicity and better integration, consider using shared storage available at `/ces`. It is what is provided in the Small/Medium/Large+ compute types. This shared storage is used when writing files with relative paths.

<details>

<summary>Scratch space notes</summary>

If you do require scratch space via a Nextflow pod annotation or a CWL resource requirement, the path is `/scratch`.

* For Nextflow `pod annotation: 'volumes.illumina.com/scratchSize', value: '1TiB'` will reserve 1 TiB.
* For CWL, adding `- class: ResourceRequirement tmpdirMin: 5000` to your requirements section will reserve 5000 MiB for CWL.

<mark style="color:red;">**Avoid the following**</mark> as it does not align with ICAv2 scratch space configuration.

* Container overlay tmp path: `/tmp`
* Legacy paths: `/ephemeral`
* Environment Variables ($TMPDIR, $TEMP and $TMP)
* Bash Command `mktemp`
* CWL `runtime.tmpdir`

</details>

#### Compute Types

Daemon sets and system processes consume approximately 1 CPU and 2 GB Memory from the base values shown in the table. Consumption will vary based on the activity of the pod.

| <p><br>Compute Type</p>      | CPUs | Mem (GiB) | Nextflow (`pod.value`) | CWL (`type, size`) |
| ---------------------------- | ---- | --------- | ---------------------- | ------------------ |
| standard-small               | 2    | 8         | standard-small         | standard, small    |
| standard-medium              | 4    | 16        | standard-medium        | standard, medium   |
| standard-large               | 8    | 32        | standard-large         | standard, large    |
| standard-xlarge              | 16   | 64        | standard-xlarge        | standard, xlarge   |
| standard-2xlarge             | 32   | 128       | standard-2xlarge       | standard, 2xlarge  |
| standard-3xlarge             | 64   | 256       | standard-3xlarge       | standard, 3xlarge  |
| hicpu-small                  | 16   | 32        | hicpu-small            | hicpu, small       |
| hicpu-medium                 | 36   | 72        | hicpu-medium           | hicpu, medium      |
| hicpu-large                  | 72   | 144       | hicpu-large            | hicpu, large       |
| himem-small                  | 8    | 64        | himem-small            | himem, small       |
| himem-medium                 | 16   | 128       | himem-medium           | himem, medium      |
| himem-large                  | 48   | 384       | himem-large            | himem, large       |
| himem-xlarge<sup>2</sup>     | 92   | 700       | himem-xlarge           | himem, xlarge      |
| hiio-small                   | 2    | 16        | hiio-small             | hiio, small        |
| hiio-medium                  | 4    | 32        | hiio-medium            | hiio, medium       |
| fpga2-medium<sup>1</sup>     | 24   | 256       | fpga2-medium           | fpga2,medium       |
| fpga2-large<sup>1</sup>      | 48   | 512       | fpga2-large            | fpga2,large        |
| gpu-small                    | 8    | 61        | gpu-small              | gpu, small         |
| gpu-medium                   | 32   | 244       | gpu-medium             | gpu, medium        |
| transfer-small<sup>3</sup>   | 4    | 10        | transfer-small         | transfer, small    |
| transfer-medium <sup>3</sup> | 8    | 15        | transfer-medium        | transfer, medium   |
| transfer-large<sup>3</sup>   | 16   | 30        | transfer-large         | transfer, large    |

{% hint style="warning" %} <sup>1</sup> **DRAGEN pipelines running on fpga2** compute type will incur a DRAGEN license cost of 0.10 iCredits per gigabase of data processed, with additional **discounts** as shown below.

* **80 or less** gigabase per sample - no discount - 0.10 iCredits per gigabase
* **> 80 to 160** gigabase per sample - 20% discount - 0.08 iCredits per gigabase
* **> 160 to 240** gigabase per sample - 30% discount - 0.07 iCredits per gigabase
* **> 240 to 320** gigabase per sample - 40% discount - 0.06 iCredits per gigabase
* **> 320 and more** gigabase per sample - 50% discount - 0.05 iCredits per gigabase

**DRAGEN Iterative gVCF Genotyper (iGG)** will incur a **license cost** of **0.6216 iCredits per gigabase**. For example, a sample of 3.3 gigabase human reference will result in 2 iCredits per sample. The associated **Compute costs** will be based on the compute instance chosen.

The **ORA (Original Read Archive) compression pipeline** is part of the DRAGEN platform. It performs lossless genomic data compression to reduce the size of FASTQ and FASTQ.GZ files (up to 4-6x smaller) while preserving data integrity with internal checksum verification. The ORA compression pipeline has a **license cost** of **0.017 iCredits per input Gbase**; decompression does not have an associated license cost.
{% endhint %}

{% hint style="warning" %}
The ***DRAGEN\_Map\_Align*** **pipeline running on fpga2** has the standard **DRAGEN license cost** of 0.10 iCredits per Gbase processed, with but replaces the standard volume discounts with the **discounts** shown below.

* **10 or less** gigabase per sample - no discount - 0.10 iCredits per gigabase
* **> 10 to 25** gigabase per sample - 30% discount - 0.07 iCredits per gigabase
* **> 25 to 60** gigabase per sample - 70% discount - 0.03 iCredits per gigabase
* **> 60 and more** gigabase per sample - 85% discount - 0.015 iCredits per gigabase
  {% endhint %}

{% hint style="info" %}
(2) The compute type **himem-xlarge** has low availability.
{% endhint %}

{% hint style="danger" %}
FPGA1 instances were decommissioned on Nov 1st 2025. Please migrate to F2 for improved capacity and performance with up to 40% reduced turnaround time for analysis.
{% endhint %}

{% hint style="info" %}
(3) The transfer size selected is based on the selected storage size for compute type and used during upload and download system tasks.
{% endhint %}

### Nextflow/CWL Files (Code)

Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list. The following formats are supported:

* DIFF (.diff)
* GROOVY (.groovy .nf)
* JAVASCRIPT (.js .javascript)
* JSON (.json)
* SH (.sh)
* SQL (.sql)
* TXT (.txt)
* XML (.xml)
* YAML (.yaml .cwl)

{% hint style="info" %}
If the file type is not recognized, it will default to text display. This can result in the application interpreting binary files as text when trying to display the contents.
{% endhint %}

#### Main.nf (Nextflow code)

The Nextflow project main script.

#### Nextflow\.config (Nextflow code)

The Nextflow configuration settings.

#### Workflow\.cwl (CWL code)

The Common Workflow Language main script.

#### Adding Files

Multiple files can be added by selecting the **+Create** option at the bottom of the screen to make pipelines more modular and manageable.

### Metadata Model

See [Metadata Models](https://help.connected.illumina.com/connected-analytics/home/h-metadatamodels)

### Report

Here patterns for detecting report files in the analysis output can be defined. On opening an analysis result window of this pipeline, **an additional tab will display these report files.** The goal is to provide a pipeline-specific user-friendly representation of the analysis result.

To add a report select the **+ symbol** on the left side. Provide your report with a unique name, a regular expression matching the report and optionally, select the format of the report. This must be the source format of the report data generated during the analysis.

{% hint style="info" %}
There is a limit of 20 reports per report pattern which will be shown when you have multiple reports matching your regular expression.
{% endhint %}

***

## Start a New Analysis

Use the following instructions to start a new analysis for a single pipeline.

1. Select **Projects > your\_project > Flow > Pipelines.**
2. Select the pipeline or pipeline details of the pipeline you want to run.
3. Select **Start Analysis**.
4. Configure [analysis settings](#analysis-setting). (see below)
5. Select **Start Analysis**.
6. View the analysis status on the Analyses page.
   * **Requested**—The analysis is scheduled to begin.
   * **In Progress**—The analysis is in progress.
   * **Succeeded**—The analysis is complete.
   * **Failed** —The analysis has failed.
   * **Aborted** — The analysis was aborted before completing.
7. To end an analysis, select **Abort**.
8. To perform a completed analysis again, select **Re-run**.

#### Analysis Settings

The Start Analysis screen provides the configuration options for the analysis.

<table><thead><tr><th width="196">Field</th><th>Entry</th></tr></thead><tbody><tr><td>User Reference</td><td>The unique analysis name.</td></tr><tr><td>Pipeline</td><td>This is not editable, but provides a link to the pipeline so you want to look up details of the pipeline.</td></tr><tr><td>User tags (optional)</td><td>One or more tags used to filter the analysis list. Select from existing tags or type a new tag name in the field.</td></tr><tr><td>Notification (optional)</td><td>Enter your email address if you want to be notified when the analysis completes.</td></tr><tr><td>Output Folder<sup>1</sup></td><td>Select a folder in which the <strong>output folder of the analysis</strong> should be located. When <strong>no folder is selected</strong>, the output folder will be located in the <strong>root of the project</strong>.<br><br>When you open the folder selection dialog, you have the option to <strong>create a new folder</strong> (bottom of the screen). You can create nested folders by using the <code>folder/subfolder</code> syntax.<br><em>Do not use a / before the first folder or after the last subfolder in the folder creation dialog.</em></td></tr><tr><td>Logs Folder</td><td><p>Select a folder in which the <strong>logs of the analysis</strong> should be located. When <strong>no logs folder is selected</strong>, the logs will be stored as <strong>subfolder in the output folder</strong>. When a logs folder is selected which is different from the output folder, the outputs and logs folders are separated.<br>Files that already exist in the logs folder will be overwritten with new versions.<br><br>When you open the folder selection dialog, you have the option to <strong>create a new folder</strong> (bottom of the screen). You can create nested folders by using the <code>folder/subfolder</code> syntax.</p><p><em><strong>Note:</strong> Choose a folder that is empty and not in use for other analyses, as files will be overwritten.</em></p><p><br><em><strong>Note:</strong> Do not use a / before the first folder or after the last subfolder in the folder creation dialog.</em></p></td></tr><tr><td>Input</td><td>Select the input files to use in the analysis. (max. 50,000)</td></tr><tr><td>Settings (optional)</td><td>Provide input settings.</td></tr><tr><td>Resources</td><td>Select the storage size for your analysis. The available storage sizes depend on your selected Pricing subscription. See <a href="../../../reference/r-pricing#data-storage">Storage</a> for more information.</td></tr></tbody></table>

<sup>1</sup> When using the API, you can [redirect analysis outputs](https://help.connected.illumina.com/connected-analytics/project/f-analyses#analysis-output-mappings) to be outside of the current project.

## Aborting Analyses

You can abort a running analysis from either the analysis overview (**Projects > your\_project > Flow > Analyses > your\_analysis > Manage > Abort**) or from the analysis details (**Projects > your\_project > Flow > Analyses > your\_analysis > Details tab > Abort**).

## View Analysis Results

You can view analysis results on the Analyses page or in the output folder on the Data page. You can also **rerun your analysis** from here.

1. Select a project, and then select the **Projects > your\_project > Flow > Analyses** page.
2. Select the desired analysis.
3. From the output files tab, expand the list if needed and select an output file.
   * If you want to **add or remove** any user or technical **tags**, you can do so from the data details view.
   * If you want to **download** the file, select Download.
   * To see the data in the data view where you can easily navigate between files, select **Open in data**.
4. To preview the file, select the **View** tab.

To see more details of your analyses, return to **Projects > your\_project > Flow > Analyses > your\_analysis**. The following tabs will be visible (depending on pipeline):

* **Details** - View information on the pipeline configuration.
* **Output files** - View the output of the Analysis.
* **Steps** - stderr and stdout information.
* **CWL** - The CWL pipeline definition.
* **Nextflow timeline** - Nextflow process execution timeline.
* **Nextflow execution** - Nextflow analysis report. Showing the run times, commands, resource usage and tasks for Nextflow analyses.
* **Report** - Shows the reports defined on the [pipeline report ](#report-code)tab.

{% hint style="info" %}
Other tabs can be available depending on your chosen pipeline.
{% endhint %}
