# Nextflow

ICA supports running pipelines defined using [Nextflow](https://www.nextflow.io/). See [this tutorial](https://help.connected.illumina.com/connected-analytics/tutorials/nextflow/nextflow-dragen-pipeline) for an example.

In order to run Nextflow pipelines, the following process-level attributes within the Nextflow definition must be considered.

## System Information

{% hint style="warning" %}
Version 20.10 on Illumina Connected Analytics will be obsoleted on April 22nd, 2026. After this date, all existing pipelines using Nextflow v20.10 will no longer run.

See [Planned Obsolescence Notice](https://illumina.seismic.com/Link/Content/DC7Wg3dQRBp928fPCMMJM3X3fP98)
{% endhint %}

<table data-header-hidden><thead><tr><th width="162.61328125"></th><th></th></tr></thead><tbody><tr><td>Nextflow version</td><td>22.04 (deprecated ⚠️), 24.10 (supported ✅), 25.10 (default ⭐)</td></tr><tr><td>Executor</td><td>Kubernetes</td></tr></tbody></table>

The following table shows when which Nextflow version is

* default (⭐) This version will be proposed when creating a new Nextflow pipeline.
* supported (✅) This version can be selected when you do not want the default Nextflow version.
* deprecated (⚠️) This version can not be selected for new pipelines, but pipelines using this version will still work.
* removed (❌). This version can not be selected when creating new pipelines and pipelines using this version will no longer work.

The switchover happens in the **January** release of that year.

<table><thead><tr><th width="116.43359375">Nextflow Version</th><th width="157" align="center">2025</th><th align="center">2026</th><th align="center">2027</th><th align="center">2028</th></tr></thead><tbody><tr><td>v20.10</td><td align="center">⚠️</td><td align="center">❌</td><td align="center">❌</td><td align="center">❌</td></tr><tr><td>v22.04</td><td align="center">✅</td><td align="center">⚠️</td><td align="center">❌</td><td align="center">❌</td></tr><tr><td>v24.10</td><td align="center">⭐</td><td align="center">✅</td><td align="center">✅</td><td align="center">✅</td></tr><tr><td>v25.10</td><td align="center">​</td><td align="center">⭐</td><td align="center">⭐</td><td align="center">✅</td></tr><tr><td>v26.10</td><td align="center">​</td><td align="center">​</td><td align="center">✅</td><td align="center">⭐</td></tr><tr><td>v27.10</td><td align="center">​</td><td align="center">​</td><td align="center">​</td><td align="center">✅</td></tr></tbody></table>

### Nextflow Version

You can select the Nextflow version while building a pipeline as follows:

<table data-header-hidden><thead><tr><th width="136.5">interface</th><th>Location</th></tr></thead><tbody><tr><td>GUI</td><td>Select the Nextflow version at <strong>Projects > your_project > flow > pipelines > your_pipeline > Details tab</strong>.</td></tr><tr><td>API</td><td>Select the Nextflow version by setting it in the optional field "<code>pipelineLanguageVersionId</code>". When not set, a default Nextflow version will be used for the pipeline.</td></tr></tbody></table>

## Compute Type

To specify a compute type for a Nextflow process, you can either define the cpu and memory (recommended) or use the compute type [predefined](https://help.connected.illumina.com/connected-analytics/project/p-flow/f-pipelines/..#compute-types) sizes (required for specific hardware such as FPGA2).

{% hint style="info" %}
Do not mix these definition methods within the same process2, use either one or the other method.
{% endhint %}

### CPU and Memory

Specify the task resources using Nextflow directives in both the workflow script (.nf) and the configuration file (nextflow\.config) `cpus` defines the number of CPU cores allocated to the process, `memory` defines the amount of RAM which will be allocated.

**Process file** example

```nf
process ALIGN {
    cpus = 4
    memory = '16 GB'
    script:
    """
    your_command_here
    """
}
```

**Configuration file** example

```nextflow
process {
    withName: ALIGN {
        cpus = 4
        memory = '16 GB'
    }
}
```

ICA will convert the required resources to the correct predefined size. This enables porting public Nextflow pipelines without configuration changes.

### Predefined Sizes

To use the predefined sizes, use the [pod directive](https://www.nextflow.io/docs/latest/process.html#process-pod) within each process. Set the `annotation` to `scheduler.illumina.com/presetSize` and the `value` to the desired compute type. The default compute type, when this directive is not specified, is `standard-small` (2 CPUs and 8 GB of memory).

For example, if you want to use [FPGA 2 medium](https://help.connected.illumina.com/connected-analytics/reference/r-pricing#compute), you need to add the line below

```groovy
pod annotation: 'scheduler.illumina.com/presetSize', value: 'fpga2-medium'
```

{% hint style="info" %}
Often, there is a need to select the compute size for a process dynamically based on user input and other factors. The Kubernetes executor used on ICA does not use the `cpu` and `memory`directives, so instead, you can dynamically set the `pod` directive, as mentioned [here](https://www.nextflow.io/docs/latest/process.html#dynamic-directives). e.g.

```groovy
process foo {
    // Assuming that params.compute_size is set to a valid size such as 'standard-small', 'standard-medium', etc.
    pod annotation: 'scheduler.illumina.com/presetSize', value: "${params.compute_size}"
}
```

It can also be specified in the [configuration file](https://www.nextflow.io/docs/latest/config.html). See the example configuration below:

```groovy
// Set the default pod
pod = [
    annotation: 'scheduler.illumina.com/presetSize',
    value     : 'standard-small'
]

withName: 'big_memory_process' {
    pod = [
        annotation: 'scheduler.illumina.com/presetSize',
        value     : 'himem-large'
    ]
}

// Use an FPGA2 instance for dragen processes
withLabel: 'dragen' {
    pod = [
        annotation: 'scheduler.illumina.com/presetSize',
        value     : 'fpga2-medium'
    ]
}
```

{% endhint %}

### Standard vs Economy

#### Concept

For each compute type, you can choose between the

* `scheduler.illumina.com/lifecycle: standard` - [**AWS on-demand**](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-on-demand-instances.html) (Default) or
* `scheduler.illumina.com/lifecycle: economy` - [**AWS spot instance**](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html) tiers.

<table><thead><tr><th width="124.84375"></th><th>On-Demand Instance</th><th>Spot Instance</th></tr></thead><tbody><tr><td>Pricing</td><td>Fixed <a href="../../../reference/r-pricing">price</a> per second with 60-second minimum.</td><td><a href="../../../reference/r-pricing">Cheaper</a> than On-Demand.</td></tr><tr><td>Availability</td><td>Guaranteed capacity with Full control of starting, stopping, and terminating.</td><td>Not guaranteed. Depends on unused AWS capacity. Can be terminated and reclaimed by AWS when the capacity is needed for other processes with <strong>2 minutes notice</strong>.</td></tr><tr><td>Best for</td><td>Ideal for critical workloads and urgent scaling needs.</td><td>Best for cost optimization and non-critical workloads as interruptions can occur any time.</td></tr></tbody></table>

#### Configuration

You can switch to economy in the process itself with the pod directive or in the nextflow\.config file.

Process example

```groovy
process foo {
    pod annotation: 'scheduler.illumina.com/lifecycle', value: "economy"
}
```

nextlow\.config example

```groovy
process.withName: PROCESS_NAME {
    pod.annotations = [
        'scheduler.illumina.com/lifecycle': 'economy'
    ]
}
```

## Inputs

Inputs are specified via the [JSON-based input](https://help.connected.illumina.com/connected-analytics/project/p-flow/f-pipelines/json-based-input-forms) form or [XML input form](https://help.connected.illumina.com/connected-analytics/project/p-flow/f-pipelines/pi-inputform). The specified `code` in the XML will correspond to the field in the `params` object that is available in the workflow. Refer to the [tutorial](https://help.connected.illumina.com/connected-analytics/tutorials/nextflow/nextflow-dragen-pipeline) for an example.

## Outputs

Outputs for Nextflow pipelines are uploaded from the `out` folder in the attached shared filesystem. The [`publishDir` directive](https://www.nextflow.io/docs/latest/process.html#publishdir) can be used to symlink (recommended), copy or move data to the correct folder. Symlinking is faster and does not increase storage cost as it creates a file pointer instead of copying or moving data. Data will be uploaded to the ICA project after the pipeline execution completes.

```groovy
publishDir 'out', mode: 'symlink'
```

<details>

<summary>Nextflow version 20.10.10 (Deprecated)</summary>

**Version 20.10 will be obsoleted on April 22nd, 2026. After this date, all existing pipelines using Nextflow v20.10 will no longer be able to run.**

For Nextflow version 20.10.10 on ICA, using the "copy" method in the `publishDir` directive for uploading output files that consume large amounts of storage may cause workflow runs to complete with missing files. The underlying issue is that file uploads may silently fail (without any error messages) during the `publishDir` process due to insufficient disk space, resulting in incomplete output delivery.

Solutions:

1. Use "[symlink](https://help.ica.illumina.com/project/p-flow/f-pipelines/pi-nextflow#outputs)" instead of "copy" in the `publishDir` directive. Symlinking creates a link to the original file rather than copying it, which doesn’t consume additional disk space. This can prevent the issue of silent file upload failures due to disk space limitations.
2. Use Nextflow 22.04 or later and enable the "[failOnError](https://www.nextflow.io/docs/latest/process.html#publishdir)" `publishDir` option. This option ensures that the workflow will fail and provide an error message if there's an issue with publishing files, rather than completing silently without all expected outputs.

</details>

## Nextflow Configuration

During execution, the Nextflow pipeline runner determines the environment settings based on values passed via the command-line or via a configuration file (see [Nextflow Configuration documentation](https://www.nextflow.io/docs/latest/config.html)). When creating a Nextflow pipeline, use the nextflow\.config tab in the UI (or API) to specify a nextflow configuration file to be used when launching the pipeline.

Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list.

![nextflowconfig-0](https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-91589a17250ffaba63396eed4fd29dfa17957071%2Fnextflowconfig-0purple.png?alt=media)

{% hint style="info" %}
If no Docker image is specified, Ubuntu will be used as default.
{% endhint %}

The following configuration settings will be ignored if provided as they are overridden by the system:

```yaml
executor.name
executor.queueSize
k8s.namespace
k8s.serviceAccount
k8s.launchDir
k8s.projectDir
k8s.workDir
k8s.storageClaimName
k8s.storageMountPath
trace.enabled
trace.file
trace.fields
timeline.enabled
timeline.file
report.enabled
report.file
dag.enabled
dag.file
```

## Best Practices

### Process Time

Setting a timeout to between 2 and 4 times the expected processing time with the [**time**](https://www.nextflow.io/docs/latest/reference/process.html#process-time) directive for processes or task will ensure that no stuck processes remain indefinitely. Stuck process keep incurring costs for the occupied resources, so if the process can not complete within that timespan, it is safer and more economical to end the process and retry.

### Sample Sheet File Ingestion

When you want to use a sample sheet with references to files as Nextflow input, add an extra input to the pipeline. This extra input lets the user select the samplesheet-mentioned files from their project. At run time, those files will get staged in the working directory, and when Nextflow parses the samplesheet and looks for those files without paths, it will find them there. You can not use file paths in a sample sheet without selecting the files in the input form because files are only passed as file/folder ids in the API payload when the analysis is launched.

You can include public data such as **http urls** because Nextflow is able download those. Nextflow is also able to download publicly accessible **S3 urls** (s3://...). You can **not** use Illumina's **urn**:ilmn:ica:region:... structure.

### Migrating from FPGA to FPGA2

New versions of existing **DRAGEN workflows** have been created to **support F2 (FPGA2)** instances as F1 (FPGA) instances have been decommissioned. Please consult the [DRAGEN BSSH/ICA end of life roadmap](https://help.dragen.illumina.com/reference/eol-transition#dragen-bssh-ica-workflows-end-of-life-roadmap) for more information. You will need to migrate your pipelines from FPGA to FPGA2.

As long as your pipeline is still in [draft](https://help.connected.illumina.com/connected-analytics/project/p-flow/f-pipelines/..#pipeline-statuses) status, you can update them with the FPGA2 configuration, but once the pipeline has been [released](https://help.connected.illumina.com/connected-analytics/project/p-flow/f-pipelines/..#pipeline-statuses), you need to **clone and edit the pipeline** as it is protected against editing. Cloning the pipeline is done at **projects > your\_project > Flow > Pipelines > open pipeline details > Clone** (top right). Setting the compute resources can be done in the **.nf** file directly or in the **nextflow\.config** file.

#### .nf

```
process DRAGEN_PROCESS {
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'fpga2-medium'
 
    script:
    """
    your_command_here
    """
}
```

#### nextflow\.config

```
process {
    withLabel: 'dragen' {
        pod = [
            annotation: 'scheduler.illumina.com/presetSize',
            value     : 'fpga2-medium'
        ]
    }
}
```
