1 of 6

Nextflow Pipeline

In this tutorial, we will show how to create and launch a pipeline using the Nextflow language in ICA.

This tutorial references the example in the Nextflow documentation.

Create the pipeline

The first step in creating a pipeline is to create a project. For instructions on creating a project, see the page. In this tutorial, we'll use a project called "Getting Started".

After creating the project, select the project from the Projects view to enter the project. Within the project, navigate to the Flow > Pipelines view in the left navigation pane. From the Pipelines view, click +Create Pipeline > Nextflow > XML based to start creating the Nextflow pipeline.

In the Nextflow pipeline creation view, the Information tab is used to add information about the pipeline. Add values for the required Code (unique pipeline name) and Description fields.

Add the container directive to each process with the latest ubuntu image. If no Docker image is specified, public.ecr.aws/lts/ubuntu:22.04_stable is used as default.
Add the publishDir directive with value 'out' to the reverse process.
Modify the reverse process to write the output to a file test.txt instead of stdout.

The description of the pipeline from the linked Nextflow docs:

This example shows a pipeline that is made of two processes. The first process receives a FASTA formatted file and splits it into file chunks whose names start with the prefix seq_.
The process that follows, receives these files and it simply reverses their content by using the rev command line tool.

Syntax example:

Navigate to the Nextflow files > main.nf tab to add the definition to the pipeline. Since this is a single file pipeline, we won't need to add any additional definition files. Paste the following definition into the text editor:

Next we'll create the input form used when launching the pipeline. This is done through the XML Configuration tab. Since the pipeline takes in a single FASTA file as input, the XML-based input form will include a single file input.

Paste the below XML input form into the XML CONFIGURATION text editor. Click the Generate button to preview the launch form fields.

With the definition added and the input form defined, the pipeline is complete.

On the Documentation tab, you can fill out additional information about your pipeline. This information will be presented under the Documentation tab whenever a user starts a new analysis on the pipeline.

Click the Save button at the top right. The pipeline will now be visible from the Pipelines view within the project.

Launch the pipeline

To upload the FASTA file to the project, first navigate to the Data section in the left navigation pane. In the Data view, drag and drop the FASTA file from your local machine into the indicated section in the browser. Once the file upload completes, the file record will show in the Data explorer. Ensure that the format of the file is set to "FASTA".

Now that the input data is uploaded, we can proceed to launch the pipeline. Navigate to the Analyses view and click the button to Start Analysis. Next, select your pipeline from the list. Alternatively you can start your pipeline from Projects > your_project > Flow > Pipelines > Start new analysis.

In the Launch Pipeline view, the input form fields are presented along with some required information to create the analysis.

Enter a User Reference (identifier) for the analysis. This will be used to identify the analysis record after launching.
Set the Entitlement Bundle (there will typically only be a single option).
In the Input Files section, select the FASTA file for the single input file. (chr1_GL383518v1_alt.fa)
Set the Storage size to small. This will attach a 1.2TB shared file system to the environment used to run the pipeline.

With the required information set, click the button to Start Analysis.

Monitor Analysis

After launching the pipeline, navigate to the Analyses view in the left navigation pane.

The analysis record will be visible from the Analyses view. The Status will transition through the analysis states as the pipeline progresses. It may take some time (depending on resource availability) for the environment to initialize and the analysis to move to the In Progress status.

Click the analysis record to enter the analysis details view.

Once the pipeline succeeds, the analysis record will show the "Succeeded" status. Do note that this may take considerable time if it is your first analysis because of the required resource management. (in our example, the analysis took 28 minutes)

From the analysis details view, the logs produced by each process within the nextflow pipeline are accessible via the Logs tab.

View Results

Analysis outputs are written to an output directory in the project with the naming convention {Analysis User Reference}-{Pipeline Code}-{GUID}. (1)

Inside of the analysis output directory are the files output by the analysis processes written to the 'out' directory. In this tutorial, the file test.txt (2) is written to by the reverse process. Navigating into the analysis output directory, clicking into the test.txt file details, and opening the VIEW tab (3) shows the output file contents.

The "Download" button (4) can be used to download the data to the local machine.

Nextflow DRAGEN Pipeline

In this tutorial, we will demonstrate how to create and launch a simple DRAGEN pipeline using the Nextflow language in ICA GUI. More information about Nextflow on ICA can be found here. For this example, we will implement the alignment and variant calling example from this DRAGEN support page for Paired-End FASTQ Inputs.

Prerequisites

The first step in creating a pipeline is to select a project for the pipeline to reside in. If the project doesn't exist, create a project. For instructions on creating a project, see the Projects page. In this tutorial, we'll use a project called Getting Started.

After a project has been created, a DRAGEN bundle must be linked to a project to obtain access to a DRAGEN docker image. Enter the project by clicking on it, and click Edit in the Project Details page. From here, you can link a DRAGEN Demo Tool bundle into the project. The bundle that is selected here will determine the DRAGEN version that you have access to. For this tutorial, you can link DRAGEN Demo Bundle 3.9.5. Once the bundle has been linked to your project, you can now access the docker image and version by navigating back to the All Projects page, clicking on Docker Repository, and double clicking on the docker image dragen-ica-4.0.3. The URL of this docker image will be used later in the container directive for your DRAGEN process defined in Nextflow.

Creating the pipeline

Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click +Create Pipeline > Nextflow > XML based to start creating a Nextflow pipeline.

In the Nextflow pipeline creation view, the Details tab is used to add information about the pipeline. Add values for the required Code (pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values. For the customized DRAGEN pipeline, Nextflow Version must be changed to 22.04.3.

Next, add the Nextflow pipeline definition by navigating to the Nextflow files > MAIN.NF tab. You will see a text editor. Copy and paste the following definition into the text editor. Modify the container directive by replacing the current URL with the URL found in the docker image dragen-ica-4.0.3.

nextflow.enable.dsl = 2

process DRAGEN {

    // The container must be a DRAGEN image that is included in an accepted bundle and will determine the DRAGEN version
    container '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/7ecddc68-f08b-4b43-99b6-aee3cbb34524:latest'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'fpga-medium'
    pod annotation: 'volumes.illumina.com/scratchSize', value: '1TiB'

    // ICA will upload everything in the "out" folder to cloud storage 
    publishDir 'out', mode: 'symlink'

    input:
        tuple path(read1), path(read2)
        val sample_id
        path ref_tar

    output:
        stdout emit: result
        path '*', emit: output

    script:
        """
        set -ex
        mkdir -p /scratch/reference
        tar -C /scratch/reference -xf ${ref_tar}
        
        /opt/edico/bin/dragen --partial-reconfig HMM --ignore-version-check true
        /opt/edico/bin/dragen --lic-instance-id-location /opt/instance-identity \\
            --output-directory ./ \\
            -1 ${read1} \\
            -2 ${read2} \\
            --intermediate-results-dir /scratch \\
            --output-file-prefix ${sample_id} \\
            --RGID ${sample_id} \\
            --RGSM ${sample_id} \\
            --ref-dir /scratch/reference \\
            --enable-variant-caller true
        """
}

workflow {
    DRAGEN(
        Channel.of([file(params.read1), file(params.read2)]),
        Channel.of(params.sample_id),
        Channel.fromPath(params.ref_tar)
    )
}

To specify a compute type for a Nextflow process, use the pod directive within each process.

Outputs for Nextflow pipelines are uploaded from the out directory in the attached shared filesystem. The publishDir directive specifies the output folder for a given process. Only data moved to the out folder using the publishDir directive will be uploaded to the ICA project after the pipeline finishes executing.

Refer to the ICA help page for details on ICA specific attributes within the Nextflow definition.

Next, create the input form used for the pipeline. This is done through the XML CONFIGURATION tab. More information on the specifications for the input form can be found in Input Form page.

This pipeline takes two FASTQ files, one reference file and one sample_id parameter as input.

Paste the following XML input form into the XML CONFIGURATION text editor.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
    <pd:dataInputs>
        <pd:dataInput code="read1" format="FASTQ" type="FILE" required="true" multiValue="false">
            <pd:label>FASTQ Read 1</pd:label>
            <pd:description>FASTQ Read 1</pd:description>
        </pd:dataInput>
        <pd:dataInput code="read2" format="FASTQ" type="FILE" required="true" multiValue="false">
            <pd:label>FASTQ Read 2</pd:label>
            <pd:description>FASTQ Read 2</pd:description>
        </pd:dataInput>
        <pd:dataInput code="ref_tar" format="TAR" type="FILE" required="true" multiValue="false">
            <pd:label>Reference</pd:label>
            <pd:description>Reference TAR</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps>
        <pd:step execution="MANDATORY" code="General">
            <pd:label>General</pd:label>
            <pd:description></pd:description>
            <pd:tool code="generalparameters">
                <pd:label>General Parameters</pd:label>
                <pd:description></pd:description>
                <pd:parameter code="sample_id" minValues="1" maxValues="1" classification="USER">
                    <pd:label>Sample ID</pd:label>
                    <pd:description></pd:description>
                    <pd:stringType/>
                    <pd:value></pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>
</pd:pipeline>

Click the Generate button (at the bottom of the text editor) to preview the launch form fields.

Click the Save button to save the changes.

The dataInputs section specifies file inputs, which will be mounted when the workflow executes. Parameters defined under the steps section refer to string and other input types.

Each of the dataInputs and parameters can be accessed in the Nextflow within the workflow's params object named according to the code defined in the XML (e.g. params.sample_id).

Running the pipeline

If you have no test data available, you need to link the Dragen Demo Bundle to your project at Projects > your_project > Project Settings > Details > Linked Bundles.

Go to the pipelines page from the left navigation pane. Select the pipeline you just created and click Start New Analysis.

Fill in the required fields indicated by red "*" sign and click on Start Analysis button.

You can monitor the run from the analysis page.

Once the Status changes to Succeeded, you can click on the run to access the results page.

Useful Links

Nextflow: Scatter-gather Method

Nextflow offers support for Scatter-gather pattern natively. The initial example uses this pattern by splitting the FASTA file into chunks to channel records in the task splitSequences, then by processing these chunks in the task reverse.

In this tutorial, we will create a pipeline which will split a TSV file into chunks, sort them, and merge them together.

Creating the pipeline

Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create pipeline > Nextflow > XML based button to start creating a Nextflow pipeline.

In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.

First, we present the individual processes. Select +Nextflow files > + Create file and label the file split.nf. Copy and paste the following definition.

process split {
    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
    cpus 1
    memory '512 MB'
    
    input:
    path x
    
    output:
    path("split.*.tsv")
    
    """
    split -a10 -d -l3 --numeric-suffixes=1 --additional-suffix .tsv ${x} split.
    """
    }

Next, select +Create file and name the file sort.nf. Copy and paste the following definition.

process sort {
    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
    cpus 1
    memory '512 MB'
    
    input:
    path x
    
    output:
    path '*.sorted.tsv'
    
    """
    sort -gk1,1 $x > ${x.baseName}.sorted.tsv
    """
}

Select +Create file again and label the file merge.nf. Copy and paste the following definition.

process merge {
    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
    cpus 1
    memory '512 MB'

    publishDir 'out', mode: 'symlink'
    
    input:
    path x
    
    output:
    path 'merged.tsv'
    
    """
    cat $x > merged.tsv
    """
}

Add the corresponding main.nf file by navigating to the Nextflow files > main.nf tab and copying and pasting the following definition.

nextflow.enable.dsl=2
 
include { sort } from './sort.nf'
include { split } from './split.nf'
include { merge } from './merge.nf'
 
 
params.myinput = "test.test"
 
workflow {
    input_ch = Channel.fromPath(params.myinput)
    split(input_ch)
    sort(split.out.flatten())
    merge(sort.out.collect())
}

Here, the operators flatten and collect are used to transform the emitting channels. The Flatten operator transforms a channel in such a way that every item of type Collection or Array is flattened so that each single entry is emitted separately by the resulting channel. The collect operator collects all the items emitted by a channel to a List and return the resulting object as a sole emission.

Finally, copy and paste the following XML configuration into the XML Configuration tab.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
    <pd:dataInputs>
        <pd:dataInput code="myinput" format="TSV" type="FILE" required="true" multiValue="false">
            <pd:label>myinput</pd:label>
            <pd:description></pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps/>
</pd:pipeline>

Click the Generate button (at the bottom of the text editor) to preview the launch form fields.

Click the Save button to save the changes.

Running the pipeline

Go to the Pipelines page from the left navigation pane. Select the pipeline you just created and click Start New Analysis.

Fill in the required fields indicated by red "*" sign and click on Start Analysis button. You can monitor the run from the Analyses page. Once the Status changes to Succeeded, you can click on the run to access the results page.

Select Projects > your_project > Flow > Analyses, and open the Logs tab. From the log files, it is clear that in the first step, the input file is split into multiple chunks, then these chunks are sorted and merged.

Nextflow: Pipeline Lift

Nextflow Pipeline Liftover

nextflow-to-icav2-config

This is an to help develop Nextflow pipelines that will run successfully on ICA. There are some syntax bugs that may get introduced in your Nextflow code. One suggestion is to run the steps as described below and then open these files in VisualStudio Code with the Nextflow plugin installed. You may also need to run smoke tests on your code to identify syntax errors you might not catch upon first glance.

This is not an official Illumina product, but is intended to make your Nextflow experience in ICA more fruitful.

Some examples of Nextflow pipelines that have been lifted over with this repo can be found .

Some additional examples of ICA-ported Nextflow pipelines are .

Some additional repos that can help with your ICA experience can be found below:

Relaunch pipeline analysis and
Monitor your analysis run in ICA and troubleshoot
Wrap a WDL-based workflow in a
Wrap a Nextflow-based workflow in a

Local testing your Nextflow pipeline after using these scripts

This will allow you to test your main.nf script. If you have a Nextflow pipeline that is more nf-core like (i.e. where you may have several subworkflow and module files), this may be more appropriate. Any and all comments are welcome.

What do these scripts do?

What these scripts do:

Parse configuration files and the Nextflow scripts (main.nf, workflows, subworkflows, modules) of a pipeline and update the configuration of the pipeline with pod directives to tell ICA what compute instance to run

Strips out parameters that ICA utilizes for workflow orchestration
Migrates manifest closure to conf/base.ica.config file
Ensures that docker is enabled

Adds workflow.onError (main.nf, workflows, subworkflows, modules) to aid troubleshooting
Modifies the processes that reference scripts and tools in the bin/ directory of a pipeline's projectDir, so that when ICA orchestrates your Nextflow pipeline, it can find and properly execute your pipeline process
Additional edits to ensure your pipeline runs more smoothly on ICA

ICA concepts to better understand ICA liftover of Nextflow pipelines

Nextflow workflows on ICA are orchestrated by kubernetes and require a parameters XML file containing data inputs (i.e. files + folders) and other string-based options for all configurable parameters to properly be passed from ICA to your Nextflow workflows
Nextflow processes will need to contain a reference to a container --- a Docker image that will run that specific process
Nextflow processes will need a pod annotation specified for ICA to know what instance type to run the process.

Using these scripts

The scripts mentioned below can be run in a docker image keng404/nextflow-to-icav2-config:0.0.3

This has:

nf-core installed
All Rscripts in this repo with relevant R libraries installed
The ICA CLI installed, to allow for pipeline creation and CLI templates to request pipeline runs after the pipeline is created in ICA

You'll likely need to run the image with a docker command like this for you to be able to run git commands within the container:

where pwd is your $HOME directory

Prerequitsites

STEP 0 Github credentials

STEP 1 [OPTIONAL] : create JSON of nf-core pipeline metadata or specify pipeline of interest

If you have a specific pipeline from Github, you can skip this statement below.

You'll first need to download the python module from nf-core via a pip install nf-core command. Then you can use nf-core list --json to return a JSON metadata file containing current pipelines in the nf-core repository.

You can choose which pipelines to git clone, but as a convenience, the wrapper nf-core.conversion_wrapper.R will perform a git pull, parse nextflow_schema.json files and generate parameter XML files, and then read configuration and Nextflow scripts and make some initial modifications for ICA development. Lastly, these pipelines are created in an ICA project of your choosing, so you will need to generate and download an API key from the ICA domain of your choosing.

STEP 2: Obtain API key file

STEP 3: Create a project in ICA

STEP 4: Download and configure the ICA CLI (see STEP 2):

The Project view should be the default view after logging into your private domain (https://my_domain.login.illumina.com) and clicking on your ICA 'card' ( This will redirect you to https://illumina.ica.com/ica).

Let's do some liftovers

GIT_HUB_URL can be specified to grab pipeline code from github. If you intend to liftover anything in the master branch, your GIT_HUB_URL might look like https://github.com/keng404/my_pipeline. If there is a specific release tag you intend to use, you can use the convention https://github.com/keng404/my_pipeline:my_tag.

Alternatively, if you have a local copy/version of a Nextflow pipeline you'd like to convert and use in ICA, you can use the --pipeline-dirs argument to specify this.

In summary, you will need the following prerequisites, either to run the wrapper referenced above or to carry out individual steps below.

git clone nf-core pipelines of interest
Install the python module nf-core and create a JSON file using the command line nf-core list --json > {PIPELINE_JSON_FILE}

A detailed step-by-step breakdown of what `nf-core.conversion_wrapper.R` does for each Nextflow pipeline

A Nextflow schema JSON is generated by nf-core's python library nf-core
nf-core can be installed via a pip install nf-core command

Step 2: Create a `nextflow.config` and a `base config` file so that it is compatible with ICA.

This script will update your configuration files so that it integrates better with ICA. The flag --is-simple-config will create a base config file from a template. This flag will also be active if no arguments are supplied to --base-config-files.

Step 3: Add helper-debug code and other modifications to your Nextflow pipeline

This step adds some updates to your module scripts to allow for easier troubleshooting (i.e. copy work directory back to ICA if an analysis fails). It also allows for ICA's orchestration of your Nextflow pipeline to properly handle any script/binary in your bin/ directory of your pipeline $projectDir.

Step 4: Update XML to add parameter options --- if your pipeline uses/could use iGenomes

You may have to edit your {PARAMETERS_XML} file if these edits are unnecessary.

Step 5: Sanity check your pipeline code to see if it is valid prior to uploading it into ICA

[NOTE: 04-10-2023] Currently ICA supports Nextflow versions nextflow/nextflow:22.04.3 and nextflow/nextflow:20.10.0

Step 6: Create a pipeline in ICA by using the following helper script `nf-core.create_ica_pipeline.R`

Developer mode --- if you plan to develop or modify a pipeline in ICA

Add the flag --developer-mode to the command line above if you have custom groovy libraries or modules files referenced in your workflow. When this flag is specified, the script will upload these files and directories to ICA and update the parameters XML file to allow you to specify directories under the parameters project_dir and files under input_files. This will ensure that these files and directories will be placed in the $workflow.launchDir when the pipeline is invoked.

How to run a pipeline in ICA via CLI

As a convenience, you can also get a templated CLI command to help run a pipeline (i.e. submit a pipeline request) in ICA via the following:

There will be a corrsponding JSON file (i.e. a file with a file extension *ICAv2_CLI_template.json) that saves these values that one could modify and configure to build out templates or launch the specific pipeline run you desire. You can specify the name of this JSON file with the parameter --output-json.

Once you modify this file, you can use --template-json and specify this file to create the CLI you can use to launch your pipeline.

If you have a previously successful analysis with your pipeline, you may find this approach more useful.

Creating your own tests/pipeline runs via the CLI

Where possible, these scripts search for config files that refer to a test (i.e. test.config,test_full.config,test*config) and creates a boolean parameter params.ica_smoke_test that can be toggled on/off as a sanity check that the pipeline works as intended. By default, this parameter is set to false.

When set to true, these test config files are loaded in your main nextflow.config.

Additional todos

Nextflow: Pipeline Lift: RNASeq

How to lift a simple NextFlow pipeline?

In this tutorial, we will be using the example RNASeq pipeline to demonstrate the process of lifting a simple Nextflow pipeline over to ICA.

This approach is applicable in situations where your main.nf file contains all your pipeline logic and illustrates what the liftover process would look like.

Creating the pipeline

Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create pipeline > Nextflow > XML based button to start creating a Nextflow pipeline.

In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.

How to modify the main.nf file

The XML configuration

In the XML configuration, the input files and settings are specified. For this particular pipeline, you need to specify the transcriptome and the reads directory. Navigate to the XML Configuration tab and paste the following:

Click the Generate button (at the bottom of the text editor) to preview the launch form fields.

Click the Save button to save the changes.

Running the pipeline

Go to the Pipelines page from the left navigation pane. Select the pipeline you just created and click Start New Analysis.

Nextflow CLI Workflow

In this tutorial, we will demonstrate how to create and launch a Nextflow pipeline using the ICA command line interface (CLI).

Installation

Please refer to for installing ICA CLI. To authenticate, please follow the steps in the page.

Tutorial project

In this tutorial, we will create in ICA. The workflow includes four processes: index creation, quantification, FastQC, and MultiQC. We will also upload a Docker container to the ICA Docker repository for use within the workflow.

The 'main.nf' file defines the workflow that orchestrates various RNASeq analysis processes.

main.nf

The script uses the following tools:

Salmon: Software tool for quantification of transcript abundance from RNA-seq data.
FastQC: QC tool for sequencing data
MultiQC: Tool to aggregate and summarize QC reports

Docker image upload

docker pull nextflow/rnaseq-nf

Create a tarball of the image to upload to ICA.

Following are lists of commands that you can use to upload the tarball to your project.

Add the image to the ICA Docker repository

The uploaded image can be added to the ICA docker repository from the ICA Graphical User Interface (GUI).

Change the format for the image tarball to DOCKER:

Navigate to Projects > <your_project> Data
Check the checkbox for the uploaded tarball
Click on "Manage" dropdown
Click on "Change format" In the new popup window, select "DOCKER" format and hit save.

To add this image to the ICA Docker repository, first click on "All Projects" to go back to the home page.

From the ICA home page, click on the "Docker Repository" page under "System Settings"
Click the "+ New" button to open the "New Docker Image" window.
In the new window, click on the "Select a file with DOCKER format"

This will open a new window that lets you select the above tarball.

Select the region (US, EU, CA) your project is in.
Select your project. You can start typing the name in the textbox to filter it.
The bottom pane will show the "Data" section of the selected project. If you have the docker image in subfolders, browse the folders to locate the file. Once found, click on the checkbox corresponding to the image and press "Select".

You will be taken back to the "New Docker image" window. The "Data" and "Name" fields will have been populated based on the imported image. You can edit the "Name" field to rename it. For this tutorial, we will change the name to "rnaseq". Select the region, and give it a version number, and description. Click on "Save".

If you have the images hosted in other repositories, you can add them as external image by clicking the "+ New external image" button and completing the form as shown in the example below.

After creating a new docker image, you can double click on the image to get the container URL for the nextflow configuration file.

Nextflow configuration file

Create a configuration file called "nextflow.config" in the same directory as the main.nf file above. Use the URL copied above to add the process.container line in the config file.

Parameters file

An empty form looks as follows:

The input files are specified within a single dataInputs node with individual input file specified in a separate dataInput node. Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc..

For this tutorial, we do not have any settings parameters but it requires multiple file inputs. The parameters.xml file looks as follows:

Use the following commands to create the pipeline with the above workflow in your project.

If not already in the project context, enter it by using the following command:

icav2 enter <PROJECT NAME or ID>

Create pipeline using icav2 project pipelines create nextflow Example:

If you prefer to organize the processes in different folders/files, you can use --other parameter to upload the different processes as additional files. Example:

Example command to run the pipeline from CLI:

You can get the pipeline id under "ID" column by running the following command:

You can get the file ids under "ID" column by running the following commands:

Additional Resources:

Nextflow CLI Workflow

In this tutorial, we will demonstrate how to create and launch a Nextflow pipeline using the ICA command line interface (CLI).

Installation

Please refer to for installing ICA CLI. To authenticate, please follow the steps in the page.

Tutorial project

The 'main.nf' file defines the workflow that orchestrates various RNASeq analysis processes.

main.nf

The script uses the following tools:

Salmon: Software tool for quantification of transcript abundance from RNA-seq data.
FastQC: QC tool for sequencing data
MultiQC: Tool to aggregate and summarize QC reports

We need a Docker container consisting of these tools. You can refer to the section in the help page to build your own docker image with the required tools. For the sake of this tutorial, we will use the container from the

Docker image upload

With in your computer, download the image required for this project using the following command.

docker pull nextflow/rnaseq-nf

Create a tarball of the image to upload to ICA.

docker save nextflow/rnaseq-nf > cont_rnaseq.tar

Following are lists of commands that you can use to upload the tarball to your project.

# Enter the project context
icav2 enter docs
# Upload the container image to the root directory (/) of the project
icav2 projectdata upload cont_rnaseq.tar /

Add the image to the ICA Docker repository

The uploaded image can be added to the ICA docker repository from the ICA Graphical User Interface (GUI).

Change the format for the image tarball to DOCKER:

Navigate to Projects > <your_project> Data
Check the checkbox for the uploaded tarball
Click on "Manage" dropdown
Click on "Change format" In the new popup window, select "DOCKER" format and hit save.

To add this image to the ICA Docker repository, first click on "All Projects" to go back to the home page.

From the ICA home page, click on the "Docker Repository" page under "System Settings"
Click the "+ New" button to open the "New Docker Image" window.
In the new window, click on the "Select a file with DOCKER format"

This will open a new window that lets you select the above tarball.

Select the region (US, EU, CA) your project is in.
Select your project. You can start typing the name in the textbox to filter it.
The bottom pane will show the "Data" section of the selected project. If you have the docker image in subfolders, browse the folders to locate the file. Once found, click on the checkbox corresponding to the image and press "Select".

If you have the images hosted in other repositories, you can add them as external image by clicking the "+ New external image" button and completing the form as shown in the example below.

After creating a new docker image, you can double click on the image to get the container URL for the nextflow configuration file.

Nextflow configuration file

Create a configuration file called "nextflow.config" in the same directory as the main.nf file above. Use the URL copied above to add the process.container line in the config file.

process.container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/3cddfc3d-2431-4a85-82bb-dae061f7b65d:latest'

You can add a pod directive within a process or in the config file to specify a compute type. The following is an example of a configuration file with the 'standard-small' compute type for all processes. Please refer to the page for a list of available compute types.

process {
    container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/3cddfc3d-2431-4a85-82bb-dae061f7b65d:latest'
    pod = [
        annotation: 'scheduler.illumina.com/presetSize',
        value: 'standard-small'
    ]  
}

Parameters file

The parameters file defines the workflow input parameters. Refer to the for detailed information for creating correctly formatted parameters files.

An empty form looks as follows:

<pipeline code="" version="1.0" xmlns="xsd://www.illumina.com/ica/cp/pipelinedefinition">
   <dataInputs>
   </dataInputs>
   <steps>
   </steps>
</pipeline>

For this tutorial, we do not have any settings parameters but it requires multiple file inputs. The parameters.xml file looks as follows:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
   <pd:dataInputs>
       <pd:dataInput code="read1" format="FASTQ" type="FILE" required="true" multiValue="false">
           <pd:label>FASTQ Read 1</pd:label>
           <pd:description>FASTQ Read 1</pd:description>
       </pd:dataInput>
       <pd:dataInput code="read2" format="FASTQ" type="FILE" required="true" multiValue="false">
           <pd:label>FASTQ Read 2</pd:label>
           <pd:description>FASTQ Read 2</pd:description>
       </pd:dataInput>
       <pd:dataInput code="transcriptome_file" format="FASTA" type="FILE" required="true" multiValue="false">
           <pd:label>Transcript</pd:label>
           <pd:description>Transcript faster</pd:description>
       </pd:dataInput>
   </pd:dataInputs>
   <pd:steps/>
</pd:pipeline>

Use the following commands to create the pipeline with the above workflow in your project.

If not already in the project context, enter it by using the following command:

icav2 enter <PROJECT NAME or ID>

Create pipeline using icav2 project pipelines create nextflow Example:

icav2 projectpipelines create nextflow rnaseq-docs --main main.nf --parameter parameters.xml --config nextflow.config --storage-size small --description 'cli nextflow pipeline'

If you prefer to organize the processes in different folders/files, you can use --other parameter to upload the different processes as additional files. Example:

icav2 projectpipelines create nextflow rnaseq-docs --main main.nf --parameter parameters.xml --config nextflow.config --other index.nf:filename=processes/index.nf --other quantification.nf:filename=processes/quantification.nf --other fastqc.nf:filename=processes/fastqc.nf --other multiqc.nf:filename=processes/multiqc.nf --storage-size small --description 'cli nextflow pipeline'

You can refer to page to explore options to automate this process.

Refere to for details on running the pipeline from CLI.

Example command to run the pipeline from CLI:

icav2 projectpipelines start nextflow <pipeline_id> --input read1:<read1_file_id> --input read2:<read2_file_id> --input transcriptome_file:<transcriptome_file_id> --storage-size small --user-reference demo_run

You can get the pipeline id under "ID" column by running the following command:

icav2 projectpipelines list

You can get the file ids under "ID" column by running the following commands:

icav2 projectdata list

Please refer to command help (icav2 [command] --help) to determine available flags to filter output of above commands if necessary. You can also refer to page for available flags for the icav2 commands.

For more help on uploading data to ICA, please refer to the page.

Additional Resources:

Nextflow Pipeline

In this tutorial, we will show how to create and launch a pipeline using the Nextflow language in ICA.

This tutorial references the example in the Nextflow documentation.

Create the pipeline

The first step in creating a pipeline is to create a project. For instructions on creating a project, see the page. In this tutorial, we'll use a project called "Getting Started".

In the Nextflow pipeline creation view, the Information tab is used to add information about the pipeline. Add values for the required Code (unique pipeline name) and Description fields.

Next we'll add the Nextflow pipeline definition. The pipeline we're creating is a modified version of the example from the Nextflow documentation. Modifications to the pipeline definition from the nextflow documentation include:

Add the container directive to each process with the latest ubuntu image. If no Docker image is specified, public.ecr.aws/lts/ubuntu:22.04_stable is used as default.
Add the publishDir directive with value 'out' to the reverse process.
Modify the reverse process to write the output to a file test.txt instead of stdout.

The description of the pipeline from the linked Nextflow docs:

This example shows a pipeline that is made of two processes. The first process receives a FASTA formatted file and splits it into file chunks whose names start with the prefix seq_.
The process that follows, receives these files and it simply reverses their content by using the rev command line tool.

Resources: For each process, you can use the and to set the . ICA will then determine the best matching compute type based on those settings. Suppose you set memory '10240 GB' and cpus 6, then ICA will determine you need standard-large ICA Compute Type.

Syntax example:

process iwantstandardsmallresources {
    cpus 2
    memory '8 GB'
    ...

#!/usr/bin/env nextflow

params.in = "$HOME/sample.fa"

sequences = file(params.in)
SPLIT = (System.properties['os.name'] == 'macOS' ? 'gcsplit' : 'csplit')

process splitSequences {

    container 'public.ecr.aws/lts/ubuntu:22.04'

    input:
    file 'input.fa' from sequences

    output:
    file 'seq_*' into records

    """
    $SPLIT input.fa '%^>%' '/^>/' '{*}' -f seq_
    """

}

process reverse {
    
    container 'public.ecr.aws/lts/ubuntu:22.04'
    publishDir 'out'

    input:
    file x from records
    
    output:
    file 'test.txt'

    """
    cat $x | rev > test.txt
    """
}

Paste the below XML input form into the XML CONFIGURATION text editor. Click the Generate button to preview the launch form fields.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition">
    <pd:dataInputs>
        <pd:dataInput code="in" format="FASTA" type="FILE" required="true" multiValue="false">
            <pd:label>in</pd:label>
            <pd:description>fasta file input</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps/>
</pd:pipeline>

With the definition added and the input form defined, the pipeline is complete.

Click the Save button at the top right. The pipeline will now be visible from the Pipelines view within the project.

Launch the pipeline

Before we launch the pipeline, we'll need to upload a FASTA file to use as input. In this tutorial, we'll use a public FASTA file from the . Download the file and unzip to decompress the FASTA file.

In the Launch Pipeline view, the input form fields are presented along with some required information to create the analysis.

Enter a User Reference (identifier) for the analysis. This will be used to identify the analysis record after launching.
Set the Entitlement Bundle (there will typically only be a single option).
In the Input Files section, select the FASTA file for the single input file. (chr1_GL383518v1_alt.fa)
Set the Storage size to small. This will attach a 1.2TB shared file system to the environment used to run the pipeline.

With the required information set, click the button to Start Analysis.

Monitor Analysis

After launching the pipeline, navigate to the Analyses view in the left navigation pane.

Click the analysis record to enter the analysis details view.

From the analysis details view, the logs produced by each process within the nextflow pipeline are accessible via the Logs tab.

View Results

Analysis outputs are written to an output directory in the project with the naming convention {Analysis User Reference}-{Pipeline Code}-{GUID}. (1)

The "Download" button (4) can be used to download the data to the local machine.

Nextflow: Pipeline Lift

Nextflow Pipeline Liftover

nextflow-to-icav2-config

This is not an official Illumina product, but is intended to make your Nextflow experience in ICA more fruitful.

Some examples of Nextflow pipelines that have been lifted over with this repo can be found .

Some additional examples of ICA-ported Nextflow pipelines are .

Some additional repos that can help with your ICA experience can be found below:

Relaunch pipeline analysis and
Monitor your analysis run in ICA and troubleshoot
Wrap a WDL-based workflow in a
Wrap a Nextflow-based workflow in a

Local testing your Nextflow pipeline after using these scripts

What do these scripts do?

What these scripts do:

Parse configuration files and the Nextflow scripts (main.nf, workflows, subworkflows, modules) of a pipeline and update the configuration of the pipeline with pod directives to tell ICA what compute instance to run

Strips out parameters that ICA utilizes for workflow orchestration
Migrates manifest closure to conf/base.ica.config file
Ensures that docker is enabled

Adds workflow.onError (main.nf, workflows, subworkflows, modules) to aid troubleshooting
Modifies the processes that reference scripts and tools in the bin/ directory of a pipeline's projectDir, so that when ICA orchestrates your Nextflow pipeline, it can find and properly execute your pipeline process
Generates parameter XML file based on nextflow_schema.json, nextflow.config, conf/ `- Take a look at to understand a bit more of what's done with the XML, as you may want to make further edits to this file for better usability
Additional edits to ensure your pipeline runs more smoothly on ICA

ICA concepts to better understand ICA liftover of Nextflow pipelines

Nextflow workflows on ICA are orchestrated by kubernetes and require a parameters XML file containing data inputs (i.e. files + folders) and other string-based options for all configurable parameters to properly be passed from ICA to your Nextflow workflows
Nextflow processes will need to contain a reference to a container --- a Docker image that will run that specific process
Nextflow processes will need a pod annotation specified for ICA to know what instance type to run the process.
- A table of instance types and the associated CPU + Memory specs can be found under a table named Compute Types

These scripts have been made to be compatible with workflows, so you may find the concepts from the documentation here a better starting point.

Using these scripts

The scripts mentioned below can be run in a docker image keng404/nextflow-to-icav2-config:0.0.3

This has:

nf-core installed
All Rscripts in this repo with relevant R libraries installed
The ICA CLI installed, to allow for pipeline creation and CLI templates to request pipeline runs after the pipeline is created in ICA

You'll likely need to run the image with a docker command like this for you to be able to run git commands within the container:

docker run -itv `pwd`:`pwd` -e HOME=`pwd` -u $(id -u):$(id -g) keng404/nextflow-to-icav2-config:0.0.3 /bin/bash

where pwd is your $HOME directory

Prerequitsites

STEP 0 Github credentials

STEP 1 [OPTIONAL] : create JSON of nf-core pipeline metadata or specify pipeline of interest

If you have a specific pipeline from Github, you can skip this statement below.

STEP 2: Obtain API key file

Next, you'll need an API key file for ICA that can be generated using the instructions .

STEP 3: Create a project in ICA

Finally, you'll need to create a project in ICA. You can do this via the CLI and API, but you should be able to follow these to create a project via the ICA GUI.

STEP 4: Download and configure the ICA CLI (see STEP 2):

Install ICA CLI by following these .

A table of all CLI releases for mac, linux, and windows can be found .

Let's do some liftovers

Rscript nf-core.conversion_wrapper.R --input {PIPELINE_JSON_FILE} --staging_directory {DIRECTORY_WHERE_NF_CORE_PIPELINES_ARE_LOCATED} --run-scripts {DIRECTORY_WHERE_THESE_R_SCRIPTS_ARE_LOCATED}  --intermediate-copy-template {DIRECTORY_WHERE_THESE_R_SCRIPTS_ARE_LOCATED}/dummy_template.txt --create-pipeline-in-ica --api-key-file {API_KEY_FILE} --ica-project-name {ICA_PROJECT_NAME} --nf-core-mode 

[OPTIONAL PARAMETER]
--git-repos {GIT_HUB_URL}
--pipeline-dirs {LOCAL_DIRECTORY_WITH_NEXTFLOW_PIPELINE}

Alternatively, if you have a local copy/version of a Nextflow pipeline you'd like to convert and use in ICA, you can use the --pipeline-dirs argument to specify this.

In summary, you will need the following prerequisites, either to run the wrapper referenced above or to carry out individual steps below.

git clone nf-core pipelines of interest
Install the python module nf-core and create a JSON file using the command line nf-core list --json > {PIPELINE_JSON_FILE}

A detailed step-by-step breakdown of what `nf-core.conversion_wrapper.R` does for each Nextflow pipeline

Step 1: Generate an XML file from nf-core pipeline (your pipeline has a )

Rscript create_xml/nf-core.json_to_params_xml.R --json {PATH_TO_SCHEMA_JSON}

A Nextflow schema JSON is generated by nf-core's python library nf-core
nf-core can be installed via a pip install nf-core command

nf-core schema build -d {PATH_NF-CORE_DIR}

Step 2: Create a `nextflow.config` and a `base config` file so that it is compatible with ICA.

Rscript ica_nextflow_config.test.R --config-file {DEFAULT_NF_CONFIG} [OPTIONAL: --base-config-files  {BASE_CONFIG}] [--is-simple-config]

Step 3: Add helper-debug code and other modifications to your Nextflow pipeline

Rscript develop_mode.downstream.R  --config-file {DEFAULT_NF_CONFIG} --nf-script {MAIN_NF_SCRIPT} --other-workflow-scripts {OTHER_NF_SCRIPT1 } --other-workflow-scripts {OTHER_NF_SCRIPT2} ...  --other-workflow-scripts {OTHER_NF_SCRIPT_N}

Step 4: Update XML to add parameter options --- if your pipeline uses/could use iGenomes

Rscript update_xml_based_on_additional_configs.R --config-file {DEFAULT_NF_CONFIG} --parameters-xml {PARAMETERS_XML}

You may have to edit your {PARAMETERS_XML} file if these edits are unnecessary.

Step 5: Sanity check your pipeline code to see if it is valid prior to uploading it into ICA

Rscript testing_pipelines/test_nextflow_script.R --nextflow-script {MAIN_NF_SCRIPT} --docker-image nextflow/nextflow:22.04.3 --nextflow-config {DEFAULT_NF_CONFIG}

[NOTE: 04-10-2023] Currently ICA supports Nextflow versions nextflow/nextflow:22.04.3 and nextflow/nextflow:20.10.0

Step 6: Create a pipeline in ICA by using the following helper script `nf-core.create_ica_pipeline.R`

Rscript nf-core.create_ica_pipeline.R --nextflow-script {NF_SCRIPT} --workflow-language nextflow --parameters-xml {PARAMETERS_XML} --nf-core-mode --ica-project-name {NAME} --pipeline-name {NAME} --api-key-file {PATH_TO_API_KEY_FILE}

Developer mode --- if you plan to develop or modify a pipeline in ICA

How to run a pipeline in ICA via CLI

As a convenience, you can also get a templated CLI command to help run a pipeline (i.e. submit a pipeline request) in ICA via the following:

Rscript create_cli_templates_from_xml.R --workflow-language {xml or nextflow} --parameters-xml {PATH_TO_PARAMETERS_XML}

Once you modify this file, you can use --template-json and specify this file to create the CLI you can use to launch your pipeline.

If you have a previously successful analysis with your pipeline, you may find this approach more useful.

Relaunch pipeline analysis and .

Creating your own tests/pipeline runs via the CLI

When set to true, these test config files are loaded in your main nextflow.config.

Additional todos

Nextflow DRAGEN Pipeline

Prerequisites

Creating the pipeline

Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click +Create Pipeline > Nextflow > XML based to start creating a Nextflow pipeline.

nextflow.enable.dsl = 2

process DRAGEN {

    // The container must be a DRAGEN image that is included in an accepted bundle and will determine the DRAGEN version
    container '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/7ecddc68-f08b-4b43-99b6-aee3cbb34524:latest'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'fpga-medium'
    pod annotation: 'volumes.illumina.com/scratchSize', value: '1TiB'

    // ICA will upload everything in the "out" folder to cloud storage 
    publishDir 'out', mode: 'symlink'

    input:
        tuple path(read1), path(read2)
        val sample_id
        path ref_tar

    output:
        stdout emit: result
        path '*', emit: output

    script:
        """
        set -ex
        mkdir -p /scratch/reference
        tar -C /scratch/reference -xf ${ref_tar}
        
        /opt/edico/bin/dragen --partial-reconfig HMM --ignore-version-check true
        /opt/edico/bin/dragen --lic-instance-id-location /opt/instance-identity \\
            --output-directory ./ \\
            -1 ${read1} \\
            -2 ${read2} \\
            --intermediate-results-dir /scratch \\
            --output-file-prefix ${sample_id} \\
            --RGID ${sample_id} \\
            --RGSM ${sample_id} \\
            --ref-dir /scratch/reference \\
            --enable-variant-caller true
        """
}

workflow {
    DRAGEN(
        Channel.of([file(params.read1), file(params.read2)]),
        Channel.of(params.sample_id),
        Channel.fromPath(params.ref_tar)
    )
}

To specify a compute type for a Nextflow process, use the pod directive within each process.

Refer to the ICA help page for details on ICA specific attributes within the Nextflow definition.

Next, create the input form used for the pipeline. This is done through the XML CONFIGURATION tab. More information on the specifications for the input form can be found in Input Form page.

This pipeline takes two FASTQ files, one reference file and one sample_id parameter as input.

Paste the following XML input form into the XML CONFIGURATION text editor.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
    <pd:dataInputs>
        <pd:dataInput code="read1" format="FASTQ" type="FILE" required="true" multiValue="false">
            <pd:label>FASTQ Read 1</pd:label>
            <pd:description>FASTQ Read 1</pd:description>
        </pd:dataInput>
        <pd:dataInput code="read2" format="FASTQ" type="FILE" required="true" multiValue="false">
            <pd:label>FASTQ Read 2</pd:label>
            <pd:description>FASTQ Read 2</pd:description>
        </pd:dataInput>
        <pd:dataInput code="ref_tar" format="TAR" type="FILE" required="true" multiValue="false">
            <pd:label>Reference</pd:label>
            <pd:description>Reference TAR</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps>
        <pd:step execution="MANDATORY" code="General">
            <pd:label>General</pd:label>
            <pd:description></pd:description>
            <pd:tool code="generalparameters">
                <pd:label>General Parameters</pd:label>
                <pd:description></pd:description>
                <pd:parameter code="sample_id" minValues="1" maxValues="1" classification="USER">
                    <pd:label>Sample ID</pd:label>
                    <pd:description></pd:description>
                    <pd:stringType/>
                    <pd:value></pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>
</pd:pipeline>

Click the Generate button (at the bottom of the text editor) to preview the launch form fields.

Click the Save button to save the changes.

The dataInputs section specifies file inputs, which will be mounted when the workflow executes. Parameters defined under the steps section refer to string and other input types.

Each of the dataInputs and parameters can be accessed in the Nextflow within the workflow's params object named according to the code defined in the XML (e.g. params.sample_id).

Running the pipeline

If you have no test data available, you need to link the Dragen Demo Bundle to your project at Projects > your_project > Project Settings > Details > Linked Bundles.

Go to the pipelines page from the left navigation pane. Select the pipeline you just created and click Start New Analysis.

Fill in the required fields indicated by red "*" sign and click on Start Analysis button.

You can monitor the run from the analysis page.

Once the Status changes to Succeeded, you can click on the run to access the results page.