# Nextflow DRAGEN Pipeline

In this tutorial, we will demonstrate how to create and launch a simple DRAGEN pipeline using the Nextflow language in ICA GUI. More information about Nextflow on ICA can be found [here](https://help.ica.illumina.com/project/p-flow/f-pipelines/pi-nextflow). For this example, we will implement the alignment and variant calling example from this [DRAGEN support page](https://support-docs.illumina.com/SW/DRAGEN_v40/Content/SW/DRAGEN/AligningVariantCallingExamples_fDG_dtREF.htm) for Paired-End FASTQ Inputs.

## Prerequisites

The first step in creating a pipeline is to select a project for the pipeline to reside in. If the project doesn't exist, create a project. For instructions on creating a project, see the [Projects](https://help.ica.illumina.com/home/h-projects) page. In this tutorial, we'll use a project called *Getting Started*.

After a project has been created, a DRAGEN bundle must be linked to a project to obtain access to a DRAGEN docker image. Enter the project by clicking on it, and click `Edit` in the Project Details page. From here, you can link a *DRAGEN Demo Tool* bundle into the project. The bundle that is selected here will determine the DRAGEN version that you have access to. For this tutorial, you can link *DRAGEN Demo Bundle 4.0.3*. Once the bundle has been linked to your project, you can now access the docker image and version by navigating back to the All ***Projects*** overview page, clicking on **System Settings > Docker Repository**, and double clicking on the docker image *dragen-ica-4.0.3*. The URL of this docker image will be used later in the `container` directive for your DRAGEN process defined in Nextflow.

## Creating the pipeline

Select **Projects > your\_project > Flow > Pipelines**. From the **Pipelines** view, click **+Create Pipeline > Nextflow > XML based** to start creating a Nextflow pipeline.

![](https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-3c7d2d02cdb1baa000cd411db2c4de0939e75a5c%2Ftutorial-nextflowpipeline-1purple%20\(1\).png?alt=media)

In the Nextflow pipeline creation view, the **Details** tab is used to add information about the pipeline. Add values for the required *Code* (pipeline name) and *Description* fields. *Nextflow Version* and *Storage size* defaults to preassigned values.

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-ba9345a607060018136f4ed0c81e9394da930db8%2Fimage%20(97).png?alt=media" alt=""><figcaption></figcaption></figure>

Next, add the Nextflow pipeline definition by navigating to the **Nextflow files > main files > main.nf**. You will see a text editor. Copy and paste the following definition into the text editor. Modify the `container` directive by replacing the current URL with the URL found in the docker image *dragen-ica-4.0.3*.

```groovy
nextflow.enable.dsl = 2

process DRAGEN {

    // The container must be a DRAGEN image that is included in an accepted bundle and will determine the DRAGEN version
    container '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/7ecddc68-f08b-4b43-99b6-aee3cbb34524:latest'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'fpga2-medium'
    pod annotation: 'volumes.illumina.com/scratchSize', value: '1TiB'

    // ICA will upload everything in the "out" folder to cloud storage 
    publishDir 'out', mode: 'symlink'

    input:
        tuple path(read1), path(read2)
        val sample_id
        path ref_tar

    output:
        stdout emit: result
        path '*', emit: output

    script:
        """
        set -ex
        mkdir -p /scratch/reference
        tar -C /scratch/reference -xf ${ref_tar}
        
        /opt/edico/bin/dragen --partial-reconfig HMM --ignore-version-check true
        /opt/edico/bin/dragen --lic-instance-id-location /opt/instance-identity \\
            --output-directory ./ \\
            -1 ${read1} \\
            -2 ${read2} \\
            --intermediate-results-dir /scratch \\
            --output-file-prefix ${sample_id} \\
            --RGID ${sample_id} \\
            --RGSM ${sample_id} \\
            --ref-dir /scratch/reference \\
            --enable-variant-caller true
        """
}

workflow {
    DRAGEN(
        Channel.of([file(params.read1), file(params.read2)]),
        Channel.of(params.sample_id),
        Channel.fromPath(params.ref_tar)
    )
}
```

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-b7827d60f251095d2c1c9e176aa532faddb4d107%2Fimage%20(60).png?alt=media" alt=""><figcaption></figcaption></figure>

To specify a compute type for a Nextflow process, use the [pod](https://www.nextflow.io/docs/latest/process.html#process-pod) directive within each process.

Outputs for Nextflow pipelines are uploaded from the `out` folder in the attached shared filesystem. The [publishDir](https://www.nextflow.io/docs/latest/process.html#publishdir) directive specifies the output folder for a given process. Only data moved to the out folder using the `publishDir` directive will be uploaded to the ICA project after the pipeline finishes executing.

Refer to the [ICA help page](https://help.ica.illumina.com/project/p-flow/f-pipelines/pi-nextflow) for details on ICA specific attributes within the Nextflow definition.

Next, create the input form used for the pipeline. This is done through the XML CONFIGURATION tab. More information on the specifications for the input form can be found in [Input Form](https://help.connected.illumina.com/connected-analytics/project/p-flow/f-pipelines/pi-inputform) page.

This pipeline takes two FASTQ files, one *reference file* and one *sample\_id* parameter as input.

Paste the following XML input form into the XML CONFIGURATION text editor.

```
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
    <pd:dataInputs>
        <pd:dataInput code="read1" format="FASTQ" type="FILE" required="true" multiValue="false">
            <pd:label>FASTQ Read 1</pd:label>
            <pd:description>FASTQ Read 1</pd:description>
        </pd:dataInput>
        <pd:dataInput code="read2" format="FASTQ" type="FILE" required="true" multiValue="false">
            <pd:label>FASTQ Read 2</pd:label>
            <pd:description>FASTQ Read 2</pd:description>
        </pd:dataInput>
        <pd:dataInput code="ref_tar" format="TAR" type="FILE" required="true" multiValue="false">
            <pd:label>Reference</pd:label>
            <pd:description>Reference TAR</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps>
        <pd:step execution="MANDATORY" code="General">
            <pd:label>General</pd:label>
            <pd:description></pd:description>
            <pd:tool code="generalparameters">
                <pd:label>General Parameters</pd:label>
                <pd:description></pd:description>
                <pd:parameter code="sample_id" minValues="1" maxValues="1" classification="USER">
                    <pd:label>Sample ID</pd:label>
                    <pd:description></pd:description>
                    <pd:stringType/>
                    <pd:value></pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>
</pd:pipeline>
```

Click the Simulate button (at the bottom of the text editor) to preview the launch form fields.

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-d200c1fdbd7288514d39e7dc705d8601024fca94%2Fimage%20(61).png?alt=media" alt=""><figcaption></figcaption></figure>

Click the `Save` button to save the changes.

The `dataInputs` section specifies file inputs, which will be mounted when the pipeline executes. Parameters defined under the `steps` section refer to string and other input types.

Each of the `dataInputs` and `parameters` can be accessed in the Nextflow within the `params` object named according to the `code` defined in the XML (e.g. `params.sample_id`).

## Running the pipeline

> If you have no test data available, you need to link the *Dragen Demo Bundle* to your project at **Projects > your\_project > Project Settings > Details > Linked Bundles**.

Go to the **projects > your\_project > flow > pipelines** page from the left navigation pane. Select the pipeline you just created and click **Start Analysis**.

Fill in the required fields indicated by the asterisk "\*" sign and click on **Start Analysis** button.

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-c81307ce4a0e19de8fe76ce7cff228aa92662630%2Fimage%20(117).png?alt=media" alt=""><figcaption></figcaption></figure>

#### Results

You can monitor the run from the **Projects > your\_project > Flow > analysis** page. Once the Status changes to Succeeded, you can click on the run to access the results.

## Useful Links

* [Illumina DRAGEN Documentation](https://support-docs.illumina.com/SW/DRAGEN_v40/Content/SW/DRAGEN/GettingStarted_fDG.htm)
* [Nextflow's official documentation](https://www.nextflow.io/)
