# Nextflow Pipeline

In this tutorial, we will show how to create and launch a pipeline using the Nextflow language in ICA.

This tutorial references the [Basic pipeline](https://www.nextflow.io/example1.html) example in the Nextflow documentation.

## Create the pipeline

The first step in creating a pipeline is to create a [Project](https://help.ica.illumina.com/home/h-projects). In the example below, the project is named *Getting Started*.

### Pipeline

After creating your project,

1. **Open the project** at **Projects > your\_project**.
2. Navigate to the **Flow > Pipelines** view in the left navigation pane.
3. From the Pipelines view, click **+Create > Nextflow > XML based** to start creating the Nextflow pipeline.

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-2d71ada2f5c782c7c6aa73c43cb0ea17dc27ff4b%2Fimage%20(112).png?alt=media" alt=""><figcaption></figcaption></figure>

In the Nextflow pipeline creation view, the Description field is used to add information about the pipeline. Add values for the required Code (unique pipeline name), description and size fields.

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-606d38bb1af4aaa5bb64ec3392f1a2c0bc4faba1%2Fimage%20(77).png?alt=media" alt=""><figcaption></figcaption></figure>

### Nextflow Files

Next a Nextflow pipeline definition must be created. The pipeline in this example is a modified version of the Basic pipeline example from the Nextflow documentation.

The description of the pipeline from the linked Nextflow docs:

> This example shows a pipeline that is made of two processes. The first process receives a FASTA formatted file and splits it into file chunks whose names start with the prefix seq\_.
>
> The process that follows, receives these files and it simply reverses their content by using the rev command line tool.

Some modifications are made to the Nextflow pipeline, you do not need to copy these modification by hand. Copyable code is provided below.

* Adding the `container` directive to each process with the desired ubuntu image. If no Docker image is specified, public.ecr.aws/lts/ubuntu:22.04\_stable is used as default. If you want to use the latest image, use *`container 'public.ecr.aws/lts/ubuntu:latest'`*
* Adding the `publishDir` directive with value `'out'` to the `reverse` process.
* Modifying the `reverse` process to write the output to a file `test.txt` instead of stdout.
* Creating a channel with the input file.

**Setting Process Resources**: For each process, you can use the [memory directive](https://www.nextflow.io/docs/latest/process.html#memory) and [cpus directive](https://www.nextflow.io/docs/latest/process.html#cpus) to set the [Compute Types](https://github.com/illumina-swi/ica-docs/blob/stage/docs/tutorials/f-pipelines.md#compute-types). ICA will then determine the best matching compute type based on those settings. Suppose you set `memory '10240 GB'` and `cpus 6`, then ICA will determine you need `standard-large` ICA Compute Type.

Syntax example:

```nf
process iwantstandardsmallresources {
    cpus 2
    memory '8 GB'
    ...
```

Navigate to the **Nextflow files > main.nf** tab to add the definition to the pipeline. Since this is a single file pipeline, we don't need to add any additional definition files. Paste the following definition into the text editor:

```nf
#!/usr/bin/env nextflow
params.in = "$HOME/sample.fa"

// -----------------------------
// Processes
// -----------------------------

// Split the file
process splitSequences {
    container 'public.ecr.aws/lts/ubuntu:latest'

    input:
    path 'input_fa'

    output:
    path "seq_*"

    script:
    """
   awk '/^>/{f="seq_"++d} {print > f}' < ${input_fa}
    """
}

// Reverse the Sequence
process reverse {
    container 'public.ecr.aws/lts/ubuntu:latest'
    publishDir 'out'

    input:
    path x

    output:
    path "test.txt"

    script:
    """
    cat ${x} | rev > test.txt
    """
}

// -----------------------------
// Workflow block
// -----------------------------

workflow {
//     Create a channel with your input file
    sequences = Channel.fromPath(params.in)
    splitSequences(sequences) | reverse | view
}
```

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-8a0aa127eb9b3d984bca5f7934a7c46cb9931982%2Fimage%20(1)%20(2).png?alt=media" alt=""><figcaption></figcaption></figure>

### Input Form

Next create the input form used when launching the pipeline. This is done in the **XML Configuration** tab. Since the pipeline takes in a single FASTA file as input, the input form includes a single file input.

Paste the below XML input form into the XML CONFIGURATION text editor

```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition">
    <pd:dataInputs>
        <pd:dataInput code="in" format="FASTA" type="FILE" required="true" multiValue="false">
            <pd:label>in</pd:label>
            <pd:description>fasta file input</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps/>
</pd:pipeline>
```

On the left, you see the XML code, on the right, you can see the input form simulation which appears when you use the **simulate** button at the bottom.

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-7c25aa53791a998ad0d0ec4abb01e5263029fc42%2Fimage%20(79).png?alt=media" alt=""><figcaption></figcaption></figure>

Once the definition has been added and the input form has been defined, the pipeline is complete.

{% hint style="info" %}
On the **Documentation tab**, you can add additional information about your pipeline. This information will be presented under the Documentation tab whenever a user starts a new analysis on the pipeline.
{% endhint %}

Click the **Save** button at the top right. The pipeline will now be visible from the **Projects > your\_project > Pipelines** view within the project.

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-ef84ce866a789fa5562ea1e480739ee285ee9a11%2Fimage%20(80).png?alt=media" alt=""><figcaption></figcaption></figure>

## Launch the pipeline

Before launching the pipeline, upload a FASTA file to use as input. For this tutorial, use a public FASTA file from the [UCSC Genome Browser](https://genome.ucsc.edu/). Download [chr1\_GL383518v1\_alt.fa.gz](https://hgdownload.cse.ucsc.edu/goldenpath/hg38/chromosomes/chr1_GL383518v1_alt.fa.gz) and unzip yjr FASTA file to decompress it.

To upload the FASTA file to the project, navigate to **Projects > your\_project > Data**. In the Data view, drag and drop the FASTA file from your local machine in the input section (2) in the browser. Once the file upload completes, the file record will show in the Data explorer. The file format should be auto-detected and be FASTA. If this is not the case, you can set it by hand by selecting the file and changing the format from the manage menu item.

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-badde74ad991b9ce508bf0d2271f502547a28bac%2Fimage%20(82).png?alt=media" alt=""><figcaption></figcaption></figure>

Now that the input data is uploaded, we can proceed to launch the pipeline. Navigate to **Projects > your\_project > Flow > Analyses** click on **Start**. Next, select your pipeline from the list.

{% hint style="info" %}
Alternatively you can start your pipeline from **Projects > your\_project > Flow > Pipelines > your\_pipeline > Start analysis**.
{% endhint %}

In the Launch Pipeline view, the input form fields are shown along with some required information to create the analysis.

With the required information set, click **Start Analysis**.

## Monitoring Analysis

After launching the pipeline, navigate to **Projects > your\_project > Flow > Analysis**.

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-c7d08a7598db76a9f3fae71faf25a5576989af91%2Fimage%20(89).png?alt=media" alt=""><figcaption></figcaption></figure>

The analysis record will be visible from the Analyses view. The Status will transition through the analysis states as the pipeline progresses. It may take some time (depending on resource availability) for the environment to initialize and the analysis to move to the *In Progress* status. Once the pipeline succeeds, the analysis record will show *Succeeded* as status.

{% hint style="info" %}
This may take considerable time if it is your first analysis due to the required resource management.
{% endhint %}

Once the analysis has succeeded, click the analysis details tab for more information.

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-f8267b317ddd3c2ca7ab6664d54def5011b64ad2%2Fimage%20(91).png?alt=media" alt=""><figcaption></figcaption></figure>

From the analysis details view, the logs produced by each process within the pipeline are accessible via the **Steps** tab.

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-54f6bbd7ea4f891e5e0a67e79f2edfb608b1ebd1%2Fimage%20(92).png?alt=media" alt=""><figcaption></figcaption></figure>

## View Results

Analysis outputs are written to an output folder in the project with the naming convention `{Analysis User Reference}-{Pipeline Code}-{GUID}`. (1)

Inside of the analysis output folder are the files generated by the analysis processes written to the `out` folder. In this tutorial, the file `test.txt` (2) is written to by the `reverse` process. Navigating to the analysis output folder, opening the `test.txt` file details, and selecting the VIEW tab (3) shows the output file contents.

Use the **download** button (4) if you want to download the data to the local machine.

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-bbaedf4795c121a272ffdafd545f503791ddf821%2Fimage%20(93).png?alt=media" alt=""><figcaption></figcaption></figure>
