# Nextflow CLI

## Nextflow CLI

In this tutorial, we will demonstrate how to create and launch a Nextflow pipeline using the ICA command line interface (CLI).

## Installation

Please refer to [these instructions](https://help.ica.illumina.com/command-line-interface/cli-installation) for installing ICA CLI. To authenticate, please follow the steps in the [Authentication](https://help.ica.illumina.com/command-line-interface/cli-authentication) page.

## Tutorial project

In this tutorial, we will create [Simple RNA-Seq](https://training.nextflow.io/latest/archive/basic_training/rnaseq_pipeline/) pipeline in ICA which includes four processes:

* index creation
* quantification
* FastQC
* MultiQC

We will also upload a Docker container to the ICA Docker repository for use within the pipeline.

### main.nf

The 'main.nf' file defines the pipeline that orchestrates various RNASeq analysis processes.

```nf
nextflow.enable.dsl = 2

process INDEX {
   input:
       path transcriptome_file

   output:
       path 'salmon_index'

   script:
       """
       salmon index -t $transcriptome_file -i salmon_index
       """
}

process QUANTIFICATION {
   publishDir 'out', mode: 'symlink'

   input:
       path salmon_index
       tuple path(read1), path(read2)
       val(quant)

   output:
       path "$quant"

   script:
       """
       salmon quant --libType=U -i $salmon_index -1 $read1 -2 $read2 -o $quant
       """
}

process FASTQC {

   input:
       tuple path(read1), path(read2)

   output:
       path "fastqc_logs"

   script:
       """
       mkdir fastqc_logs
       fastqc -o fastqc_logs -f fastq -q ${read1} ${read2}
       """
}

process MULTIQC {
   publishDir 'out', mode:'symlink'

   input:
       path '*'

   output:
       path 'multiqc_report.html'

   script:
       """
       multiqc .
       """
}

workflow {
   index_ch = INDEX(Channel.fromPath(params.transcriptome_file))
   quant_ch = QUANTIFICATION(index_ch, Channel.of([file(params.read1), file(params.read2)]),Channel.of("quant"))
   fastqc_ch = FASTQC(Channel.of([file(params.read1), file(params.read2)]))
   MULTIQC(quant_ch.mix(fastqc_ch).collect())
}
```

The script uses the following tools:

* **Salmon**: Software tool for quantification of transcript abundance from RNA-seq data.
* **FastQC**: QC tool for sequencing data
* **MultiQC**: Tool to aggregate and summarize QC reports

We need a Docker container containing these tools. For the sake of this tutorial, we will use the container from the original tutorial. You can refer to the ["Build and push to ICA your own Docker image"](https://help.connected.illumina.com/connected-analytics/cli-cwl/cwl-graphical-pipeline#build-and-push-to-ica-your-own-docker-image) section to build your own docker image with the required tools.

## Docker image upload

With [Docker installed](https://docs.docker.com/desktop/) in your computer, download the image required for this project using the following command.

`docker pull nextflow/rnaseq-nf`

Create a tarball of the image to upload to ICA.

```
docker save nextflow/rnaseq-nf > cont_rnaseq.tar
```

Following are lists of commands that you can use to upload the tarball to your project.

```
# Enter the project context
icav2 enter docs
# Upload the container image to the root directory (/) of the project
icav2 projectdata upload cont_rnaseq.tar /
```

**Add the image to the ICA Docker repository**

The uploaded image can be added to the ICA docker repository from the ICA Graphical User Interface (GUI).

Change the format for the image tarball to DOCKER:

1. Navigate to **Projects > your\_project > Data**.
2. Check the checkbox for the uploaded tarball.
3. Click on **Manage > Change format**.
4. In the new popup window, select "DOCKER" format and save.

To add this image to the ICA Docker repository, first click on **Projects** to go back to the home page.

1. From the ICA home page, click on the **System** **Settings > Docker Repository > Create > Image**.
2. This will open a new window that lets you select the region (US, EU, CA) in which your your project is and the docker image from the bottom pane.
3. Edit the Name field to rename it. For this tutorial, we will change the name to "rnaseq". Select the region, and give it a version number, and description. Click on "Save".

{% hint style="info" %}
If you have the images hosted in other repositories, you can add them as external image by using **System** **Settings > Docker Repository > Create > External Image**.
{% endhint %}

After creating a new docker image, you can click on the image to get the container URL (under Regions) for the nextflow configuration file.

#### Nextflow configuration file

Create a configuration file called "nextflow\.config" in the same folder as the main.nf file above. Use the URL copied above to add the `process.container` line in the config file.

{% code overflow="wrap" %}

```
process.container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/3cddfc3d-2431-4a85-82bb-dae061f7b65d:latest'
```

{% endcode %}

You can add a pod directive within a process or in the config file to specify a compute type. The following is an example of a configuration file with the 'standard-small' compute type for all processes. Please refer to the [Compute Types](https://help.connected.illumina.com/connected-analytics/project/p-flow/f-pipelines#compute-resources) page for a list of available compute types.

{% code overflow="wrap" %}

```
process {
    container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/3cddfc3d-2431-4a85-82bb-dae061f7b65d:latest'
    pod = [
        annotation: 'scheduler.illumina.com/presetSize',
        value: 'standard-small'
    ]  
}
```

{% endcode %}

#### Parameters file

The parameters file defines the pipeline input parameters. Refer to the [JSON](https://help.connected.illumina.com/connected-analytics/project/p-flow/f-pipelines/json-based-input-forms) or [XML](https://help.connected.illumina.com/connected-analytics/project/p-flow/f-pipelines/pi-inputform) input for detailed information for creating correctly formatted parameters files.

An empty form looks as follows:

{% code overflow="wrap" %}

```
<pipeline code="" version="1.0" xmlns="xsd://www.illumina.com/ica/cp/pipelinedefinition">
   <dataInputs>
   </dataInputs>
   <steps>
   </steps>
</pipeline>
```

{% endcode %}

The input files are specified within a single **dataInputs** node with individual input file specified in a separate **dataInput** node. Settings (as opposed to files) are specified within the **steps** node. Settings represent any non-file input to the pipeline, including but not limited to, strings, booleans, integers, etc..

For this tutorial, we do not have any settings parameters but it requires multiple file inputs. The parameters.xml file looks as follows:

{% code overflow="wrap" %}

```
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
   <pd:dataInputs>
       <pd:dataInput code="read1" format="FASTQ" type="FILE" required="true" multiValue="false">
           <pd:label>FASTQ Read 1</pd:label>
           <pd:description>FASTQ Read 1</pd:description>
       </pd:dataInput>
       <pd:dataInput code="read2" format="FASTQ" type="FILE" required="true" multiValue="false">
           <pd:label>FASTQ Read 2</pd:label>
           <pd:description>FASTQ Read 2</pd:description>
       </pd:dataInput>
       <pd:dataInput code="transcriptome_file" format="FASTA" type="FILE" required="true" multiValue="false">
           <pd:label>Transcript</pd:label>
           <pd:description>Transcript faster</pd:description>
       </pd:dataInput>
   </pd:dataInputs>
   <pd:steps/>
</pd:pipeline>
```

{% endcode %}

Use the following commands to create the pipeline with the above contents in your project.

If not already in the project context, enter it by using the following command:

`icav2 enter <PROJECT NAME or ID>`

Create pipeline using `icav2 project pipelines create nextflow` Example:

{% code overflow="wrap" %}

```
icav2 projectpipelines create nextflow rnaseq-docs --main main.nf --parameter parameters.xml --config nextflow.config --storage-size small --description 'cli nextflow pipeline'
```

{% endcode %}

If you prefer to organize the processes in different folders/files, you can use `--other` parameter to upload the different processes as additional files. Example:

{% code overflow="wrap" %}

```
icav2 projectpipelines create nextflow rnaseq-docs --main main.nf --parameter parameters.xml --config nextflow.config --other index.nf:filename=processes/index.nf --other quantification.nf:filename=processes/quantification.nf --other fastqc.nf:filename=processes/fastqc.nf --other multiqc.nf:filename=processes/multiqc.nf --storage-size small --description 'cli nextflow pipeline'
```

{% endcode %}

You can refer to [Nextflow: Pipeline Lift](https://help.connected.illumina.com/connected-analytics/tutorials/nextflow/nextflow_pipeline_liftovers) page to explore options to automate this process.

Refer to [Launch Pipelines on CLI](https://help.connected.illumina.com/connected-analytics/tutorials/launchpipecli) for details on running the pipeline from CLI.

Example command to run the pipeline from CLI:

{% code overflow="wrap" %}

```
icav2 projectpipelines start nextflow <pipeline_id> --input read1:<read1_file_id> --input read2:<read2_file_id> --input transcriptome_file:<transcriptome_file_id> --storage-size small --user-reference demo_run
```

{% endcode %}

You can get the pipeline id under "ID" column by running the following command:

```
icav2 projectpipelines list
```

You can get the file ids under "ID" column by running the following commands:

```
icav2 projectdata list
```

Please refer to command help (`icav2 [command] --help`) to determine available flags to filter output of above commands if necessary. You can also refer to [Command Index](https://help.connected.illumina.com/connected-analytics/command-line-interface/cli-indexcommands) page for available flags for the icav2 commands.

For more help on uploading data to ICA, please refer to the [Data Transfer options](https://help.connected.illumina.com/connected-analytics/command-line-interface/cli-datatransfer) page.
