# Pipeline Development in Bench (Experimental)

## Introduction

The **Pipeline Development Kit** in Bench makes it easy to create Nextflow pipelines for ICA Flow. This kit consists of a number of development tools which are installed in `/data/.software` (regardless of which Bench image is selected) and provides the following features:

* Import to Bench
  * From public nf-core pipelines
  * From existing ICA Flow Nextflow pipelines
* Run in Bench
* Modify and re-run in Bench, providing fast development iterations
* Deploy to Flow
* Launch validation in Flow

## Prerequisites

* Recommended **workspace size**: Nf-core Nextflow pipelines typically require **4** or more **cores** to run.
* The pipeline development tools require
  * **Conda** which is automatically installed by “pipeline-dev” if `conda-miniconda.installer.ica-userspace.sh` is present in the image.
  * **Nextflow** (version 24.10.2 is automatically installed using conda, or you can use other versions)
  * **git** (automatically installed using conda)
  * **jq, curl** (which should be made available in the image)

## NextFlow Requirements / Best Practices

Pipeline development tools work best when the following items are defined:

* Nextflow profiles:
  * ***test*** profile, specifying inputs appropriate for a validation run
  * ***docker*** profile, instructing NextFlow to use Docker
* **nextflow\_schema.json**, as described [here](https://nf-co.re/docs/nf-core-tools/pipelines/schema). This is useful for the launch UI generation. The nf-core CLI tool (installable via `pip install nf-core`) offers extensive help to create and maintain this schema.

ICA Flow adds one additional constraint. The **output directory** `out` is the only one automatically copied to the Project data when an ICA Flow Analysis completes. The **`-outdir`** parameter recommended by nf-core should therefore be set to`--outdir=out` when running as a Flow pipeline.

## Pipeline Development Tools

{% hint style="info" %}
New Bench pipeline development tools only become active after a workspace reboot.
{% endhint %}

These are installed in `/data/.software` (which should be in your `$PATH`), the `pipeline-dev` script is the front-end to the other `pipeline-dev-*` tools.

Pipeline-dev fulfils a number of roles:

* Checks that the environment contains the **required tools** (conda, nextflow, etc) and offers to install them if needed.
* Checks that the **fast data mounts are present** (/data/mounts/project etc.) – it is useful to check regularly, as they get unmounted when a workspace is stopped and restarted.
* Redirects **stdout and stderr** to `.pipeline-dev.log`, with the history of log files kept as `.pipeline-dev.log.<log date>`.
* Launches the appropriate **sub-tool**.
* **Prints out errors** with backtrace, to help report issues.

***

## Usage

### 1) Starting a new Project

A pipeline-dev project relies on the following **Folder structure**, which is auto-generated when using the `pipeline-dev import*` tools.

{% hint style="warning" %}
If you start a project manually, you must follow the same folder structure.
{% endhint %}

* **Project base folder**
  * **nextflow-src**: Platform-agnostic Nextflow code, for example the github contents of an nf-core pipeline, or your usual nextflow source code.
    * **main.nf**
    * **nextflow\.config**
    * **nextflow\_schema.json**
  * **pipeline-dev.project-info**: contains project name, description, etc.
  * **nextflow-bench.config** (automatically generated when needed): contains definitions for bench.
  * **ica-flow-config**: Directory of files used when deploying pipeline to Flow.
    * **inputForm.json** (if not present, gets generated from nextflow-src/nextflow\_schema.json): input form as defined in ICA Flow.
    * **onSubmit.js**, **onRender.js** (optional, generated at the same time as inputForm.json): javascript code to go with the input form.
    * **launchPayload\_inputFormValues.json** (if not present, gets generated from the test profile): used by “pipeline-dev launch-validation-in-flow”.

#### Pipeline Sources

{% tabs %}
{% tab title="Starting from Scratch" %}
The above-mentioned project structure must be generated manually. The nf-core CLI tools can assist to generate the `nextflow_schema.json`. Tutorial [Pipeline from Scratch](https://help.connected.illumina.com/connected-analytics/project/p-bench/pipeline-development-in-bench-experimental/creating-a-pipeline-from-scratch) goes into more details about this use case.
{% endtab %}

{% tab title="Importing nf-core Pipeline" %}

```
$ pipeline-dev import-from-nextflow <repo name e.g. nf-core/demo>
```

A directory with the same name as the nextflow/nf-core pipeline is created, and the Nextflow files are pulled into the `nextflow-src` subdirectory.

Tutorial [Nf Core Pipelines](https://help.connected.illumina.com/connected-analytics/project/p-bench/pipeline-development-in-bench-experimental/nf-core-pipelines) goes into more details about this use case.
{% endtab %}

{% tab title="Importing an Existing ICA Pipeline" %}

```
$ pipeline-dev import-from-flow [--analysis-id=…] 
```

A directory called `imported-flow-analysis` is created and the analysis+pipeline assets are downloaded.

Tutorial [Updating an Existing Flow Pipeline](https://help.connected.illumina.com/connected-analytics/project/p-bench/pipeline-development-in-bench-experimental/updating-an-existing-flow-pipeline) goes into more details about this use case.

{% hint style="info" %}
Currently only pipelines with publicly available Docker images are supported. Pipelines with ICA-stored images are not yet supported.
{% endhint %}
{% endtab %}
{% endtabs %}

***

### 2) Running in Bench

```
$ pipeline-dev run-in-bench [--local|--sge] 
```

Optional parameters `--local / --sge` can be added to force the execution on the local workspace node, or on the workspace cluster (when available). Otherwise, the presence of a cluster is automatically detected and used.

The script then launches nextflow. The **full nextflow command line is printed and launched**.

In case of errors, full logs are saved as `.pipeline-dev.log`

{% hint style="info" %}
Currently, not all corner cases are covered by command line options. Please start from the nextflow command printed by the tool and extend it based on your specific needs.
{% endhint %}

#### Output Example

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-ac7d6a666597b97d0270317299f53065f9d66dbd%2Fimage%20(14).png?alt=media" alt=""><figcaption><p>Nextflow output</p></figcaption></figure>

#### Container (Docker) images

Nextflow can run processes with and without Docker images. In the context of pipeline development, the pipeline-dev tools assume Docker images are used, in particular during execution with the `nextflow --profile docker`.

In NextFlow, Docker images can be **specified at the&#x20;*****process*****&#x20;level**

* This is done with the `container "<image_name:version>"` directive, which can be specified
  * in **nextflow config** files (preferred method when following the nf-core best practices)
  * or at the start of each process definition.
* Each process can use a different docker image
* **It is highly recommended to always specify an image.** If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.

Resources such as #cpu and memory can be specified as described [here](https://nf-co.re/docs/usage/getting_started/configuration#max-resources) See [containers](https://www.nextflow.io/docs/latest/container.html) or our [tutorials](#tutorials) for details about Nextflow-Docker syntax.

Bench can push/pull/create/modify Docker images, as described in [Containers](https://help.connected.illumina.com/connected-analytics/project/p-bench/containers-in-bench).

***

### 3) Deploying to ICA Flow

```
$ pipeline-dev deploy-as-flow-pipeline [--create|--update] 
```

This command does the following:

1. Generate the JSON file describing the **ICA Flow user interface**.
   * If *`ica-flow-config/inputForm.json`* doesn’t exist: generate it from *`nextflow-src/nextflow_swagger.json` .*
2. Generate the JSON file containing the **validation launch inputs**.
   * If *`ica-flow-config/launchPayload_inputFormValues.json`* doesn’t exist: generate it from `nextflow --profile test` inputs.
   * If **local files** are used as validation inputs or as default input values:
     * copy them to `/data/project/pipeline-dev-files/temp` .
     * get their ICA file ids.
     * use these file ids in the launch specifications.
   * If **remote files** are used as validation inputs or as default input values of an input of type “file” (and not “string”): do the same as above.
3. **Identify the pipeline name** to use for this new pipeline deployment:
   * If a deployment has already occurred in this project, or if the project was imported from an existing Flow pipeline, start from this pipeline name. Otherwise start from the project name.
   * Identify which already-deployed pipelines have the same base name, with or without suffixes that could be some versioning (\_v\<number>, \_\<number>, \_\<date>) .
   * Ask the user if they prefer to update the current version of the pipeline, create a new version, or enter a new name of their choice – or use the `--create/--update` parameters when specified, for scripting without user interactions.
4. New **ICA Flow pipeline gets created** (except in case of pipeline update) .
   * The current Nextflow version in Bench is used to select the best Nextflow version to be used in Flow
5. `nextflow-src` **folder is uploaded** file by file as pipeline assets.

Output Example:

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-ba4eed3861c5d7a4c6d74186b94ad4f1de7e7a01%2Fimage%20(40).png?alt=media" alt=""><figcaption></figcaption></figure>

The pipeline name, id and URL are printed out, and if your environment allows, Ctrl+Click/Option+Click/Right click can open the URL in a browser.

Opening the URL of the pipeline and clicking on **Start Analysis** shows the generated user interface:

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-4f94a0050120f7ad12694e59a99f1b05089f3313%2Fimage%20(16).png?alt=media" alt=""><figcaption></figcaption></figure>

***

### 4) Launching Validation in Flow

```
$ pipeline-dev launch-validation-in-flow 
```

The *`ica-flow-config/launchPayload_inputFormValues.json`* file generated in the previous step is submitted to ICA Flow to **start an analysis** with the same validation inputs as “nextflow --profile test”.

Output Example:

<figure><img src="https://3193631692-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MWUqIqZhOK_i4HqCUpT%2Fuploads%2Fgit-blob-6875c17c22b216edf6d356f96674d56d7c671c1f%2Fimage%20(17).png?alt=media" alt=""><figcaption><p>launch-validation-in-flow</p></figcaption></figure>

The analysis name, id and URL are printed out, and if your environment allows, Ctrl+Click/Option+Click/Right click can open the URL in a browser.

***

## Tutorials

* [Creating a Pipeline from Scratch](https://help.connected.illumina.com/connected-analytics/project/p-bench/pipeline-development-in-bench-experimental/creating-a-pipeline-from-scratch)
* [nf-core Pipelines](https://help.connected.illumina.com/connected-analytics/project/p-bench/pipeline-development-in-bench-experimental/nf-core-pipelines)
* [Updating an Existing Flow Pipeline](https://help.connected.illumina.com/connected-analytics/project/p-bench/pipeline-development-in-bench-experimental/updating-an-existing-flow-pipeline)
