Updating an Existing Flow Pipeline

Previousnf-core Pipelines NextContainers in Bench

Last updated 1 month ago

Was this helpful?

Updating an Existing Flow Pipeline

Introduction

This tutorial shows you how to

with a supporting validation analysis
- the execution
Iterative development: and validate in Bench
- Modify code
- Modify Docker image contents ( or method)

Preparation

Make sure you have access in ICA Flow to:

the pipeline you want to work with
an analysis exercising this pipeline, preferably with a short execution time, to use as validation test

Start Bench Workspace

For this tutorial, the instance size depends on the flow you import, and whether you use a Bench cluster:

When using a cluster, choose standard-small or standard-medium for the workspace master node
Otherwise, choose at least standard-large if you re-import a pipeline that originally came from nf-core, as they typically need 4 or more CPUs to run.
Select the "single user workspace" permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines
Specify at least 100GB of disk space
Optional: After choosing the image, enable a cluster with at least one standard-large instance type.
Start the workspace, then (if applicable) also start the cluster

Import Existing Pipeline and Analysis to Bench

mkdir demo-flow-dev
cd demo-flow-dev
 
pipeline-dev import-from-flow
 or
pipeline-dev import-from-flow --analysis-id=9415d7ff-1757-4e74-97d1-86b47b29fb8f

The starting point is the analysis id that is used as pipeline validation test (the pipeline id is obtained from the analysis metadata).

If no --analysis-id is provided, the tool lists all the successfull analyses in the current project and lets the developer pick one.

If conda and/or nextflow are not installed, pipeline-dev will offer to install them.
A folder called imported-flow-analysis is created.
Pipeline Nextflow assets are downloaded into the nextflow-src sub-folder.
Pipeline input form and associated javascript are downloaded into the ica-flow-config sub-folder.
Analysis input specs are downloaded to the ica-flow-config/launchPayload_inputFormValues.json file.
The analysis inputs are converted into a "test" profile for Nextflow, stored - among other items - in nextflow_bench.conf

Results

Enter the number of the entry you want to use: 21
Fetching analysis 9415d7ff-1757-4e74-97d1-86b47b29fb8f ...
Fetching pipeline bb47d612-5906-4d5a-922e-541262c966df ...
Fetching pipeline files... main.nf
Fetching test inputs
New Json inputs detected
Resolving test input ids to /data/mounts/project paths
Fetching input form..
Pipeline "GWAS pipeline_1.
_2_1_20241215_130117" successfully imported.
pipeline name: GWAS pipeline_1_2_1_20241215_130117 
analysis name: Test GWAS pipeline_1_2_1_20241215_130117 
pipeline id : bb47d612-5906-4d5a-922e-541262c966df
analysis id : 9415d7ff-1757-4e74-97d1-86b47b29fb8f
Suggested actions:
pipeline-dev run-in-bench 
I Iterative dev: Make code changes + re-validate with previous command ] 
pipeline-dev deploy-as-flow-pipeline
pipeline-dev run-in-flow

Run Validation Test in Bench

The following command runs this test profile. If a Bench cluster is active, it runs on your Bench cluster, otherwise it runs on the main workspace instance:

cd imported-flow-analysis
pipeline-dev run-in-bench

The pipeline-dev tool is using "nextflow run ..." to run the pipeline. The full nextflow command is printed on stdout and can be copy-pasted+adjusted if you need additional options.

Monitoring

When a pipeline is running on your Bench cluster, a few commands help to monitor the tasks and cluster. In another terminal, you can use:

qstat to see the tasks being pending or running
tail /data/logs/sge-scaler.log.<latest available workspace reboot time> to check if the cluster is scaling up or down (it currently takes 3 to 5 minutes to get a new node)

/data/demo $ tail /data/logs/sge-scaler.log.*
2025-02-10 18:27:19,657 - SGEScaler - INFO: SGE Marked Overview - {'UNKNOWN': O, 'DEAD': O, 'IDLE': O, 'DISABLED': O, 'DELETED': O, 'UNRESPONSIVE': 0}
2025-02-10 18:27:19,657 - SGEScaler - INFO: Job Status - Active jobs : 0, Pending jobs : 6
2025-02-10 18:27:26,291 - SGEScaler - INFO: Cluster Status - State: Transitioning,
Online Members: 0, Offline Members: 2, Requested Members: 2, Min Members: 0, Max Members: 2

Data Locations

The output of the pipeline is in the outdir folder
Nextflow work files are under the work folder
Log files are .nextflow.log* and output.log

Modify Pipeline

Nextflow files (located in the nextflow-src folder) are easy to modify. Depending on your environment (ssh access / docker image with JupyterLab or VNC, with and without Visual Studio code), various source code editors can be used.

code nextflow-src # Open in Visual Studio Code
code .            # Open current dir in Visual Studio Code
vi nextflow-src/main.nf

After modifying the source code, you can run a validation iteration with the same command as before:

pipeline-dev run-in-bench

Identify Docker Image

Modifying the Docker image is the next step.

Nextflow (and ICA) allow the Docker images to be specified at different places:

in config files such as nextflow-src/nextflow.config
in nextflow code files:

/data/demo-flow-dev $ head nextflow-src/main.nf
nextflow.enable.dsl = 2
process top_level_process t
container 'docker.io/ljanin/gwas-pipeline:1.2.1'

grep container may help locate the correct files:

Docker Image Update: Dockerfile Method

Use case: Update some of the software (mimalloc) by compiling a new version

IMAGE_BEFORE=docker.io/ljanin/gwas-pipeline:1.2.1
IMAGE_AFTER=docker.io/ljanin/gwas-pipeline:tmpdemo
 
# Create directory for Dockerfile
mkdir dirForDockerfile
cd dirForDockerfile

# Create Dockerfile
cat <<EOF > Dockerfile
FROM ${IMAGE_BEFORE}
RUN mkdir /mimalloc-compile \
 && cd /mimalloc-compile \
 && git clone -b v2.0.6 https://github.com/microsoft/mimalloc \
 && mkdir -p mimalloc/out/release \
 && cd mimalloc/out/release \
 && cmake ../.. \
 && make \
 && make install \
 && cd / \
 && rm -rf mimalloc-compile
EOF

# Build image
docker build -t ${IMAGE_AFTER} .

With the appropriate permissions, you can then "docker login" and "docker push" the new image.

Docker Image Update: Interactive Method

IMAGE_BEFORE=docker.io/ljanin/gwas-pipeline:1.2.1
IMAGE_AFTER=docker.io/ljanin/gwas-pipeline:1.2.2
docker run -it --rm ${IMAGE_BEFORE} bash
 
# Make some modifications
vi /scripts/plot_manhattan.py
<Fix "manhatten.png" into "manhattAn.png">
<Enter :wq to save and quit vi>

<Start another terminal (try Ctrl+Shift+T if using wezterm)>
# Identify container id

# Save container changes into new image layer
CONTAINER_ID=c18670335247
docker commit ${CONTAINER_ID} ${IMAGE_AFTER}

With the appropriate permissions, you can then "docker login" and "docker push" the new image.

Fun fact: VScode with the "Dev Containers" extension lets you edit the files inside your running container:

Beware that this extension creates a lot of temp files in /tmp and in $HOME/.vscode-server. Don't include them in your image...

Update the nextflow code and/or configs to use the new image

sed --in-place "s/${IMAGE_BEFORE}/${IMAGE_AFTER}/" nextflow-src/main.nf

Validate your changes in Bench:

pipeline-dev run-in-bench

Deploy as Flow Pipeline

pipeline-dev deploy-as-flow-pipeline

After generating a few ICA-specific files (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name, so that's what is checked here).

It then asks if we want to update the latest version or create a new one.

Choice: 2
Creating ICA Flow pipeline dev-nf-core-demo_v4
Sending inputForm.json
Sending onRender.js
Sending main.nf
Sending nextflow.config

At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.

Result

/data/demo $ pipeline-dev deploy-as-flow-pipeline

Generating ICA input specs...
Extracting nf-core test inputs...
Deploying project nf-core/demo
- Currently being developed as: dev-nf-core-demo
- Last version updated in ICA:  dev-nf-core-demo_v3
- Next suggested version:       dev-nf-core-demo_v4

How would you like to deploy?
1. Update dev-nf-core-demo (current version)
2. Create dev-nf-core-demo_v4
3. Enter new name
4. Update dev-nf-core-demo_v3 (latest version updated in ICA)

Sending docs/images/nf-core-demo-subway.svg
Sending docs/images/nf-core-demo_logo_dark.png
Sending docs/images/nf-core-demo_logo_light.png
Sending docs/images/nf-core-demo-subway.png
Sending docs/README. md
Sending docs/output.md

Pipeline successfully deployed
- Id : 26bc5aa5-0218-4e79-8a63-ee92954c6cd9
- URL: https://stage.v2.stratus.illumina.com/ica/projects/1873043/pipelines/26bc5aa5-0218-4e79-8a63-ee92954C6cd9

Suggested actions:
  pipeline-dev run-in-flow

Run Validation Test in Flow

pipeline-dev launch-validation-in-flow

This launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.

Some of the input files will have been copied to your ICA project to allow the launch to take place. They are stored in the folder /data/project/bench-pipeline-dev/temp-data.

Result

/data/demo $ pipeline-dev launch-validation-in-flow

pipelineld: 26bc5aa5-0218-4e79-8a63-ee92954c6cd9
Getting Analysis Storage Id
Launching as ICA Flow Analysis...

ICA Analysis created:
- Name: Test dev-nf-core-demo_v4
- Id:   cadcee73-d975-435d-b321-5d60e9aec1ec
- Url:   https://stage.v2.stratus.illumina.com/ica/projects/1873043/analyses/cadcee73-d975-435d-b321-5d60e9aec1ec

Previousnf-core Pipelines NextContainers in Bench

Last updated 1 month ago

Was this helpful?