Updating an Existing Flow Pipeline
Introduction
This tutorial shows you how to
import an existing ICA Flow pipeline with a supporting validation analysis
monitor the execution
Iterative development: modify pipeline code and validate in Bench
Modify nextflow code
Modify Docker image contents (Dockerfile or Interactive method)
Preparation
Make sure you have access in ICA Flow to:
the pipeline you want to work with
an analysis exercising this pipeline, preferably with a short execution time, to use as validation test
Start Bench Workspace
For this tutorial, the instance size depends on the flow you import, and whether you use a Bench cluster:
When using a cluster, choose standard-small or standard-medium for the workspace master node
Otherwise, choose at least standard-large if you re-import a pipeline that originally came from nf-core, as they typically need 4 or more CPUs to run.
Select the "single user workspace" permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines
Specify at least 100GB of disk space
Optional: After choosing the image, enable a cluster with at least one standard-large instance type.
Start the workspace, then (if applicable) also start the cluster
Import Existing Pipeline and Analysis to Bench
The starting point is the analysis id that is used as pipeline validation test (the pipeline id is obtained from the analysis metadata).
If no --analysis-id is provided, the tool lists all the successfull analyses in the current project and lets the developer pick one.
If conda and/or nextflow are not installed, pipeline-dev will offer to install them.
The Nextflow files are pulled into the
nextflow-src
subdirectory.The analysis inputs are converted into a "test" profile for nextflow stored - among other items - in
nextflow_bench.conf
Results
Run Validation Test in Bench
The following command runs this test profile. If a Bench cluster is active, it runs on your Bench cluster, otherwise it runs on the main workspace instance:
Monitoring
When a pipeline is running on your Bench cluster, a few commands help to monitor the tasks and cluster. In another terminal, you can use:
qstat
to see the tasks being pending or runningtail /data/logs/sge-scaler.log.
<latest available workspace reboot time>
to check if the cluster is scaling up or down (it currently takes 3 to 5 minutes to get a new node)
Data Locations
The output of the pipeline is in the
outdir
directoryNextflow work files are under the
work
directoryLog files are
.nextflow.log*
andoutput.log
Modify Pipeline
Nextflow files (located in the nextflow-src
directory) are easy to modify.
Depending on your environment (ssh access / docker image with JupyterLab or VNC, with and without Visual Studio code), various source code editors can be used.
After modifying the source code, you can run a validation iteration with the same command as before:
Identify Docker Image
Modifying the Docker image is the next step.
Nextflow (and ICA) allow the Docker images to be specified at different places:
in config files such as
nextflow-src/nextflow.config
in nextflow code files:
grep container
may help locate the correct files:
Docker Image Update: Dockerfile Method
Use case: Update some of the software (mimalloc) by compiling a new version
With the appropriate permissions, you can then "docker login" and "docker push" the new image.
Docker Image Update: Interactive Method
With the appropriate permissions, you can then "docker login" and "docker push" the new image.
Beware that this extension creates a lot of temp files in /tmp and in $HOME/.vscode-server. Don't include them in your image...
Update the nextflow code and/or configs to use the new image
Validate your changes in Bench:
Deploy as Flow Pipeline
After generating a few ICA-specific files (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name, so that's what is checked here).
It then asks if we want to update the latest version or create a new one.
At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.
Result
Run Validation Test in Flow
This launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.
Some of the input files will have been copied to your ICA project to allow the launch to take place. They are stored in the folder /data/project/bench-pipeline-dev/temp-data
.
Result
Last updated
Was this helpful?