All pages
Powered by GitBook
2 of 8

Pipelines

A Pipeline is a series of Tools with connected inputs and outputs configured to execute in a specific order.

Linking Existing Pipelines

Linking a pipeline (Projects > your_project > Flow > Pipelines > Link) adds that pipeline to your project. This is not as a copy, but as the actual pipeline, so any changes to the pipeline are atomatically propagated to and from any project which has this pipeline linked.

You can link a pipeline if it is not already linked to your project and it is from your tenant or available in your bundle or activation code.

Activation codes are tokens which allow you to run your analyses and are used for accounting and allocating the appropriate resources. ICA will automatically determine the best matching activation code, but this can be overwritten if needed.

If you unlink a pipeline it removes the pipline from your project, but it remains part of the list of pipelines of your tenant, so it can be linked to other projects later on.

There is no way to permanently delete a pipeline.


Create a Pipeline

Pipelines are created and stored within projects.

  1. Navigate to Projects > your_project > Flow > Pipelines > +Create.

  2. Select Nextflow (XML / JSON) , CWL Graphical or CWL code (XML / JSON) to create a new Pipeline.

  3. Configure pipeline settings in the pipeline property tabs.

  4. When creating a graphical CWL pipeline, drag connectors to link tools to input and output files in the canvas. Required tool inputs are indicated by a yellow connector.

  5. Select Save.

Pipelines use the latest tool definition when the pipeline was last saved. Tool changes do not automatically propagate to the pipeline. In order to update the pipeline with the latest tool changes, edit the pipeline definition by removing the tool and re-adding it back to the pipeline.

Individual Pipeline files are limited to 20 Megabytes. If you need to add more than this, split your content over multiple files.

Pipeline Status

You can edit pipelines while they are in Draft or Release Candidate status. Once released, pipelines can no longer be edited.

Status

Draft

Description

Fully editable draft.

Status

Release Candidate

Description

The pipeline is ready for release. Editing is locked but the pipeline can be cloned (top right in the details view) to create a new version.

Status

Released

Description

The pipeline is released. To release a pipeline, all tools of that pipeline must also be in released status. Editing a released pipeline is not possible, but the pipeline can be cloned (top right in the details view) to create a new editable version.


Pipeline Properties

The following sections describe the properties that can be configured in each tab of the pipeline editor.

Depending on how you design the pipeline, the displayed tabs differ between the graphical and code definitions. For CWL you have a choice on how to define the pipeline, Nextflow is always defined in code mode.

CWL Graphical

  • Details

  • Documentation

  • Definition

  • Analysis Report

  • Metadata Model

  • Report

CWL Code

  • Details

  • Documentation

  • Inputform files (JSON) or XML Configuration (XML)

  • CWL Files

  • Metadata Model

  • Report

Nextflow Code

  • Details

  • Documentation

  • Inputform Files (JSON) or XML Configuration (XML)

  • Nextflow files

  • Metadata Model

  • Report

Any additional source files related to your pipeline will be displayed here in alphabetical order.

See the following pages for language-specific details for defining pipelines:

  • Nextflow

  • CWL


Details

The details tab provides options for configuring basic information about the pipeline.

Field
Entry

Code

The name of the pipeline. The name must be unique within the tenant, including linked and unlinked pipelines.

Nextflow Version

User selectable Nextflow version available only for Nextflow pipelines

Categories

One or more tags to categorize the pipeline. Select from existing tags or type a new tag name in the field.

Description

A short description of the pipeline.

Proprietary

Hide the pipeline scripts and details from users who do not belong to the tenant who owns the pipeline. This also prevents cloning the pipeline.

Status

The release status of the pipeline.

Storage size

User selectable storage size for running the pipeline. This must be large enough to run the pipeline, but setting it too large incurs unnecessary costs.

Family

A group of pipeline versions. To specify a family, select Change, and then select a pipeline or pipeline family. To change the order of the pipeline, select Up or Down. The first pipeline listed is the default and the remainder of the pipelines are listed as Other versions. The current pipeline appears in the list as this pipeline.

Version comment

A description of changes in the updated version.

Links

External reference links. (max 100 chars as name and 2048 chars as link)

The following information becomes visible when viewing the pipeline details.

Field
Entry

ID

Unique Identifier of the pipeline.

URN

Identification of the pipeline in Uniform Resource Name

The clone action will be shown in the pipeline details at the top-right. Cloning a pipeline allows you to create modifications without impacting the original pipeline. When cloning a pipeline, you become the owner of the cloned pipeline. When you clone a pipeline, you must give it a unique name because no duplicate names are allowed within all projects of the tenant. So the name must be unique per tenant. It is possible that you see the same pipeline name twice when a pipeline linked from another tenant is cloned with that same name in your tenant. The name is then still unique per tenant, but you will see them both in your tenant.

When you clone a Nextflow pipeline, a verification of the configured Nextflow version is done to prevent the use of deprecated versions.

Documentation

The Documentation tab provides is the place where you explain how your pipeline works to users. The description appears in the tool repository but is excluded from exported CWL definitions. If no documentation has been provided, this tab will be empty.

Definition (Graphical)

When using graphical mode for the pipeline definition, the Definition tab provides options for configuring the pipeline using a visualization panel and a list of component menus.

Menu
Description

Machine profiles

Compute types available to use with Tools in the pipeline.

Shared settings

Settings for pipelines used in more than one tool.

Reference files

Descriptions of reference files used in the pipeline.

Input files

Descriptions of input files used in the pipeline.

Output files

Descriptions of output files used in the pipeline.

Tool

Details about the tool selected in the visualization panel.

Tool repository

A list of tools available to be used in the pipeline.

In graphical mode, you can drag and drop inputs into the visualization panel to connect them to the tools. Make sure to connect the input icons to the tool before editing the input details in the component menu. Required tool inputs are indicated by a yellow connector.

XML Configuration / JSON Inputform Files (Code)

This page is used to specify all relevant information about the pipeline parameters.

There is a limit of 200 reports per report pattern which will be shown when you have multiple reports matching your regular expression.

Compute Resources

Compute Nodes

For each process defined by the workflow, ICA will launch a compute node to execute the process.

  • For each compute type, the standard (default - AWS on-demand) or economy (AWS spot instance) tiers can be selected.

  • When selecting an fpga instance type for running analyses on ICA, it is recommended to use the medium size. While the large size offers slight performance benefits, these do not proportionately justify the associated cost increase for most use cases.

  • When no type is specified, the default type of compute node is standard-small.

By default, compute nodes have no scratch space. This is an advanced setting and should only be used when absolutely necessary as it will incur additional costs and may offer only limited performance benefits because it is not local to the compute node.

For simplicity and better integration, consider using shared storage available at /ces. It is what is provided in the Small/Medium/Large+ compute types. This shared storage is used when writing files with relative paths.

Scratch space

If you do require scratch space via a Nextflow pod annotation or a CWL resource requirement, the path is /scratch.

  • For Nextflow pod annotation: 'volumes.illumina.com/scratchSize', value: '1TiB' will reserve 1 TiB.

  • For CWL, adding - class: ResourceRequirement tmpdirMin: 5000 to your requirements section will reserve 5000 MiB for CWL.

Avoid the following as it does not align with ICAv2 scratch space configuration.

  • Container overlay tmp path: /tmp

  • Legacy paths: /ephemeral

  • Environment Variables ($TMPDIR, $TEMP and $TMP)

  • Bash Command mktemp

  • CWL runtime.tmpdir

Compute Types

Daemon sets and system processes consume approximately 1 CPU and 2 GB Memory from the base values shown in the table. Consumption will vary based on the activity of the pod.

Compute Type

CPUs

Mem (GiB)

Nextflow (pod.value)

CWL (type, size)

standard-small

2

8

standard-small

standard, small

standard-medium

4

16

standard-medium

standard, medium

standard-large

8

32

standard-large

standard, large

standard-xlarge

16

64

standard-xlarge

standard, xlarge

standard-2xlarge

32

128

standard-2xlarge

standard, 2xlarge

standard-3xlarge

64

256

standard-3xlarge

standard, 3xlarge

hicpu-small

16

32

hicpu-small

hicpu, small

hicpu-medium

36

72

hicpu-medium

hicpu, medium

hicpu-large

72

144

hicpu-large

hicpu, large

himem-small

8

64

himem-small

himem, small

himem-medium

16

128

himem-medium

himem, medium

himem-large

48

384

himem-large

himem, large

himem-xlarge2

92

700

himem-xlarge

himem, xlarge

hiio-small

2

16

hiio-small

hiio, small

hiio-medium

4

32

hiio-medium

hiio, medium

fpga2-medium1

24

256

fpga2-medium

fpga2,medium

fpga-medium

16

244

fpga-medium

fpga, medium

fpga-large3

64

976

fpga-large

fpga, large

transfer-small4

4

10

transfer-small

transfer, small

transfer-medium 4

8

15

transfer-medium

transfer, medium

transfer-large4

16

30

transfer-large

transfer, large

1 DRAGEN pipelines running on fpga2 compute type will incur a DRAGEN License cost of 0.10 iCredits per gigabase of data processed, with additional discounts as shown below.

  • 80 or less gigabase per sample - no discount - 0.10 iCredits per gigabase

  • > 80 to 160 gigabase per sample - 20% discount - 0.08 iCredits per gigabase

  • > 160 to 240 gigabase per sample - 30% discount - 0.07 iCredits per gigabase

  • > 240 to 320 gigabase per sample - 40% discount - 0.06 iCredits per gigabase

  • > 320 and more gigabase per sample - 50% discount - 0.05 iCredits per gigabase

If your DRAGEN job fails, no DRAGEN license cost will be charged.

(2) The compute type himem-xlarge has low availability.

(3) The compute type fpga-large is only available in the US (use1) region. This compute type is not recommended as it suffers from low availability and offers little performance benefit over fpga-medium at significant additional cost.

(4) The transfer size selected is based on the selected storage size for compute type and used during upload and download system tasks.

Analysis Report (Graphical)

The pipeline analysis report appears in the pipeline execution results. The report is configured from widgets added to the Analysis Report tab in the pipeline editor.

  1. [Optional] Import widgets from another pipeline.

    1. Select Import from other pipeline.

    2. Select the pipeline that contains the report you want to copy.

    3. Select an import option: Replace current report or Append to current report.

    4. Select Import.

  2. From the Analysis Report tab, select Add widget, and then select a widget type.

  3. Configure widget details.

    Widget
    Settings

    Title

    Add and format title text.

    Analysis details

    Add heading text and select the analysis metadata details to display.

    Free text

    Add formatted free text. The widget includes options for placeholder variables that display the corresponding project values.

    Inline viewer

    Add options to view the content of an analysis output file.

    Analysis comments

    Add comments that can be edited after an analysis has been performed.

    Input details

    Add heading text and select the input details to display. The widget includes an option to group details by input name.

    Project details

    Add heading text and select the project details to display.

    Page break

    Add a page break widget where page breaks should appear between report sections.

  4. Select Save.

By default, compute nodes have no scratch space. This is an advanced setting and should only be used when absolutely necessary as it will incur additional costs and offers only limited performance benefits because it is not local to the compute node.

For better integration, use shared storage available at /ces. This shared storage is used when writing files with relative paths.

Daemon sets and system processes consume approximately 1 CPU and 2 GiB Memory from the base values shown in the table. Consumption will vary based on the activity of the pod.

Compute Type

CPUs

Mem (GiB)

Nextflow (pod.value)

CWL (type, size)

standard-small

2

8

standard-small

standard, small

standard-medium

4

16

standard-medium

standard, medium

standard-large

8

32

standard-large

standard, large

standard-xlarge

16

64

standard-xlarge

standard, xlarge

standard-2xlarge

32

128

standard-2xlarge

standard, 2xlarge

hicpu-small

16

32

hicpu-small

hicpu, small

hicpu-medium

36

72

hicpu-medium

hicpu, medium

hicpu-large

72

144

hicpu-large

hicpu, large

himem-small

8

64

himem-small

himem, small

himem-medium

16

128

himem-medium

himem, medium

himem-large

48

384

himem-large

himem, large

himem-xlarge2

92

700

himem-xlarge

himem, xlarge

hiio-small

2

16

hiio-small

hiio, small

hiio-medium

4

32

hiio-medium

hiio, medium

fpga2-medium1

24

256

fpga2-medium

fpga2,medium

fpga-medium

16

244

fpga-medium

fpga, medium

fpga-large3

64

976

fpga-large

fpga, large

transfer-small4

4

10

transfer-small

transfer, small

transfer-medium4

8

15

transfer-medium

transfer, medium

transfer-large4

16

30

transfer-large

transfer, large

1 DRAGEN pipelines running on fpga2 compute type will incur a DRAGEN License cost of 0.10 iCredits per gigabase of data processed, with additional discounts as shown below.

  • 80 or less gigabase per sample - no discount - 0.10 iCredits per gigabase

  • > 80 to 160 gigabase per sample - 20% discount - 0.08 iCredits per gigabase

  • > 160 to 240 gigabase per sample - 30% discount - 0.07 iCredits per gigabase

  • > 240 to 320 gigabase per sample - 40% discount - 0.06 iCredits per gigabase

  • > 320 and more gigabase per sample - 50% discount - 0.05 iCredits per gigabase

If your DRAGEN job fails, no DRAGEN license cost will be charged.

(*2) The compute type himem-xlarge has low availability.

Free Text Placeholders

The following placeholders can be used to insert project data.

Placeholder
Description

[[BB_PROJECT_NAME]]

The project name.

[[BB_PROJECT_OWNER]]

The project owner.

[[BB_PROJECT_DESCRIPTION]]

The project short description.

[[BB_PROJECT_INFORMATION]]

The project information.

[[BB_PROJECT_LOCATION]]

The project location.

[[BB_PROJECT_BILLING_MODE]]

The project billing mode.

[[BB_PROJECT_DATA_SHARING]]

The project data sharing settings.

[[BB_REFERENCE]]

The analysis reference.

[[BB_USERREFERENCE]]

The user analysis reference.

[[BB_PIPELINE]]

The name of the pipeline.

[[BB_USER_OPTIONS]]

The analysis user options.

[[BB_TECH_OPTIONS]]

The analysis technical options. Technical options include the TECH suffix and are not visible to end users.

[[BB_ALL_OPTIONS]]

All analysis options. Technical options include the TECH suffix and are not visible to end users.

[[BB_SAMPLE]]

The sample.

[[BB_REQUEST_DATE]]

The analysis request date.

[[BB_START_DATE]]

The analysis start date.

[[BB_DURATION]]

The analysis duration.

[[BB_REQUESTOR]]

The user requesting analysis execution.

[[BB_RUNSTATUS]]

The status of the analysis.

[[BB_ENTITLEMENTDETAIL]]

The used entitlement detail.

[[BB_METADATA:path]]

The value or list of values of a metadata field or multi-value fields.

Nextflow/CWL Files (Code)

Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list. The following formats are supported:

  • DIFF (.diff)

  • GROOVY (.groovy .nf)

  • JAVASCRIPT (.js .javascript)

  • JSON (.json)

  • SH (.sh)

  • SQL (.sql)

  • TXT (.txt)

  • XML (.xml)

  • YAML (.yaml .cwl)

If the file type is not recognized, it will default to text display. This can result in the application interpreting binary files as text when trying to display the contents.

Main.nf (Nextflow code)

The Nextflow project main script.

Nextflow.config (Nextflow code)

The Nextflow configuration settings.

Workflow.cwl (CWL code)

The Common Workflow Language main script.

Adding Files

Multiple files can be added by selecting the +Create option at the bottom of the screen to make pipelines more modular and manageable.

Metadata Model

See Metadata Models

Report

Here patterns for detecting report files in the analysis output can be defined. On opening an analysis result window of this pipeline, an additional tab will display these report files. The goal is to provide a pipeline-specific user-friendly representation of the analysis result.

To add a report select the + symbol on the left side. Provide your report with a unique name, a regular expression matching the report and optionally, select the format of the report. This must be the source format of the report data generated during the analysis.

There is a limit of 20 reports per report pattern which will be shown when you have multiple reports matching your regular expression.


Start a New Analysis

Use the following instructions to start a new analysis for a single pipeline.

  1. Select Projects > your_project > Flow > Pipelines.

  2. Select the pipeline or pipeline details of the pipeline you want to run.

  3. Select Start Analysis.

  4. Configure analysis settings.

  5. Select Start Analysis.

  6. View the analysis status on the Analyses page.

    • Requested—The analysis is scheduled to begin.

    • In Progress—The analysis is in progress.

    • Succeeded—The analysis is complete.

    • Failed —The analysis has failed.

    • Aborted — The analysis was aborted before completing.

  7. To end an analysis, select Abort.

  8. To perform a completed analysis again, select Re-run.

Analysis Tab

The Analysis tab provides options for configuring basic information about the analysis.

Field
Entry

User Reference

The unique analysis name.

User tags

One or more tags used to filter the analysis list. Select from existing tags or type a new tag name in the field.

Pricing

Select a subscription to which the analysis will be charged.

Notification

Enter your email address if you want to be notified when the analysis completes.

Output Folder

Select a folder in which the output folder of the analysis should be located. When no folder is selected, the output folder will be located in the root of the project. When you open the folder selection dialog, you have the option to create a new folder (bottom of the screen).

Input

Select the input files to use in the analysis. (max. 50,000)

Settings

Provide input settings.

Aborting Analyses

You can abort a running analysis from either the analysis overview (Projects > your_project > Flow > Analyses > your_analysis > Manage > Abort) or from the analysis details (Projects > your_project > Flow > Analyses > your_analysis > Details tab > Abort).

View Analysis Results

You can view analysis results on the Analyses page or in the output_folder on the Data page.

  1. Select a project, and then select the Flow > Analyses page.

  2. Select an analysis.

  3. On the Details tab, select the square symbol right of the output files.

  4. From the output files view, expand the list and select an output file.

    1. If you want to add or remove any user or technical tags, you can do so from the data details view.

    2. If you want to download the file, select Schedule download.

  5. To preview the file, select the View tab.

  6. Return to Flow > Analyses > your_analysis.

  7. View additional analysis result information on the following tabs:

    • Details - View information on the pipeline configuration.

    • Steps - stderr and stdout information

    • Nextflow Timeline - Nextflow process execution timeline.

    • Nextflow Execution - Nextflow analysis report. Showing the run times, commands, resource usage and tasks for Nextflow analyses.

    • Report - Shows the reports defined on the pipeline report tab.

Nextflow

ICA supports running pipelines defined using Nextflow. See this tutorial for an example.

In order to run Nextflow pipelines, the following process-level attributes within the Nextflow definition must be considered.

System Information

Info
Details

Nextflow version

20.10.0 (deprecated *), 22.04.3, 24.10.2 (Experimental)

Executor

Kubernetes

(*) Pipelines will still run when 20.10.0 will be deprecated, but you will no longer be able to choose it when creating new pipelines.

Nextflow Version

You can select the Nextflow version while building a pipeline as follows:

interface

GUI

Select the Nextflow version at Projects > your_project > flow > pipelines > your_pipeline > Details tab.

API

Select the Nextflow version by setting it in the optional field "pipelineLanguageVersionId". When not set, a default Nextflow version will be used for the pipeline.

Compute Node

For each compute type, you can choose between the scheduler.illumina.com/lifecycle: standard (default - AWS on-demand) or scheduler.illumina.com/lifecycle: economy (AWS spot instance) tiers.

Compute Type

To specify a compute type for a Nextflow process, use the pod directive within each process. Set the annotation to scheduler.illumina.com/presetSize and the value to the desired compute type. A list of available compute types can be found here. The default compute type, when this directive is not specified, is standard-small (2 CPUs and 8 GB of memory).

pod annotation: 'scheduler.illumina.com/presetSize', value: 'fpga-medium'

Often, there is a need to select the compute size for a process dynamically based on user input and other factors. The Kubernetes executor used on ICA does not use the cpu and memorydirectives, so instead, you can dynamically set the pod directive, as mentioned here. e.g.

process foo {
    // Assuming that params.compute_size is set to a valid size such as 'standard-small', 'standard-medium', etc.
    pod annotation: 'scheduler.illumina.com/presetSize', value: "${params.compute_size}"
}

Additionally, it can also be specified in the configuration file. Example configuration file:

// Set the default pod
pod = [
    annotation: 'scheduler.illumina.com/presetSize',
    value     : 'standard-small'
]

withName: 'big_memory_process' {
    pod = [
        annotation: 'scheduler.illumina.com/presetSize',
        value     : 'himem-large'
    ]
}

// Use an FPGA instance for dragen processes
withLabel: 'dragen' {
    pod = [
        annotation: 'scheduler.illumina.com/presetSize',
        value     : 'fpga-medium'
    ]
}

Inputs

Inputs are specified via the XML input form or JSON-based input form. The specified code in the XML will correspond to the field in the params object that is available in the workflow. Refer to the tutorial for an example.

Outputs

Outputs for Nextflow pipelines are uploaded from the out folder in the attached shared filesystem. The publishDir directive can be used to symlink (recommended), copy or move data to the correct folder. Data will be uploaded to the ICA project after the pipeline execution completes.

publishDir 'out', mode: 'symlink'
Nextflow version 20.10.10 (Deprecated)

For Nextflow version 20.10.10 on ICA, using the "copy" method in the publishDir directive for uploading output files that consume large amounts of storage may cause workflow runs to complete with missing files. The underlying issue is that file uploads may silently fail (without any error messages) during the publishDir process due to insufficient disk space, resulting in incomplete output delivery.

Solutions:

  1. Use "symlink" instead of "copy" in the publishDir directive. Symlinking creates a link to the original file rather than copying it, which doesn’t consume additional disk space. This can prevent the issue of silent file upload failures due to disk space limitations.

  2. Use Nextflow 22.04.0 or later and enable the "failOnError" publishDir option. This option ensures that the workflow will fail and provide an error message if there's an issue with publishing files, rather than completing silently without all expected outputs.

Nextflow Configuration

During execution, the Nextflow pipeline runner determines the environment settings based on values passed via the command-line or via a configuration file (see Nextflow Configuration documentation). When creating a Nextflow pipeline, use the nextflow.config tab in the UI (or API) to specify a nextflow configuration file to be used when launching the pipeline.

Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list.

nextflowconfig-0

If no Docker image is specified, Ubuntu will be used as default.

The following configuration settings will be ignored if provided as they are overridden by the system:

executor.name
executor.queueSize
k8s.namespace
k8s.serviceAccount
k8s.launchDir
k8s.projectDir
k8s.workDir
k8s.storageClaimName
k8s.storageMountPath
trace.enabled
trace.file
trace.fields
timeline.enabled
timeline.file
report.enabled
report.file
dag.enabled
dag.file

CWL

ICA supports running pipelines defined using Common Workflow Language (CWL).

Compute Type

To specify a compute type for a CWL CommandLineTool, use the ResourceRequirement with a custom namespace.

requirements:
    ResourceRequirement:
        https://platform.illumina.com/rdf/ica/resources:type: fpga
        https://platform.illumina.com/rdf/ica/resources:size: small 
        https://platform.illumina.com/rdf/ica/resources:tier: standard

Reference Compute Types for available compute types and sizes.

The ICA Compute Type will be determined automatically based on CWL ResourceRequirement coresMin/coresMax (CPU) and ramMin/ramMax (Memory) values using a "best fit" strategy to meet the minimum specified requirements (refer to the Compute Types table).

For example, take the following ResourceRequirements:

requirements:
    ResourceRequirement:
      ramMin: 10240
      coresMin: 6

This would result in a best fit of standard-large ICA Compute Type request for the tool.

  • If the specified requirements can not be met by any of the presets, the task will be rejected and failed.

  • FPGA requirements can not be set by means of CWL ResourceRequirements.

  • The Machine Profile Resource in the graphical editor will override whatever is set for requirements in the ResourceRequirement.

Considerations

If no Docker image is specified, Ubuntu will be used as default. Both : and / can be used as separator.

CWL Overrides

ICA supports overriding workflow requirements at load time using Command Line Interface (CLI) with JSON input. Please refer to CWL documentation for more details on the CWL overrides feature.

In ICA you can provide the "override" recipes as a part of the input JSON. The following example uses CWL overrides to change the environment variable requirement at load time.

icav2 projectpipelines start cwl cli-tutorial --data-id fil.a725a68301ee4e6ad28908da12510c25 --input-json '{
  "ipFQ": {
    "class": "File",
    "path": "test.fastq"
  },
  "cwltool:overrides": {
  "tool-fqTOfa.cwl": {
    "requirements": {
      "EnvVarRequirement": {
        "envDef": {
          "MESSAGE": "override_value"
          }
        }                                       
       }
      }
    }
}' --type-input JSON --user-reference overrides-example

XML Input Form

Pipelines defined using the "Code" mode require either an XML-based or JSON-based input form to define the fields shown on the launch view in the user interface (UI). The XML-based input form is defined in the "XML Configuration" tab of the pipeline editing view.

The input form XML must adhere to the input form schema.

Empty Form

During the creation of a Nextflow pipeline the user is given an empty form to fill out.

<pipeline code="" version="1.0" xmlns="xsd://www.illumina.com/ica/cp/pipelinedefinition">
    <dataInputs>
    </dataInputs>
    <steps>
    </steps>
</pipeline>

Files

The input files are specified within a single DataInputs node. An individual input is then specified in a separate DataInput node. A DataInput node contains following attributes:

  • code: an unique id. Required.

  • format: specifying the format of the input: FASTA, TXT, JSON, UNKNOWN, etc. Multiple entries are possible: example below. Required.

  • type: is it a FILE or a DIRECTORY? Multiple entries are not allowed. Required.

  • required: is this input required for the execution of a pipeline? Required.

  • multiValue: are multiple files as an input allowed? Required.

  • dataFilter: TBD. Optional.

Additionally, DataInput has two elements: label for labelling the input and description for a free text description of the input.

Single file input

An example of a single file input which can be in a TXT, CSV, or FASTA format.

        <pd:dataInput code="in" format="TXT, CSV, FASTA" type="FILE" required="true" multiValue="false">
            <pd:label>Input file</pd:label>
            <pd:description>Input file can be either in TXT, CSV or FASTA format.</pd:description>
        </pd:dataInput>

Folder as an input

To use a folder as an input the following form is required:

    <pd:dataInput code="fastq_folder" format="UNKNOWN" type="DIRECTORY" required="false" multiValue="false">
         <pd:label>fastq folder path</pd:label>
        <pd:description>Providing Fastq folder</pd:description>
    </pd:dataInput>

Multiple files as an input

For multiple files, set the attribute multiValue to true. This will make it so the variable is considered to be of type list [], so adapt your pipeline when changing from single value to multiValue.

<pd:dataInput code="tumor_fastqs" format="FASTQ" type="FILE" required="false" multiValue="true">
    <pd:label>Tumor FASTQs</pd:label>
    <pd:description>Tumor FASTQ files to be provided as input. FASTQ files must have "_LXXX" in its filename to denote the lane and "_RX" to denote the read number. If either is omitted, lane 1 and read 1 will be used in the FASTQ list. The tool will automatically write a FASTQ list from all files provided and process each sample in batch in tumor-only mode. However, for tumor-normal mode, only one sample each can be provided.
    </pd:description>
</pd:dataInput>

Settings

Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc. The following hierarchy of nodes must be followed: steps > step > tool > parameter. The parameter node must contain following attributes:

  • code: unique id. This is the parameter name that is passed to the workflow

  • minValues: how many values (at least) should be specified for this setting. If this setting is required, minValues should be set to 1.

  • maxValues: how many values (at most) should be specified for this setting

  • classification: is this setting specified by the user?

In the code below a string setting with the identifier inp1 is specified.

    <pd:steps>
        <pd:step execution="MANDATORY" code="General">
            <pd:label>General</pd:label>
            <pd:description>General parameters</pd:description>
            <pd:tool code="generalparameters">
                <pd:label>generalparameters</pd:label>
                <pd:description></pd:description>
                <pd:parameter code="inp1" minValues="1" maxValues="3" classification="USER">
                    <pd:label>inp1</pd:label>
                    <pd:description>first</pd:description>
                    <pd:stringType/>
                    <pd:value></pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>

Examples of the following types of settings are shown in the subsequent sections. Within each type, the value tag can be used to denote a default value in the UI, or can be left blank to have no default. Note that setting a default value has no impact on analyses launched via the API.

Integers

For an integer setting the following schema with an element integerType is to be used. To define an allowed range use the attributes minimumValue and maximumValue.

<pd:parameter code="ht_seed_len" minValues="0" maxValues="1" classification="USER">
    <pd:label>Seed Length</pd:label>
    <pd:description>Initial length in nucleotides of seeds from the reference genome to populate into the hash table. Consult the DRAGEN manual for recommended lengths. Corresponds to DRAGEN argument --ht-seed-len.
    </pd:description>
    <pd:integerType minimumValue="10" maximumValue="50"/>
    <pd:value>21</pd:value>
</pd:parameter>

Options

Options types can be used to designate options from a drop-down list in the UI. The selected option will be passed to the workflow as a string. This currently has no impact when launching from the API, however.

<pd:parameter code="cnv_segmentation_mode" minValues="0" maxValues="1" classification="USER">
    <pd:label>Segmentation Algorithm</pd:label>
    <pd:description> DRAGEN implements multiple segmentation algorithms, including the following algorithms, Circular Binary Segmentation (CBS) and Shifting Level Models (SLM).
    </pd:description>
    <pd:optionsType>
        <option>CBS</option>
        <option>SLM</option>
        <option>HSLM</option>
        <option>ASLM</option>
    </pd:optionsType>
    <pd:value>false</pd:value>
</pd:parameter>

Option types can also be used to specify a boolean, for example

<pd:parameter code="output_format" minValues="1" maxValues="1" classification="USER">
    <pd:label>Map/Align Output</pd:label>
    <pd:description></pd:description>
    <pd:optionsType>
        <pd:option>BAM</pd:option>
        <pd:option>CRAM</pd:option>
    </pd:optionsType>
    <pd:value>BAM</pd:value>
</pd:parameter>

Strings

For a string setting the following schema with an element stringType is to be used.

<pd:parameter code="output_file_prefix" minValues="1" maxValues="1" classification="USER">
    <pd:label>Output File Prefix</pd:label>
    <pd:description></pd:description>
    <pd:stringType/>
    <pd:value>tumor</pd:value>
</pd:parameter>

Booleans

For a boolean setting, booleanType can be used.

<pd:parameter code="quick_qc" minValues="0" maxValues="1" classification="USER">
    <pd:label>quick_qc</pd:label>
    <pd:description></pd:description>
    <pd:booleanType/>
    <pd:value></pd:value>
</pd:parameter>

Limitations

One known limitation of the schema presented above is the inability to specify a parameter that can be multiple type, e.g. File or String. One way to implement this requirement would be to define two optional parameters: one for File input and the second for String input. At the moment ICA UI doesn't validate whether at least one of these parameters is populated - this check can be done within the pipeline itself.

Below one can find both a main.nf and XML configuration of a generic pipeline with two optional inputs. One can use it as a template to address similar issues. If the file parameter is set, it will be used. If the str parameter is set but file is not, the str parameter will be used. If neither of both is used, the pipeline aborts with an informative error message.

nextflow.enable.dsl = 2

// Define parameters with default values
params.file = false
params.str = false

// Check that at least one of the parameters is specified
if (!params.file && !params.str) {
    error "You must specify at least one input: --file or --str"
}

process printInputs {
    
    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'

    input:
    file(input_file)

    script:
    """
    echo "File contents:"
    cat $input_file
    """
}

process printInputs2 {

    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'

    input:
    val(input_str)

    script:
    """
    echo "String input: $input_str"
    """
}

workflow {
    if (params.file) {
        file_ch = Channel.fromPath(params.file)
        file_ch.view()
        str_ch = Channel.empty()
        printInputs(file_ch)
    }
    else {
        file_ch = Channel.empty()
        str_ch = Channel.of(params.str)
        str_ch.view()
        file_ch.view()
        printInputs2(str_ch)
    } 
}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
    <pd:dataInputs>
        <pd:dataInput code="file" format="TXT" type="FILE" required="false" multiValue="false">
            <pd:label>in</pd:label>
            <pd:description>Generic file input</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps>
        <pd:step execution="MANDATORY" code="general">
            <pd:label>General Options</pd:label>
            <pd:description locked="false"></pd:description>
            <pd:tool code="general">
                <pd:label locked="false"></pd:label>
                <pd:description locked="false"></pd:description>
                <pd:parameter code="str" minValues="0" maxValues="1" classification="USER">
                    <pd:label>String</pd:label>
                    <pd:description></pd:description>
                    <pd:stringType/>
                    <pd:value>string</pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>
</pd:pipeline>

JSON-Based input forms

Introduction

Pipelines defined using the "Code" mode require an XML or JSON-based input form to define the fields shown on the launch view in the user interface (UI).

To create a JSON-based Nextflow (or CWL) pipeline, go to Projects > your_project > Flow > Pipelines > +Create > Nextflow (or CWL) > JSON-based.

Three files, located on the inputform files tab, work together for evaluating and presenting JSON-based input.

  • inputForm.json contains the actual input form which is rendered when starting the pipeline run.

  • onRender.js is triggered when a value is changed.

  • onSubmit.js is triggered when starting a pipeline via the GUI or API.

Use + Create to add additional files and Simulate to test your inputForms.

Scripting execution supports crossfield validation of the values, hiding fields, making them required, .... based on value changes.


inputForm.json

The JSON schema allowing you to define the input parameters. See the inputForm.json page for syntax details.

Parameter types

Type
Usage

textbox

Corresponds to stringType in xml.

checkbox

A checkbox that supports the option of being required, so can serve as an active consent feature. (corresponds to the booleanType in xml).

radio

A radio button group to select one from a list of choices. The values to choose from must be unique.

select

A dropdown selection to select one from a list of choices. This can be used for both single-level lists and tree-based lists.

number

The value is of Number type in javascript and Double type in java. (corresponds to doubleType in xml).

integer

Corresponds to java Integer.

data

Data such as files.

section

For splitting up fields, to give structure. Rendered as subtitles. No values are to be assigned to these fields.

text

To display informational messages. No values are to be assigned to these fields.

fieldgroup

Can contain parameters or other groups. Allows to have repeating sets of parameters, for instance when a father|mother|child choice needs to be linked to each file input. So if you want to have the same elements multiple times in your form, combine them into a fieldgroup.

Parameter Attributes

These attributes can be used to configure all parameter types.

Attribute
Purpose

label

The display label for this parameter. Optional but recommended, id will be used if missing.

minValues

The minimal amount of values that needs to be present. Default when not set is 0. Set to >=1 to make the field required.

maxValues

The maximal amount of values that need to be present. Default when not set is 1.

minMaxValuesMessage

The error message displayed when minValues or maxValues is not adhered to. When not set, a default message is generated.

helpText

A helper text about the parameter. Will be displayed in smaller font with the parameter.

placeHolderText

An optional short hint ( a word or short phrase) to aid the user when the field has no value.

value

The value of the parameter. Can be considered default value.

minLength

Only applied on type="textbox". Value is a positive integer.

maxLength

Only applied on type="textbox". Value is a positive integer.

min

Minimal allowed value for 'integer' and 'number' type.

  • for 'integer' type fields the minimal and maximal values are -100000000000000000 and 100000000000000000.

  • for 'number' type fields the max precision is 15 significant digits and the exponent needs to be between -300 and +300.

max

Maximal allowed value for 'integer' and 'number' type.

  • for 'integer' type fields the minimal and maximal values are -100000000000000000 and 100000000000000000.

  • for 'number' type fields the max precision is 15 significant digits and the exponent needs to be between -300 and +300.

choices

A list of choices with for each a "value", "text" (is label), "selected" (only 1 true supported), "disabled". "parent" can be used to build hierarchical choicetrees. "availableWhen" can be used for conditional presence of the choice based on values of other fields. Parent and value must be unique, you can not use the same value for both.

fields

The list of sub fields for type fieldgroup.

dataFilter

For defining the filtering when type is 'data'. nameFilter, dataFormat and dataType are additional properties.

regex

The regex pattern the value must adhere to. Only applied on type="textbox".

regexErrorMessage

The optional error message when the value does not adhere to the "regex". A default message will be used if this parameter is not present. It is highly recommended to set this as the default message will show the regex which is typically very technical.

hidden

Makes this parameter hidden. Can be made visible later in onRender.js or can be used to set hardcoded values of which the user should be aware.

disabled

Shows the parameter but makes editing it impossible. The value can still be altered by onRender.js.

emptyValuesAllowed

When maxValues is 1 or not set and emptyValuesAllowed is true, the values may contain null entries. Default is false.

updateRenderOnChange

When true, the onRender javascript function is triggered each time the user changes the value of this field. Default is false.

dropValueWhenDisabled

When this is present and true and the field has disabled being true, then the value will be omitted during the submit handling (on the onSubmit result).

Tree structure example

"choices" can be used for a single list or for a tree-structured list. See below for an example for how to set up a tree structure.

{
  "fields": [
    {
      "id": "myTreeList",
      "type": "select",
      "label": "Selection Tree Example",
      "choices": [
        {
          "text": "trunk",
          "value": "treetrunk"
        },
        {
          "text": "branch",
          "value": "treebranch",
          "parent":"treetrunk"
        },
        {
          "text": "leaf",
          "value": "treeleaf",
          "parent":"treebranch"
        },
        {
          "text": "bird",
          "value": "happybird",
          "parent":"treebranch"
        },
        {
          "text": "cat",
          "value": "cat",
          "parent": "treetrunk",
          "disabled": true
        }
      ],
      "minValues": 1,
      "maxValues": 3,
      "helpText": "This is a tree example"
    }
  ]
}

Experimental Features

Feature

Streamable inputs

Adding "streamable":true to an input field of type "data" makes it a streamable input.


onSubmit.js

The onSubmit.js javascript function receives an input object which holds information about the chosen values of the input form and the pipeline and pipeline execution request parameters. This javascript function is not only triggered when submitting a new pipeline execution request in the user interface, but also when submitting one through the rest API..

Input parameters

Value
Meaning

settings

The value of the setting fields. Corresponds to settingValues in the onRender.js. This is a map with field id as key and an array of field values as value. For convenience, values of single-value fields are present as the individual value and not as an array of length 1. In case of fieldGroups, the value can be multiple levels of arrays.

settingValues

To maximize the opportunity for reusing code between onRender and onSubmit, the 'settings' are also exposed as settingValues like in the onRender input.

pipeline

Info about the pipeline: code, tenant, version, and description are all available in the pipeline object as string.

analysis

Info about this run: userReference, userName, and userTenant are all available in the analysis object as string.

storageSize

The storage size as chosen by the user. This will initially be null. StorageSize is an object containing an 'id' and 'name' property.

storageSizeOptions

The list of storage sizes available to the user when creating an analysis. Is a list of StorageSize objects containing an 'id' and 'name' property.

analysisSettings

The input form json as saved in the pipeline. So the original json, without eventual changes.

currentAnalysisSettings

The current input form JSON as rendered to the user. This can contain already applied changes form earlier onRender passes. Null in the first call, when context is 'Initial' or when analysis is created through CLI/API.

Return values (taken from the response object)

Value
Meaning

settings

The value of the setting fields. This allows modifying the values or applying defaults and such. Or taking info of the pipeline or analysis input object. When settings are not present in the onSubmit return value object, they are assumed to be not modified.

validationErrors

A list of AnalysisError essages representing validation errors. Submitting a pipeline execution request is not possible while there are still validation errors.

analysisSettings

The input form json with potential applied changes. The discovered changes will be applied in the UI when viewing the analysis.

AnalysisError

This is the object used for representing validation errors.

Value
Meaning

fieldId / FieldId

The field which has an erroneous value. When not present, a general error/warning is displayed. To display an error on the storage size, use the storageSizeFieldid.

index / Index

The 0-starting index of the value which is incorrect. Use this when a particular value of a multivalue field is not correct. When not present, the entire field is marked as erroneous. The value can also be an array of indexes for use with fieldgroups. For instance, when the 3rd field of the 2nd instance of a fieldgroup is erroneous, a value of [ 1 , 2 ] is used.

message / Message

The error/warning message to display.


onRender.js

Receives an input object which contains information about the current state of the input form, the chosen values and the field value change that triggered the onrender call. It also contains pipeline information. Changed objects are present in the onRender return value object. Any object not present is considered to be unmodified. Changing the storage size in the start analysis screen triggers an onRender execution with storageSize as changed field.

Input Parameters

context

"Initial"/"FieldChanged"/"Edited".

  • Initial is the value when first displaying the form when a user opens the start run screen.

  • The value is FieldChanged when a field with 'updateRenderOnChange'=true is changed by the user.

  • Edited (Not yet supported in ICA) is used when a form is displayed later again, this is intended for draft runs or when editing the form during reruns.

changedFieldId

The id of the field that changed and which triggered this onRender call. context will be FieldChanged. When the storage size is changed, the fieldId will be storageSize.

analysisSettings

The input form json as saved in the pipeline. This is the original json, without changes.

currentAnalysisSettings

The current input form json as rendered to the user. This can contain already applied changes form earlier onRender passes. Null in the first call, when context is Initial.

settingValues

The current value of all settings fields. This is a map with field id as key and an array of field values as value for multivalue fields. For convenience, values of single-value fields are present as the individual value and not as an array of length 1. In case of fieldGroups, the value can be multiple levels of arrays.

pipeline

Information about the pipeline: code, tenant, version, and description are all available in the pipeline object as string.

analysis

Information about this run: userReference, userName, and userTenant are all available in the analysis object as string.

storageSize

The storage size as chosen by the user. This will initially be null. StorageSize is an object containing an 'id' and 'name' property.

storageSizeOptions

The list of storage sizes available to the user when creating an analysis. Is a list of StorageSize objects containing an 'id' and 'name' property.

Return values (taken from the response object)

Value
Meaning

analysisSettings

The input form json with potential applied changes. The discovered changes will be applied in the UI.

settingValues

The current, potentially altered map of all setting values. These will be updated in the UI.

validationErrors

A list of RenderMessages representing validation errors. Submitting a pipeline execution request is not possible while there are still validation errors.

validationWarnings

A list of RenderMessages representing validation warnings. A user may choose to ignore these validation warnings and start the pipeline execution request.

storageSize

The suitable value for storageSize. Must be one of the options of input.storageSizeOptions. When absent or null, it is ignored.

validation errors and validation warnings can use 'storageSize' as fieldId to let an error appear on the storage size field. 'storageSize' is the value of the changedFieldId when the user alters the chosen storage size.

RenderMessage

This is the object used for representing validation errors and warnings. The attributes can be used with first letter lowercase (consistent with the input object attributes) or uppercase.

Value
Meaning

fieldId / FieldId

The field which has an erroneous value. When not present, a general error/warning is displayed. To display an error on the storage size, use the storageSizeFieldid.

index / Index

The 0-starting index of the value which is incorrect. Use this when a particular value of a multivalue field is not correct. When not present, the entire field is marked as erroneous. The value can also be an array of indexes for use with fieldgroups. For instance, when the 3rd field of the 2nd instance of a fieldgroup is erroneous, a value of [ 1 , 2 ] is used.

message / Message

The error/warning message to display.

InputForm.json Syntax

{
  "$id": "#ica-pipeline-input-form",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "ICA Pipeline Input Forms",
  "description": "Describes the syntax for defining input setting forms for ICA pipelines",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "fields": {
      "description": "The list of setting fields",
      "type": "array",
      "items": {
        "$ref": "#/definitions/ica_pipeline_input_form_field"
      }
    }
  },
  "required": [
    "fields"
  ],
  "definitions": {
    "ica_pipeline_input_form_field": {
      "$id": "#ica_pipeline_input_form_field",
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "id": {
          "description": "The unique identifier for this field. Will be available with this key to the pipeline script.",
          "type": "string",
          "pattern": "^[a-zA-Z-0-9\\-_\\.\\s\\+\\[\\]]+$"
        },
        "type": {
          "type": "string",
          "enum": [
            "textbox",
            "checkbox",
            "radio",
            "select",
            "number",
            "integer",
            "data",
            "section",
            "text",
            "fieldgroup"
          ]
        },
        "label": {
          "type": "string"
        },
        "minValues": {
          "description": "The minimal amount of values that needs to be present. Default is 0 when not provided. Set to >=1 to make the field required.",
          "type": "integer",
          "minimum": 0
        },
        "maxValues": {
          "description": "The maximal amount of values that needs to be present. Default is 1 when not provided.",
          "type": "integer",
          "exclusiveMinimum": 0
        },
        "minMaxValuesMessage": {
          "description": "The error message displayed when minValues or maxValues is not adhered to. When not provided a default message is generated.",
          "type": "string"
        },
        "helpText": {
          "type": "string"
        },
        "placeHolderText": {
          "description": "An optional short hint (a word or short phrase) to aid the user when the field has no value."
          "type": "string"
        },
        "value": {
         "description": "The value for the field. Can be an array for multi-value fields. 
          For 'number' type values the exponent needs to be between -300 and +300 and max precision is 15. 
          For 'integer' type values the value needs to between -100000000000000000 and 100000000000000000"
         },
        "minLength": {
          "type": "integer",
          "minimum": 0
        },
        "maxLength": {
          "type": "integer",
          "exclusiveMinimum": 0
        },
        "min": {
          "description": "Minimal allowed value for 'integer' and 'number' type. Exponent needs to be between -300 and +300 and max precision is 15.",
          "type": "number"
        },
        "max": {
          "description": "Maximal allowed value for 'integer' and 'number' type. Exponent needs to be between -300 and +300 and max precision is 15.",
          "type": "number"
        },
        "choices": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/ica_pipeline_input_form_field_choice"
          }
        },
        "fields": {
          "description": "The list of setting sub fields for type fieldgroup",
          "type": "array",
          "items": {
            "$ref": "#/definitions/ica_pipeline_input_form_field"
          }
        },
        "dataFilter": {
          "description": "For defining the filtering when type is 'data'.",
          "type": "object",
          "additionalProperties": false,
          "properties": {
            "nameFilter": {
              "description": "Optional data filename filter pattern that input files need to adhere to when type is 'data'. Eg parts of the expected filename",
              "type": "string"
            },
            "dataFormat": {
              "description": "Optional dataformat name array that input files need to adhere to when type is 'data'",
              "type": "array",
              "contains": {
                "type": "string"
              }
            },
            "dataType": {
              "description": "Optional data type (file or directory) that input files need to adhere to when type is 'data'",
              "type": "string",
              "enum": [
                "file",
                "directory"
              ]
            }
          }
        },
        "regex": {
          "type": "string"
        },
        "regexErrorMessage": {
          "type": "string"
        },
        "hidden": {
          "type": "boolean"
        },
        "disabled": {
          "type": "boolean"
        },
        "emptyValuesAllowed": {
          "type": "boolean",
          "description": "When maxValues is greater than 1 and emptyValuesAllowed is true, the values may contain null entries. Default is false."
        },
        "updateRenderOnChange": {
          "type": "boolean",
          "description": "When true, the onRender javascript function is triggered ech time the user changes the value of this field. Default is false."
        }
        "streamable": {
          "type": "boolean",
          "description": "EXPERIMENTAL PARAMETER! Only possible for fields of type 'data'. When true, the data input files will be offered in streaming mode to the pipeline instead of downloading them."
        },
      },
      "required": [
        "id",
        "type"
      ],
      "allOf": [
        {
          "if": {
            "description": "When type is 'textbox' then 'dataFilter', 'fields', 'choices', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "textbox"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "choices",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'checkbox' then 'dataFilter', 'fields', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "checkbox"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "choices",
                  "placeHolderText",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'radio' then 'dataFilter', 'fields', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "radio"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "placeHolderText",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'select' then 'dataFilter', 'fields', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "select"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'number' or 'integer' then 'dataFilter', 'fields', 'choices', 'regex', 'regexErrorMessage', 'maxLength' and 'minLength' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "number",
                  "integer"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "choices",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'data' then 'dataFilter' is required and 'fields', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "data"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "required": [
              "dataFilter"
            ],
            "propertyNames": {
              "not": {
                "enum": [
                  "fields",
                  "choices",
                  "placeHolderText",
                  "regex",
                  "regexErrorMessage",
                  "max",
                  "min",
                  "maxLength",
                  "minLength"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'section' or 'text' then 'disabled', 'fields', 'updateRenderOnChange', 'classification', 'value', 'minValues', 'maxValues', 'minMaxValuesMessage', 'dataFilter', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "section",
                  "text"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "disabled",
                  "fields",
                  "updateRenderOnChange",
                  "classification",
                  "value",
                  "minValues",
                  "maxValues",
                  "minMaxValuesMessage",
                  "dataFilter",
                  "choices",
                  "regex",
                  "placeHolderText",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'fieldgroup' then 'fields' is required and then 'dataFilter', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "fieldgroup"
                ]
              }
            },
            "required": [
              "type",
              "fields"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "choices",
                  "placeHolderText",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        }
      ]
    },
    "ica_pipeline_input_form_field_choice": {
      "$id": "#ica_pipeline_input_form_field_choice",
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "value": {
        "description": "The value which will be set when selecting this choice. Must be unique over the choices within a field",
        },
        "text": {
          "description": "The display text for this choice, similar as the label of a field. ",
          "type": "string"
        },
        "selected": {
          "description": "Optional. When true, this choice value is picked as default selected value. 
          As in selected=true has precedence over an eventual set field 'value'. 
          For clarity it's better however not to use 'selected' but use field 'value' as is used to set default values for the other field types. 
          Only maximum 1 choice may have selected true.",
          "type": "boolean"
        },
        "disabled": {
          "type": "boolean"
        },
        "parent": {
          "description": "Value of the parent choice item. Can be used to build hierarchical choice trees."
        }
      },
      "required": [
        "value",
        "text"
      ]
    }
  }
}

JSON Scatter Gather Pipeline

Let's create the Nextflow Scatter Gather pipeline with a JSON input form.

Pay close attention to uppercase and lowercase characters when creating pipelines.

Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create > Nextflow > JSON based button to start creating a Nextflow pipeline.

In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.

Nextflow files

split.nf

First, we present the individual processes. Select Nextflow files > + Create and label the file split.nf. Copy and paste the following definition.

process split {
    cpus 1
    memory '512 MB'
    
    input:
    path x
    
    output:
    path("split.*.tsv")
    
    """
    split -a10 -d -l3 --numeric-suffixes=1 --additional-suffix .tsv ${x} split.
    """
    }

sort.nf

Next, select +Create and name the file sort.nf. Copy and paste the following definition.

process sort {
    cpus 1
    memory '512 MB'
    
    input:
    path x
    
    output:
    path '*.sorted.tsv'
    
    """
    sort -gk1,1 $x > ${x.baseName}.sorted.tsv
    """
}

merge.nf

Select +Create again and label the file merge.nf. Copy and paste the following definition.

process merge {
  cpus 1
  memory '512 MB'
 
  publishDir 'out', mode: 'move'
 
  input:
  path x
 
  output:
  path 'merged.tsv'
 
  """
  cat $x > merged.tsv
  """
}

main.nf

Edit the main.nf file by navigating to the Nextflow files > main.nf tab and copying and pasting the following definition.

nextflow.enable.dsl=2
 
include { sort } from './sort.nf'
include { split } from './split.nf'
include { merge } from './merge.nf'
 
params.myinput = "test.test"
 
workflow {
    input_ch = Channel.fromPath(params.myinput)
    split(input_ch)
    sort(split.out.flatten())
    merge(sort.out.collect())
}

Here, the operators flatten and collect are used to transform the emitting channels. The Flatten operator transforms a channel in such a way that every item of type Collection or Array is flattened so that each single entry is emitted separately by the resulting channel. The collect operator collects all the items emitted by a channel to a List and return the resulting object as a sole emission.

Inputform files

On the Inputform files tab, edit the inputForm.json to allow selection of a file.

inputForm.json

{
  "fields": [
    {
      "id": "myinput",
      "label": "myinput",
      "type": "data",
      "dataFilter": {
        "dataType": "file",
        "dataFormat": ["TSV"]
      },
      "maxValues": 1,
      "minValues": 1
    }
  ]
}

Click the Simulate button (at the bottom of the text editor) to preview the launch form fields.

The onSubmit.js and onRender.js can remain with their default scripts and are just shown here for reference.

onSubmit.js

function onSubmit(input) {
    var validationErrors = [];

    return {
        'settings': input.settings,
        'validationErrors': validationErrors
    };
}

onRender.js

function onRender(input) {

    var validationErrors = [];
    var validationWarnings = [];

    if (input.currentAnalysisSettings === null) {
        //null first time, to use it in the remainder of he javascript
        input.currentAnalysisSettings = input.analysisSettings;
    }

    switch(input.context) {
        case 'Initial': {
            renderInitial(input, validationErrors, validationWarnings);
            break;
        }
        case 'FieldChanged': {
            renderFieldChanged(input, validationErrors, validationWarnings);
            break;
        }
        case 'Edited': {
            renderEdited(input, validationErrors, validationWarnings);
            break;
        }
        default:
            return {};
    }

    return {
        'analysisSettings': input.currentAnalysisSettings,
        'settingValues': input.settingValues,
        'validationErrors': validationErrors,
        'validationWarnings': validationWarnings
    };
}

function renderInitial(input, validationErrors, validationWarnings) {
}

function renderEdited(input, validationErrors, validationWarnings) {
}

function renderFieldChanged(input, validationErrors, validationWarnings) {
}

function findField(input, fieldId){
    var fields = input.currentAnalysisSettings['fields'];
    for (var i = 0; i < fields.length; i++){
        if (fields[i].id === fieldId) {
            return fields[i];
        }
    }
    return null;
}

Click the Save button to save the changes.

Tips and Tricks

Developing on the cloud incurs inherent runtime costs due to compute and storage used to execute workflows. Here are a few tips that can facilitate development.

  • Leverage the cross-platform nature of these workflow languages. Both CWL and Nextflow can be run locally in addition to on ICA. When possible, testing should be performed locally before attempting to run in the cloud. For Nextflow, configuration files can be utilized to specify settings to be used either locally or on ICA. An example of advanced usage of a config would be applying the scratch directive to a set of process names (or labels) so that they use the higher performance local scratch storage attached to an instance instead of the shared network disk,

    withName: 'process1|process2|process3' { scratch = '/scratch/' }
    withName: 'process3' { stageInMode = 'copy' } // Copy the input files to scratch instead of symlinking to shared network disk
  • When trying to test on the cloud, it's oftentimes beneficial to create scripts to automate the deployment and launching / monitoring process. This can be performed either using the ICA CLI or by creating your own scripts integrating with the REST API.

  • For scenarios in which instances are terminated prematurely (for example, while using spot instances) without warning, you can implement scripts like the following to retry the job a certain number of times. Adding the following script to 'nextflow.config' enables five retries for each job, with increasing delays between each try.

    process {
        maxRetries = 4
        errorStrategy = { sleep(task.attempt * 60000 as long); return'retry'} // Retry with increasing delay
    }

    Note: Adding the retry script where it is not needed might introduce additional delays.

  • When hardening a Nextflow to handle resource shortages (for example exit code 2147483647), an immediate retry will in most circumstances fail because the resources have not yet been made available. It is best practice to use Dynamic retry with backoff which has an increasing backoff delay, allowing the system time to provide the necessary resources.

  • When publishing your Nextflow pipeline, make sure your have defined a container such as 'public.ecr.aws/lts/ubuntu:22.04' and are not using the default container 'ubuntu:latest'.

  • To limit potential costs, there is a timeout of 96 hours: if the analysis does not complete within four days, it will go to a 'Failed' state. This time begins to count as soon as the input data is being downloaded. This takes place during the ICA 'Requested' step of the analysis, before going to 'In Progress'. In case parallel tasks are executed, running time is counted once. As an example, let's assume the initial period before being picked up for execution is 10 minutes and consists of the request, queueing and initializing. Then, the data download takes 20 minutes. Next, a task runs on a single node for 25 minutes, followed by 10 minutes of queue time. Finally, three tasks execute simultaneously, each of them taking 25, 28, and 30 minutes, respectively. Upon completion, this is followed by uploading the outputs for one minute. The overall analysis time is then 20 + 25 + 10 + 30 (as the longest task out of three) + 1 = 86 minutes:

Analysis task

request

queued

initializing

input download

single task

queue

parallel tasks

generating outputs

completed

96 hour limit

1m (not counted)

7m (not counted)

2m (not counted)

20m

25m

10m

30m

1m

-

Status in ICA

status requested

status queued

status initializing

status preparing inputs

status in progress

status in progress

status in progress

status generating outputs

status succeeded

If there are no available resources or your project priority is low, the time before download commences will be substantially longer.

  • By default, Nextflow will not generate the trace report. If you want to enable generating the report, add the section below to your userNextflow.config file.

trace.enabled = true
trace.file = '.ica/user/trace-report.txt'
trace.fields = 'task_id,hash,native_id,process,tag,name,status,exit,module,container,cpus,time,disk,memory,attempt,submit,start,complete,duration,realtime,queue,%cpu,%mem,rss,vmem,peak_rss,peak_vmem,rchar,wchar,syscr,syscw,read_bytes,write_bytes,vol_ctxt,inv_ctxt,env,workdir,script,scratch,error_action'
  1. Useful Links

    • Nextflow on Kubernetes: Best Practices

    • The State of Kubernetes in Nextflow