1 of 11

Flow

Flow provides tooling for building and running secondary analysis pipelines. The platform supports analysis workflows constructed using Common Workflow Language (CWL) and Nextflow. Each step of an analysis pipeline executes a containerized application using inputs passed into the pipeline or output from previous steps.

You can configure the following components in Illumina Connected Analytics Flow:

Reference Data — Reference Data for Graphical CWL flows. See .
Pipelines — One or more tools configured to process input data and generate output files. See .
Analyses — Launched instance of a pipeline with selected input data. See .

Reference Data

To use a reference set from within a project, you have first to add it. From the project's page select Flow > Reference Data > Manage > +Add to project. Then select a reference set to add to your project. You can select the entire reference set, or click the arrow next to it to expand it. After expanding, scroll to the right, to see the individual reference files in the set. You can select individual reference files to add to your project, by checking the boxes next to them.

Note: Reference sets are only supported in Graphical CWL pipelines.

Copying Reference Data to other Regions

Navigate to Reference Data (outside of Project context).
Select the data set(s) you wish to add to another region and select Actions > Copy to another project.
Select a project located in the region where you want to add your reference data.
You can check in which region(s) Reference data is present by double-clicking on individual files in the Reference set and viewing Copy Details on the Data details tab.
Allow a few minutes for new copies to become available before use.

Note: You only need one copy of each reference data set per region. Adding Reference Data sets to additional projects set in the same region does not result in extra copies, but creates links instead. This is done from inside the project at Projects > <your_project> > Flow > Reference Data > Manage > Add to project.

Creating a Pipeline with Reference Data

To create a pipeline with a reference data use the CWL - graphical mode (important restriction: as of now you cannot use reference data for pipelines created in advanced mode). Use the reference data icon instead of regular input icon. On the right hand side use the Reference files submenu to specify the name, the format, and the filters. You can specify the options for an end-user to choose from and a default selection. You can select more than 1 file, but you can only select 1 at a time (so, repeat process to select multiple reference files). If you only select 1 reference file, that file will be the only one users can use with your pipeline. In the screenshot a reference data with two options is presented.

If your pipeline was built to give users the option of choosing among multiple input reference files, they will see the option to select among the reference files you configured, under Settings.

After clicking the magnifying glass icon the user can select from provided options.

Pipelines

A Pipeline is a series of Tools with connected inputs and outputs configured to execute in a specific order.

Create a Pipeline

Pipelines are created and stored within projects.

Navigate to Projects > your_project > Flow > Pipelines > +Create.
Select CWL Graphical, CWL code (XML / JSON) or Nextflow (XML / JSON) to create a new Pipeline.
Configure pipeline settings in the pipeline property tabs.
When creating a graphical CWL pipeline, drag connectors to link tools to input and output files in the canvas. Required tool inputs are indicated by a yellow connector.
Select Save.

Pipelines use the latest tool definition when the pipeline was last saved. Tool changes do not automatically propagate to the pipeline. In order to update the pipeline with the latest tool changes, edit the pipeline definition by removing the tool and re-adding it back to the pipeline.

Individual Pipeline files are limited to 20 Megabytes. If you need to add more than this, split your content over multiple files.

You can edit pipelines while they are in Draft or Release Candidate status. Once released, pipelines can no longer be edited.

Status

Description

Status

Description

Status

Description

Pipeline Properties

The following sections describe the tool properties that can be configured in each tab of the pipeline editor.

Graphical vs Code definition

Depending on how you design the pipeline, the displayed tabs differ between the graphical and code definitions. For CWL you have a choice on how to define the pipeline, Nextflow is always defined in code mode.

Any additional source files related to your pipeline will be displayed here in alphabetical order.

See the following pages for language-specific details for defining pipelines:

Nextflow
CWL

Details

The details tab provides options for configuring basic information about the pipeline.

Field

Entry

Code

The name of the pipeline.

Nextflow Version

User selectable Nextflow version available only for Nextflow pipelines

Documentation

The Documentation tab provides options for configuring the HTML description for the tool. The description appears in the tool repository but is excluded from exported CWL definitions. If no documentation has been provided, this tab will be empty.

Definition (Graphical)

When using graphical mode for the pipeline definition, the Definition tab provides options for configuring the pipeline using a visualization panel and a list of component menus.

Description

Machine profiles

Compute types available to use with Tools in the pipeline.

Shared settings

Settings for pipelines used in more than one tool.

Reference files

Descriptions of reference files used in the pipeline.

Input files

Descriptions of input files used in the pipeline.

Output files

Descriptions of output files used in the pipeline.

Tool

Details about the tool selected in the visualization panel.

Tool repository

A list of tools available to be used in the pipeline.

In graphical mode, you can drag and drop inputs into the visualization panel to connect them to the tools. Make sure to connect the input icons to the tool before editing the input details in the component menu. Required tool inputs are indicated by a yellow connector.

XML Configuration / JSON Inputform Files (code)

This page is used to specify all relevant information about the pipeline parameters.

Analysis Report (Graphical)

The Analysis Report tab provides options for configuring pipeline execution reports. The report is composed of widgets added to the tab.

Configure Pipeline Analysis Report (Graphical CWL Only)

The pipeline analysis report appears in the pipeline execution results. The report is configured from widgets added to the Analysis Report tab in the pipeline editor.

[Optional] Import widgets from another pipeline.
1. Select Import from other pipeline.
2. Select the pipeline that contains the report you want to copy.
3. Select an import option: Replace current report or Append to current report.
4. Select Import.
From the Analysis Report tab, select Add widget, and then select a widget type.
Configure widget details.
Widget
Settings
Title
Add and format title text.
Analysis details
Add heading text and select the analysis metadata details to display.
Free text
Add formatted free text. The widget includes options for placeholder variables that display the corresponding project values.
Inline viewer
Add options to view the content of an analysis output file.
Analysis comments
Add comments that can be edited after an analysis has been performed.
Input details
Add heading text and select the input details to display. The widget includes an option to group details by input name.
Project details
Add heading text and select the project details to display.
Page break
Add a page break widget where page breaks should appear between report sections.
Select Save.

Free Text Placeholders

Placeholder

Description

[[BB_PROJECT_NAME]]

The project name.

[[BB_PROJECT_OWNER]]

The project owner.

[[BB_PROJECT_DESCRIPTION]]

The project short description.

[[BB_PROJECT_INFORMATION]]

The project information.

[[BB_PROJECT_LOCATION]]

The project location.

[[BB_PROJECT_BILLING_MODE]]

The project billing mode.

[[BB_PROJECT_DATA_SHARING]]

The project data sharing settings.

[[BB_REFERENCE]]

The analysis reference.

[[BB_USERREFERENCE]]

The user analysis reference.

[[BB_PIPELINE]]

The name of the pipeline.

[[BB_USER_OPTIONS]]

The analysis user options.

[[BB_TECH_OPTIONS]]

The analysis technical options. Technical options include the TECH suffix and are not visible to end users.

[[BB_ALL_OPTIONS]]

All analysis options. Technical options include the TECH suffix and are not visible to end users.

[[BB_SAMPLE]]

The sample.

[[BB_REQUEST_DATE]]

The analysis request date.

[[BB_START_DATE]]

The analysis start date.

[[BB_DURATION]]

The analysis duration.

[[BB_REQUESTOR]]

The user requesting analysis execution.

[[BB_RUNSTATUS]]

The status of the analysis.

[[BB_ENTITLEMENTDETAIL]]

The used entitlement detail.

[[BB_METADATA:path]]

The value or list of values of a metadata field or multi-value fields.

Metadata Model

See Metadata Models

Nextflow Files (code)

Main.nf (code)

The Nextflow project main script.

Nextflow.config (code)

The Nextflow configuration settings.

CWL files (code)

Workflow.cwl (code)

The Common Workflow Language main script.

+ Create (code)

Multiple files can be added to make pipelines more modular and manageable.

Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list. The following formats are supported:

DIFF (.diff)
GROOVY (.groovy .nf)
JAVASCRIPT (.js .javascript)
JSON (.json)
SH (.sh)
SQL (.sql)
TXT (.txt)
XML (.xml)
YAML (.yaml .cwl)

Compute Nodes

For each process defined by the workflow, ICA will launch a compute node to execute the process.

For each compute type, the standard (default - AWS on-demand) or economy (AWS spot instance) tiers can be selected.
When selecting an fpga instance type for running analyses on ICA, it is recommended to use the medium size. While the large size offers slight performance benefits, these do not proportionately justify the associated cost increase for most use cases.
When no type is specified, the default type of compute node is standard-small.

By default, compute nodes have no scratch space. This is an advanced setting and should only be used when absolutely necessary as it will incur additional costs and may offer only limited performance benefits because it is not local to the compute node.

For simplicity and better integration, consider using shared storage available at /ces. It is what is provided in the Small/Medium/Large+ compute types. This shared storage is used when writing files with relative paths.

Scratch space

If you do require scratch space via a Nextflow pod annotation or a CWL resource requirement, the path is /scratch.

For Nextflow pod annotation: 'volumes.illumina.com/scratchSize', value: '1TiB' will reserve 1 TiB.
For CWL, adding - class: ResourceRequirement tmpdirMin: 5000 to your requirements section will reserve 5GiB for CWL.

Avoid the following as it does not align with ICAv2 scratch space configuration.

Container overlay tmp path: /tmp
Legacy paths: /ephemeral
Environment Variables ($TMPDIR, $TEMP and $TMP)
Bash Command mktemp
CWL runtime.tmpdir

Compute Types

Daemon sets and system processes consume approximately 1 CPU and 2 GB Memory from the base values shown in the table. Consumption will vary based on the activity of the pod.

Compute Type

CPUs

Mem (GB)

Nextflow (pod.value)

CWL (type, size)

standard-small

standard, small

standard-medium

standard, medium

standard-large

standard, large

standard-xlarge

standard, xlarge

standard-2xlarge

128

standard-2xlarge

standard, 2xlarge

hicpu-small

hicpu, small

hicpu-medium

hicpu, medium

hicpu-large

144

hicpu-large

hicpu, large

himem-small

himem, small

himem-medium

128

himem-medium

himem, medium

himem-large

384

himem-large

himem, large

himem-xlarge (*1)

768

himem-xlarge

himem, xlarge

hiio-small

hiio, small

hiio-medium

hiio, medium

~~fpga-small~~ (*2)

~~122~~

~~fpga-small~~

~~fpga, small~~

fpga-medium

244

fpga-medium

fpga, medium

fpga-large (*3)

976

fpga-large

fpga, large

transfer-small (*4)

transfer-small

transfer, small

transfer-medium (*4)

transfer-medium

transfer, medium

transfer-large (*4)

transfer-large

transfer, large

(*1) The compute type himem-xlarge has low availability.

(*2) The compute type fpga-small is not available. Use 'fpga-medium' instead.

(*3) The compute type fpga-large is only available in the US (use1) region. This compute type is not recommended as it suffers from low availability and offers little performance benefit over fpga-medium at significant additional cost.

(*4) The transfer size selected is based on the selected storage size for compute type and used during upload and download system tasks.

Start a New Analysis

Use the following instructions to start a new analysis for a single pipeline.

Select a project.
From the project menu, select Flow > Pipelines.
Select the pipeline or pipeline details of the pipeline you want to run.
Select Start Analysis.
Configure analysis settings. See Analysis Properties.
Select Start Analysis.
View the analysis status on the Analyses page.
- Requested—The analysis is scheduled to begin.
- In Progress—The analysis is in progress.
- Succeeded—The analysis is complete.
- Failed and Failed Final—The analysis has failed or was aborted.
To end an analysis, select Abort.
To perform a completed analysis again, select Re-run.

Alternatively, you can start a new analysis from Projects > <Your_Project> > Flow > Analyses

Select New Analysis
Select Pipeline
Configure analysis settings. See Analysis Properties.
Select Start Analysis.

Analysis Properties

The following sections describe the analysis properties that can be configured in each tab.

Analysis

The Analysis tab provides options for configuring basic information about the analysis.

Field

Entry

User Reference

The unique analysis name.

User tags

One or more tags used to filter the analysis list. Select from existing tags or type a new tag name in the field.

Entitlement Bundle

Select a subscription to charge the analysis to.

Input Files

Select the input files to use in the analysis. (max. 50,000)

Settings

Provide input settings.

View Analysis Results

You can view analysis results on the Analyses page or in the output_folder on the Data page.

Select a project, and then select the Flow > Analyses page.
Select an analysis.
On the Details tab, select the square symbol right of the output files.
From the output files view, expand the list and select an output file.
1. If you want to add or remove any user or technical tags, you can do so from the data details view.
2. If you want to download the file, select Schedule download.
To preview the file, select the View tab.
Return to Flow > Analyses > your_analysis.
View additional analysis result information on the following tabs:
- Details - View information on the pipeline configuration.
- Steps - stderr and stdout information
- Timeline Report - Nextflow process execution timeline.
- Execution Report - Nextflow analysis report. Showing the run times, commands, resource usage and tasks for Nextflow analyses.

Nextflow

ICA supports running pipelines defined using Nextflow. See this tutorial for an example.

In order to run Nextflow pipelines, the following process-level attributes within the Nextflow definition must be considered.

System Information

Info

Details

Nextflow version

20.10.0, 22.04.3 (default)

Executor

Kubernetes

Nextflow Version

You can select the Nextflow version while building a pipeline as follows:

interface

GUI

Select the Nextflow version at Projects > your_project > flow > pipelines > your_pipeline > Details tab.

API

Select the Nextflow version by setting it in the optional field "pipelineLanguageVersionId". When not set, a default Nextflow version will be used for the pipeline.

Compute Node

For each compute type, you can choose between the scheduler.illumina.com/lifecycle: standard (default - AWS on-demand) or scheduler.illumina.com/lifecycle: economy (AWS spot instance) tiers.

Compute Type

To specify a compute type for a Nextflow process, use the pod directive within each process. Set the annotation to scheduler.illumina.com/presetSize and the value to the desired compute type. A list of available compute types can be found here. The default compute type, when this directive is not specified, is standard-small (2 CPUs and 8 GB of memory).

pod annotation: 'scheduler.illumina.com/presetSize', value: 'fpga-medium'

Often, there is a need to select the compute size for a process dynamically based on user input and other factors. The Kubernetes executor used on ICA does not use the cpu and memorydirectives, so instead, you can dynamically set the pod directive, as mentioned here. e.g.

process foo {
    // Assuming that params.compute_size is set to a valid size such as 'standard-small', 'standard-medium', etc.
    pod annotation: 'scheduler.illumina.com/presetSize', value: "${params.compute_size}"
}

Additionally, it can also be specified in the configuration file. Example configuration file:

// Set the default pod
pod = [
    annotation: 'scheduler.illumina.com/presetSize',
    value     : 'standard-small'
]

withName: 'big_memory_process' {
    pod = [
        annotation: 'scheduler.illumina.com/presetSize',
        value     : 'himem-large'
    ]
}

// Use an FPGA instance for dragen processes
withLabel: 'dragen' {
    pod = [
        annotation: 'scheduler.illumina.com/presetSize',
        value     : 'fpga-medium'
    ]
}

Inputs

Inputs are specified via the XML input form or JSON-based input form. The specified code in the XML will correspond to the field in the params object that is available in the workflow. Refer to the tutorial for an example.

Outputs

Outputs for Nextflow pipelines are uploaded from the out directory in the attached shared filesystem. The publishDir directive can be used to symlink (recommended), copy or move data to the correct folder. Data will be uploaded to the ICA project after the pipeline execution completes.

publishDir 'out', mode: 'symlink'

For Nextflow version 20.10.10 on ICA, using the "copy" method in the publishDir directive for uploading output files that consume large amounts of storage may cause workflow runs to complete with missing files. The underlying issue is that file uploads may silently fail (without any error messages) during the publishDir process due to insufficient disk space, resulting in incomplete output delivery.

Workarounds:

Use "symlink" instead of "copy" in the publishDir directive. Symlinking creates a link to the original file rather than copying it, which doesn’t consume additional disk space. This can prevent the issue of silent file upload failures due to disk space limitations.
Use the latest version of Nextflow supported (22.04.0) and enable the "failOnError" publishDir option. This option ensures that the workflow will fail and provide an error message if there's an issue with publishing files, rather than completing silently without all expected outputs.

Nextflow Configuration

During execution, the Nextflow pipeline runner determines the environment settings based on values passed via the command-line or via a configuration file (see Nextflow Configuration documentation). When creating a Nextflow pipeline, use the nextflow.config tab in the UI (or API) to specify a nextflow configuration file to be used when launching the pipeline.

Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list.

If no Docker image is specified, Ubuntu will be used as default.

The following configuration settings will be ignored if provided as they are overridden by the system:

executor.name
executor.queueSize
k8s.namespace
k8s.serviceAccount
k8s.launchDir
k8s.projectDir
k8s.workDir
k8s.storageClaimName
k8s.storageMountPath
trace.enabled
trace.file
trace.fields
timeline.enabled
timeline.file
report.enabled
report.file
dag.enabled
dag.file

CWL

ICA supports running pipelines defined using .

Compute Type

To specify a compute type for a CWL CommandLineTool, use the ResourceRequirement with a custom namespace.

requirements:
    ResourceRequirement:
        https://platform.illumina.com/rdf/ica/resources:type: fpga
        https://platform.illumina.com/rdf/ica/resources:size: small 
        https://platform.illumina.com/rdf/ica/resources:tier: standard

Reference for available compute types and sizes.

The ICA Compute Type will be determined automatically based on coresMin/coresMax (CPU) and ramMin/ramMax (Memory) values using a "best fit" strategy to meet the minimum specified requirements (refer to the table).

For example, take the following ResourceRequirements:

requirements:
    ResourceRequirement:
      ramMin: 10240
      coresMin: 6

This would result in a best fit of standard-large ICA Compute Type request for the tool.

If the specified requirements can not be met by any of the presets, the task will be rejected and failed.
FPGA requirements can not be set by means of CWL ResourceRequirements.
The Machine Profile Resource in the graphical editor will override whatever is set for requirements in the ResourceRequirement.

Considerations

If no Docker image is specified, Ubuntu will be used as default. Both : and / can be used as separator.

CWL Overrides

In ICA you can provide the "override" recipes as a part of the input JSON. The following example uses CWL overrides to change the environment variable requirement at load time.

icav2 projectpipelines start cwl cli-tutorial --data-id fil.a725a68301ee4e6ad28908da12510c25 --input-json '{
  "ipFQ": {
    "class": "File",
    "path": "test.fastq"
  },
  "cwltool:overrides": {
  "tool-fqTOfa.cwl": {
    "requirements": {
      "EnvVarRequirement": {
        "envDef": {
          "MESSAGE": "override_value"
          }
        }                                       
       }
      }
    }
}' --type-input JSON --user-reference overrides-example

XML Input Form

Pipelines defined using the "Code" mode require either an XML-based or JSON-based input form to define the fields shown on the launch view in the user interface (UI). The XML-based input form is defined in the "XML Configuration" tab of the pipeline editing view.

The input form XML must adhere to the input form schema.

Empty Form

During the creation of a Nextflow pipeline the user is given an empty form to fill out.

<pipeline code="" version="1.0" xmlns="xsd://www.illumina.com/ica/cp/pipelinedefinition">
    <dataInputs>
    </dataInputs>
    <steps>
    </steps>
</pipeline>

Files

The input files are specified within a single DataInputs node. An individual input is then specified in a separate DataInput node. A DataInput node contains following attributes:

code: an unique id. Required.
format: specifying the format of the input: FASTA, TXT, JSON, UNKNOWN, etc. Multiple entries are possible: example below. Required.
type: is it a FILE or a DIRECTORY? Multiple entries are not allowed. Required.
required: is this input required for the execution of a pipeline? Required.
multiValue: are multiple files as an input allowed? Required.
dataFilter: TBD. Optional.

Additionally, DataInput has two elements: label for labelling the input and description for a free text description of the input.

Single file input

An example of a single file input which can be in a TXT, CSV, or FASTA format.

        <pd:dataInput code="in" format="TXT, CSV, FASTA" type="FILE" required="true" multiValue="false">
            <pd:label>Input file</pd:label>
            <pd:description>Input file can be either in TXT, CSV or FASTA format.</pd:description>
        </pd:dataInput>

Folder as an input

To use a folder as an input the following form is required:

    <pd:dataInput code="fastq_folder" format="UNKNOWN" type="DIRECTORY" required="false" multiValue="false">
         <pd:label>fastq folder path</pd:label>
        <pd:description>Providing Fastq folder</pd:description>
    </pd:dataInput>

Multiple files as an input

For multiple files, set the attribute multiValue to true. This will make it so the variable is considered to be of type list [], so adapt your pipeline when changing from single value to multiValue.

<pd:dataInput code="tumor_fastqs" format="FASTQ" type="FILE" required="false" multiValue="true">
    <pd:label>Tumor FASTQs</pd:label>
    <pd:description>Tumor FASTQ files to be provided as input. FASTQ files must have "_LXXX" in its filename to denote the lane and "_RX" to denote the read number. If either is omitted, lane 1 and read 1 will be used in the FASTQ list. The tool will automatically write a FASTQ list from all files provided and process each sample in batch in tumor-only mode. However, for tumor-normal mode, only one sample each can be provided.
    </pd:description>
</pd:dataInput>

Settings

Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc. The following hierarchy of nodes must be followed: steps > step > tool > parameter. The parameter node must contain following attributes:

code: unique id. This is the parameter name that is passed to the workflow
minValues: how many values (at least) should be specified for this setting. If this setting is required, minValues should be set to 1.
maxValues: how many values (at most) should be specified for this setting
classification: is this setting specified by the user?

In the code below a string setting with the identifier inp1 is specified.

    <pd:steps>
        <pd:step execution="MANDATORY" code="General">
            <pd:label>General</pd:label>
            <pd:description>General parameters</pd:description>
            <pd:tool code="generalparameters">
                <pd:label>generalparameters</pd:label>
                <pd:description></pd:description>
                <pd:parameter code="inp1" minValues="1" maxValues="3" classification="USER">
                    <pd:label>inp1</pd:label>
                    <pd:description>first</pd:description>
                    <pd:stringType/>
                    <pd:value></pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>

Examples of the following types of settings are shown in the subsequent sections. Within each type, the value tag can be used to denote a default value in the UI, or can be left blank to have no default. Note that setting a default value has no impact on analyses launched via the API.

Integers

For an integer setting the following schema with an element integerType is to be used. To define an allowed range use the attributes minimumValue and maximumValue.

<pd:parameter code="ht_seed_len" minValues="0" maxValues="1" classification="USER">
    <pd:label>Seed Length</pd:label>
    <pd:description>Initial length in nucleotides of seeds from the reference genome to populate into the hash table. Consult the DRAGEN manual for recommended lengths. Corresponds to DRAGEN argument --ht-seed-len.
    </pd:description>
    <pd:integerType minimumValue="10" maximumValue="50"/>
    <pd:value>21</pd:value>
</pd:parameter>

Options

Options types can be used to designate options from a drop-down list in the UI. The selected option will be passed to the workflow as a string. This currently has no impact when launching from the API, however.

<pd:parameter code="cnv_segmentation_mode" minValues="0" maxValues="1" classification="USER">
    <pd:label>Segmentation Algorithm</pd:label>
    <pd:description> DRAGEN implements multiple segmentation algorithms, including the following algorithms, Circular Binary Segmentation (CBS) and Shifting Level Models (SLM).
    </pd:description>
    <pd:optionsType>
        <option>CBS</option>
        <option>SLM</option>
        <option>HSLM</option>
        <option>ASLM</option>
    </pd:optionsType>
    <pd:value>false</pd:value>
</pd:parameter>

Option types can also be used to specify a boolean, for example

<pd:parameter code="output_format" minValues="1" maxValues="1" classification="USER">
    <pd:label>Map/Align Output</pd:label>
    <pd:description></pd:description>
    <pd:optionsType>
        <pd:option>BAM</pd:option>
        <pd:option>CRAM</pd:option>
    </pd:optionsType>
    <pd:value>BAM</pd:value>
</pd:parameter>

Strings

For a string setting the following schema with an element stringType is to be used.

<pd:parameter code="output_file_prefix" minValues="1" maxValues="1" classification="USER">
    <pd:label>Output File Prefix</pd:label>
    <pd:description></pd:description>
    <pd:stringType/>
    <pd:value>tumor</pd:value>
</pd:parameter>

Booleans

For a boolean setting, booleanType can be used.

<pd:parameter code="quick_qc" minValues="0" maxValues="1" classification="USER">
    <pd:label>quick_qc</pd:label>
    <pd:description></pd:description>
    <pd:booleanType/>
    <pd:value></pd:value>
</pd:parameter>

Limitations

One known limitation of the schema presented above is the inability to specify a parameter that can be multiple type, e.g. File or String. One way to implement this requirement would be to define two optional parameters: one for File input and the second for String input. At the moment ICA UI doesn't validate whether at least one of these parameters is populated - this check can be done within the pipeline itself.

Below one can find both a main.nf and XML configuration of a generic pipeline with two optional inputs. One can use it as a template to address similar issues. If the file parameter is set, it will be used. If the str parameter is set but file is not, the str parameter will be used. If neither of both is used, the pipeline aborts with an informative error message.

nextflow.enable.dsl = 2

// Define parameters with default values
params.file = false
params.str = false

// Check that at least one of the parameters is specified
if (!params.file && !params.str) {
    error "You must specify at least one input: --file or --str"
}

process printInputs {
    
    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'

    input:
    file(input_file)

    script:
    """
    echo "File contents:"
    cat $input_file
    """
}

process printInputs2 {

    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'

    input:
    val(input_str)

    script:
    """
    echo "String input: $input_str"
    """
}

workflow {
    if (params.file) {
        file_ch = Channel.fromPath(params.file)
        file_ch.view()
        str_ch = Channel.empty()
        printInputs(file_ch)
    }
    else {
        file_ch = Channel.empty()
        str_ch = Channel.of(params.str)
        str_ch.view()
        file_ch.view()
        printInputs2(str_ch)
    } 
}

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
    <pd:dataInputs>
        <pd:dataInput code="file" format="TXT" type="FILE" required="false" multiValue="false">
            <pd:label>in</pd:label>
            <pd:description>Generic file input</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps>
        <pd:step execution="MANDATORY" code="general">
            <pd:label>General Options</pd:label>
            <pd:description locked="false"></pd:description>
            <pd:tool code="general">
                <pd:label locked="false"></pd:label>
                <pd:description locked="false"></pd:description>
                <pd:parameter code="str" minValues="0" maxValues="1" classification="USER">
                    <pd:label>String</pd:label>
                    <pd:description></pd:description>
                    <pd:stringType/>
                    <pd:value>string</pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>
</pd:pipeline>

JSON-Based input forms

Introduction

Pipelines defined using the "Code" mode require an XML or JSON-based input form to define the fields shown on the launch view in the user interface (UI).

To create a JSON-based Nextflow (or CWL) pipeline, go to Projects > your_project > Flow > Pipelines > +Create > Nextflow (or CWL) > JSON-based.

Three files, located on the inputform files tab, work together for evaluating and presenting JSON-based input.

inputForm.json contains the actual input form which is rendered when starting the pipeline run.
onRender.js is triggered when a value is changed.
onSubmit.js is triggered when starting a pipeline via the GUI or API.

Use + Create to add additional files and Simulate to test your inputForms.

Scripting execution supports crossfield validation of the values, hiding fields, making them required, .... based on value changes.

inputForm.json

The json schema allowing you to define the input parameters. See the inputForm.json page for syntax details.

Parameter types

Type

Usage

textbox

Corresponds to stringType in xml.

checkbox

A checkbox that supports the option of being required, so can serve as an active consent feature. (corresponds to the booleanType in xml).

radio

A radio button group to select one from a list of choices. The values to choose from must be unique.

select

A dropdown selection to select one from a list of choices. This can be used for both single-level lists and tree-based lists.

number

The value is of Number type in javascript and Double type in java. (corresponds to doubleType in xml).

integer

Corresponds to java Integer.

data

Data such as files.

section

For splitting up fields, to give structure. Rendered as subtitles. No values are to be assigned to these fields.

text

To display informational messages. No values are to be assigned to these fields.

fieldgroup

Can contain parameters or other groups. Allows to have repeating sets of parameters, for instance when a father|mother|child choice needs to be linked to each file input. So if you want to have the same elements multiple times in your form, combine them into a fieldgroup.

Parameter Attributes

These attributes can be used to configure all parameter types.

Attribute

Purpose

label

The display label for this parameter. Optional but recommended, id will be used if missing.

minValues

The minimal amount of values that needs to be present. Default when not set is 0. Set to >=1 to make the field required.

maxValues

The maximal amount of values that need to be present. Default when not set is 1.

minMaxValuesMessage

The error message displayed when minValues or maxValues is not adhered to. When not set, a default message is generated.

helpText

A helper text about the parameter. Will be displayed in smaller font with the parameter.

placeHolderText

An optional short hint ( a word or short phrase) to aid the user when the field has no value.

value

The value of the parameter. Can be considered default value.

minLength

Only applied on type="textbox". Value is a positive integer.

maxLength

Only applied on type="textbox". Value is a positive integer.

min

Minimal allowed value for 'integer' and 'number' type.

for 'integer' type fields the minimal and maximal values are -100000000000000000 and 100000000000000000.
for 'number' type fields the max precision is 15 significant digits and the exponent needs to be between -300 and +300.

max

Maximal allowed value for 'integer' and 'number' type.

for 'integer' type fields the minimal and maximal values are -100000000000000000 and 100000000000000000.
for 'number' type fields the max precision is 15 significant digits and the exponent needs to be between -300 and +300.

choices

A list of choices with for each a "value", "text" (is label), "selected" (only 1 true supported), "disabled". "parent" can be used to build hierarchical choicetrees. "availableWhen" can be used for conditional presence of the choice based on values of other fields. Parent and value must be unique, you can not use the same value for both.

fields

The list of sub fields for type fieldgroup.

dataFilter

For defining the filtering when type is 'data'. nameFilter, dataFormat and dataType are additional properties.

regex

The regex pattern the value must adhere to. Only applied on type="textbox".

regexErrorMessage

The optional error message when the value does not adhere to the "regex". A default message will be used if this parameter is not present. It is highly recommended to set this as the default message will show the regex which is typically very technical.

hidden

Makes this parameter hidden. Can be made visible later in onRender.js or can be used to set hardcoded values of which the user should be aware.

disabled

Shows the parameter but makes editing it impossible. The value can still be altered by onRender.js.

emptyValuesAllowed

When maxValues is 1 or not set and emptyValuesAllowed is true, the values may contain null entries. Default is false.

updateRenderOnChange

When true, the onRender javascript function is triggered each time the user changes the value of this field. Default is false.

Tree structure example

"choices" can be used for a single list or for a tree-structured list. See below for an example for how to set up a tree structure.

{
  "fields": [
    {
      "id": "myTreeList",
      "type": "select",
      "label": "Selection Tree Example",
      "choices": [
        {
          "text": "trunk",
          "value": "treetrunk"
        },
        {
          "text": "branch",
          "value": "treebranch",
          "parent":"treetrunk"
        },
        {
          "text": "leaf",
          "value": "treeleaf",
          "parent":"treebranch"
        },
        {
          "text": "bird",
          "value": "happybird",
          "parent":"treebranch"
        },
        {
          "text": "cat",
          "value": "cat",
          "parent": "treetrunk",
          "disabled": true
        }
      ],
      "minValues": 1,
      "maxValues": 3,
      "helpText": "This is a tree example"
    }
  ]
}

onSubmit.js

The onSubmit.js javascript function receives an input object which holds information about the chosen values of the input form and the pipeline and pipeline execution request parameters. This javascript function is not only triggered when submitting a new pipeline execution request in the user interface, but also when submitting one through the rest API..

Input parameters

Value

Meaning

settings

The value of the setting fields. Corresponds to settingValues in the onRender.js. This is a map with field id as key and an array of field values as value. For convenience, values of single-value fields are present as the individual value and not as an array of length 1. In case of fieldGroups, the value can be multiple levels of arrays.

settingValues

To maximize the opportunity for reusing code between onRender and onSubmit, the 'settings' are also exposed as settingValues like in the onRender input.

pipeline

Info about the pipeline: code, tenant, version, and description are all available in the pipeline object as string.

analysis

Info about this run: userReference, userName, and userTenant are all available in the analysis object as string.

storageSize

The storage size as chosen by the user. This will initially be null. StorageSize is an object containing an 'id' and 'name' property.

storageSizeOptions

The list of storage sizes available to the user when creating an analysis. Is a list of StorageSize objects containing an 'id' and 'name' property.

Return values (taken from the response object)

Value

Meaning

settings

The value of the setting fields. This allows modifying the values or applying defaults and such. Or taking info of the pipeline or analysis input object. When settings are not present in the onSubmit return value object, they are assumed to be not modified.

validationErrors

A list of AnalysisError essages representing validation errors. Submitting a pipeline execution request is not possible while there are still validation errors.

AnalysisError

Value

Meaning

fieldId / FieldId

The field which has an erroneous value. When not present, a general error/warning is displayed. To display an error on the storage size, use the storageSizeFieldid.

index / Index

The 0-starting index of the value which is incorrect. Use this when a particular value of a multivalue field is not correct. When not present, the entire field is marked as erroneous. The value can also be an array of indexes for use with fieldgroups. For instance, when the 3rd field of the 2nd instance of a fieldgroup is erroneous, a value of [ 1 , 2 ] is used.

message / Message

The error/warning message to display.

onRender.js

Receives an input object which contains information about the current state of the input form, the chosen values and the field value change that triggered the onrender call. It also contains pipeline information. Changed objects are present in the onRender return value object. Any object not present is considered to be unmodified. Changing the storage size in the start analysis screen triggers an onRender execution with storageSize as changed field.

Input Parameters

context

"Initial"/"FieldChanged"/"Edited".

Initial is the value when first displaying the form when a user opens the start run screen.
The value is FieldChanged when a field with 'updateRenderOnChange'=true is changed by the user.
Edited (Not yet supported in ICA) is used when a form is displayed later again, this is intended for draft runs or when editing the form during reruns.

changedFieldId

The id of the field that changed and which triggered this onRender call. context will be FieldChanged. When the storage size is changed, the fieldId will be storageSize.

analysisSettings

The input form json as saved in the pipeline. This is the original json, without changes.

currentAnalysisSettings

The current input form json as rendered to the user. This can contain already applied changes form earlier onRender passes. Null in the first call, when context is Initial.

settingValues

The current value of all settings fields. This is a map with field id as key and an array of field values as value for multivalue fields. For convenience, values of single-value fields are present as the individual value and not as an array of length 1. In case of fieldGroups, the value can be multiple levels of arrays.

pipeline

Information about the pipeline: code, tenant, version, and description are all available in the pipeline object as string.

analysis

Information about this run: userReference, userName, and userTenant are all available in the analysis object as string.

storageSize

The storage size as chosen by the user. This will initially be null. StorageSize is an object containing an 'id' and 'name' property.

storageSizeOptions

The list of storage sizes available to the user when creating an analysis. Is a list of StorageSize objects containing an 'id' and 'name' property.

Return values (taken from the response object)

Value

Meaning

analysisSettings

The input form json with potential applied changes. The discovered changes will be applied in the UI.

settingValues

The current, potentially altered map of all setting values. These will be updated in the UI.

validationErrors

A list of RenderMessages representing validation errors. Submitting a pipeline execution request is not possible while there are still validation errors.

validationWarnings

A list of RenderMessages representing validation warnings. A user may choose to ignore these validation warnings and start the pipeline execution request.

storageSize

The suitable value for storageSize. Must be one of the options of input.storageSizeOptions. When absent or null, it is ignored.

validation errors and validation warnings can use 'storageSize' as fieldId to let an error appear on the storage size field. 'storageSize' is the value of the changedFieldId when the user alters the chosen storage size.

RenderMessage

This is the object used for representing validation errors and warnings. The attributes can be used with first letter lowercase (consistent with the input object attributes) or uppercase.

Value

Meaning

fieldId / FieldId

The field which has an erroneous value. When not present, a general error/warning is displayed. To display an error on the storage size, use the storageSizeFieldid.

index / Index

message / Message

The error/warning message to display.

InputForm.json Syntax

{
  "$id": "#ica-pipeline-input-form",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "ICA Pipeline Input Forms",
  "description": "Describes the syntax for defining input setting forms for ICA pipelines",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "fields": {
      "description": "The list of setting fields",
      "type": "array",
      "items": {
        "$ref": "#/definitions/ica_pipeline_input_form_field"
      }
    }
  },
  "required": [
    "fields"
  ],
  "definitions": {
    "ica_pipeline_input_form_field": {
      "$id": "#ica_pipeline_input_form_field",
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "id": {
          "description": "The unique identifier for this field. Will be available with this key to the pipeline script.",
          "type": "string",
          "pattern": "^[a-zA-Z-0-9\\-_\\.\\s\\+\\[\\]]+$"
        },
        "type": {
          "type": "string",
          "enum": [
            "textbox",
            "checkbox",
            "radio",
            "select",
            "number",
            "integer",
            "data",
            "section",
            "text",
            "fieldgroup"
          ]
        },
        "label": {
          "type": "string"
        },
        "minValues": {
          "description": "The minimal amount of values that needs to be present. Default is 0 when not provided. Set to >=1 to make the field required.",
          "type": "integer",
          "minimum": 0
        },
        "maxValues": {
          "description": "The maximal amount of values that needs to be present. Default is 1 when not provided.",
          "type": "integer",
          "exclusiveMinimum": 0
        },
        "minMaxValuesMessage": {
          "description": "The error message displayed when minValues or maxValues is not adhered to. When not provided a default message is generated.",
          "type": "string"
        },
        "helpText": {
          "type": "string"
        },
        "placeHolderText": {
          "description": "An optional short hint (a word or short phrase) to aid the user when the field has no value."
          "type": "string"
        },
        "value": {},
        "minLength": {
          "type": "integer",
          "minimum": 0
        },
        "maxLength": {
          "type": "integer",
          "exclusiveMinimum": 0
        },
        "min": {
          "type": "number"
        },
        "max": {
          "type": "number"
        },
        "choices": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/ica_pipeline_input_form_field_choice"
          }
        },
        "fields": {
          "description": "The list of setting sub fields for type fieldgroup",
          "type": "array",
          "items": {
            "$ref": "#/definitions/ica_pipeline_input_form_field"
          }
        },
        "dataFilter": {
          "description": "For defining the filtering when type is 'data'.",
          "type": "object",
          "additionalProperties": false,
          "properties": {
            "nameFilter": {
              "description": "Optional data filename filter pattern that input files need to adhere to when type is 'data'. Eg parts of the expected filename",
              "type": "string"
            },
            "dataFormat": {
              "description": "Optional dataformat name array that input files need to adhere to when type is 'data'",
              "type": "array",
              "contains": {
                "type": "string"
              }
            },
            "dataType": {
              "description": "Optional data type (file or directory) that input files need to adhere to when type is 'data'",
              "type": "string",
              "enum": [
                "file",
                "directory"
              ]
            }
          }
        },
        "regex": {
          "type": "string"
        },
        "regexErrorMessage": {
          "type": "string"
        },
        "hidden": {
          "type": "boolean"
        },
        "disabled": {
          "type": "boolean"
        },
        "emptyValuesAllowed": {
          "type": "boolean",
          "description": "When maxValues is greater than 1 and emptyValuesAllowed is true, the values may contain null entries. Default is false."
        },
        "updateRenderOnChange": {
          "type": "boolean",
          "description": "When true, the onRender javascript function is triggered ech time the user changes the value of this field. Default is false."
        }
      },
      "required": [
        "id",
        "type"
      ],
      "allOf": [
        {
          "if": {
            "description": "When type is 'textbox' then 'dataFilter', 'fields', 'choices', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "textbox"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "choices",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'checkbox' then 'dataFilter', 'fields', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "checkbox"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "choices",
                  "placeHolderText",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'radio' then 'dataFilter', 'fields', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "radio"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "placeHolderText",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'select' then 'dataFilter', 'fields', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "select"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'number' or 'integer' then 'dataFilter', 'fields', 'choices', 'regex', 'regexErrorMessage', 'maxLength' and 'minLength' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "number",
                  "integer"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "choices",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'data' then 'dataFilter' is required and 'fields', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "data"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "required": [
              "dataFilter"
            ],
            "propertyNames": {
              "not": {
                "enum": [
                  "fields",
                  "choices",
                  "placeHolderText",
                  "regex",
                  "regexErrorMessage",
                  "max",
                  "min",
                  "maxLength",
                  "minLength"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'section' or 'text' then 'disabled', 'fields', 'updateRenderOnChange', 'classification', 'value', 'minValues', 'maxValues', 'minMaxValuesMessage', 'dataFilter', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "section",
                  "text"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "disabled",
                  "fields",
                  "updateRenderOnChange",
                  "classification",
                  "value",
                  "minValues",
                  "maxValues",
                  "minMaxValuesMessage",
                  "dataFilter",
                  "choices",
                  "regex",
                  "placeHolderText",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'fieldgroup' then 'fields' is required and then 'dataFilter', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "fieldgroup"
                ]
              }
            },
            "required": [
              "type",
              "fields"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "choices",
                  "placeHolderText",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        }
      ]
    },
    "ica_pipeline_input_form_field_choice": {
      "$id": "#ica_pipeline_input_form_field_choice",
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "value": {},
        "text": {
          "type": "string"
        },
        "selected": {
          "type": "boolean"
        },
        "disabled": {
          "type": "boolean"
        },
        "parent": {
          "description": "Value of the parent choice item. Can be used to build hierarchical choice trees."
        }
      },
      "required": [
        "value",
        "text"
      ]
    }
  }
}

JSON Scatter Gather Pipeline

Let's create the with a JSON input form.

Pay close attention to uppercase and lowercase characters when creating pipelines.

Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create > Nextflow > JSON based button to start creating a Nextflow pipeline.

In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.

Nextflow files

split.nf

First, we present the individual processes. Select Nextflow files > + Create and label the file split.nf. Copy and paste the following definition.

process split {
    cpus 1
    memory '512 MB'
    
    input:
    path x
    
    output:
    path("split.*.tsv")
    
    """
    split -a10 -d -l3 --numeric-suffixes=1 --additional-suffix .tsv ${x} split.
    """
    }

sort.nf

Next, select +Create and name the file sort.nf. Copy and paste the following definition.

process sort {
    cpus 1
    memory '512 MB'
    
    input:
    path x
    
    output:
    path '*.sorted.tsv'
    
    """
    sort -gk1,1 $x > ${x.baseName}.sorted.tsv
    """
}

merge.nf

Select +Create again and label the file merge.nf. Copy and paste the following definition.

process merge {
  cpus 1
  memory '512 MB'
 
  publishDir 'out', mode: 'move'
 
  input:
  path x
 
  output:
  path 'merged.tsv'
 
  """
  cat $x > merged.tsv
  """
}

main.nf

Edit the main.nf file by navigating to the Nextflow files > main.nf tab and copying and pasting the following definition.

nextflow.enable.dsl=2
 
include { sort } from './sort.nf'
include { split } from './split.nf'
include { merge } from './merge.nf'
 
params.myinput = "test.test"
 
workflow {
    input_ch = Channel.fromPath(params.myinput)
    split(input_ch)
    sort(split.out.flatten())
    merge(sort.out.collect())
}

Here, the operators flatten and collect are used to transform the emitting channels. The Flatten operator transforms a channel in such a way that every item of type Collection or Array is flattened so that each single entry is emitted separately by the resulting channel. The collect operator collects all the items emitted by a channel to a List and return the resulting object as a sole emission.

Inputform files

On the Inputform files tab, edit the inputForm.json to allow selection of a file.

inputForm.json

{
  "fields": [
    {
      "id": "myinput",
      "label": "myinput",
      "type": "data",
      "dataFilter": {
        "dataType": "file",
        "dataFormat": ["TSV"]
      },
      "maxValues": 1,
      "minValues": 1
    }
  ]
}

Click the Simulate button (at the bottom of the text editor) to preview the launch form fields.

The onSubmit.js and onRender.js can remain with their default scripts and are just shown here for reference.

onSubmit.js

function onSubmit(input) {
    var validationErrors = [];

    return {
        'settings': input.settings,
        'validationErrors': validationErrors
    };
}

onRender.js

function onRender(input) {

    var validationErrors = [];
    var validationWarnings = [];

    if (input.currentAnalysisSettings === null) {
        //null first time, to use it in the remainder of he javascript
        input.currentAnalysisSettings = input.analysisSettings;
    }

    switch(input.context) {
        case 'Initial': {
            renderInitial(input, validationErrors, validationWarnings);
            break;
        }
        case 'FieldChanged': {
            renderFieldChanged(input, validationErrors, validationWarnings);
            break;
        }
        case 'Edited': {
            renderEdited(input, validationErrors, validationWarnings);
            break;
        }
        default:
            return {};
    }

    return {
        'analysisSettings': input.currentAnalysisSettings,
        'settingValues': input.settingValues,
        'validationErrors': validationErrors,
        'validationWarnings': validationWarnings
    };
}

function renderInitial(input, validationErrors, validationWarnings) {
}

function renderEdited(input, validationErrors, validationWarnings) {
}

function renderFieldChanged(input, validationErrors, validationWarnings) {
}

function findField(input, fieldId){
    var fields = input.currentAnalysisSettings['fields'];
    for (var i = 0; i < fields.length; i++){
        if (fields[i].id === fieldId) {
            return fields[i];
        }
    }
    return null;
}

Click the Save button to save the changes.

Tips and Tricks

Developing on the cloud incurs inherent runtime costs due to compute and storage used to execute workflows. Here are a few tips that can facilitate development.

Leverage the cross-platform nature of these workflow languages. Both CWL and Nextflow can be run locally in addition to on ICA. When possible, testing should be performed locally before attempting to run in the cloud. For Nextflow, can be utilized to specify settings to be used either locally or on ICA. An example of advanced usage of a config would be applying the to a set of process names (or labels) so that they use the higher performance local scratch storage attached to an instance instead of the shared network disk,
```
withName: 'process1|process2|process3' { scratch = '/scratch/' }
withName: 'process3' { stageInMode = 'copy' } // Copy the input files to scratch instead of symlinking to shared network disk
```
When trying to test on the cloud, it's oftentimes beneficial to create scripts to automate the deployment and launching / monitoring process. This can be performed either using the or by creating your own scripts integrating with the REST API.
For scenarios in which instances are terminated prematurely (for example, while using spot instances) without warning, you can implement scripts like the following to retry the job a certain number of times. Adding the following script to 'nextflow.config' enables five retries for each job, with increasing delays between each try.
```
process {
    maxRetries = 4
    errorStrategy = { sleep(task.attempt * 60000 as long); return'retry'} // Retry with increasing delay
}
```
Note: Adding the retry script where it is not needed might introduce additional delays.
When hardening a Nextflow to handle resource shortages (for example exit code 2147483647), an immediate retry will in most circumstances fail because the resources have not yet been made available. It is best practice to use which has an increasing backoff delay, allowing the system time to provide the necessary resources.
When publishing your Nextflow pipeline, make sure your have defined a container such as 'public.ecr.aws/lts/ubuntu:22.04' and are not using the default container 'ubuntu:latest'.
To limit potential costs, there is a timeout of 96 hours: if the analysis does not complete within four days, it will go to a 'Failed' state. This time begins to count as soon as the input data is being downloaded. This takes place during the ICA 'Requested' step of the analysis, before going to 'In Progress'. In case parallel tasks are executed, running time is counted once. As an example, let's assume the initial period before being picked up for execution is 10 minutes and consists of the request, queueing and initializing. Then, the data download takes 20 minutes. Next, a task runs on a single node for 25 minutes, followed by 10 minutes of queue time. Finally, three tasks execute simultaneously, each of them taking 25, 28, and 30 minutes, respectively. Upon completion, this is followed by uploading the outputs for one minute. The overall analysis time is then 20 + 25 + 10 + 30 (as the longest task out of three) + 1 = 86 minutes:

If there are no available resources or your project priority is low, the time before download commences will be substantially longer.

By default, Nextflow will not generate the trace report. If you want to enable generating the report, add the section below to your userNextflow.config file.

trace.enabled = true
trace.file = '.ica/user/trace-report.txt'
trace.fields = 'task_id,hash,native_id,process,tag,name,status,exit,module,container,cpus,time,disk,memory,attempt,submit,start,complete,duration,realtime,queue,%cpu,%mem,rss,vmem,peak_rss,peak_vmem,rchar,wchar,syscr,syscw,read_bytes,write_bytes,vol_ctxt,inv_ctxt,env,workdir,script,scratch,error_action'

Useful Links

Analyses

An Analysis is the execution of a pipeline.

Starting Analyses

You can start an analysis from both the dedicated analysis screen or from the actual pipeline.

From Analyses

Navigate to Projects > Your_Project > Flow > Analyses.
Select Start.
Select a single Pipeline.
Configure the analysis settings.
Select Start Analysis.
Refresh to see the analysis status. See for more information on statuses.
If for some reason, you want to end the analysis before it can complete, select Projects > Your_Project > Flow > Analyses > Manage > Abort. Refresh to see the status update.

From Pipelines or Pipeline details

Navigate to Projects > <Your_Project> > Flow > Pipelines
Select the pipeline you want to run or open the pipeline details of the pipeline which you want to run.
Select Start Analysis.
Configure analysis settings.
Select Start Analysis.
View the analysis status on the Analyses page. See for more information on statuses.
If for some reason, you want to end the analysis before it can complete, select Manage > Abort on the Analyses page.

Rerunning Analyses

Once an analysis has been executed, you can rerun it with the same settings or choose to modify the parameters when rerunning. Modifying the parameters is possible on a per-analysis basis. When selecting multiple analyses at once, they will be executed with the original parameters. Draft pipelines are subject to updates and thus can result in a different outcome when rerunning. ICA will display a warning message to inform you of this when you try to rerun an analysis based on a draft pipeline.

When rerunning an analysis, the user reference will be the original user reference (up to 231 characters), followed by _rerun_yyyy-MM-dd_HHmmss.

When there is an XML configuration change on a a pipline for which you want to rerun an analysis, ICA will display a warning and not fill out the parameters as it cannot guarantee their validity for the new XML.

Some restrictions apply when trying to rerun an analysis.

To rerun one or more analyses with te same settings:

Navigate to Projects > Your_Project > Flow > Analyses.
In the overview screen, select one or more analyses.
Select Manage > Rerun. The analyses will now be executed with the same parameters as their original run.

To rerun a single analysis with modified parameters:

Navigate to Projects > Your_Project > Flow > Analyses.
In the overview screen, open the details of the analysis you want to rerun by clicking on the analysis user reference.
Select Rerun. (at the top right)
Update the parameters you want to change.
Select Start Analysis The analysis will now be executed with the updated parameters.

Lifecycle

When an analysis is started, the availability of resources may impact the start time of the pipeline or specific steps after execution has started. Analyses are subject to delay when the system is under high load and the availability of resources is limited.

When an analysis is started, ICA runs a verification on the input files to see if they are available. When it encounters files that have not completed their upload or transfer, it will report "Data found for parameter [parameter_name], but status is Partial instead of Available". Wait for the file to be available and restart the analysis.

Analysis steps logs

During the execution of an analysis, logs are produced for each process involved in the analysis lifecyle. In the analysis details view, the Steps tab is used to view the steps in near real time as they're produced in the running processes. A grid layout is used for analyses with more than 50 steps, a tiled view for analyses with 50 steps or less, though you can choose to also use the grid layout for those by means of the tile/grid button on the top right of the analysis log tab.

There are system processes involved in the lifeycle for all analyses (ie. downloading inputs, uploading outputs, etc.) and there are processes which are pipeline-specific, such as processes which execute the pipeline steps. The below table describes the system processes. You can choose to display or hide these system processes with the Show technical steps

Additional log entries will show for the processes which execute the steps defined in the pipeline.

Each process shows as a distinct entry in the steps view with a Queue Date, Start Date, and End Date.

The time between the Start Date and the End Date is used to calculate the duration. The time of the duration is used to calculate the usage-based cost for the analysis. Because this is an active calculation, sorting on this field is not supported.

Each log entry in the Steps view contains a checkbox to view the stdout and stderr log files for the process. Clicking a checkbox adds the log as a tab to the log viewer where the log text is displayed and made available for download.

Log Files

In the analysis output folder, the ica_logs subfolder will contain the stdout and stderr files.

If you delete these files, no log information will be available on the analysis details > Steps tab.

Log Streaming

Logs can also be streamed using websocket client tooling. The API to retrieve analysis step details returns websocket URLs for each step to stream the logs from stdout/stderr during the step's execution. Upon completion, the websocket URL is no longer available.

Analysis Output Mappings

Currently, this feature is only availabe when launching analyses via API.

Currently, only FOLDER type output mappings are supported

By default, analysis outputs are directed to a new folder within the project where the analysis is launched. Analysis output mappings may be specified to redirect outputs to user-specified locations consisting of project and path. An output mapping consists of:

the source path on the local disk of the analysis execution environment, relative to the working directory
the data type, either FILE or FOLDER
the target project ID to direct outputs to; analysis launcher must have contributor access to the project
the target path relative to the root of the project data to write the outputs

If the output directory already exists, any existing contents with the same filenames as those output from the pipeline will be overwritten by the new analysis

Example

In this example, 2 analysis output mappings are specified. The analysis writes data during execution in the working directory at paths out/test and out/test2. The data contained in these folders are directed to project with ID 4d350d0f-88d8-4640-886d-5b8a23de7d81 and at paths /output-testing-01/ and /output-testing-02/ respectively, relative to the root of the project data.

The following demonstrates the construction of the request body to start an analysis with the output mappings described above:

```json
{
...
    "analysisOutput":
    [
        {
            "sourcePath": "out/test1",
            "type": "FOLDER",
            "targetProjectId": "4d350d0f-88d8-4640-886d-5b8a23de7d81",
            "targetPath": "/output-testing-01/"
        },
        {
            "sourcePath": "out/test2",
            "type": "FOLDER",
            "targetProjectId": "4d350d0f-88d8-4640-886d-5b8a23de7d81",
            "targetPath": "/output-testing-02/"
        }
    ]
}
```

When the analysis completes the outputs can be seen in the ICA UI, within the folders designated in the payload JSON during pipeline launch (output-testing-01 and output-testing-02).

Hyperlinking

If you want to share a link to an analysis, you can copy and paste the URL from your browser when you have the analysis open. The syntax of the analysis link will be <hostURL>/ica/link/project/<projectUUID>/analysis/<analysisUUID>. Likewise, workflow sessions will use the syntax <hostURL>/ica/link/project/<projectUUID>/workflowSession/<workflowsessionUUID>. To prevent third parties from accessing data via the link when it is shared or forwarded, ICA will verify the access rights of every user when they open the link.

Restrictions

Input for analysis is limited to a total of 50,000 files (including multiple copies of the same file). You can have up to 50 concurrent analyses running per tenant. Additional analyses will be queued and scheduled when currently running analyses complete and free up positions.

Troubleshooting

When your analysis fails, open the analysis details view (Projects > your_project> Flow > Analyses > your_analysis) and select display failed steps. This will give you the steps view filtered on those steps that had non-0 exit codes. If there is only one failed step which has logfiles, the stderr of that step will be displayed.

Exit code 55 indicates analysis failure due to an external event such as spot termination or node draining. Retry the analysis.

Pipelines

A Pipeline is a series of Tools with connected inputs and outputs configured to execute in a specific order.

Create a Pipeline

Pipelines are created and stored within projects.

Navigate to Projects > your_project > Flow > Pipelines > +Create.
Select CWL Graphical, CWL code (XML / JSON) or Nextflow (XML / JSON) to create a new Pipeline.
Configure pipeline settings in the pipeline property tabs.
When creating a graphical CWL pipeline, drag connectors to link tools to input and output files in the canvas. Required tool inputs are indicated by a yellow connector.
Select Save.

Individual Pipeline files are limited to 20 Megabytes. If you need to add more than this, split your content over multiple files.

You can edit pipelines while they are in Draft or Release Candidate status. Once released, pipelines can no longer be edited.

Status

Description

Status

Description

Status

Description

Pipeline Properties

The following sections describe the tool properties that can be configured in each tab of the pipeline editor.

Graphical vs Code definition

Any additional source files related to your pipeline will be displayed here in alphabetical order.

See the following pages for language-specific details for defining pipelines:

Nextflow
CWL

Details

The details tab provides options for configuring basic information about the pipeline.

Field

Entry

Code

The name of the pipeline.

Nextflow Version

User selectable Nextflow version available only for Nextflow pipelines

Documentation

Definition (Graphical)

When using graphical mode for the pipeline definition, the Definition tab provides options for configuring the pipeline using a visualization panel and a list of component menus.

Description

Machine profiles

Compute types available to use with Tools in the pipeline.

Shared settings

Settings for pipelines used in more than one tool.

Reference files

Descriptions of reference files used in the pipeline.

Input files

Descriptions of input files used in the pipeline.

Output files

Descriptions of output files used in the pipeline.

Tool

Details about the tool selected in the visualization panel.

Tool repository

A list of tools available to be used in the pipeline.

XML Configuration / JSON Inputform Files (code)

This page is used to specify all relevant information about the pipeline parameters.

Analysis Report (Graphical)

The Analysis Report tab provides options for configuring pipeline execution reports. The report is composed of widgets added to the tab.

Configure Pipeline Analysis Report (Graphical CWL Only)

The pipeline analysis report appears in the pipeline execution results. The report is configured from widgets added to the Analysis Report tab in the pipeline editor.

[Optional] Import widgets from another pipeline.
1. Select Import from other pipeline.
2. Select the pipeline that contains the report you want to copy.
3. Select an import option: Replace current report or Append to current report.
4. Select Import.
From the Analysis Report tab, select Add widget, and then select a widget type.
Configure widget details.
Widget
Settings
Title
Add and format title text.
Analysis details
Add heading text and select the analysis metadata details to display.
Free text
Add formatted free text. The widget includes options for placeholder variables that display the corresponding project values.
Inline viewer
Add options to view the content of an analysis output file.
Analysis comments
Add comments that can be edited after an analysis has been performed.
Input details
Add heading text and select the input details to display. The widget includes an option to group details by input name.
Project details
Add heading text and select the project details to display.
Page break
Add a page break widget where page breaks should appear between report sections.
Select Save.

Free Text Placeholders

Placeholder

Description

[[BB_PROJECT_NAME]]

The project name.

[[BB_PROJECT_OWNER]]

The project owner.

[[BB_PROJECT_DESCRIPTION]]

The project short description.

[[BB_PROJECT_INFORMATION]]

The project information.

[[BB_PROJECT_LOCATION]]

The project location.

[[BB_PROJECT_BILLING_MODE]]

The project billing mode.

[[BB_PROJECT_DATA_SHARING]]

The project data sharing settings.

[[BB_REFERENCE]]

The analysis reference.

[[BB_USERREFERENCE]]

The user analysis reference.

[[BB_PIPELINE]]

The name of the pipeline.

[[BB_USER_OPTIONS]]

The analysis user options.

[[BB_TECH_OPTIONS]]

The analysis technical options. Technical options include the TECH suffix and are not visible to end users.

[[BB_ALL_OPTIONS]]

All analysis options. Technical options include the TECH suffix and are not visible to end users.

[[BB_SAMPLE]]

The sample.

[[BB_REQUEST_DATE]]

The analysis request date.

[[BB_START_DATE]]

The analysis start date.

[[BB_DURATION]]

The analysis duration.

[[BB_REQUESTOR]]

The user requesting analysis execution.

[[BB_RUNSTATUS]]

The status of the analysis.

[[BB_ENTITLEMENTDETAIL]]

The used entitlement detail.

[[BB_METADATA:path]]

The value or list of values of a metadata field or multi-value fields.

Metadata Model

See Metadata Models

Nextflow Files (code)

Main.nf (code)

The Nextflow project main script.

Nextflow.config (code)

The Nextflow configuration settings.

CWL files (code)

Workflow.cwl (code)

The Common Workflow Language main script.

+ Create (code)

Multiple files can be added to make pipelines more modular and manageable.

Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list. The following formats are supported:

DIFF (.diff)
GROOVY (.groovy .nf)
JAVASCRIPT (.js .javascript)
JSON (.json)
SH (.sh)
SQL (.sql)
TXT (.txt)
XML (.xml)
YAML (.yaml .cwl)

Compute Nodes

For each process defined by the workflow, ICA will launch a compute node to execute the process.

For each compute type, the standard (default - AWS on-demand) or economy (AWS spot instance) tiers can be selected.
When selecting an fpga instance type for running analyses on ICA, it is recommended to use the medium size. While the large size offers slight performance benefits, these do not proportionately justify the associated cost increase for most use cases.
When no type is specified, the default type of compute node is standard-small.

Scratch space

If you do require scratch space via a Nextflow pod annotation or a CWL resource requirement, the path is /scratch.

For Nextflow pod annotation: 'volumes.illumina.com/scratchSize', value: '1TiB' will reserve 1 TiB.
For CWL, adding - class: ResourceRequirement tmpdirMin: 5000 to your requirements section will reserve 5GiB for CWL.

Avoid the following as it does not align with ICAv2 scratch space configuration.

Container overlay tmp path: /tmp
Legacy paths: /ephemeral
Environment Variables ($TMPDIR, $TEMP and $TMP)
Bash Command mktemp
CWL runtime.tmpdir

Compute Types

Daemon sets and system processes consume approximately 1 CPU and 2 GB Memory from the base values shown in the table. Consumption will vary based on the activity of the pod.

Compute Type

CPUs

Mem (GB)

Nextflow (pod.value)

CWL (type, size)

standard-small

standard, small

standard-medium

standard, medium

standard-large

standard, large

standard-xlarge

standard, xlarge

standard-2xlarge

128

standard-2xlarge

standard, 2xlarge

hicpu-small

hicpu, small

hicpu-medium

hicpu, medium

hicpu-large

144

hicpu-large

hicpu, large

himem-small

himem, small

himem-medium

128

himem-medium

himem, medium

himem-large

384

himem-large

himem, large

himem-xlarge (*1)

768

himem-xlarge

himem, xlarge

hiio-small

hiio, small

hiio-medium

hiio, medium

~~fpga-small~~ (*2)

~~122~~

~~fpga-small~~

~~fpga, small~~

fpga-medium

244

fpga-medium

fpga, medium

fpga-large (*3)

976

fpga-large

fpga, large

transfer-small (*4)

transfer-small

transfer, small

transfer-medium (*4)

transfer-medium

transfer, medium

transfer-large (*4)

transfer-large

transfer, large

(*1) The compute type himem-xlarge has low availability.

(*2) The compute type fpga-small is not available. Use 'fpga-medium' instead.

(*4) The transfer size selected is based on the selected storage size for compute type and used during upload and download system tasks.

Start a New Analysis

Use the following instructions to start a new analysis for a single pipeline.

Select a project.
From the project menu, select Flow > Pipelines.
Select the pipeline or pipeline details of the pipeline you want to run.
Select Start Analysis.
Configure analysis settings. See Analysis Properties.
Select Start Analysis.
View the analysis status on the Analyses page.
- Requested—The analysis is scheduled to begin.
- In Progress—The analysis is in progress.
- Succeeded—The analysis is complete.
- Failed and Failed Final—The analysis has failed or was aborted.
To end an analysis, select Abort.
To perform a completed analysis again, select Re-run.

Alternatively, you can start a new analysis from Projects > <Your_Project> > Flow > Analyses

Select New Analysis
Select Pipeline
Configure analysis settings. See Analysis Properties.
Select Start Analysis.

Analysis Properties

The following sections describe the analysis properties that can be configured in each tab.

Analysis

The Analysis tab provides options for configuring basic information about the analysis.

Field

Entry

User Reference

The unique analysis name.

User tags

One or more tags used to filter the analysis list. Select from existing tags or type a new tag name in the field.

Entitlement Bundle

Select a subscription to charge the analysis to.

Input Files

Select the input files to use in the analysis. (max. 50,000)

Settings

Provide input settings.

View Analysis Results

You can view analysis results on the Analyses page or in the output_folder on the Data page.

Select a project, and then select the Flow > Analyses page.
Select an analysis.
On the Details tab, select the square symbol right of the output files.
From the output files view, expand the list and select an output file.
1. If you want to add or remove any user or technical tags, you can do so from the data details view.
2. If you want to download the file, select Schedule download.
To preview the file, select the View tab.
Return to Flow > Analyses > your_analysis.
View additional analysis result information on the following tabs:
- Details - View information on the pipeline configuration.
- Steps - stderr and stdout information
- Timeline Report - Nextflow process execution timeline.
- Execution Report - Nextflow analysis report. Showing the run times, commands, resource usage and tasks for Nextflow analyses.