A Pipeline is a series of Tools with connected inputs and outputs configured to execute in a specific order.
Linking a pipeline (Projects > your_project > Flow > Pipelines > Link) adds that pipeline to your project. This is not as a copy, but as the actual pipeline, so any changes to the pipeline are atomatically propagated to and from any project which has this pipeline linked.
You can link a pipeline if it is not already linked to your project and it is from your tenant or available in your bundle or activation code.
If you unlink a pipeline it removes the pipline from your project, but it remains part of the list of pipelines of your tenant, so it can be linked to other projects later on.
Pipelines are created and stored within projects.
Navigate to Projects > your_project > Flow > Pipelines > +Create.
Select Nextflow (XML / JSON) , CWL Graphical or CWL code (XML / JSON) to create a new Pipeline.
Configure pipeline settings in the pipeline property tabs.
When creating a graphical CWL pipeline, drag connectors to link tools to input and output files in the canvas. Required tool inputs are indicated by a yellow connector.
Select Save.
Pipelines use the latest tool definition when the pipeline was last saved. Tool changes do not automatically propagate to the pipeline. In order to update the pipeline with the latest tool changes, edit the pipeline definition by removing the tool and re-adding it back to the pipeline.
Draft
Fully editable draft.
Release Candidate
The pipeline is ready for release. Editing is locked but the pipeline can be cloned (top right in the details view) to create a new version.
Released
The pipeline is released. To release a pipeline, all tools of that pipeline must also be in released status. Editing a released pipeline is not possible, but the pipeline can be cloned (top right in the details view) to create a new editable version.
The following sections describe the properties that can be configured in each tab of the pipeline editor.
Depending on how you design the pipeline, the displayed tabs differ between the graphical and code definitions. For CWL you have a choice on how to define the pipeline, Nextflow is always defined in code mode.
CWL Graphical
Details
Documentation
Definition
Analysis Report
Metadata Model
Report
CWL Code
Details
Documentation
Inputform files (JSON) or XML Configuration (XML)
CWL Files
Metadata Model
Report
Nextflow Code
Details
Documentation
Inputform Files (JSON) or XML Configuration (XML)
Nextflow files
Metadata Model
Report
Any additional source files related to your pipeline will be displayed here in alphabetical order.
See the following pages for language-specific details for defining pipelines:
The details tab provides options for configuring basic information about the pipeline.
Code
The name of the pipeline. The name must be unique within the tenant, including linked and unlinked pipelines.
Nextflow Version
User selectable Nextflow version available only for Nextflow pipelines
Categories
One or more tags to categorize the pipeline. Select from existing tags or type a new tag name in the field.
Description
A short description of the pipeline.
Proprietary
Hide the pipeline scripts and details from users who do not belong to the tenant who owns the pipeline. This also prevents cloning the pipeline.
Status
The release status of the pipeline.
Storage size
User selectable storage size for running the pipeline. This must be large enough to run the pipeline, but setting it too large incurs unnecessary costs.
Family
A group of pipeline versions. To specify a family, select Change, and then select a pipeline or pipeline family. To change the order of the pipeline, select Up or Down. The first pipeline listed is the default and the remainder of the pipelines are listed as Other versions. The current pipeline appears in the list as this pipeline.
Version comment
A description of changes in the updated version.
Links
External reference links. (max 100 chars as name and 2048 chars as link)
The following information becomes visible when viewing the pipeline details.
ID
Unique Identifier of the pipeline.
URN
Identification of the pipeline in Uniform Resource Name
The clone action will be shown in the pipeline details at the top-right. Cloning a pipeline allows you to create modifications without impacting the original pipeline. When cloning a pipeline, you become the owner of the cloned pipeline. When you clone a pipeline, you must give it a unique name because no duplicate names are allowed within all projects of the tenant. So the name must be unique per tenant. It is possible that you see the same pipeline name twice when a pipeline linked from another tenant is cloned with that same name in your tenant. The name is then still unique per tenant, but you will see them both in your tenant.
When you clone a Nextflow pipeline, a verification of the configured Nextflow version is done to prevent the use of deprecated versions.
The Documentation tab provides is the place where you explain how your pipeline works to users. The description appears in the tool repository but is excluded from exported CWL definitions. If no documentation has been provided, this tab will be empty.
When using graphical mode for the pipeline definition, the Definition tab provides options for configuring the pipeline using a visualization panel and a list of component menus.
Machine profiles
Compute types available to use with Tools in the pipeline.
Shared settings
Settings for pipelines used in more than one tool.
Reference files
Descriptions of reference files used in the pipeline.
Input files
Descriptions of input files used in the pipeline.
Output files
Descriptions of output files used in the pipeline.
Tool
Details about the tool selected in the visualization panel.
Tool repository
A list of tools available to be used in the pipeline.
In graphical mode, you can drag and drop inputs into the visualization panel to connect them to the tools. Make sure to connect the input icons to the tool before editing the input details in the component menu. Required tool inputs are indicated by a yellow connector.
This page is used to specify all relevant information about the pipeline parameters.
For each process defined by the workflow, ICA will launch a compute node to execute the process.
For each compute type, the standard
(default - AWS on-demand) or economy
(AWS spot instance) tiers can be selected.
When selecting an fpga instance type for running analyses on ICA, it is recommended to use the medium size. While the large size offers slight performance benefits, these do not proportionately justify the associated cost increase for most use cases.
When no type is specified, the default type of compute node is standard-small
.
By default, compute nodes have no scratch space. This is an advanced setting and should only be used when absolutely necessary as it will incur additional costs and may offer only limited performance benefits because it is not local to the compute node.
For simplicity and better integration, consider using shared storage available at /ces
. It is what is provided in the Small/Medium/Large+ compute types. This shared storage is used when writing files with relative paths.
Daemon sets and system processes consume approximately 1 CPU and 2 GB Memory from the base values shown in the table. Consumption will vary based on the activity of the pod.
Compute Type
CPUs
Mem (GiB)
Nextflow (pod.value
)
CWL (type, size
)
standard-small
2
8
standard-small
standard, small
standard-medium
4
16
standard-medium
standard, medium
standard-large
8
32
standard-large
standard, large
standard-xlarge
16
64
standard-xlarge
standard, xlarge
standard-2xlarge
32
128
standard-2xlarge
standard, 2xlarge
standard-3xlarge
64
256
standard-3xlarge
standard, 3xlarge
hicpu-small
16
32
hicpu-small
hicpu, small
hicpu-medium
36
72
hicpu-medium
hicpu, medium
hicpu-large
72
144
hicpu-large
hicpu, large
himem-small
8
64
himem-small
himem, small
himem-medium
16
128
himem-medium
himem, medium
himem-large
48
384
himem-large
himem, large
himem-xlarge2
92
700
himem-xlarge
himem, xlarge
hiio-small
2
16
hiio-small
hiio, small
hiio-medium
4
32
hiio-medium
hiio, medium
fpga2-medium1
24
256
fpga2-medium
fpga2,medium
fpga-medium
16
244
fpga-medium
fpga, medium
fpga-large3
64
976
fpga-large
fpga, large
transfer-small4
4
10
transfer-small
transfer, small
transfer-medium 4
8
15
transfer-medium
transfer, medium
transfer-large4
16
30
transfer-large
transfer, large
1 DRAGEN pipelines running on fpga2 compute type will incur a DRAGEN License cost of 0.10 iCredits per gigabase of data processed, with additional discounts as shown below.
80 or less gigabase per sample - no discount - 0.10 iCredits per gigabase
> 80 to 160 gigabase per sample - 20% discount - 0.08 iCredits per gigabase
> 160 to 240 gigabase per sample - 30% discount - 0.07 iCredits per gigabase
> 240 to 320 gigabase per sample - 40% discount - 0.06 iCredits per gigabase
> 320 and more gigabase per sample - 50% discount - 0.05 iCredits per gigabase
If your DRAGEN job fails, no DRAGEN license cost will be charged.
The pipeline analysis report appears in the pipeline execution results. The report is configured from widgets added to the Analysis Report tab in the pipeline editor.
[Optional] Import widgets from another pipeline.
Select Import from other pipeline.
Select the pipeline that contains the report you want to copy.
Select an import option: Replace current report or Append to current report.
Select Import.
From the Analysis Report tab, select Add widget, and then select a widget type.
Configure widget details.
Title
Add and format title text.
Analysis details
Add heading text and select the analysis metadata details to display.
Free text
Add formatted free text. The widget includes options for placeholder variables that display the corresponding project values.
Inline viewer
Add options to view the content of an analysis output file.
Analysis comments
Add comments that can be edited after an analysis has been performed.
Input details
Add heading text and select the input details to display. The widget includes an option to group details by input name.
Project details
Add heading text and select the project details to display.
Page break
Add a page break widget where page breaks should appear between report sections.
Select Save.
By default, compute nodes have no scratch space. This is an advanced setting and should only be used when absolutely necessary as it will incur additional costs and offers only limited performance benefits because it is not local to the compute node.
For better integration, use shared storage available at /ces
. This shared storage is used when writing files with relative paths.
Daemon sets and system processes consume approximately 1 CPU and 2 GiB Memory from the base values shown in the table. Consumption will vary based on the activity of the pod.
Compute Type
CPUs
Mem (GiB)
Nextflow (pod.value
)
CWL (type, size
)
standard-small
2
8
standard-small
standard, small
standard-medium
4
16
standard-medium
standard, medium
standard-large
8
32
standard-large
standard, large
standard-xlarge
16
64
standard-xlarge
standard, xlarge
standard-2xlarge
32
128
standard-2xlarge
standard, 2xlarge
hicpu-small
16
32
hicpu-small
hicpu, small
hicpu-medium
36
72
hicpu-medium
hicpu, medium
hicpu-large
72
144
hicpu-large
hicpu, large
himem-small
8
64
himem-small
himem, small
himem-medium
16
128
himem-medium
himem, medium
himem-large
48
384
himem-large
himem, large
himem-xlarge2
92
700
himem-xlarge
himem, xlarge
hiio-small
2
16
hiio-small
hiio, small
hiio-medium
4
32
hiio-medium
hiio, medium
fpga2-medium1
24
256
fpga2-medium
fpga2,medium
fpga-medium
16
244
fpga-medium
fpga, medium
fpga-large3
64
976
fpga-large
fpga, large
transfer-small4
4
10
transfer-small
transfer, small
transfer-medium4
8
15
transfer-medium
transfer, medium
transfer-large4
16
30
transfer-large
transfer, large
1 DRAGEN pipelines running on fpga2 compute type will incur a DRAGEN License cost of 0.10 iCredits per gigabase of data processed, with additional discounts as shown below.
80 or less gigabase per sample - no discount - 0.10 iCredits per gigabase
> 80 to 160 gigabase per sample - 20% discount - 0.08 iCredits per gigabase
> 160 to 240 gigabase per sample - 30% discount - 0.07 iCredits per gigabase
> 240 to 320 gigabase per sample - 40% discount - 0.06 iCredits per gigabase
> 320 and more gigabase per sample - 50% discount - 0.05 iCredits per gigabase
If your DRAGEN job fails, no DRAGEN license cost will be charged.
(*2) The compute type himem-xlarge has low availability.
Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list. The following formats are supported:
DIFF (.diff)
GROOVY (.groovy .nf)
JAVASCRIPT (.js .javascript)
JSON (.json)
SH (.sh)
SQL (.sql)
TXT (.txt)
XML (.xml)
YAML (.yaml .cwl)
The Nextflow project main script.
The Nextflow configuration settings.
The Common Workflow Language main script.
Multiple files can be added by selecting the +Create option at the bottom of the screen to make pipelines more modular and manageable.
See Metadata Models
Here patterns for detecting report files in the analysis output can be defined. On opening an analysis result window of this pipeline, an additional tab will display these report files. The goal is to provide a pipeline-specific user-friendly representation of the analysis result.
To add a report select the + symbol on the left side. Provide your report with a unique name, a regular expression matching the report and optionally, select the format of the report. This must be the source format of the report data generated during the analysis.
Use the following instructions to start a new analysis for a single pipeline.
Select Projects > your_project > Flow > Pipelines.
Select the pipeline or pipeline details of the pipeline you want to run.
Select Start Analysis.
Configure analysis settings.
Select Start Analysis.
View the analysis status on the Analyses page.
Requested—The analysis is scheduled to begin.
In Progress—The analysis is in progress.
Succeeded—The analysis is complete.
Failed —The analysis has failed.
Aborted — The analysis was aborted before completing.
To end an analysis, select Abort.
To perform a completed analysis again, select Re-run.
The Analysis tab provides options for configuring basic information about the analysis.
User Reference
The unique analysis name.
User tags
One or more tags used to filter the analysis list. Select from existing tags or type a new tag name in the field.
Pricing
Select a subscription to which the analysis will be charged.
Notification
Enter your email address if you want to be notified when the analysis completes.
Output Folder
Select a folder in which the output folder of the analysis should be located. When no folder is selected, the output folder will be located in the root of the project. When you open the folder selection dialog, you have the option to create a new folder (bottom of the screen).
Input
Select the input files to use in the analysis. (max. 50,000)
Settings
Provide input settings.
You can abort a running analysis from either the analysis overview (Projects > your_project > Flow > Analyses > your_analysis > Manage > Abort) or from the analysis details (Projects > your_project > Flow > Analyses > your_analysis > Details tab > Abort).
You can view analysis results on the Analyses page or in the output_folder on the Data page.
Select a project, and then select the Flow > Analyses page.
Select an analysis.
On the Details tab, select the square symbol right of the output files.
From the output files view, expand the list and select an output file.
If you want to add or remove any user or technical tags, you can do so from the data details view.
If you want to download the file, select Schedule download.
To preview the file, select the View tab.
Return to Flow > Analyses > your_analysis.
View additional analysis result information on the following tabs:
Details - View information on the pipeline configuration.
Steps - stderr and stdout information
Nextflow Timeline - Nextflow process execution timeline.
Nextflow Execution - Nextflow analysis report. Showing the run times, commands, resource usage and tasks for Nextflow analyses.
Report - Shows the reports defined on the pipeline report tab.
ICA supports running pipelines defined using Nextflow. See this tutorial for an example.
In order to run Nextflow pipelines, the following process-level attributes within the Nextflow definition must be considered.
Nextflow version
20.10.0 (deprecated *), 22.04.3, 24.10.2 (Experimental)
Executor
Kubernetes
(*) Pipelines will still run when 20.10.0 will be deprecated, but you will no longer be able to choose it when creating new pipelines.
You can select the Nextflow version while building a pipeline as follows:
GUI
Select the Nextflow version at Projects > your_project > flow > pipelines > your_pipeline > Details tab.
API
Select the Nextflow version by setting it in the optional field "pipelineLanguageVersionId
".
When not set, a default Nextflow version will be used for the pipeline.
For each compute type, you can choose between the scheduler.illumina.com/lifecycle: standard
(default - AWS on-demand) or scheduler.illumina.com/lifecycle: economy
(AWS spot instance) tiers.
To specify a compute type for a Nextflow process, use the pod
directive within each process. Set the annotation
to scheduler.illumina.com/presetSize
and the value
to the desired compute type. A list of available compute types can be found here. The default compute type, when this directive is not specified, is standard-small
(2 CPUs and 8 GB of memory).
pod annotation: 'scheduler.illumina.com/presetSize', value: 'fpga-medium'
Inputs are specified via the XML input form or JSON-based input form. The specified code
in the XML will correspond to the field in the params
object that is available in the workflow. Refer to the tutorial for an example.
Outputs for Nextflow pipelines are uploaded from the out
folder in the attached shared filesystem. The publishDir
directive can be used to symlink (recommended), copy or move data to the correct folder. Data will be uploaded to the ICA project after the pipeline execution completes.
publishDir 'out', mode: 'symlink'
During execution, the Nextflow pipeline runner determines the environment settings based on values passed via the command-line or via a configuration file (see Nextflow Configuration documentation). When creating a Nextflow pipeline, use the nextflow.config tab in the UI (or API) to specify a nextflow configuration file to be used when launching the pipeline.
Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list.
The following configuration settings will be ignored if provided as they are overridden by the system:
executor.name
executor.queueSize
k8s.namespace
k8s.serviceAccount
k8s.launchDir
k8s.projectDir
k8s.workDir
k8s.storageClaimName
k8s.storageMountPath
trace.enabled
trace.file
trace.fields
timeline.enabled
timeline.file
report.enabled
report.file
dag.enabled
dag.file
ICA supports running pipelines defined using Common Workflow Language (CWL).
To specify a compute type for a CWL CommandLineTool, use the ResourceRequirement
with a custom namespace.
requirements:
ResourceRequirement:
https://platform.illumina.com/rdf/ica/resources:type: fpga
https://platform.illumina.com/rdf/ica/resources:size: small
https://platform.illumina.com/rdf/ica/resources:tier: standard
Reference Compute Types for available compute types and sizes.
The ICA Compute Type will be determined automatically based on CWL ResourceRequirement coresMin/coresMax (CPU) and ramMin/ramMax (Memory) values using a "best fit" strategy to meet the minimum specified requirements (refer to the Compute Types table).
For example, take the following ResourceRequirements
:
requirements:
ResourceRequirement:
ramMin: 10240
coresMin: 6
This would result in a best fit of standard-large
ICA Compute Type request for the tool.
If the specified requirements can not be met by any of the presets, the task will be rejected and failed.
FPGA requirements can not be set by means of CWL ResourceRequirements.
The Machine Profile Resource in the graphical editor will override whatever is set for requirements in the ResourceRequirement.
If no Docker image is specified, Ubuntu will be used as default. Both : and / can be used as separator.
ICA supports overriding workflow requirements at load time using Command Line Interface (CLI) with JSON input. Please refer to CWL documentation for more details on the CWL overrides feature.
In ICA you can provide the "override" recipes as a part of the input JSON. The following example uses CWL overrides to change the environment variable requirement at load time.
icav2 projectpipelines start cwl cli-tutorial --data-id fil.a725a68301ee4e6ad28908da12510c25 --input-json '{
"ipFQ": {
"class": "File",
"path": "test.fastq"
},
"cwltool:overrides": {
"tool-fqTOfa.cwl": {
"requirements": {
"EnvVarRequirement": {
"envDef": {
"MESSAGE": "override_value"
}
}
}
}
}
}' --type-input JSON --user-reference overrides-example
Pipelines defined using the "Code" mode require either an XML-based or JSON-based input form to define the fields shown on the launch view in the user interface (UI). The XML-based input form is defined in the "XML Configuration" tab of the pipeline editing view.
The input form XML must adhere to the input form schema.
During the creation of a Nextflow pipeline the user is given an empty form to fill out.
<pipeline code="" version="1.0" xmlns="xsd://www.illumina.com/ica/cp/pipelinedefinition">
<dataInputs>
</dataInputs>
<steps>
</steps>
</pipeline>
The input files are specified within a single DataInputs node. An individual input is then specified in a separate DataInput node. A DataInput node contains following attributes:
code: an unique id. Required.
format: specifying the format of the input: FASTA, TXT, JSON, UNKNOWN, etc. Multiple entries are possible: example below. Required.
type: is it a FILE or a DIRECTORY? Multiple entries are not allowed. Required.
required: is this input required for the execution of a pipeline? Required.
multiValue: are multiple files as an input allowed? Required.
dataFilter: TBD. Optional.
Additionally, DataInput has two elements: label for labelling the input and description for a free text description of the input.
An example of a single file input which can be in a TXT, CSV, or FASTA format.
<pd:dataInput code="in" format="TXT, CSV, FASTA" type="FILE" required="true" multiValue="false">
<pd:label>Input file</pd:label>
<pd:description>Input file can be either in TXT, CSV or FASTA format.</pd:description>
</pd:dataInput>
To use a folder as an input the following form is required:
<pd:dataInput code="fastq_folder" format="UNKNOWN" type="DIRECTORY" required="false" multiValue="false">
<pd:label>fastq folder path</pd:label>
<pd:description>Providing Fastq folder</pd:description>
</pd:dataInput>
For multiple files, set the attribute multiValue to true. This will make it so the variable is considered to be of type list [], so adapt your pipeline when changing from single value to multiValue.
<pd:dataInput code="tumor_fastqs" format="FASTQ" type="FILE" required="false" multiValue="true">
<pd:label>Tumor FASTQs</pd:label>
<pd:description>Tumor FASTQ files to be provided as input. FASTQ files must have "_LXXX" in its filename to denote the lane and "_RX" to denote the read number. If either is omitted, lane 1 and read 1 will be used in the FASTQ list. The tool will automatically write a FASTQ list from all files provided and process each sample in batch in tumor-only mode. However, for tumor-normal mode, only one sample each can be provided.
</pd:description>
</pd:dataInput>
Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc. The following hierarchy of nodes must be followed: steps > step > tool > parameter. The parameter node must contain following attributes:
code: unique id. This is the parameter name that is passed to the workflow
minValues: how many values (at least) should be specified for this setting. If this setting is required, minValues
should be set to 1.
maxValues: how many values (at most) should be specified for this setting
classification: is this setting specified by the user?
In the code below a string setting with the identifier inp1 is specified.
<pd:steps>
<pd:step execution="MANDATORY" code="General">
<pd:label>General</pd:label>
<pd:description>General parameters</pd:description>
<pd:tool code="generalparameters">
<pd:label>generalparameters</pd:label>
<pd:description></pd:description>
<pd:parameter code="inp1" minValues="1" maxValues="3" classification="USER">
<pd:label>inp1</pd:label>
<pd:description>first</pd:description>
<pd:stringType/>
<pd:value></pd:value>
</pd:parameter>
</pd:tool>
</pd:step>
</pd:steps>
Examples of the following types of settings are shown in the subsequent sections. Within each type, the value
tag can be used to denote a default value in the UI, or can be left blank to have no default. Note that setting a default value has no impact on analyses launched via the API.
For an integer setting the following schema with an element integerType is to be used. To define an allowed range use the attributes minimumValue and maximumValue.
<pd:parameter code="ht_seed_len" minValues="0" maxValues="1" classification="USER">
<pd:label>Seed Length</pd:label>
<pd:description>Initial length in nucleotides of seeds from the reference genome to populate into the hash table. Consult the DRAGEN manual for recommended lengths. Corresponds to DRAGEN argument --ht-seed-len.
</pd:description>
<pd:integerType minimumValue="10" maximumValue="50"/>
<pd:value>21</pd:value>
</pd:parameter>
Options types can be used to designate options from a drop-down list in the UI. The selected option will be passed to the workflow as a string. This currently has no impact when launching from the API, however.
<pd:parameter code="cnv_segmentation_mode" minValues="0" maxValues="1" classification="USER">
<pd:label>Segmentation Algorithm</pd:label>
<pd:description> DRAGEN implements multiple segmentation algorithms, including the following algorithms, Circular Binary Segmentation (CBS) and Shifting Level Models (SLM).
</pd:description>
<pd:optionsType>
<option>CBS</option>
<option>SLM</option>
<option>HSLM</option>
<option>ASLM</option>
</pd:optionsType>
<pd:value>false</pd:value>
</pd:parameter>
Option types can also be used to specify a boolean, for example
<pd:parameter code="output_format" minValues="1" maxValues="1" classification="USER">
<pd:label>Map/Align Output</pd:label>
<pd:description></pd:description>
<pd:optionsType>
<pd:option>BAM</pd:option>
<pd:option>CRAM</pd:option>
</pd:optionsType>
<pd:value>BAM</pd:value>
</pd:parameter>
For a string setting the following schema with an element stringType
is to be used.
<pd:parameter code="output_file_prefix" minValues="1" maxValues="1" classification="USER">
<pd:label>Output File Prefix</pd:label>
<pd:description></pd:description>
<pd:stringType/>
<pd:value>tumor</pd:value>
</pd:parameter>
For a boolean setting, booleanType
can be used.
<pd:parameter code="quick_qc" minValues="0" maxValues="1" classification="USER">
<pd:label>quick_qc</pd:label>
<pd:description></pd:description>
<pd:booleanType/>
<pd:value></pd:value>
</pd:parameter>
One known limitation of the schema presented above is the inability to specify a parameter that can be multiple type, e.g. File or String. One way to implement this requirement would be to define two optional parameters: one for File input and the second for String input. At the moment ICA UI doesn't validate whether at least one of these parameters is populated - this check can be done within the pipeline itself.
Below one can find both a main.nf and XML configuration of a generic pipeline with two optional inputs. One can use it as a template to address similar issues. If the file parameter is set, it will be used. If the str parameter is set but file is not, the str parameter will be used. If neither of both is used, the pipeline aborts with an informative error message.
nextflow.enable.dsl = 2
// Define parameters with default values
params.file = false
params.str = false
// Check that at least one of the parameters is specified
if (!params.file && !params.str) {
error "You must specify at least one input: --file or --str"
}
process printInputs {
container 'public.ecr.aws/lts/ubuntu:22.04'
pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
input:
file(input_file)
script:
"""
echo "File contents:"
cat $input_file
"""
}
process printInputs2 {
container 'public.ecr.aws/lts/ubuntu:22.04'
pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
input:
val(input_str)
script:
"""
echo "String input: $input_str"
"""
}
workflow {
if (params.file) {
file_ch = Channel.fromPath(params.file)
file_ch.view()
str_ch = Channel.empty()
printInputs(file_ch)
}
else {
file_ch = Channel.empty()
str_ch = Channel.of(params.str)
str_ch.view()
file_ch.view()
printInputs2(str_ch)
}
}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
<pd:dataInputs>
<pd:dataInput code="file" format="TXT" type="FILE" required="false" multiValue="false">
<pd:label>in</pd:label>
<pd:description>Generic file input</pd:description>
</pd:dataInput>
</pd:dataInputs>
<pd:steps>
<pd:step execution="MANDATORY" code="general">
<pd:label>General Options</pd:label>
<pd:description locked="false"></pd:description>
<pd:tool code="general">
<pd:label locked="false"></pd:label>
<pd:description locked="false"></pd:description>
<pd:parameter code="str" minValues="0" maxValues="1" classification="USER">
<pd:label>String</pd:label>
<pd:description></pd:description>
<pd:stringType/>
<pd:value>string</pd:value>
</pd:parameter>
</pd:tool>
</pd:step>
</pd:steps>
</pd:pipeline>
Pipelines defined using the "Code" mode require an XML or JSON-based input form to define the fields shown on the launch view in the user interface (UI).
To create a JSON-based Nextflow (or CWL) pipeline, go to Projects > your_project > Flow > Pipelines > +Create > Nextflow (or CWL) > JSON-based.
Three files, located on the inputform files tab, work together for evaluating and presenting JSON-based input.
inputForm.json contains the actual input form which is rendered when starting the pipeline run.
onRender.js is triggered when a value is changed.
onSubmit.js is triggered when starting a pipeline via the GUI or API.
Use + Create to add additional files and Simulate to test your inputForms.
Scripting execution supports crossfield validation of the values, hiding fields, making them required, .... based on value changes.
The JSON schema allowing you to define the input parameters. See the inputForm.json page for syntax details.
textbox
Corresponds to stringType in xml.
checkbox
A checkbox that supports the option of being required, so can serve as an active consent feature. (corresponds to the booleanType in xml).
radio
A radio button group to select one from a list of choices. The values to choose from must be unique.
select
A dropdown selection to select one from a list of choices. This can be used for both single-level lists and tree-based lists.
number
The value is of Number type in javascript and Double type in java. (corresponds to doubleType in xml).
integer
Corresponds to java Integer.
data
Data such as files.
section
For splitting up fields, to give structure. Rendered as subtitles. No values are to be assigned to these fields.
text
To display informational messages. No values are to be assigned to these fields.
fieldgroup
Can contain parameters or other groups. Allows to have repeating sets of parameters, for instance when a father|mother|child choice needs to be linked to each file input. So if you want to have the same elements multiple times in your form, combine them into a fieldgroup.
These attributes can be used to configure all parameter types.
label
The display label for this parameter. Optional but recommended, id will be used if missing.
minValues
The minimal amount of values that needs to be present. Default when not set is 0. Set to >=1 to make the field required.
maxValues
The maximal amount of values that need to be present. Default when not set is 1.
minMaxValuesMessage
The error message displayed when minValues or maxValues is not adhered to. When not set, a default message is generated.
helpText
A helper text about the parameter. Will be displayed in smaller font with the parameter.
placeHolderText
An optional short hint ( a word or short phrase) to aid the user when the field has no value.
value
The value of the parameter. Can be considered default value.
minLength
Only applied on type="textbox". Value is a positive integer.
maxLength
Only applied on type="textbox". Value is a positive integer.
min
Minimal allowed value for 'integer' and 'number' type.
for 'integer' type fields the minimal and maximal values are -100000000000000000 and 100000000000000000.
for 'number' type fields the max precision is 15 significant digits and the exponent needs to be between -300 and +300.
max
Maximal allowed value for 'integer' and 'number' type.
for 'integer' type fields the minimal and maximal values are -100000000000000000 and 100000000000000000.
for 'number' type fields the max precision is 15 significant digits and the exponent needs to be between -300 and +300.
choices
A list of choices with for each a "value", "text" (is label), "selected" (only 1 true supported), "disabled". "parent" can be used to build hierarchical choicetrees. "availableWhen" can be used for conditional presence of the choice based on values of other fields. Parent and value must be unique, you can not use the same value for both.
fields
The list of sub fields for type fieldgroup.
dataFilter
For defining the filtering when type is 'data'. nameFilter, dataFormat and dataType are additional properties.
regex
The regex pattern the value must adhere to. Only applied on type="textbox".
regexErrorMessage
The optional error message when the value does not adhere to the "regex". A default message will be used if this parameter is not present. It is highly recommended to set this as the default message will show the regex which is typically very technical.
hidden
Makes this parameter hidden. Can be made visible later in onRender.js or can be used to set hardcoded values of which the user should be aware.
disabled
Shows the parameter but makes editing it impossible. The value can still be altered by onRender.js.
emptyValuesAllowed
When maxValues is 1 or not set and emptyValuesAllowed is true, the values may contain null entries. Default is false.
updateRenderOnChange
When true, the onRender javascript function is triggered each time the user changes the value of this field. Default is false.
dropValueWhenDisabled
When this is present and true and the field has disabled being true, then the value will be omitted during the submit handling (on the onSubmit result).
Streamable inputs
Adding "streamable":true
to an input field of type "data" makes it a streamable input.
The onSubmit.js javascript function receives an input object which holds information about the chosen values of the input form and the pipeline and pipeline execution request parameters. This javascript function is not only triggered when submitting a new pipeline execution request in the user interface, but also when submitting one through the rest API..
settings
The value of the setting fields. Corresponds to settingValues
in the onRender.js. This is a map with field id as key and an array of field values as value. For convenience, values of single-value fields are present as the individual value and not as an array of length 1. In case of fieldGroups, the value can be multiple levels of arrays.
settingValues
To maximize the opportunity for reusing code between onRender and onSubmit, the 'settings' are also exposed as settingValues
like in the onRender input.
pipeline
Info about the pipeline: code, tenant, version, and description are all available in the pipeline object as string.
analysis
Info about this run: userReference, userName, and userTenant are all available in the analysis object as string.
storageSize
The storage size as chosen by the user. This will initially be null. StorageSize is an object containing an 'id' and 'name' property.
storageSizeOptions
The list of storage sizes available to the user when creating an analysis. Is a list of StorageSize objects containing an 'id' and 'name' property.
analysisSettings
The input form json as saved in the pipeline. So the original json, without eventual changes.
currentAnalysisSettings
The current input form JSON as rendered to the user. This can contain already applied changes form earlier onRender passes. Null in the first call, when context is 'Initial' or when analysis is created through CLI/API.
settings
The value of the setting fields. This allows modifying the values or applying defaults and such. Or taking info of the pipeline or analysis input object. When settings are not present in the onSubmit return value object, they are assumed to be not modified.
validationErrors
A list of AnalysisError essages representing validation errors. Submitting a pipeline execution request is not possible while there are still validation errors.
analysisSettings
The input form json with potential applied changes. The discovered changes will be applied in the UI when viewing the analysis.
This is the object used for representing validation errors.
fieldId / FieldId
The field which has an erroneous value. When not present, a general error/warning is displayed. To display an error on the storage size, use the storageSize
Fieldid.
index / Index
The 0-starting index of the value which is incorrect. Use this when a particular value of a multivalue field is not correct. When not present, the entire field is marked as erroneous. The value can also be an array of indexes for use with fieldgroups. For instance, when the 3rd field of the 2nd instance of a fieldgroup is erroneous, a value of [ 1 , 2 ] is used.
message / Message
The error/warning message to display.
Receives an input object which contains information about the current state of the input form, the chosen values and the field value change that triggered the onrender call. It also contains pipeline information. Changed objects are present in the onRender return value object. Any object not present is considered to be unmodified. Changing the storage size in the start analysis screen triggers an onRender execution with storageSize as changed field.
context
"Initial"/"FieldChanged"/"Edited".
Initial is the value when first displaying the form when a user opens the start run screen.
The value is FieldChanged when a field with 'updateRenderOnChange'=true
is changed by the user.
Edited (Not yet supported in ICA) is used when a form is displayed later again, this is intended for draft runs or when editing the form during reruns.
changedFieldId
The id of the field that changed and which triggered this onRender call. context will be FieldChanged
. When the storage size is changed, the fieldId will be storageSize
.
analysisSettings
The input form json as saved in the pipeline. This is the original json, without changes.
currentAnalysisSettings
The current input form json as rendered to the user. This can contain already applied changes form earlier onRender passes. Null in the first call, when context is Initial
.
settingValues
The current value of all settings fields. This is a map with field id as key and an array of field values as value for multivalue fields. For convenience, values of single-value fields are present as the individual value and not as an array of length 1. In case of fieldGroups, the value can be multiple levels of arrays.
pipeline
Information about the pipeline: code, tenant, version, and description are all available in the pipeline object as string.
analysis
Information about this run: userReference, userName, and userTenant are all available in the analysis object as string.
storageSize
The storage size as chosen by the user. This will initially be null. StorageSize is an object containing an 'id' and 'name' property.
storageSizeOptions
The list of storage sizes available to the user when creating an analysis. Is a list of StorageSize objects containing an 'id' and 'name' property.
analysisSettings
The input form json with potential applied changes. The discovered changes will be applied in the UI.
settingValues
The current, potentially altered map of all setting values. These will be updated in the UI.
validationErrors
A list of RenderMessages representing validation errors. Submitting a pipeline execution request is not possible while there are still validation errors.
validationWarnings
A list of RenderMessages representing validation warnings. A user may choose to ignore these validation warnings and start the pipeline execution request.
storageSize
The suitable value for storageSize. Must be one of the options of input.storageSizeOptions. When absent or null, it is ignored.
validation errors and validation warnings can use 'storageSize' as fieldId to let an error appear on the storage size field. 'storageSize' is the value of the changedFieldId when the user alters the chosen storage size.
This is the object used for representing validation errors and warnings. The attributes can be used with first letter lowercase (consistent with the input object attributes) or uppercase.
fieldId / FieldId
The field which has an erroneous value. When not present, a general error/warning is displayed. To display an error on the storage size, use the storageSize
Fieldid.
index / Index
The 0-starting index of the value which is incorrect. Use this when a particular value of a multivalue field is not correct. When not present, the entire field is marked as erroneous. The value can also be an array of indexes for use with fieldgroups. For instance, when the 3rd field of the 2nd instance of a fieldgroup is erroneous, a value of [ 1 , 2 ] is used.
message / Message
The error/warning message to display.
{
"$id": "#ica-pipeline-input-form",
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "ICA Pipeline Input Forms",
"description": "Describes the syntax for defining input setting forms for ICA pipelines",
"type": "object",
"additionalProperties": false,
"properties": {
"fields": {
"description": "The list of setting fields",
"type": "array",
"items": {
"$ref": "#/definitions/ica_pipeline_input_form_field"
}
}
},
"required": [
"fields"
],
"definitions": {
"ica_pipeline_input_form_field": {
"$id": "#ica_pipeline_input_form_field",
"type": "object",
"additionalProperties": false,
"properties": {
"id": {
"description": "The unique identifier for this field. Will be available with this key to the pipeline script.",
"type": "string",
"pattern": "^[a-zA-Z-0-9\\-_\\.\\s\\+\\[\\]]+$"
},
"type": {
"type": "string",
"enum": [
"textbox",
"checkbox",
"radio",
"select",
"number",
"integer",
"data",
"section",
"text",
"fieldgroup"
]
},
"label": {
"type": "string"
},
"minValues": {
"description": "The minimal amount of values that needs to be present. Default is 0 when not provided. Set to >=1 to make the field required.",
"type": "integer",
"minimum": 0
},
"maxValues": {
"description": "The maximal amount of values that needs to be present. Default is 1 when not provided.",
"type": "integer",
"exclusiveMinimum": 0
},
"minMaxValuesMessage": {
"description": "The error message displayed when minValues or maxValues is not adhered to. When not provided a default message is generated.",
"type": "string"
},
"helpText": {
"type": "string"
},
"placeHolderText": {
"description": "An optional short hint (a word or short phrase) to aid the user when the field has no value."
"type": "string"
},
"value": {
"description": "The value for the field. Can be an array for multi-value fields.
For 'number' type values the exponent needs to be between -300 and +300 and max precision is 15.
For 'integer' type values the value needs to between -100000000000000000 and 100000000000000000"
},
"minLength": {
"type": "integer",
"minimum": 0
},
"maxLength": {
"type": "integer",
"exclusiveMinimum": 0
},
"min": {
"description": "Minimal allowed value for 'integer' and 'number' type. Exponent needs to be between -300 and +300 and max precision is 15.",
"type": "number"
},
"max": {
"description": "Maximal allowed value for 'integer' and 'number' type. Exponent needs to be between -300 and +300 and max precision is 15.",
"type": "number"
},
"choices": {
"type": "array",
"items": {
"$ref": "#/definitions/ica_pipeline_input_form_field_choice"
}
},
"fields": {
"description": "The list of setting sub fields for type fieldgroup",
"type": "array",
"items": {
"$ref": "#/definitions/ica_pipeline_input_form_field"
}
},
"dataFilter": {
"description": "For defining the filtering when type is 'data'.",
"type": "object",
"additionalProperties": false,
"properties": {
"nameFilter": {
"description": "Optional data filename filter pattern that input files need to adhere to when type is 'data'. Eg parts of the expected filename",
"type": "string"
},
"dataFormat": {
"description": "Optional dataformat name array that input files need to adhere to when type is 'data'",
"type": "array",
"contains": {
"type": "string"
}
},
"dataType": {
"description": "Optional data type (file or directory) that input files need to adhere to when type is 'data'",
"type": "string",
"enum": [
"file",
"directory"
]
}
}
},
"regex": {
"type": "string"
},
"regexErrorMessage": {
"type": "string"
},
"hidden": {
"type": "boolean"
},
"disabled": {
"type": "boolean"
},
"emptyValuesAllowed": {
"type": "boolean",
"description": "When maxValues is greater than 1 and emptyValuesAllowed is true, the values may contain null entries. Default is false."
},
"updateRenderOnChange": {
"type": "boolean",
"description": "When true, the onRender javascript function is triggered ech time the user changes the value of this field. Default is false."
}
"streamable": {
"type": "boolean",
"description": "EXPERIMENTAL PARAMETER! Only possible for fields of type 'data'. When true, the data input files will be offered in streaming mode to the pipeline instead of downloading them."
},
},
"required": [
"id",
"type"
],
"allOf": [
{
"if": {
"description": "When type is 'textbox' then 'dataFilter', 'fields', 'choices', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"textbox"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"choices",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'checkbox' then 'dataFilter', 'fields', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"checkbox"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"choices",
"placeHolderText",
"regex",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'radio' then 'dataFilter', 'fields', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"radio"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"placeHolderText",
"regex",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'select' then 'dataFilter', 'fields', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"select"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"regex",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'number' or 'integer' then 'dataFilter', 'fields', 'choices', 'regex', 'regexErrorMessage', 'maxLength' and 'minLength' are not allowed",
"properties": {
"type": {
"enum": [
"number",
"integer"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"choices",
"regex",
"regexErrorMessage",
"maxLength",
"minLength"
]
}
}
}
},
{
"if": {
"description": "When type is 'data' then 'dataFilter' is required and 'fields', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"data"
]
}
},
"required": [
"type"
]
},
"then": {
"required": [
"dataFilter"
],
"propertyNames": {
"not": {
"enum": [
"fields",
"choices",
"placeHolderText",
"regex",
"regexErrorMessage",
"max",
"min",
"maxLength",
"minLength"
]
}
}
}
},
{
"if": {
"description": "When type is 'section' or 'text' then 'disabled', 'fields', 'updateRenderOnChange', 'classification', 'value', 'minValues', 'maxValues', 'minMaxValuesMessage', 'dataFilter', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"section",
"text"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"disabled",
"fields",
"updateRenderOnChange",
"classification",
"value",
"minValues",
"maxValues",
"minMaxValuesMessage",
"dataFilter",
"choices",
"regex",
"placeHolderText",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'fieldgroup' then 'fields' is required and then 'dataFilter', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"fieldgroup"
]
}
},
"required": [
"type",
"fields"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"choices",
"placeHolderText",
"regex",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min"
]
}
}
}
}
]
},
"ica_pipeline_input_form_field_choice": {
"$id": "#ica_pipeline_input_form_field_choice",
"type": "object",
"additionalProperties": false,
"properties": {
"value": {
"description": "The value which will be set when selecting this choice. Must be unique over the choices within a field",
},
"text": {
"description": "The display text for this choice, similar as the label of a field. ",
"type": "string"
},
"selected": {
"description": "Optional. When true, this choice value is picked as default selected value.
As in selected=true has precedence over an eventual set field 'value'.
For clarity it's better however not to use 'selected' but use field 'value' as is used to set default values for the other field types.
Only maximum 1 choice may have selected true.",
"type": "boolean"
},
"disabled": {
"type": "boolean"
},
"parent": {
"description": "Value of the parent choice item. Can be used to build hierarchical choice trees."
}
},
"required": [
"value",
"text"
]
}
}
}
Let's create the Nextflow Scatter Gather pipeline with a JSON input form.
Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create > Nextflow > JSON based button to start creating a Nextflow pipeline.
In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.
First, we present the individual processes. Select Nextflow files > + Create and label the file split.nf. Copy and paste the following definition.
process split {
cpus 1
memory '512 MB'
input:
path x
output:
path("split.*.tsv")
"""
split -a10 -d -l3 --numeric-suffixes=1 --additional-suffix .tsv ${x} split.
"""
}
Next, select +Create and name the file sort.nf. Copy and paste the following definition.
process sort {
cpus 1
memory '512 MB'
input:
path x
output:
path '*.sorted.tsv'
"""
sort -gk1,1 $x > ${x.baseName}.sorted.tsv
"""
}
Select +Create again and label the file merge.nf. Copy and paste the following definition.
process merge {
cpus 1
memory '512 MB'
publishDir 'out', mode: 'move'
input:
path x
output:
path 'merged.tsv'
"""
cat $x > merged.tsv
"""
}
Edit the main.nf file by navigating to the Nextflow files > main.nf tab and copying and pasting the following definition.
nextflow.enable.dsl=2
include { sort } from './sort.nf'
include { split } from './split.nf'
include { merge } from './merge.nf'
params.myinput = "test.test"
workflow {
input_ch = Channel.fromPath(params.myinput)
split(input_ch)
sort(split.out.flatten())
merge(sort.out.collect())
}
Here, the operators flatten and collect are used to transform the emitting channels. The Flatten operator transforms a channel in such a way that every item of type Collection or Array is flattened so that each single entry is emitted separately by the resulting channel. The collect operator collects all the items emitted by a channel to a List and return the resulting object as a sole emission.
On the Inputform files tab, edit the inputForm.json to allow selection of a file.
{
"fields": [
{
"id": "myinput",
"label": "myinput",
"type": "data",
"dataFilter": {
"dataType": "file",
"dataFormat": ["TSV"]
},
"maxValues": 1,
"minValues": 1
}
]
}
Click the Simulate button (at the bottom of the text editor) to preview the launch form fields.
The onSubmit.js and onRender.js can remain with their default scripts and are just shown here for reference.
function onSubmit(input) {
var validationErrors = [];
return {
'settings': input.settings,
'validationErrors': validationErrors
};
}
function onRender(input) {
var validationErrors = [];
var validationWarnings = [];
if (input.currentAnalysisSettings === null) {
//null first time, to use it in the remainder of he javascript
input.currentAnalysisSettings = input.analysisSettings;
}
switch(input.context) {
case 'Initial': {
renderInitial(input, validationErrors, validationWarnings);
break;
}
case 'FieldChanged': {
renderFieldChanged(input, validationErrors, validationWarnings);
break;
}
case 'Edited': {
renderEdited(input, validationErrors, validationWarnings);
break;
}
default:
return {};
}
return {
'analysisSettings': input.currentAnalysisSettings,
'settingValues': input.settingValues,
'validationErrors': validationErrors,
'validationWarnings': validationWarnings
};
}
function renderInitial(input, validationErrors, validationWarnings) {
}
function renderEdited(input, validationErrors, validationWarnings) {
}
function renderFieldChanged(input, validationErrors, validationWarnings) {
}
function findField(input, fieldId){
var fields = input.currentAnalysisSettings['fields'];
for (var i = 0; i < fields.length; i++){
if (fields[i].id === fieldId) {
return fields[i];
}
}
return null;
}
Click the Save
button to save the changes.
Developing on the cloud incurs inherent runtime costs due to compute and storage used to execute workflows. Here are a few tips that can facilitate development.
Leverage the cross-platform nature of these workflow languages. Both CWL and Nextflow can be run locally in addition to on ICA. When possible, testing should be performed locally before attempting to run in the cloud. For Nextflow, configuration files can be utilized to specify settings to be used either locally or on ICA. An example of advanced usage of a config would be applying the scratch directive to a set of process names (or labels) so that they use the higher performance local scratch storage attached to an instance instead of the shared network disk,
withName: 'process1|process2|process3' { scratch = '/scratch/' }
withName: 'process3' { stageInMode = 'copy' } // Copy the input files to scratch instead of symlinking to shared network disk
When trying to test on the cloud, it's oftentimes beneficial to create scripts to automate the deployment and launching / monitoring process. This can be performed either using the ICA CLI or by creating your own scripts integrating with the REST API.
For scenarios in which instances are terminated prematurely (for example, while using spot instances) without warning, you can implement scripts like the following to retry the job a certain number of times. Adding the following script to 'nextflow.config' enables five retries for each job, with increasing delays between each try.
process {
maxRetries = 4
errorStrategy = { sleep(task.attempt * 60000 as long); return'retry'} // Retry with increasing delay
}
Note: Adding the retry script where it is not needed might introduce additional delays.
When hardening a Nextflow to handle resource shortages (for example exit code 2147483647), an immediate retry will in most circumstances fail because the resources have not yet been made available. It is best practice to use Dynamic retry with backoff which has an increasing backoff delay, allowing the system time to provide the necessary resources.
When publishing your Nextflow pipeline, make sure your have defined a container such as 'public.ecr.aws/lts/ubuntu:22.04' and are not using the default container 'ubuntu:latest'.
To limit potential costs, there is a timeout of 96 hours: if the analysis does not complete within four days, it will go to a 'Failed' state. This time begins to count as soon as the input data is being downloaded. This takes place during the ICA 'Requested' step of the analysis, before going to 'In Progress'. In case parallel tasks are executed, running time is counted once. As an example, let's assume the initial period before being picked up for execution is 10 minutes and consists of the request, queueing and initializing. Then, the data download takes 20 minutes. Next, a task runs on a single node for 25 minutes, followed by 10 minutes of queue time. Finally, three tasks execute simultaneously, each of them taking 25, 28, and 30 minutes, respectively. Upon completion, this is followed by uploading the outputs for one minute. The overall analysis time is then 20 + 25 + 10 + 30 (as the longest task out of three) + 1 = 86 minutes:
Analysis task
request
queued
initializing
input download
single task
queue
parallel tasks
generating outputs
completed
96 hour limit
1m (not counted)
7m (not counted)
2m (not counted)
20m
25m
10m
30m
1m
-
Status in ICA
status requested
status queued
status initializing
status preparing inputs
status in progress
status in progress
status in progress
status generating outputs
status succeeded
If there are no available resources or your project priority is low, the time before download commences will be substantially longer.
By default, Nextflow will not generate the trace report. If you want to enable generating the report, add the section below to your userNextflow.config file.
trace.enabled = true
trace.file = '.ica/user/trace-report.txt'
trace.fields = 'task_id,hash,native_id,process,tag,name,status,exit,module,container,cpus,time,disk,memory,attempt,submit,start,complete,duration,realtime,queue,%cpu,%mem,rss,vmem,peak_rss,peak_vmem,rchar,wchar,syscr,syscw,read_bytes,write_bytes,vol_ctxt,inv_ctxt,env,workdir,script,scratch,error_action'