Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
ICA supports running pipelines defined using Common Workflow Language (CWL).
To specify a compute type for a CWL CommandLineTool, use the ResourceRequirement
with a custom namespace.
Reference Compute Types for available compute types and sizes.
The ICA Compute Type will be determined automatically based on CWL ResourceRequirement coresMin/coresMax (CPU) and ramMin/ramMax (Memory) values using a "best fit" strategy to meet the minimum specified requirements (refer to the Compute Types table).
For example, take the following ResourceRequirements
:
This would result in a best fit of standard-large
ICA Compute Type request for the tool.
If the specified requirements can not be met by any of the presets, the task will be rejected and failed.
FPGA requirements can not be set by means of CWL ResourceRequirements.
The Machine Profile Resource in the graphical editor will override whatever is set for requirements in the ResourceRequirement.
If no Docker image is specified, Ubuntu will be used as default. Both : and / can be used as separator.
ICA supports overriding workflow requirements at load time using Command Line Interface (CLI) with JSON input. Please refer to CWL documentation for more details on the CWL overrides feature.
In ICA you can provide the "override" recipes as a part of the input JSON. The following example uses CWL overrides to change the environment variable requirement at load time.
To use a reference set from within a project, you have first to add it. From the project's page select Flow > Reference Data > Manage > +Add to project. Then select a reference set to add to your project. You can select the entire reference set, or click the arrow next to it to expand it. After expanding, scroll to the right, to see the individual reference files in the set. You can select individual reference files to add to your project, by checking the boxes next to them.
Note: Reference sets are only supported in Graphical CWL pipelines.
Navigate to Reference Data (outside of Project context).
Select the data set(s) you wish to add to another region and select Actions > Copy to another project.
Select a project located in the region where you want to add your reference data.
You can check in which region(s) Reference data is present by double-clicking on individual files in the Reference set and viewing Copy Details on the Data details tab.
Allow a few minutes for new copies to become available before use.
Note: You only need one copy of each reference data set per region. Adding Reference Data sets to additional projects set in the same region does not result in extra copies, but creates links instead. This is done from inside the project at Projects > <your_project> > Flow > Reference Data > Manage > Add to project.
To create a pipeline with a reference data use the CWL - graphical mode (important restriction: as of now you cannot use reference data for pipelines created in advanced mode). Use the reference data icon instead of regular input icon. On the right hand side use the Reference files submenu to specify the name, the format, and the filters. You can specify the options for an end-user to choose from and a default selection. You can select more than 1 file, but you can only select 1 at a time (so, repeat process to select multiple reference files). If you only select 1 reference file, that file will be the only one users can use with your pipeline. In the screenshot a reference data with two options is presented.
If your pipeline was built to give users the option of choosing among multiple input reference files, they will see the option to select among the reference files you configured, under Settings.
After clicking the magnifying glass icon the user can select from provided options.
A Pipeline is a series of Tools with connected inputs and outputs configured to execute in a specific order.
Pipelines are created and stored within projects.
Navigate to Projects > your_project > Flow > Pipelines.
Select CWL or Nextflow to create a new Pipeline.
Configure pipeline settings in the pipeline property tabs.
When creating a graphical CWL pipeline, drag connectors to link tools to input and output files in the canvas. Required tool inputs are indicated by a yellow connector.
Select Save.
Pipelines use the latest tool definition when the pipeline was last saved. Tool changes do not automatically propagate to the pipeline. In order to update the pipeline with the latest tool changes, edit the pipeline definition by removing the tool and re-adding it back to the pipeline.
Individual Pipeline files are limited to 100 Megabytes. If you need to add more than this, split your content over multiple files.
Pipelines use the latest tool definition when the pipeline was last saved. Tool changes do not automatically propagate to the pipeline. In order to update the pipeline with the latest tool changes, edit the pipeline definition by removing the tool and re-adding it back to the pipeline.
You can edit pipelines while they are in Draft or Release Candidate status. Once released, pipelines can no longer be edited.
The following sections describe the tool properties that can be configured in each tab of the pipeline editor.
Depending on how you design the pipeline, the displayed tabs differ between the graphical and code definitions. For CWL you have a choice on how to define the pipeline, Nextflow is always defined in code mode.
Any additional source files related to your pipeline will be displayed here in alphabetical order.
See the following pages for language-specific details for defining pipelines:
The Information tab provides options for configuring basic information about the pipeline.
The following information becomes visible when viewing the pipeline.
In addition, the clone function will be shown (top-right). When cloning a pipeline, you become the owner of the cloned pipeline.
The Documentation tab provides options for configuring the HTML description for the tool. The description appears in the tool repository but is excluded from exported CWL definitions. If no documentation has been provided, this tab will be empty.
When using graphical mode for the pipeline definition, the Definition tab provides options for configuring the pipeline using a visualization panel and a list of component menus.
In graphical mode, you can drag and drop inputs into the visualization panel to connect them to the tools. Make sure to connect the input icons to the tool before editing the input details in the component menu. Required tool inputs are indicated by a yellow connector.
This page is used to specify all relevant information about the pipeline parameters.
The Analysis Report tab provides options for configuring pipeline execution reports. The report is composed of widgets added to the tab.
The pipeline analysis report appears in the pipeline execution results. The report is configured from widgets added to the Analysis Report tab in the pipeline editor.
[Optional] Import widgets from another pipeline.
Select Import from other pipeline.
Select the pipeline that contains the report you want to copy.
Select an import option: Replace current report or Append to current report.
Select Import.
From the Analysis Report tab, select Add widget, and then select a widget type.
Configure widget details.
Select Save.
The Common Workflow Language main script.
The Nextflow configuration settings.
The Nextflow project main script.
Multiple files can be added to make pipelines more modular and manageable.
Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list. The following formats are supported:
DIFF (.diff)
GROOVY (.groovy .nf)
JAVASCRIPT (.js .javascript)
JSON (.json)
SH (.sh)
SQL (.sql)
TXT (.txt)
XML (.xml)
YAML (.yaml .cwl)
For each process defined by the workflow, ICA will launch a compute node to execute the process.
For each compute type, the standard
(default - AWS on-demand) or economy
(AWS spot instance) tiers can be selected.
When selecting an fpga instance type for running analyses on ICA, it is recommended to use the medium size. While the large size offers slight performance benefits, these do not proportionately justify the associated cost increase for most use cases.
When no type is specified, the default type of compute node is standard-small
.
By default, compute nodes have no scratch space. This is an advanced setting and should only be used when absolutely necessary as it will incur additional costs and may offer only limited performance benefits because it is not local to the compute node.
For simplicity and better integration, consider using shared storage available at /ces
. It is what is provided in the Small/Medium/Large+ compute types. This shared storage is used when writing files with relative paths.
Daemon sets and system processes consume approximately 1CPU and 2GB Mem from the base values shown in the table. Consumption will vary based on the activity of the pod.
* The compute type "fpga-small" is no longer available. Use 'fpga-medium' instead. fpga-large offers little performance benefit at additional cost.
** The transfer size selected is based on the selected storage size for compute type and used during upload and download system tasks.
Use the following instructions to start a new analysis for a single pipeline.
Select a project.
From the project menu, select Flow > Pipelines.
Select the pipeline to run.
Select Start a New Analysis.
Configure analysis settings. See Analysis Properties.
Select Start Analysis.
View the analysis status on the Analyses page.
Requested—The analysis is scheduled to begin.
In Progress—The analysis is in progress.
Succeeded—The analysis is complete.
Failed and Failed Final—The analysis has failed or was aborted.
To end an analysis, select Abort.
To perform a completed analysis again, select Re-run.
The following sections describe the analysis properties that can be configured in each tab.
The Analysis tab provides options for configuring basic information about the analysis.
You can view analysis results on the Analyses page or in the output_folder on the Data page.
Select a project, and then select the Flow > Analyses page.
Select an analysis.
On the Result tab, select an output file.
To preview the file, select the View tab.
Add or remove any user or technical tags, and then select Save.
To download, select Schedule for Download.
View additional analysis result information on the following tabs:
Details—View information on the pipeline configuration.
Logs—Download information on the pipeline process.
Field | Entry |
---|
Field | Entry |
---|
Menu | Description |
---|
Widget | Settings |
---|
Placeholder | Description |
---|
See
Field | Entry |
---|
Draft
Fully editable draft.
Release Candidate
The pipeline is ready for release. Editing is locked but the pipeline can be cloned (top right in the details view) to create a new version.
Released
The pipeline is released. To release a pipeline, all tools of that pipeline must also be in released status. Editing a released pipeline is not possible, but the pipeline can be cloned (top right in the details view) to create a new editable version.
CWL Graphical
Information
Documentation
Definition
Analysis Report
Metadata Model
CWL Code
Information
Documentation
XML Configuration
Metadata Model
workflow.cwl
New File
Nextflow Code
Information
Documentation
XML Configuration
Metadata Model
nextflow.config
main.nf
New File
Code | The name of the pipeline. |
Categories | One or more tags to categorize the pipeline. Select from existing tags or type a new tag name in the field. |
Description | A short description of the pipeline. |
Proprietary | Hide the pipeline scripts and details from users who do not belong to the tenant who owns the pipeline. This also prevents cloning the pipeline. |
Status | The release status of the pipeline. |
Storage size | User selectable storage size for running the pipeline. This must be large enough to run the pipeline, but setting it too large incurs unnecessary costs. |
Family | A group of pipeline versions. To specify a family, select Change, and then select a pipeline or pipeline family. To change the order of the pipeline, select Up or Down. The first pipeline listed is the default and the remainder of the pipelines are listed as Other versions. The current pipeline appears in the list as this pipeline. |
Version comment | A description of changes in the updated version. |
Links | External reference links. (max 100 chars as name and 2048 chars as link) |
ID | Unique Identifier of the pipeline. |
URN | Identification of the pipeline in Uniform Resource Name |
Nextflow Version | User selectable Nextflow version available only for Nextflow pipelines |
Machine profiles | Compute types available to use with Tools in the pipeline. |
Shared settings | Settings for pipelines used in more than one tool. |
Reference files | Descriptions of reference files used in the pipeline. |
Input files | Descriptions of input files used in the pipeline. |
Output files | Descriptions of output files used in the pipeline. |
Tool | Details about the tool selected in the visualization panel. |
Tool repository | A list of tools available to be used in the pipeline. |
Title | Add and format title text. |
Analysis details | Add heading text and select the analysis metadata details to display. |
Free text | Add formatted free text. The widget includes options for placeholder variables that display the corresponding project values. |
Inline viewer | Add options to view the content of an analysis output file. |
Analysis comments | Add comments that can be edited after an analysis has been performed. |
Input details | Add heading text and select the input details to display. The widget includes an option to group details by input name. |
Project details | Add heading text and select the project details to display. |
Page break | Add a page break widget where page breaks should appear between report sections. |
[[BB_PROJECT_NAME]] | The project name. |
[[BB_PROJECT_OWNER]] | The project owner. |
[[BB_PROJECT_DESCRIPTION]] | The project short description. |
[[BB_PROJECT_INFORMATION]] | The project information. |
[[BB_PROJECT_LOCATION]] | The project location. |
[[BB_PROJECT_BILLING_MODE]] | The project billing mode. |
[[BB_PROJECT_DATA_SHARING]] | The project data sharing settings. |
[[BB_REFERENCE]] | The analysis reference. |
[[BB_USERREFERENCE]] | The user analysis reference. |
[[BB_PIPELINE]] | The name of the pipeline. |
[[BB_USER_OPTIONS]] | The analysis user options. |
[[BB_TECH_OPTIONS]] | The analysis technical options. Technical options include the TECH suffix and are not visible to end users. |
[[BB_ALL_OPTIONS]] | All analysis options. Technical options include the TECH suffix and are not visible to end users. |
[[BB_SAMPLE]] | The sample. |
[[BB_REQUEST_DATE]] | The analysis request date. |
[[BB_START_DATE]] | The analysis start date. |
[[BB_DURATION]] | The analysis duration. |
[[BB_REQUESTOR]] | The user requesting analysis execution. |
[[BB_RUNSTATUS]] | The status of the analysis. |
[[BB_ENTITLEMENTDETAIL]] | The used entitlement detail. |
[[BB_METADATA:path]] | The value or list of values of a metadata field or multi-value fields. |
Compute Type | CPUs | Mem (GB) | Nextflow ( | CWL ( |
standard-small | 2 | 8 | standard-small | standard, small |
standard-medium | 4 | 16 | standard-medium | standard, medium |
standard-large | 8 | 32 | standard-large | standard, large |
standard-xlarge | 16 | 64 | standard-xlarge | standard, xlarge |
standard-2xlarge | 32 | 128 | standard-2xlarge | standard, 2xlarge |
hicpu-small | 16 | 32 | hicpu-small | hicpu, small |
hicpu-medium | 36 | 72 | hicpu-medium | hicpu, medium |
hicpu-large | 72 | 144 | hicpu-large | hicpu, large |
himem-small | 8 | 64 | himem-small | himem, small |
himem-medium | 16 | 128 | himem-medium | himem, medium |
himem-large | 48 | 384 | himem-large | himem, large |
himem-xlarge | 96 | 768 | himem-xlarge | himem, xlarge |
hiio-small | 2 | 16 | hiio-small | hiio, small |
hiio-medium | 4 | 32 | hiio-medium | hiio, medium |
|
|
|
|
|
fpga-medium | 16 | 244 | fpga-medium | fpga, medium |
fpga-large | 64 | 976 | fpga-large | fpga, large |
transfer-small ** | 4 | 10 | transfer-small | transfer, small |
transfer-medium ** | 8 | 15 | transfer-medium | transfer, medium |
transfer-large ** | 16 | 30 | transfer-large | transfer, large |
User Reference | The unique analysis name. |
User tags | One or more tags used to filter the analysis list. Select from existing tags or type a new tag name in the field. |
Entitlement Bundle | Select a subscription to charge the analysis to. |
Input Files | Select the input files to use in the analysis. (max. 50,000) |
Settings | Provide input settings. |
Let's create the Nextflow Scatter Gather pipeline with a JSON input form.
Pay close attention to uppercase and lowercase characters when creating pipelines.
Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create pipeline > Nextflow > JSON based button to start creating a Nextflow pipeline.
In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.
First, we present the individual processes. Select Nextflow files > + Create file and label the file split.nf. Copy and paste the following definition.
Next, select +Create file and name the file sort.nf. Copy and paste the following definition.
Select +Create file again and label the file merge.nf. Copy and paste the following definition.
Edit the main.nf file by navigating to the Nextflow files > main.nf tab and copying and pasting the following definition.
Here, the operators flatten and collect are used to transform the emitting channels. The Flatten operator transforms a channel in such a way that every item of type Collection or Array is flattened so that each single entry is emitted separately by the resulting channel. The collect operator collects all the items emitted by a channel to a List and return the resulting object as a sole emission.
On the Inputform files tab, edit the inputForm.json to allow selection of a file.
Click the Simulate button (at the bottom of the text editor) to preview the launch form fields.
The onSubmit.js and onRender.js can remain with their default scripts and are just shown here for reference.
Click the Save
button to save the changes.
Flow provides tooling for building and running secondary analysis pipelines. The platform supports analysis workflows constructed using Common Workflow Language (CWL) and Nextflow. Each step of an analysis pipeline executes a containerized application using inputs passed into the pipeline or output from previous steps.
You can configure the following components in Illumina Connected Analytics Flow:
Tools — Pipeline components that are configured to process data input files. See Tool Repository.
Pipelines — One or more tools configured to process input data and generate output files. See Pipelines.
Analyses — Launched instance of a pipeline with selected input data. See Analyses.
ICA supports running pipelines defined using Nextflow. See this tutorial for an example.
In order to run Nextflow pipelines, the following process-level attributes within the Nextflow definition must be considered.
Info | Details |
---|---|
You can select the Nextflow version while building a pipeline as follows:
interface | |
---|---|
For each compute type, you can choose between the scheduler.illumina.com/lifecycle: standard
(default - AWS on-demand) or scheduler.illumina.com/lifecycle: economy
(AWS spot instance) tiers.
To specify a compute type for a Nextflow process, use the pod
directive within each process. Set the annotation
to scheduler.illumina.com/presetSize
and the value
to the desired compute type. A list of available compute types can be found here. The default compute type, when this directive is not specified, is standard-small
(2 CPUs and 8 GB of memory).
Often, there is a need to select the compute size for a process dynamically based on user input and other factors. The Kubernetes executor used on ICA does not use the cpu
and memory
directives, so instead, you can dynamically set the pod
directive, as mentioned here. e.g.
Additionally, it can also be specified in the configuration file. Example configuration file:
Inputs are specified via the XML input form or JSON-based input form. The specified code
in the XML will correspond to the field in the params
object that is available in the workflow. Refer to the tutorial for an example.
Outputs for Nextflow pipelines are uploaded from the out
directory in the attached shared filesystem. The publishDir
directive can be used to symlink (recommended), copy or move data to the correct folder. Data will be uploaded to the ICA project after the pipeline execution completes.
For Nextflow version 20.10.10 on ICA, using the "copy" method in the publishDir
directive for uploading output files that consume large amounts of storage may cause workflow runs to complete with missing files. The underlying issue is that file uploads may silently fail (without any error messages) during the publishDir
process due to insufficient disk space, resulting in incomplete output delivery.
Workarounds:
Use "symlink" instead of "copy" in the publishDir
directive. Symlinking creates a link to the original file rather than copying it, which doesn’t consume additional disk space. This can prevent the issue of silent file upload failures due to disk space limitations.
Use the latest version of Nextflow supported (22.04.0) and enable the "failOnError" publishDir
option. This option ensures that the workflow will fail and provide an error message if there's an issue with publishing files, rather than completing silently without all expected outputs.
During execution, the Nextflow pipeline runner determines the environment settings based on values passed via the command-line or via a configuration file (see Nextflow Configuration documentation). When creating a Nextflow pipeline, use the nextflow.config tab in the UI (or API) to specify a nextflow configuration file to be used when launching the pipeline.
Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list.
If no Docker image is specified, Ubuntu will be used as default.
The following configuration settings will be ignored if provided as they are overridden by the system:
Pipelines defined using the "Code" mode require either an XML-based or JSON-based input form to define the fields shown on the launch view in the user interface (UI). The XML-based input form is defined in the "XML Configuration" tab of the pipeline editing view.
The input form XML must adhere to the input form schema.
During the creation of a Nextflow pipeline the user is given an empty form to fill out.
The input files are specified within a single DataInputs node. An individual input is then specified in a separate DataInput node. A DataInput node contains following attributes:
code: an unique id. Required.
format: specifying the format of the input: FASTA, TXT, JSON, UNKNOWN, etc. Multiple entries are possible: example below. Required.
type: is it a FILE or a DIRECTORY? Multiple entries are not allowed. Required.
required: is this input required for the execution of a pipeline? Required.
multiValue: are multiple files as an input allowed? Required.
dataFilter: TBD. Optional.
Additionally, DataInput has two elements: label for labelling the input and description for a free text description of the input.
An example of a single file input which can be in a TXT, CSV, or FASTA format.
To use a folder as an input the following form is required:
For multiple files, set the attribute multiValue to true. This will make it so the variable is considered to be of type list [], so adapt your pipeline when changing from single value to multiValue.
Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc. The following hierarchy of nodes must be followed: steps > step > tool > parameter. The parameter node must contain following attributes:
code: unique id. This is the parameter name that is passed to the workflow
minValues: how many values (at least) should be specified for this setting. If this setting is required, minValues
should be set to 1.
maxValues: how many values (at most) should be specified for this setting
classification: is this setting specified by the user?
In the code below a string setting with the identifier inp1 is specified.
Examples of the following types of settings are shown in the subsequent sections. Within each type, the value
tag can be used to denote a default value in the UI, or can be left blank to have no default. Note that setting a default value has no impact on analyses launched via the API.
For an integer setting the following schema with an element integerType is to be used. To define an allowed range use the attributes minimumValue and maximumValue.
Options types can be used to designate options from a drop-down list in the UI. The selected option will be passed to the workflow as a string. This currently has no impact when launching from the API, however.
Option types can also be used to specify a boolean, for example
For a string setting the following schema with an element stringType
is to be used.
For a boolean setting, booleanType
can be used.
One known limitation of the schema presented above is the inability to specify a parameter that can be multiple type, e.g. File or String. One way to implement this requirement would be to define two optional parameters: one for File input and the second for String input. At the moment ICA UI doesn't validate whether at least one of these parameters is populated - this check can be done within the pipeline itself.
Below one can find both a main.nf and XML configuration of a generic pipeline with two optional inputs. One can use it as a template to address similar issues. If the file parameter is set, it will be used. If the str parameter is set but file is not, the str parameter will be used. If neither of both is used, the pipeline aborts with an informative error message.
Pipelines defined using the "Code" mode require an XML or JSON-based input form to define the fields shown on the launch view in the user interface (UI).
To create a JSON-based Nexflow (or CWL) pipeline, go to Projects > your_project > flow>pipelines>create pipeline>Nextflow (or CWL) > JSON-based.
Three files, located on the inputform files tab, work together for evaluating and presenting JSON-based input.
inoutForm.json contains the actual input form which is rendered when staring the pipeline run.
onRender.js is triggered when a value is changed.
onSubmit.js is triggered when submitting a new pipeline execution request via the GUI or API.
Use + Create file to add additional files and Simulate to test your inputForms.
Scripting execution supports crossfield validation of the values, hiding fields, making them required, .... based on value changes.
The json schema allowing you to define the input parameters. See the inputForm.json page for syntax details.
Type | Usage |
---|---|
These attributes can be used to configure all parameter types.
The onSubmit.js javascript function receives an input object which holds information about the chosen values of the input form and the pipeline and pipeline execution request parameters. This javascript function is not only triggered when submitting a new pipeline execution request in the user interface, but also when submitting one through the rest API..
Receives an input object which contains information about the current state of the input form, the chosen values and the field value change that triggered the onrender call. It also contains pipeline information. Changed objects are present in the onRender return value object. Any object not present is considered to be unmodified. Changing the storage size in the start analysis screen triggers an onRender execution with storageSize as changed field.
This is the object used for representing validation errors and warnings. The attributes can be used with first letter lowercase (consistent with the input object attributes) or uppercase.
An Analysis is the execution of a pipeline.
You can start an analysis from both the dedicated analysis screen or from the actual pipeline.
Navigate to Projects > Your_Project > Flow > Analyses.
Select » New Analysis.
Select a single Pipeline.
Configure the analysis settings.
Select » Start Analysis.
Refresh to see the analysis status. See for more information on statuses.
If for some reason, you want to end the analysis before it can complete, select Manage > Abort analyses. Refresh to see the status update.
Navigate to Projects > <Your_Project> > Flow > Pipelines
Select the pipeline to run.
Select » Start New Analysis.
Configure analysis settings.
Select » Start Analysis.
View the analysis status on the Analyses page. See for more information on statuses.
If for some reason, you want to end the analysis before it can complete, select Manage > Abort analyses on the Analyses page.
Once an analysis has been executed, you can rerun it with the same settings or choose to modify the parameters when rerunning. Modifying the parameters is possible on a per-analysis basis. When selecting multiple analyses at once, they will be executed with the original parameters. Draft pipelines are subject to updates and thus can result in a different outcome when rerunning. ICA will display a warning message to inform you of this when you try to rerun an analysis based on a draft pipeline.
When there is an XML configuration change on a a pipline for which you want to rerun an analysis, ICA will display a warning and not fill out the parameters as it cannot guarantee their validity for the new XML.
Some restrictions apply when trying to rerun an analysis.
To rerun one or more analyses with te same settings:
Navigate to Projects > Your_Project > Flow > Analyses.
In the overview screen, select one or more anaylses.
Select Manage > Rerun analyses. The analyses will now be executed with the same parameters as their original run.
To rerun a single analysis with modified parameters:
Navigate to Projects > Your_Project > Flow > Analyses.
In the overview screen, open the details of the analysis you want to rerun by clicking on the analysis user reference.
Select Rerun. (at the top right)
Update the parameters you want to change.
Select Start Analysis The analysis will now be executed with the updated parameters.
When an analysis is started, the availability of resources may impact the start time of the pipeline or specific steps after execution has started. Analyses are subject to delay when the system is under high load and the availability of resources is limited.
During the execution of an analysis, logs are produced for each process involved in the analysis lifecyle. In the analysis details view, the Logs tab is used to view the logs in near real time as they're produced in the running processes. A grid layout is used for analyses with more than 50 steps, a tiled view for analyses with 50 steps or less, though you can choose to also use the grid layout for those by means of the tile/grid button on the top right of the analysis log tab.
There are system processes involved in the lifeycle for all analyses (ie. downloading inputs, uploading outputs, etc.) and there are processes which are pipeline-specific, such as processes which execute the pipeline steps. The below table describes the system processes.
Additional log entries will show for the processes which execute the steps defined in the pipeline.
Each process shows as a distinct entry in the Logs view with a Queue Date, Start Date, and End Date.
The time between the Start Date and the End Date is used to calculate the duration. The time of the duration is used to calculate the usage-based cost for the analysis. Because this is an active calculation, sorting on this field is not supported.
Each log entry in the Logs view contains a checkbox to view the stdout and stderr log files for the process. Clicking a checkbox adds the log as a tab to the log viewer where the log text is displayed and made available for download.
In the analysis output folder, the ica_logs subfolder will contain the stdout and stderr files. If you delete these files, no log information will be available on the analysis details > logs tab.
Logs can also be streamed using websocket client tooling. The API to retrieve analysis step details returns websocket URLs for each step to stream the logs from stdout/stderr during the step's execution. Upon completion, the websocket URL is no longer available.
Currently, this feature is only availabe when launching analyses via API.
Currently, only FOLDER type output mappings are supported
By default, analysis outputs are directed to a new folder within the project where the analysis is launched. Analysis output mappings may be specified to redirect outputs to user-specified locations consisting of project and path. An output mapping consists of:
the source path on the local disk of the analysis execution environment, relative to the working directory
the data type, either FILE or FOLDER
the target project ID to direct outputs to; analysis launcher must have contributor access to the project
the target path relative to the root of the project data to write the outputs
Use the example analysis output mapping below for guidance.
If the output directory already exists, any existing contents with the same filenames as those output from the pipeline will be overwritten by the new analysis
In this example, 2 analysis output mappings are specified. The analysis writes data during execution in the working directory at paths out/test
and out/test2
. The data contained in these folders are directed to project with ID 4d350d0f-88d8-4640-886d-5b8a23de7d81
and at paths /output-testing-01/
and /output-testing-02/
respectively, relative to the root of the project data.
The following demonstrates the construction of the request body to start an analysis with the output mappings described above:
When the analysis completes the outputs can be seen in the ICA UI, within the folders designated in the payload JSON during pipeline launch (output-testing-01
and output-testing-02
).
You can add and remove tags from your analyses.
Navigate to Projects > Your_Project > Flow > Analyses.
Select the analyses whose tags you want to change.
Select Manage > Manage tags.
Edit the user tags, reference data tags (if applicable) and technical tags.
Select Save to confirm the changes.
If you want to share a link to an analysis, you can copy and paste the URL from your browser when you have the analysis open. The syntax of the analysis link will be <hostURL>/ica/link/project/<projectUUID>/analysis/<analysisUUID>
. Likewise, workflow sessions will use the syntax <hostURL>/ica/link/project/<projectUUID>/workflowSession/<workflowsessionUUID>
. To prevent third parties from accessing data via the link when it is shared or forwarded, ICA will verify the access rights of every users when they open the link.
Input for analysis is limited to a total of 50,000 files (including multiple copies of the same file). You can have up to 50 concurrent analyses running per tenant. Additional analyses will be queued and scheduled when currently running analyses complete and free up positions.
Analysis failure due to an external event such as spot termination or node draining, will result in exit code 55. Retry the analysis.
Developing on the cloud incurs inherent runtime costs due to compute and storage used to execute workflows. Here are a few tips that can facilitate development.
Leverage the cross-platform nature of these workflow languages. Both CWL and Nextflow can be run locally in addition to on ICA. When possible, testing should be performed locally before attempting to run in the cloud. For Nextflow, can be utilized to specify settings to be used either locally or on ICA. An example of advanced usage of a config would be applying the to a set of process names (or labels) so that they use the higher performance local scratch storage attached to an instance instead of the shared network disk,
When trying to test on the cloud, it's oftentimes beneficial to create scripts to automate the deployment and launching / monitoring process. This can be performed either using the or by creating your own scripts integrating with the REST API.
For scenarios in which instances are terminated prematurely (for example, while using spot instances) without warning, you can implement scripts like the following to retry the job a certain number of times. Adding the following script to 'nextflow.config' enables five retries for each job, with increasing delays between each try.
Note: Adding the retry script where it is not needed might introduce additional delays.
When hardening a Nextflow to handle resource shortages (for example exit code 2147483647), an immediate retry will in most circumstances fail because the resources have not yet been made available. It is best practice to use which has an increasing backoff delay, allowing the system time to provide the necessary resources.
When publishing your Nextflow pipeline, make sure your have defined a container such as 'public.ecr.aws/lts/ubuntu:22.04' and are not using the default container 'ubuntu:latest'.
To limit potential costs, there is a timeout of 96 hours: if the analysis does not complete within four days, it will go to a 'Failed' state. This time begins to count as soon as the input data is being downloaded. This takes place during the ICA 'Requested' step of the analysis, before going to 'In Progress'. In case parallel tasks are executed, running time is counted once. As an example, let's assume the initial period before being picked up for execution is 10 minutes and consists of the request, queueing and initializing. Then, the data download takes 20 minutes. Next, a task runs on a single node for 25 minutes, followed by 10 minutes of queue time. Finally, three tasks execute simultaneously, each of them taking 25, 28, and 30 minutes, respectively. Upon completion, this is followed by uploading the outputs for one minute. The overall analysis time is then 20 + 25 + 10 + 30 (as the longest task out of three) + 1 = 86 minutes:
If there are no available resources or your project priority is low, the time before download commences will be substantially longer.
By default, Nextflow will not generate the trace report. If you want to enable generating the report, add the section below to your userNextflow.config file.
Useful Links
Attribute | Purpose |
---|---|
Value | Meaning |
---|---|
Value | Meaning |
---|---|
Value | Meaning |
---|---|
Value | Meaning |
---|---|
Value | Meaning |
---|---|
Analyses | Rerun | Rerun with modified parameters |
---|
Status | Description | Final State |
---|
Process | Description |
---|
Timestamp | Description |
---|
Nextflow version
20.10.0 (default), 22.04.3
Executor
Kubernetes
GUI
Select the Nextflow version at Projects > your_project > flow > pipelines > your_pipeline > Details tab.
API
Select the Nextflow version by setting it in the optional field "pipelineLanguageVersionId
".
When not set, a default Nextflow version will be used for the pipeline.
text
To display informational messages. No values are to be assigned to these fields.
section
For splitting up fields, to give structure. Rendered as subtitles. No values are to be assigned to these fields.
number
The value is of Number type in javascript and Double type in java. (corresponds to doubleType in xml).
integer
Corresponds to java Integer.
textbox
Corresponds to stringType in xml.
checkbox
A checkbox that supports the option of being required, so can serve as an active consent feature. (corresponds to the booleanType in xml).
select
A dropdown selection to select one from a list of choices.
radio
A radio button group to select one from a list of choices.
data
Data such as files.
fieldgroup
Can contain parameters or other groups. Allows to have repeating sets of parameters, for instance when a father|mother|child choice needs to be linked to each file input. So if you want to have the same elements multiple times in your form, combine them into a fieldgroup.
label
The display label for this parameter. Optional but recommended, id will be used if missing.
minValues
The minimal amount of values that needs to be present. Default when not set is 0. Set to >=1 to make the field required.
maxValues
The maximal amount of values that need to be present. Default when not set is 1.
minMaxValuesMessage
The error message displayed when minValues or maxValues is not adhered to. When not set, a default message is generated.
helpText
A helper text about the parameter. Will be displayed in smaller font with the parameter.
placeHolderText
An optional short hint ( a word or short phrase) to aid the user when the field has no value.
value
The value of the parameter. Can be considered default value.
minLength
Only applied on type="textbox". Value is a positive integer.
maxLength
Only applied on type="textbox". Value is a positive integer.
min
Applied on type="number" and type="integer". Value must correspond to the primitive type.
max
Applied on type="number" and type="integer". Value must correspond to the primitive type.
choices
A list of choices with for each a "value", "text" (is label), "selected" (only 1 true supported), "disabled". "parent" can be used to build hierarchical choicetrees. "availableWhen" can be used for conditional presence of the choice based on values of other fields.
fields
The list of sub fields for type fieldgroup.
dataFilter
For defining the filtering when type is 'data'. nameFilter, dataFormat and dataType are additional properties.
regex
The regex pattern the value must adhere to. Only applied on type="textbox".
regexErrorMessage
The optional error message when the value does not adhere to the "regex". A default message will be used if this parameter is not present. It is highly recommended to set this as the default message will show the regex which is typically very technical.
hidden
Makes this parameter hidden. Can be made visible later in onRender.js or can be used to set hardcoded values of which the user should be aware.
disabled
Shows the parameter but makes editing it impossible. The value can still be altered by onRender.js.
emptyValuesAllowed
When maxValues is 1 or not set and emptyValuesAllowed is true, the values may contain null entries. Default is false.
updateRenderOnChange
When true, the onRender javascript function is triggered each time the user changes the value of this field. Default is false.
settings
The value of the setting fields. Corresponds to settingValues
in the onRender.js. This is a map with field id as key and an array of field values as value. For convenience, values of single-value fields are present as the individual value and not as an array of length 1. In case of fieldGroups, the value can be multiple levels of arrays.
settingValues
To maximize the opportunity for reusing code between onRender and onSubmit, the 'settings' are also exposed as settingValues
like in the onRender input.
pipeline
Info about the pipeline: code, tenant, version, and description are all available in the pipeline object as string.
analysis
Info about this run: userReference, userName, and userTenant are all available in the analysis object as string.
storageSize
The storage size as chosen by the user. This will initially be null. StorageSize is an object containing an 'id' and 'name' property.
storageSizeOptions
The list of storage sizes available to the user when creating an analysis. Is a list of StorageSize objects containing an 'id' and 'name' property.
settings
The value of the setting fields. This allows modifying the values or applying defaults and such. Or taking info of the pipeline or analysis input object. When settings are not present in the onSubmit return value object, they are assumed to be not modified.
validationErrors
A list of AnalysisError essages representing validation errors. Submitting a pipeline execution request is not possible while there are still validation errors.
fieldId / FieldId
The field which has an erroneous value. When not present, a general error/warning is displayed. To display an error on the storage size, use the storageSize
Fieldid.
index / Index
The 0-starting index of the value which is incorrect. Use this when a particular value of a multivalue field is not correct. When not present, the entire field is marked as erroneous. The value can also be an array of indexes for use with fieldgroups. For instance, when the 3rd field of the 2nd instance of a fieldgroup is erroneous, a value of [ 1 , 2 ] is used.
message / Message
The error/warning message to display.
context
"Initial"/"FieldChanged"/"Edited".
Initial is the value when first displaying the form when a user opens the start run screen.
The value is FieldChanged when a field with 'updateRenderOnChange'=true
is changed by the user.
Edited (Not yet supported in ICA) is used when a form is displayed later again, this is intended for draft runs or when editing the form during reruns.
changedFieldId
The id of the field that changed and which triggered this onRender call. context will be FieldChanged
. When the storage size is changed, the fieldId will be storageSize
.
analysisSettings
The input form json as saved in the pipeline. This is the original json, without changes.
currentAnalysisSettings
The current input form json as rendered to the user. This can contain already applied changes form earlier onRender passes. Null in the first call, when context is Initial
.
settingValues
The current value of all settings fields. This is a map with field id as key and an array of field values as value for multivalue fields. For convenience, values of single-value fields are present as the individual value and not as an array of length 1. In case of fieldGroups, the value can be multiple levels of arrays.
pipeline
Information about the pipeline: code, tenant, version, and description are all available in the pipeline object as string.
analysis
Information about this run: userReference, userName, and userTenant are all available in the analysis object as string.
storageSize
The storage size as chosen by the user. This will initially be null. StorageSize is an object containing an 'id' and 'name' property.
storageSizeOptions
The list of storage sizes available to the user when creating an analysis. Is a list of StorageSize objects containing an 'id' and 'name' property.
analysisSettings
The input form json with potential applied changes. The discovered changes will be applied in the UI.
settingValues
The current, potentially altered map of all setting values. These will be updated in the UI.
validationErrors
A list of RenderMessages representing validation errors. Submitting a pipeline execution request is not possible while there are still validation errors.
validationWarnings
A list of RenderMessages representing validation warnings. A user may choose to ignore these validation warnings and start the pipeline execution request.
storageSize
The suitable value for storageSize. Must be one of the options of input.storageSizeOptions. When absent or null, it is ignored.
fieldId / FieldId
The field which has an erroneous value. When not present, a general error/warning is displayed. To display an error on the storage size, use the storageSize
Fieldid.
index / Index
The 0-starting index of the value which is incorrect. Use this when a particular value of a multivalue field is not correct. When not present, the entire field is marked as erroneous. The value can also be an array of indexes for use with fieldgroups. For instance, when the 3rd field of the 2nd instance of a fieldgroup is erroneous, a value of [ 1 , 2 ] is used.
message / Message
The error/warning message to display.
Analyses using external data | Allowed | - |
Analyses using mount paths on input data | Allowed | - |
Analyses using user-provided input json | Allowed | - |
Analyses using advanced output mappings | - | - |
Analyses with draft pipeline | Warn | Warn |
Analyses with XML configuration change | Warn | Warn |
Requested | The request to start the Analysis is being processed | No |
Queued | Analysis has been queued | No |
Initializing | Initializing environment and performing validations for Analysis | No |
Preparing Inputs | Downloading inputs for Analysis | No |
In Progress | Analysis execution is in progress | No |
Generating outputs | Transferring the Analysis results | No |
Aborting | Analysis has been requested to be aborted | No |
Aborted | Analysis has been aborted | Yes |
Failed | Analysis has finished with error | Yes |
Succeeded | Analysis has finished with success | Yes |
Setup Environment | Validate analysis execution environment is prepared |
Run Monitor | Monitor resource usage for billing and reporting |
Prepare Input Data | Download and mount input data to the shared file system |
Pipeline Runner | Parent process to execute the pipeline definition |
Finalize Output Data | Upload Output Data |
Queue Date | The time when the process is submitted to the processes scheduler for execution |
Start Date | The time when the process has started exection |
End Date | The time when the process has stopped execution |
Analysis task | request | queued | initializing | input download | single task | queue | parallel tasks | generating outputs | completed |
96 hour limit | 1m (not counted) | 7m (not counted) | 2m (not counted) | 20m | 25m | 10m | 30m | 1m | - |
Status in ICA | status requested | status queued | status initializing | status preparing inputs | status in progress | status in progress | status in progress | status generating outputs | status succeeded |