Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
New users may reference the Illumina Connected Software Registration Guide for detailed guidance on setting up an account and registering a subscription.
The platform requires a provisioned tenant in the Illumina account management system with access to the Illumina Connected Analytics (ICA) application. Once a tenant has been provisioned, a tenant administrator will be assigned. The tenant administrator has permission to manage account access including add users, create workgroups, and add additional tenant administrators.
Each tenant is assigned a domain name used to login to the platform. The domain name is used in the login URL to navigate to the appropriate login page in a web browser. The login URL is https://<domain>.login.illumina.com
, where <domain>
is substituted with the domain name assigned to the tenant.
New user accounts can be created for a tenant by navigating to the domain login URL and following the links on the page to setup a new account with a valid email address. Once the account has been added to the domain, the tenant administrator may assign registered users to workgroups with permission to use the ICA application. Registered users may also be made workgroup administrators by tenant administrators or existing workgroup administrators.
For more details on identity and access management, see the Account Management
To access the APIs using the command-line interface (CLI), an API Key may be provided as credentials when logging in. API Keys operate similar to a user name and password and should be kept secure and rotated on a regular basis (preferably yearly). When keys are compromised or no longer in use, they must be revoked. This is done through the domain login URL by navigating to the profile drop down and selecting "Manage API Keys", followed by selecting the key and using the trash icon next to it.
For security reasons, it is best practice to not use accounts with administrator level access to generate API keys and instead create a specific CLI user with basic permission. This will minimize the possible impact of compromised keys.
For long-lived credentials to the API, an API Key can be generated from the account console and used with the API and command-line interface. Each user is limited to 10 API Keys. API Keys are managed through the product dashboard after logging in through the domain login URL by navigating to the profile drop down and selecting "Manage API Keys".
Click the button to generate a new API Key. Provide a name for the API Key. Then choose to either include all workgroups or select the workgroups to be included. Selected workgroups will be accessible with the API Key.
Click to generate the API Key. The API Key is then presented (hidden) with a button to show the key to be copied and a link to download to a file to be stored securely for future reference. Once the window is closed, the key contents will not be accessible through the domain login page, so be sure to store it securely for future reference if needed.
After generating an API key, save the key somewhere secure to be referenced when using the command-line interface or APIs.
The web application provides a visual user interface (UI) for navigating resources in the platform, managing projects, and extended features beyond the API. To access the web application, navigate to the Illumina Connected Analytics portal.
On the left, you have the navigation bar which will auto-collapse on smaller screens. When collapsed, use the ≡ symbol to expand it.
The central part of the display is the item on which you are performing your actions.
At the top right, you have icons to refresh the screen for information, status updates, and access to the online help.
The command-line interface offers a developer-oriented experience for interacting with the APIs to manage resources and launch analysis workflows. Find instructions for using the command-line interface including download links for your operating system in the CLI documentation.
The HTTP-based application programming interfaces (APIs) are listed in the API Reference section of the documentation. The reference documentation provides the ability to call APIs from the browser page and shows detailed information about the API schemas. HTTP client tooling such as Postman or cURL can be used to make direct calls to the API outside of the browser.
When accessing the API using the API Reference page or through REST client tools, the
Authorization
header must be provided with the value set toBearer <token>
where<token>
is replaced with a valid JSON Web Token (JWT). For generating a JWT, see JSON Web Token (JWT).
The object data models for resources that are created in the platform include a unique id
field for identifying the resource. These fixed machine-readable IDs are used for accessing and modifying the resource through the API or CLI, even if the resource name changes.
Accessing the platform APIs requires authorizing calls using JSON Web Tokens (JWT). A JWT is a standardized trusted claim containing authentication context. This is a primary security mechanism to protect against unauthorized cross-account data access.
A JWT is generated by providing user credentials (API Key or username/password) to the token creation endpoint. Token creation can be performed using the API directly or the CLI.
The event log shows an overview of system events with options to search and filter. For every entry, it lists the following:
Event date and time
Category (error, warn or info)
Code
Description
Tenant
Up to 200,000 results will be be returned. If your desired records are outside the range of the returned records, please refine the filters or use the search function at the top right.
Export is restricted to the amount of entries shown per page. You can use the selector at the bottom to set this to up to 1000 entries per page.
To enable logging into the platform using your organization's identity provider (IDP), a SAML configuration may be provided in your account settings.
Must be configured with an tenant administrator account for your Illumina enterprise domain
Access to your IDP system to configure the Illumina Service Provider
Your IDP configurations
Metadata XML
SAML Attributes for EmailId, firstName, LastName
Login to the IAM Console through the domain login URL and navigate to the "IAM Console" application. The IAM console can also be accessed directly at https://<domain>.login.illumina.com/iam
.
Navigate to the domain tab and choose the authentication menu item on the left pane. Change the Authentication Configuration to "SAML".
Upload your IDP Metadata XML file.
Register the Illumina Service Provider (SP) in your IDP system by downloading the illumina_sp.xml
file.
Enter the relevant IDP/SAML attributes (Contact your organization's technical support team for these details).
Allow 15 minutes for the Illumina Service Provider to update with the provided information. To confirm SAML configuration changes, go to the domain login URL https://<domain>.login.illumina.com
. This should now redirect to the configured IDP login page.
Learn how to and navigate .
The and documentation pages match navigation within ICA. We also offer supporting documentation for popular topics like , , and .
For more content on topics like , , , and other resources, view the section.
User and group roles and permissions are managed in the identity and access management (IAM) console, accessible through the product dashboard after logging in through the domain login URL. After logging in, select the "IAM Console" from the list of applications.
The entities in the IAM to which tenant users may be assigned are:
To add/promote users to a tenant admin, navigate to the IAM console and select "Manage Domain Access". Provide your credentials again and select "User Management" and then "Administrators" from the left hand menu. Input the email address of the new tenant admin and fill out the form.
Workgroups can be created by tenant administrators through the IAM console. To create a workgroup, click the button to create a new workgroup on the IAM console dashboard.
Provide a workgroup name, description, and administrator email. Optionally choose to enable collaborators outside of the domain to add users from other domains to the workgroup.
Users can be added to a workgroup by tenant administrators or the workgroup's administrators. A workgroup can contain an unlimited number of users.
Open the IAM Console application
Select a workgroup in the Dashboard
Select "Users" from the left pane and click the Invite button.
In the Invite new user dialog box, enter the email addresses for the users you want to add. Enter one address per line or as a comma-separated list. Invitations are blocked if the email domain is not included in the domain whitelist. Ensure the "Has Access" menu item is selected from the product access drop down for the Illumina Connected Analytics.
Has Access - The user has access to Illumina Connected Analytics through the workgroup
No Access - The user does not have access to Illumina Connected Analytics through the workgroup
❗ To allow users to perform instrument run setup and data streaming from BSSH, they must also be granted the "Has Access" role for the BaseSpace Sequence Hub product.
Select Grant access. The invited user(s) receives an email invitation and a dashboard notification.
A Tool is the definition of a containerized application with defined inputs, outputs, and execution environment details including compute resources required, environment variables, command line arguments, and more.
Tools define the inputs, parameters, and outputs for the analysis. Tools are available for use in graphical CWL pipelines by any project in the account.
Select System Settings > Tool Repository > + New tool.
Configure tool settings in the tool properties tabs. See Tool Properties.
Select Save.
The following sections describe the tool properties that can be configured in each tab.
Refer to the for further explanation about many of the properties described below. Not all features described in the specification are supported.
Field | Entry |
---|
Tool Status
The release status of the tool. can be one of "Draft", "Release Candidate", "Released" or "Deprecated".
The Documentation tab provides options for configuring the HTML description for the tool. The description appears in the Tool Repository but is excluded from exported CWL definitions.
The General Tool tab provides options to configure the basic command line.
The Hints/Requirements include CWL features to indicate capabilities expected in the Tool's execution environment.
Inline Javascript
The Tool contains a property with a JavaScript expression to resolve it's value.
Initial workdir
The workdir can be any of the following types:
String or Expression — A string or JavaScript expression, eg, $(inputs.InputFASTA)
File or Dir — A map of one or more files or directories, in the following format: {type: array, items: [File, Directory]}
Dirent — A script in the working directory. The Entry name field specifies the file name.
Scatter feature — Indicates that the workflow platform must support the scatter
and scatterMethod
fields.
The Tool Arguments tab provides options to configure base command parameters that do not require user input.
Tool arguments may be one of two types:
String or Expression — A literal string or JavaScript expression, eg --format=bam.
Binding — An argument constructed from the binding of an input parameter.
The following table describes the argument input fields.
Example
The Tool Inputs tab provides options to define the input files and directories for the tool. The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.
The Tool Settings tab provides options to define parameters that can be set at the time of execution. The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.
The Tool Outputs tab provides options to define the parameters of output files.
The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.
The Tool CWL tab displays the complete CWL code constructed from the values entered in the other tabs. the CWL code automatically updates when changes are made in the tool definition tabs, and any changes to the CWL code are reflected in the tool definition tabs.
❗️ Modifying data within the CWL editor can result in invalid code.
From the System Settings > Tool Repository page, select a tool.
Select Edit.
From the System Settings > Tool Repository page, select a tool.
Select the Information tab.
From the Status drop-down menu, select a status.
Select Save.
In addition to the interactive Tool builder, the platform GUI also supports working directly with the raw definition when developing a new Tool. This provides the ability to write the Tool definition manually or bring an existing Tool's definition to the platform.
A simple example CWL Tool definition is provided below.
When creating a new Tool, navigate to System Settings > Tool Repository > your_tool > Tool CWL tab to show the raw CWL definition. Here a CWL CommandLineTool definition may be pasted into the editor. After pasting into the editor, the definition is parsed and the other tabs for visually editing the Tool will populate according to the definition contents.
General Tool - includes your base command and various optional configurations.
The base command is required for your tool to run, e.g. python /path/to/script.py
such that python
and /path/to/script.py
are added in separate lines.
Inline Javascript requirement - must be enabled if you are using Javascript anywhere in your tool definition.
Initial workdir requirement - Dirent Type
Your tool must point to a script that executes your analysis. That script can either be provided in your Docker image or using a Dirent. Defining a script via Dirent allows you to dynamically modify your script without updating your Docker image. In order to define your Dirent script define your script name under Entry name
(e.g. runner.sh
) and the script content under Entry
. Then, point your base command to that custom script, e.g. bash runner.sh
.
❗ What's the difference between Settings and Arguments?
Settings are exposed at the pipeline level with the ability to get modified at launch, while Arguments are intended to be immutable and hidden from users launching the pipeline.
How to reference your tool inputs and settings throughout the tool definition?
You can either reference your inputs using their position or ID.
Settings can be referenced using their defined IDs, e.g. $(inputs.InputSetting)
All inputs can also be referenced using their position, e.g. bash script.sh $1 $2
Entity | Description |
---|---|
Status | Description |
---|
Field | Entry |
---|
Field | Entry | Type |
---|
Field | Value |
---|
Field | Entry |
---|
Field | Entry |
---|
Field | Entry |
---|
File/Directory inputs can be referenced using their defined IDs, followed by the desired field, e.g. $(inputs.InputFile.path)
. For additional information please refer to the .
Tenant Admin
Read/write access to all resources created by users in the tenant. Manage tenant and workgroup membership.
Workgroup Admin
Read/write access to all resources created by users in the workgroup. Manage workgroup membership.
Workgroup User
Read/write access to all resources created by users in the workgroup.
Draft | Fully editable draft. |
Release Candidate | The tool is ready for release. Editing is locked but the tool can be cloned to create a new version. |
Released | The tool is released. Tools in this state cannot be edited. Editing is locked but the tool can be cloned to create a new version. |
Deprecated | The tool is no longer intended for use in pipelines. but there are no restrictions placed on the tool. That is, it can still be added to new pipelines and will continue to work in existing pipelines. It is merely an indication to the user that the tool should no longer be used. |
ID | CWL identifier field |
CWL version | The CWL version in use. This field cannot be changed. |
Interpreter version | The interpreter version in use. |
Base command | Components of the command. Each argument must be added in a separate line. |
Standard in | The name of the file that captures Standard In (STDIN) stream information. |
Standard out | The name of the file that captures Standard Out (STDOUT) stream information. |
Standard error | The name of the file that captures Standard Error (STDERR) stream information. |
Requirements | The requirements for triggering an error message. |
Hints | The requirements for triggering a warning message. |
Value | The literal string to be added to the base command. | String or expression |
Position | The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added. | Binding |
Prefix | The string prefix. | Binding |
Item separator | The separator that is used between array values. | Binding |
Value from | The source string or JavaScript expression. | Binding |
Separate | The setting to require the Prefix and Value from fields to be added as separate or combined arguments. Tru indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument. | Binding |
Shell quote | The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually. | Binding |
Prefix |
|
Value from |
|
Input file |
|
Output file |
|
ID | The file ID. |
Label | A short description of the input. |
Description | A long description of the input. |
Type | The input type, which can be either a file or a directory. |
Input options | Checkboxes to add the following options. Optional indicates the input is optional. Multi value indicates there is more than one input file or directory. Streamable indicates the file is read or written sequentially without seeking. |
Secondary files | The required secondary files or directories. |
Format | The input file format. |
Position | The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added. |
Prefix | The string prefix. |
Item separator | The separator that is used between array values. |
Value from | The source string or JavaScript expression. |
Load contents | The setting to require the Prefix and Value from fields to be added as separate or combined arguments. True indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument. |
Separate | The setting to require the Prefix and Value from fields to be added as separate or combined arguments. True indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument. |
Shell quote | The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually. |
ID | The file ID. |
Label | A short description of the input. |
Description | A long description of the input. |
Default Value | The default value to use if the tool setting is not available. |
Type | The input type, which can be Boolean, Int, Long, Float, Double or String. |
Input options | Checkboxes to add the following options. Optional indicates the input is optional. Multi value indicates there can be more than one value for the input. |
Position | The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added. |
Prefix | The string prefix. |
Item separator | The separator that is used between array values. |
Value from | The source string or JavaScript expression. |
Separate | The setting to require the Prefix and Value from fields to be added as separate or combined arguments. True indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument. |
Shell quote | The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually. |
ID | The file ID. |
Label | A short description of the input. |
Description | A long description of the input. |
Default Value | A long description of the input. |
Type | The input type, which can be either a file or a directory. |
Output options | Checkboxes to add the following options. Optional indicates the input is optional. Multi value indicates here is more than one input file or directory. Streamable indicates the file is read or written sequentially without seeking. |
Secondary files | The required secondary files or directories. |
Format | The input file format. |
Globs | The pattern for searching file names. |
Load contents | Automatically loads some contents. The system extracts up to the first 64 KiB of text from the file. Populates the contents field with the first 64 KiB of text from the file. |
Output eval | Evaluate an expression to generate the output value. |
Name | The name of the tool. |
Categories | One or more tags to categorize the tool. Select from existing tags or type a new tag name in the field. |
Icon | The icon for the tool. |
Description | Free text description for information purposes. |
Status | The release status of the tool. |
Docker image | The registered Docker image for the tool. |
Regions | The regions supported by linked Docker image. |
Tool version | The version of the tool specified by the end user. Could be any string. |
Release version | The version number of the tool. |
Family | A group of tools or tool versions. |
Version comment | A description of changes in the updated version. |
Links | External reference links. |
You can verify the integrity of the data with the MD5 (Message Digest Algorithm 5) checksum. It is a widely used cryptographic hash function that generates a fixed-size, 128-bit hash value from any input data. This hash value is unique to the content of the data, meaning even a slight change in the data will result in a significantly different MD5 checksum.
For files smaller than 16 MB, you can directly retrieve the MD5 checksum using our API endpoints. Make an API GET call to the https://ica.illumina.com/ica/rest/api/projects/{projectId}/data/{dataId} endpoint specifying the data Id you want to check and the corresponding project ID. The response you receive will be in JSON format, containing various file metadata. Within the JSON response, look for the objectETag field. This value is the MD5 checksum for the file you have queried. You can compare this checksum with the one you compute locally ot ensure the file's integrity.
For larger files, the process is different due to computation limitations. In these cases, we recommend using a dedicated pipeline on our platform to explicitly calculate the MD5 checksum. Below you can find both a main.nf file and the corresponding XML for a possible Nextflow pipeline to calculate the MD5 checksum for FASTQ files.
You can use samples to group information related to a sample, including input files, output files, and analyses.
To add a new sample, do as follows.
Select Projects > your_project > Samples.
To add a new sample, select + New Sample, and then enter a unique name and description for the sample.
To include files related to the sample, select + Add data to sample.
Your sample is added to the Samples page. To view information on the sample, select the sample, and then select Open Details.
You can add additional files to a sample after creating the sample. Any files that are not currently included in a sample are listed on the Unlinked Files tab.
To add an unlinked file to a sample, do as follows.
Go to Projects > your_project > Samples > Unlinked files tab.
Select a file or files, and then select one of the following options:
Create sample from selection — Create a new sample that includes the selected files.
Link to existing sample — Select an existing sample in the project to link the file to.
Alternatively, you can add unlinked files from the sample details.
Going to Projects > your_project > Samples > your_sample.
Double click your sample to open the details.
The last section of the details is files, where you select + Add data to sample.
To remove files from samples,
Go to Projects > your_project > Samples > your_sample > Details.
Go to the files section and open the file details of the file you want to remove.
Select Remove data from sample.
Save your changes.
A Sample can be linked to a project from a separate project to make it available in read-only capacity.
Navigate to the Samples view in the Project
Click the Link Sample button
Select the Sample(s) to link to the project
Click the Link Samples button
Data linked to Samples is not automatically linked to the project. The data must be linked separately from the Data view. Samples also must be available in a complete
state in order to be linked.
When looking at the main ICA navigation, you will see the following structure:
Projects are your primary work locations which contain your data and tools to execute your analyses. Projects can be considered as a binder for your work and information. You can have data contained within a project, or you can choose to make it shareable between projects.
Reference Data are reference genome sets which you use to help look for deviations and to compare your data against.
Bundles are packages of assets such as sample data, pipelines, tools and templates which you can use as a curated data set. Bundles can be provided both by Illumina and other providers, and you can even create your own bundles. You will find the Illumina-provided pipelines in bundles.
Audit/Event Logs are used for audit purposes and issue resolving.
System Settings contain general information susch as the location of storage space, docker images and tool repositories.
Projects are the main dividers in ICA. They provide an access-controlled boundary for organizing and sharing resources created in the platform. The Projects view is used to manage projects within the current tenant.
Note that there is a combined limit of 30,000 projects and bundles per tenant.
To create a new project, click the Projects > + Create Project button.
On the project creation screen, add information to create a project. See Project Details page for information about each field.
Required fields include:
Name
1-255 characters
Must begin with a letter
Characters are limited to alphanumerics, hyphens, underscores, and spaces
Analysis Priority (Low/Medium(default)/High) This is balanced per tenant with high priority analyses started first and the system progressing to the next lower priority once all higher priority analyses are running. Balance your priorities so that lower priority projects do not remain waiting for resources indefinitely.
Project Owner Owner (and usually contact person) of the project. The project owner has the same rights as a project administrator, but can not be removed from a project without first assigning another project owner. This can be done by the current project owner, the tenant administrator or a project administrator of the current project. Reassignment is done at Projects > your_project > Project Settings > Details > Edit.
Project Location Select your project location. Options available are based on Entitlement(s) associated with purchased subscription.
Storage Bundle (auto-selected based on user selection of Project Location)
Click the Save button to finish creating the project. The project will be visible from the Projects view.
Refer to the Storage Configuration documentation for details on creating a storage configuration.
During project creation, select the I want to manage my own storage checkbox to use a Storage Configuration as the data provider for the project.
With a storage configuration set, a project will have a 2-way sync with the external cloud storage provider: any data added directly to the external storage will be sync'ed into the ICA project data, and any data added to the project will be sync'ed into the external cloud storage.
Several tools are available to assist you with keeping an overview of your projects. These filters work in both list and tile view and persist across sessions.
Searching is a case-insensitive wildcard filter. Any project which contains the characters will be shown. Use * as wildcard in searches. Be aware that operators without search words are blocked and will result in Unexpected error occurred when searching for projects. You can use the brackets, AND, OR and NOT operators, provided that you do not start the search with them (Monkey AND Banana is allowed, AND Aardvark by itself is invalid syntax)
Filter by Workgroup : Projects in ICA can be accessible for different workgroups. This drop-down list allows you to filter projects for specific workgroups. To reset the filter so it displays projects from all your workgroups, use the x on the right which appears when a workgroup is selected.
Hidden projects : You can hide projects (Projects > your_project > Details > Hide) which you no longer use. Hiding will delete data in base and bench. You can still see hidden projects if you select this option and delete the data they contain at Projects > your_project > Data to save on storage costs. If you are using your own S3 bucket, your S3 storage will be unlinked from the project, but the data will remain in your S3 storage. Your S3 storage can then be used for other projects.
Favorites : By clicking on the star next to the project name in the tile view, you set a project as favourite. You can have multiple favourites and use the Favourites checkbox to only show those favourites. This prevents having too many projects visible.
Tile view shows a grid of projects. This view is best suited if you only have a few projects or have filtered them out by creating favourites. A single click will open the project.
List view shows a list of projects. This view allows you to add additional filters on name, description, location, user role, tenant, size and analyses. A double-click is required to open the project.
Missing Projects: If you are missing projects which have been created by other users, the workgroup filter might still be active. Clear the filter with the x to the right. You can verify the list of projects to which you have access with the CLI command
icav2 projects list
.
Illumina software applications which do their own data management on ICA (such as BSSH) store their resources and data in a project much in the same was as manually created projects work in ICA. For ICA, these projects are considered to be externally-managed projects and from ICA, only read-only access to that data is possible for users of that tenant. This is to prevent inconsistencies when these applications want to access their own project data.
Projects are indicated as externally-managed in the projects overview screen by a project card with a light grey accent and a lock symbol followed by "managed by app".
For a better understanding of how all components of ICA work, try the end-to-end tutorial.
Illumina Connected Analytics allows you to create and assign metadata to capture additional information about samples.
Each tenant has one root metadata model that is accessible to all projects in the tenant. This allows an organization to collect the same piece of information for every sample in every project in the tenant, such as an ID number. Within this root model, you can configure multiple metadata submodels, even at different levels.
Illumina recommends that you limit the amount of fields or field groups you add to the root model. If there are any misconfigured items in the root model, it will carry over into all other metadata models in the tenant. Once a root model is published, the fields and groups that are defined within it cannot be deleted. You should first consider creating submodels before adding anything to the root model. When configuring a project, you have the option to assign one published metadata model for all samples in the project. This metadata model can be the root model, a submodel of the root model, or a submodel of a submodel. It can be any published metadata model in the tenant. When a metadata model is selected for a project, all fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.
❗️ Illumina recommends that you limit the amount of fields or field groups you add to the root model. You should first consider creating submodels before adding anything to the root model.
The following terminology is used within this page:
Metadata fields = Metadata fields will be linked to a sample in the context of a project. They can be of various types and could contain single or multiple values.
Metadata groups = You can identify that a few fields belong together (for example, they all are related to quality metrics). That would be the moment to create a group so that the user knows these fields belong together
Root model = Model that is linked to the tenant. Every metadata model that you link to a project will also contain the fields and groups specified in this model as this is a parent model for all other models. This is a subcategory of a project metadata model
Child/Sub model = Any metadata model that is not the root model. Child models will inherit all fields and groups from their parent models. This is a subcategory of a project metadata model
Pipeline model = Model that is linked to a specific pipeline and not a project
Metadata in the context of ICA will always give information about a sample. It can be provided by the user, the pipeline and via the API. There are 2 general categories of metadata models: Project Metadata Model and Pipeline Metadata Model. Both models are built from metadata fields and groups. The project metadata model is specific per tenant, while the pipeline metadata model is linked to a pipeline and can be shared across tenants. These models are defined by users.
Each sample can have multiple metadata models. Whenever you link a project metadata model to your project, you will see its groups and fields present on each sample. The root model from that tenant will also be present as every metadata model inherits the groups and fields specified in the parent metadata model(s). When a pipeline is executed with sample and the pipeline contained a metadata model, the groups and fields will be present as well for each analysis that comes out of a pipeline execution.
The following field types are used within ICA:
Text: Free text
Keyword: Automatically complete value based on already used values
Numeric: Only numbers
Boolean: True or false, cannot be multiple value
Date: e.g. 23/02/2022
Date time: e.g. 23/02/2022 11:43:53, saved in UTC
Enumeration: select value out of drop-down list
The following properties can be selected for groups & fields:
Required: Pipeline can’t be started with this sample until the required group/field is filled in
Sensitive: Values of this group/field are only visible to project users of the own tenant. When a sample is shared across tenants, these fields won't be visible
Filled by pipeline: Fields that need to be filled by pipeline should be part of the same group. This group will automatically be multiple value and values will be available after pipeline execution. This property is only available for groups
Multiple value: This group/field can consist out of multiple (grouped) values
❗️ Fields cannot be both required and filled by pipeline
Project metadata model has metadata linked to a specific project. Values are known upront, general information is required for each sample of a specific project, and it may include general mandatory company information.
Pipeline metadata model has metadata linked to a specific pipeline. Values are populated during the pipeline execution and it requires an output file with the name 'metadata.response.json'.
❗️ Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled
Newly created and updated metadata models are not available for use within the tenant until the metadata model is published. When a metadata model is published, fields and field groups cannot be deleted, but the names and descriptions for fields and field groups can be edited. A model can be published after verifying all parent models are published first.
If a published metadata model is no longer needed, you can retire the model (except the root model).
First, check if the model contains any submodels. A model cannot be retired if it contains any published submodels.
When you are certain you want to retire a model and all submodels are retired, click on the three dots in the top right of the model window, and then select Retire Metadata Model.
To add metadata to your samples, you first need to assign a metadata model to your project.
Go to Projects > your_project > Project Settings > Details.
Select Edit.
From the Metadata Model drop-down list, select the metadata model you want to use for the project.
Select Save. All fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.
To manually add metadata to samples in your project, do as follows.
Precondition is that you have a metadata model assigned to your project
Go to Projects > your_project > Samples > your_sample.
Double-click your sample to open the sample details.
Enter all metadata information as it applies to the selected sample. All required metadata fields must be populated or the pipeline cannot start.
Select Save
To fill metadata by pipeline executions, a pipeline model must be created.
In the Illumina Connected Analytics main navigation, go to Projects > your_project > Flow > Pipelines > your_pipeline.
Double-click on your pipeline to open the pipeline details.
Create/Edit your model under Metadata Model tab. Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled.
In order for your pipeline to fill the metadata model, an output file with the name metadata.response.json
must be generated. After adding your group fields to the pipeline model, click on Generate example JSON
to view the required format for your pipeline.
❗️ The field names cannot have
.
in them, e.g. for the metric nameQ30 bases (excl. dup & clipped bases)
the.
afterexcl
must be removed.
Populating metadata models of samples allows having a sample-centric view of all the metadata. It is also possible to synchorinize that data into your project's Base warehouse.
In the Illumina Connected Analytics main navigation, select Projects.
In your project menu select Schedule.
Select 'Add new', and then click on the Metadata Schedule option.
Type a name for your schedule, optionally add description, and select whether you would like the metadata source would be the current project or the entire tenant. It is also possible to select whether ICA references would be anonymized and if sensitive metadata fields would be included. As a reminder, values of sensitive metadata fields would not be visible to other users outside of the project.
Select Save.
Navigate to Tables under BASE menu in your project.
Two new table schemas should be added with your current metadata models.
Bundles are curated data sets which combine assets such as pipelines, tools, and Base query templates. This is where you will find packaged assets such as Illumina-provided pipelines and sample data. You can create, share and use bundles in projects of your own tenant as well as projects in other tenants.
For data-management reasons, there is a combined limit of 30 000 projects and bundles per tenant.
The following ICA assets can be included in bundles:
Base tables (read-only)
Base query templates
Bench docker images
Data
Pipelines
Reference data
Sample data
Tools and Tool images
Assets must meet the following requirements before being added to a bundle:
For Samples and Data, the project the asset belongs to must have data sharing enabled.
The region of the project containing the asset must match the region of the bundle.
You must have permission to access the project containing the asset.
Pipelines and tools need to be in released status.
Samples must be available in a complete
state.
The main Bundles screen has two tabs: My Bundles and Entitled Bundles. The My Bundles tab shows all the bundles that you are a member of. This tab is where most of your interactions with bundles occur. The Entitled Bundles tab shows the bundles that have been specially created by Illumina or other organizations and shared with you to use in your projects. See Access and Use an Entitled Bundle.
As of ICA v.2.29, the content in bundles is linked in such a way that any updates to a bundle are automatically propagated to the projects which have that bundle linked.
If you have created bundle links in ICA versions prior to ICA v2.29 and want to switch them over to links with dynamic updates, you need to unlink and relink them.
From the main navigation page, select Projects > your_project > Details.
Click the Edit button at the top of the Details page.
Click the + button, under LINKED BUNDLES.
Click on the desired bundle, then click the +Link Bundles button.
Click Save.
The assets included in the bundle will now be available in the respective pages within the Project (e.g. Data and Pipelines pages). Any updates to the assets will be automatically available in the destination project.
Bundles and projects have to be in the same region in order to be linked. Otherwise, the error The bundle is in a different region than the project so it's not eligible for linking will be displayed.
To create a new bundle and configure its settings, do as follows.
From the main navigation, select Bundles.
Select + Create Bundle.
Enter a unique name for the bundle.
From the Bundle Location drop-down list, select where the assets for this bundle should be stored.
[Optional] Configure the following settings.
Categories—Select an existing category or enter a new one.
Status—Set the status of the bundle. When the status of a bundle changes, it cannot be reverted to a draft or released state.
Draft—The bundle can be edited.
Released—The bundle is released. Technically, you can still edit bundle information and add assets to the bundle, but should refrain from doing so.
Deprecated—The bundle is no longer intended for use. By default, deprecated bundles are hidden on the main Bundles screen (unless non-deprecated versions of the bundle exist). Select "Show deprecated bundles" to show all deprecated bundles. Bundles can not be recovered from deprecated status.
Short Description—Enter a description for the bundle.
Metadata Model—Select a metadata model to apply to the bundle.
Enter a release version for the bundle and optionally enter a description for the version.
[Optional] Links can be added with a display name (max 100 chars) and URL (max 2048 chars).
Homepage
License
Links
Publications
[Optional] Select the Documentation tab and enter any information you would like to attach to the bundle.
Select Save.
There is no option to delete bundles, they must be deprecated instead.
To make changes to a bundle, do as follows.
From the main navigation, select Bundles.
Select a bundle.
Select Edit.
Modify the bundle information and documentation as needed.
Select Save.
When the changes are saved, they also become available in all projects that have this bundle linked.
To make changes to a bundle, do as follows.
Select a bundle.
On the left-hand side, select the asset type under Flow (such as pipeline or tool) you want to add to the bundle.
Depending on the asset type, select add or link to bundle.
Select the assets and confirm.
When you link folders to a bundle, a warning is displayed indicating that, depending on the size of the folder, linking may take considerable time. The linking process will run in the background and the progress can be monitored on the Bundles > your_bundle > activity > Batch Jobs screen. To see more details and the progress, double-click the batch job and then double-click the individual item. This will show how many individual files are already linked.
You can not add the same asset twice to a bundle. Once added, the asset will no longer appear in the selection list.
Which batch jobs are visible as activity depends on the user role.
When creating a new bundle version, you can only add assets to the bundle. You cannot remove existing assets from a bundle when creating a new version. If you need to remove assets from a bundle, it is recommended that you create a new bundle.
From the main navigation, select Bundles.
Select a bundle.
Select Create new Version.
Make updates as needed and update the version number.
Select Save.
To manage bundle users and their permissions, do as follows.
From the main navigation, select Bundles > your_bundle > Team.
To invite a user to collaborate on the bundle, do as follows.
To add a user from your tenant, select Someone of your tenant and select a user from the drop-down list.
To add a user by their email address, select By email and enter their email address.
To add all the users of an entire workgroup, select Add workgroup and select a workgroup from the drop-down list.
Select the Bundle Role drop-down list and choose a role for the user or workgroup. This role defines the ability of the user or workgroup to view or edit bundle settings.
Repeat as needed to add more users.\
Users are not officially added to the bundle until they accept the invitation.
To change the permissions role for a user, select the Bundle Role drop-down list for the user and select a new role.
To revoke bundle permissions from a user, select the trash icon for the user.
Select Save Changes.
From the main navigation, Select Bundles > your_bundle > Legal.
To add Terms of Use to a Bundle, do as follows:
Select New Version.
Use the WYSIWYG editor to define Terms of Use for the selected bundle.
Click Save.
[Optional] Require acceptance by clicking the checkbox, Acceptance required.
Acceptance required will prompt a user to accept the Terms of Use before being able to use a bundle or add the bundle to a project.
To edit the Terms of Use, repeat Steps 1-3 and use a unique version name. If you select acceptance required, you can choose to keep the acceptance status as is or require users to reaccept the terms of use. When reacceptance is required, users need to reaccept the terms in order continue using this bundle in their pipelines. This is indicated when they want to enter projects which use this bundle.
Entitled bundles are bundles created by Illumina or third parties for you to use in your projects. Entitled bundles can already be part of your tenant when it is part of your subscription. You can see your entitled bundles at Bundles > Entitled Bundles.
To use your shared entitled bundle, add the bundle to your project via Project Linking. Content shared via entitled bundles is read-only, so you cannot add or modify the contents of an entitled bundle. If you lose access to an entitled bundle previously shared with you, the bundle is unlinked and you will no longer be able to access its contents.
A storage configuration provides ICA with information to connect to an external cloud storage provider, such as AWS S3. The storage configuration validates that the information provided is correct, and then continuously monitors the integration.
Refer to the following pages for instructions to setup supported external cloud storage providers:
The storage configuration requires credentials to connect to your storage. AWS uses the security credentials to authenticate and authorize your requests. On the System Settings > Storage > Credentials tab > Create storage credential, you can enter these credentials. Long-term access keys consist of a combination of the access key ID and secret access key as a set.
Fill out the following fields:
Type—The type of access credentials. This will usually be AWS user.
Name—Provide a name to easily identify your access key.
Access key ID—The access key you created.
Secret access key—Your related secret access key.
For more information, refer to the AWS security credentials documentation.
In the ICA main navigation, select System Settings > Storage > Configuration tab > New configuration.
Configure the following settings for the storage configuration.
Type—Use the default value, eg, AWS_S3. Do not change.
Region—Select the region where the bucket is located.
Configuration name—You will use this name when creating volumes that reside in the bucket. The name length must be in between 3 and 63 characters.
Description—Here you can provide a description for yourself or other users to identify this storage configuration.
Bucket name—Enter the name of your S3 bucket.
Key prefix [Optional]—You can provide a key prefix to allow only files inside the prefix to be accessible. The key prefix must end with "/".
If a key prefix is specified, your projects will only have access to that folder and subfolders. For example, using the key prefix folder-1/ ensures that only the data from the folder-1 directory in your S3 bucket is synced with your ICA project. Using prefixes and distinct folders for each ICA project is the recommended configuration as it allows you to use the same S3 bucket for different projects.
Using no key prefix results in syncing all data in your S3 bucket (starting from root level) with your ICA project. Your project will have access to your entire S3 bucket, which prevents that S3 bucket from being used for other ICA projects. Although possible, this configuration is not recommended.
Secret—Select the credentials to associate with this storage configuration. These were created on the Credentials tab.
Server Side Encryption [Optional]—If needed, you can enter the algorithm and key name for server-side encryption processes.
Select Save.
ICA performs a series of steps in the background to verify the connection to your bucket. This can take several minutes. You may need to manually refresh the list to verify that the bucket was successfully configured. Once the storage configuration setup is complete, the configuration can be used while creating a new project.
With the action Set as default for region, you select which storage will be used as default storage in a region for new projects of your tenant. Only one storage can be default at a time for a region, so selecting a new storage as default will unselect the previous default. If you do not want to have a default, you can select the default storage and the action will become Unset as default for region.
The System Settings > Storage > Credentials > Share storage credential action is used to make the storage available to everyone in your tenant. By default, storage is private per user so that you have complete control over the contents. Once you decide you want to share the storage, simply select it and use the Share storage credential action. Do take into account that once shared, you can not unshare the storage. Once your storage is used in a project, it can also no longer be deleted.
Filenames beginning with / are not allowed, so be careful when entering full path names. Otherwise the file will end up on S3 but not be visible in ICA. If this happens, access your S3 storage directly and copy the data to where it was intended. If you are using an Illumina-managed S3 storage, submit a support request to delete the erroneous data.
Every 4 hours, ICA will verify the storage configuration and credentials to ensure availability. When an error is detected, ICA will attempt to reconnect once every 15 minutes. After 200 consecutively failed connection attempts (50 hours), ICA will stop trying to connect.
When you update your credentials, the storage configuration is automatically validated. In addition, you can manually trigger revalidation when ICA has stopped trying to connect by selecting the storage and then clicking Validate on the System Settings > Storage > Configuration tab.
Refer to this page for the troubleshooting guide.
ICA supports the following storage classes. Please see the AWS documentation for more information on each:
If you are using Intelligent Tiering, which allows S3 to automatically move files into different cost-effective storage tiers, please do NOT include the Archive and Deep Archive Access tiers, as these are not supported by ICA yet. Instead, you can use lifecycle rules to automatically move files to Archive after 90 days and Deep Archive after 180 days. Lifecycle rules are supported for user-managed buckets.
The Data section gives you access to the files and folders stored in the project as well as those linked to the project. Here, you can perform searches and data management operations such as moving, copying, deleting and (un)archiving.
ICA supports UTF-8 characters in file and folder names for data. Please follow the guidelines detailed below. (For more information about recommended approaches to file naming that can be applicable across platforms, please refer to the AWS S3 documentation.)
The length of the file name (minus prefixes and delimiters) is ideally limited to 32 characters.
Folders cannot be renamed after they have been created. To rename a folder, you will need to create a new folder with the desired name, move the contents from the original folder into the new one, and then delete the original folder. Please see Move Data section for more information.
See the list of supported Data Formats
Data privacy should be carefully considered when adding data in ICA, either through storage configurations (ie, AWS S3) or ICA data upload. Be aware that when adding data from cloud storage providers by creating a storage configuration, ICA will provide access to the data. Ensure the storage configuration source settings are correct and ensure uploads do not include unintended data in order to avoid unintentional privacy breaches. More guidance can be found in the ICA Security and Compliance section.
See Data Integrity
To prevent cost issues, you can not perform actions such as copying and moving data which would write data to the workspace when the project billing mode is set to tenant and the owning tenant of the folder is not the current user's tenant.
On the Projects > your_project > Data page, you can view file information and preview files.
To view file details click on the filename to see the file details.
Run input tags identify the last 100 pipelines which used this file as input.
Connector tags indicate if the file was added via browser upload or connector.
To view file contents, select the checkbox at the begining of the line and then select View from the top menu. Alternatively, you can first click on the filename to see the details and then click view to preview the file.
When Secondary Data is added to a data record, those secondary data records are mounted in the same parent folder path as the primary data file when the primary data file is provided as an input to a pipeline. Secondary data is intended to work with the CWL secondaryFiles feature. This is commonly used with genomic data such as BAM files with companion BAM index files (refer to https://www.ncbi.nlm.nih.gov/tools/gbench/tutorial6/ for an example).
To hyperlink to data, use the following syntax:
Normal permission checks still apply with these links. If you try to follow a link to data to which you do not have access, you will be returned to the main project screen or login screen, depending on your permissions.
Uploading data to the platform makes it available for consumption by analysis workflows and tools.
To upload data manually via the drag-and-drop interface in the platform UI, go to Projects > your_project > Data and either
Drag a file from your system into Choose a file or drag it here box.
Select the Choose a file or drag it here box, and then choose a file. Select Open to upload the file.
Your file or files are added to the Data page when upload completes.
Do not close the ICA tab in your browser while data uploads.
Uploads via the UI are limited to 5TB and no more than 100 concurrent files at a time, but for practical and performance reasons, it is recommended to use the CLI or Service connector when uploading large amounts of data.
For instructions on uploading/downloading data via CLI, see CLI Data Transfer.
You can copy data from the same project to a different folder or from another project to which you have access.
In order to copy data, the following rights must be assigned to the person copying the data:
The following restrictions apply when copying data:
Data in the "Partial" or "Archived" state will be skipped during a copy job.
To use data copy:
Go to the destination project for your data copy and proceed to Projects > your_project > Data > Manage > Copy Data From.
Optionally, use the filters (Type, Name, Status, Format or additional filters) to filter out the data or search with the search box.
Select the data (individual files or folders with data) you want to copy.
Select any meta data which you want to keep with the copied data (user tags, technical system tags or instrument information).
Select which action to take if the data already exists (overwrite exsiting data, don't copy or keep both the original and the new copy by appending a number to the copied data).
Select Copy Data to copy the data to your project. You can see the progress in Projects > your_project > Activity > Batch Jobs and if your browser permits it, a pop-up message will be displayed whan the copy process completes.
The outcome can be
INITIALIZED
WAITING_FOR_RESOURCES
RUNNING
STOPPED - When choosing to stop the batch job.
SUCCEEDED - All files and folders are copied.
PARTIALLY_SUCCEEDED - Some files and folders could be copied, but not all. Partially succeeded will typically occur when files were being modified or unavailable while the copy process was running.
FAILED - None of the files and folders could be copied.
There is a difference in copy type behavior between copying files and folders. The behavior is designed for files and it is best practice to not copy folders if there already is a folder with the same name in the destination location.
Notes on copying data
Copying data comes with an additional storage cost as it will create a copy of the data.
You can copy over the same data multiple times.
Copying data from your own S3 storage requires additional configuration. See Connect AWS S3 Bucket and SSE-KMS Encryption..
On the command-line interface, the command to copy data is icav2 projectdata copy
.
You can move data both within a project and between different projects to which you have access. If you allow notifications from your browser, a pop-up will appear when the move is completed.
Move Data From is used when you are in the destination location.
Move Data To is used when you are in the source location. Before moving the data, pre-checks are performed to verify that the data can be moved and no currently running operations are being performed on the folder. Conflicting jobs and missing permissions will be reported. Once the move has started, no other operation should be performed on the data being moved to avoid potential data loss or duplication. Adding or (un)archiving files during the move may result in duplicate folders and files with different identifiers. If this happens, you will need to manually delete the duplicate files and move the files which were skipped during the initial move.
When you move data from one location to another, you should not change the source data while the Move job is in progress. This will result in jobs getting aborted. Please expand the "Troubleshooting" section below for information on how to fix this if it occurs.
There are a number of rights and restrictions related to data move as this will delete the data in the source location.
Move jobs will fail if any data being moved is in the "Partial" or "Archived" state.
Move Data From is used when you are in the destination location.
Navigate to Projects > your_project > Data > your_destination_location > Manage > Move Data From.
Select the files and folders which you want to move.
Select the Move button. Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.
Move Data To is used when you are in the source location. You will need to select the data you want to move from to current location and the destination to move it to.
Navigate to Projects > your_project > Data > your_source_location.
Select the files and folders which you want to move.
Select to Projects > your_project > Data > your_source_location > Manage > Move Data To.
Select your target project and location.
Note: You can create a new folder to move data to by filling in the "New folder name (optional)" field. This does NOT rename an existing folder. To rename an existing folder, please see File/Folder Naming.
Select the Move button. Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.
INITIALIZED
WAITING_FOR_RESOURCES
RUNNING
STOPPED - When choosing to stop the batch job.
SUCCEEDED - All files and folders are moved.
PARTIALLY_SUCCEEDED - Some files and folders could be moved, but not all. Partially succeeded will typically occur when files were being modified or unavailable while the move process was running.
FAILED - None of the files and folders could be moved.
Restrictions:
A total maximum of 1000 items can be moved in one operation. An item can be either a file or a folder. Folders with subfolders and subfiles still count as one item.
You can not move files and folders to a destination where one or more files or folders with the same name already exists.
You can not move data and folders to linked data.
You can not move a folder to itself.
You can not move data which is in the process of being moved.
You can not move data across regions.
You can not move data from externally-managed projects.
You can not move linked data.
You can not move data between regions.
You can not move externally managed data.
You can only move data when it has status available.
To move data across projects, it must be owned by the user's tenant.
If you do not select a target folder for Move Data To, the root folder of the target project is used.
If you are only able to select your source project as the target data project, this may indicate that data sharing (Projects > your_project > Project Settings > Details > Data Sharing) is not enabled for your project or that you do not have have upload rights in other projects.
Single files can be downloaded directly from within the UI.
Select the checkbox next to the file which you want to download, followed by Download > Download file.
Files for which ICA can display the contents can be viewed by clicking on the filename, followed by the View tab. Select the download action on the view tab to download the file. Note that larger files may take some time to load.
You can trigger an asynchronous download via service connector using the Schedule for Download button with one or more files selected.
Select a file or files to download.
Select Download > Download files or folders using a service connector. This will display a list of all available connectors.
Select a connector, and then select Schedule for Download. If you do not find the connector you need or you do not have a connector, you can click the Don't have a connector yet?
option to create a new connector. You must then install this new connector and return to the file selection in step 1 to use it.
You can view the progress of the download or stop the download on the Activity page for the project.
The data records contained in a project can be exported in CSV, JSON, and excel format.
Select one or more files to export.
Select Export.
Select the following export options:
To export only the selected file, select the Selected rows as the Rows to export option. To export all files on the page, select Current page.
To export only the columns present for the file, select the Visible columns as the Columns to export option.
Select the export format.
To manually archive or delete files, do as follows:
Select the checkbox next to the file or files to delete or archive.
Select Manage, and then select one of the following options:
Archive — Move the file or files to long-term storage (event code ICA_DATA_110).
Unarchive — Return the file or files from long-term storage. Unarchiving can take up to 48 hours, regardless of file size. Unarchived files can be used in analysis (event code ICA_DATA_114).
Delete — Remove the file completely (event code ICA_DATA_106).
When attempting concurrent archiving or unarchiving of the same file, a message will inform you to wait for the currently running (un)archiving to finish first.
To archive or delete files programmatically, you can use ICA's API endpoints:
GET the file's information.
Modify the dates of the file to be deleted/archived.
PUT the updated information back in ICA.
Linking a folder creates a dynamic read-only view of the source data. You can use this to get access to data without running the risk of modifying the source material and to share data between projects. In addition, linking ensures changes to the source data are immediately visible and no additional storage is required.
You can recognise linked data by the green color and see the owning project as part of the details.
Since this is read-only access, you cannot perform actions on linked data that need to write access. Actions like (un)archiving, linking, creating, deleting, adding or moving data and folders, and copying data into the linked data are not possible.
Linking data is only possible from the root folder of your destination project. The action is disabled in project subfolders.
Linking a parent folder after linking a file or subfolder will unlink the file or subfolder and link the parent folder. So root\linked_subfolder will become root\linked_parentfolder\linked_subfolder.
Initial linking can take considerable time when there is a large amount of source data. However, once the initial link is made, updates to the source data will be instantaneous.
You can perform analysis on data from other projects by linking data from that project.
Select Projects > your_project > Data > Manage, and then select Link Data.
To view data by project, select the funnel symbol, and then select Owning Project. If you only know which project the data is linked to, you can choose to filter on linked projects.
Select the checkbox next to the file or files to add.
Select Select Data.
Your files are added to the Data page. To view the linked data file, select Add filter, and then select Links.
If you link a folder instead of individual files, a warning is displayed indicating that, depending on the size of the folder, linking may take considerable time. The linking process will run in the background and the progress can be monitored on the Projects > your_project > activity > Batch Jobs screen.
To see more details, double-click the batch job.
To see how many individual files are already linked, double-click the item.
To unlink the data, go to the root level of your project and select the linked folder or if you have linked individual files separately, then you can select those linked files (limited to 100 at a time) and select Manage > Unlink Data. As during linking a folder, when unlinking, the progress can be monitored at Projects > your_project > activity > Batch Jobs.
The Activity view shows the status and history of long-running activities including Data Transfers, Base Jobs, Base Activity, Bench Activity and Batch Jobs.
The Data Transfers tab shows the status of data uploads and downloads.
The Base Jobs tab gives an overview of all the actions related to a table or a query that have run or are running (e.g., Copy table, export table, Select * from table, etc.)
The jobs are shown with their:
Creation time: When did the job start
Description: The query or the performed action with some extra information
Type: Which action was taken
Status: Failed or succeeded
Duration: How long the job took
Billed bytes: The used bytes that need to be paid for
Failed jobs provide information on why the job failed. Details are accessed by double-clicking the failed job. Jobs in progress can be aborted here.
The Base Activity tab gives an overview of previous results (e.g., Executed query, Succeeded Exporting table, Created table, etc.) Collecting this information can take considerable time. For performance reasons, only the activity of the last month (rolling window) with a limit of 1000 records is shown and available for download as Excel or JSON. To get the data for the last year without limit on the number of records, use the export as file function. No activity data is retained for more than one year.
The activities are shown with:
Start Time: The moment the action was started
Query: The SQL expression.
Status: Failed or succeeded
Duration: How long the job took
User: The user that requested the action
Size: For SELECT queries, the size of the query results is shown. Queries resulting in less than 100Kb of data will be shown with a size of <100K
The Bench Activity tab shows the actions taken on Bench Workspaces in the project.
The activities are shown with:
Workspace: Workspace where the activity took place
Date: Date and time of the activity
User: User who performed the activity
Action: Which activity was performed
The Batch Jobs tab allows users to monitor progress of Batch Jobs in the project. It lists Data Downloads, Sample Creation (double-click entries for details) and Data Linking (double-click entries for details). The (ongoing) Batch Job details are updated each time they are (re)opened, or when the refresh button is selected at the bottom of the details screen. Batch jobs which have a final state such as Failed or Succeeded are removed from the activity list after 7 days.
Which batch jobs are visible depends on the user role.
Illumina® Connected Analytics is a cloud-based software platform intended to be used to manage, analyze, and interpret large volumes of multi-omics data in a secure, scalable, and flexible environment. The versatility of the system allows the platform to be used for a broad range of applications. When using the applications provided on the platform for diagnostic purposes, it is the responsibility of the user to determine regulatory requirements and to validate for intended use, as appropriate.
The platform is hosted in regions listed below.
Region Name | Region Identifier |
---|---|
The platform hosts a suite of RESTful HTTP-based application programming interfaces (APIs) to perform operations on data and analysis resources. A web application user-interface is hosted alongside the API to deliver an interactive visualization of the resources and enables additional functionality beyond automated analysis and data transfer. Storage and compute costs are presented via usage information in the account console, and a variety of compute resource options are specifiable for applications to fine tune efficiency.
The user documentation provides material for learning the basics of interacting with the platform including examples and tutorials. Start with the Get Started documentation to learn more.
Use the search bar on the top right to navigate through the help docs and find specific topics of interest.
If you have any questions, contact Illumina Technical Support by phone or email:
Illumina Technical Support | techsupport@illumina.com | 1-800-809-4566
For customers outside the United States, Illumina regional Technical Support contact information can be found at www.illumina.com/company/contact-us.html.
To see the current ICA version you are logged in to, click your username found on the top right of the screen and then select About.
To view a list of the products to which you have access, select the 9 dots symbol at the top right of ICA. This will list your products. If you have multiple regional applications for the same product, the region of each is shown between brackets.
The More Tools category presents the following options
My Illumina Dashboard to monitor instruments, streamline purchases and keep track of upcoming activities.
Link to the Support Center for additional information and help.
Link to the order management from where you can keep track of your current and past orders.
In the Release Notes section of the documentation, posts are made for new versions of deployments of the core platform components.
You can use your own S3 bucket with Illumina Connected Analytics (ICA) for data storage. This section describes how to configure your AWS account to allow ICA to connect to an S3 bucket.
These instructions utilize the AWS CLI. Follow the AWS CLI documentation for instructions to download and install.
Key points for connected AWS S3 buckets to ICA:
The AWS S3 bucket must exist in the same AWS region as the ICA project. Refer to the table below for a mapping of ICA project regions to AWS regions:
*Note: BSSH is not deployed currently on the South Korea instance, therefore there will be limited functionality in this region with regard to sequencer integration.
You can enable SSE using an Amazon S3-managed key (SSE-S3). Instructions for using KMS-managed (SSE-KMS) keys are found here.
Because of how Amazon S3 handles folders and does not send events for S3 folders, the following restrictions must be taken into account for ICA project data stored in S3.
When creating an empty folder in S3, it will not be visible in ICA.
When moving folders in S3, the original, but empty, folder will remain visible in ICA and must be manually deleted there.
When deleting a folder and its contents in S3, the empty folder will remain visible in ICA and must be manually deleted there.
Projects cannot be created with ./ as prefix since S3 does not allow uploading files with this key prefix.
When configuring a new project in ICA to use a preconfigured S3 bucket, you can use the root folder if needed. However, this is not recommended as that S3 bucket is then no longer available for other ICA projects. Instead, please consider using subfolders in S3 for your projects.
❗️ For Bring Your Own Storage buckets, all unversioned, versioned and suspended buckets are supported. If you connect buckets with object versioning, the data in ICA will be automatically synced with the data in objectstore. For Bring Your Own Storage buckets with versioning enabled, when an object is deleted without specifying a particular version, a "Delete marker" is created on the objectstore to indicate that the object has been deleted. ICA will reflect the object state by deleting the record from the database. No further action on your side is needed to sync.
ICA requires cross-origin resource sharing (CORS) permissions to write to the S3 bucket for uploads via the browser. Refer to the Configuring cross-origin resource sharing (CORS) (expand the "Using the S3 console" section) documentation for instructions on enabling CORS via the AWS Management Console. Use the following configuration during the process:
In the cross-origin resource sharing (CORS) section, enter the following content.
ICA requires specific permissions to access data in an AWS S3 bucket. These permissions are contained in an AWS IAM Policy.
Refer to the Creating policies on the JSON tab documentation for instructions on creating an AWS IAM Policy via the AWS Management Console. Use the following configuration during the process:
On Unversioned buckets, paste the JSON policy document below. Replace YOUR_BUCKET_NAME
with the actual name of your bucket. Note the example below provides access to all objects prefixes in the bucket.
On Versioned OR Suspended buckets, paste the JSON policy document below. Replace YOUR_BUCKET_NAME
with the actual name of your bucket. Note the example below provides access to all objects prefixes in the bucket.
(Optional) Set policy name to "illumina-ica-admin-policy"
To create the IAM Policy via the AWS CLI, create a local file named illumina-ica-admin-policy.json
containing the policy content above and run the following command. Be sure the path to the policy document (--policy-document
) leads to the path where you saved the file:
An AWS IAM User is needed to create an Access Key for ICA to connect to the AWS S3 Bucket. The policy will be attached to the IAM user to grant the user the necessary permissions.
Refer to the Creating IAM users (console) documentation for instructions on creating an AWS IAM User via the AWS Management Console. Use the following configuration during the process:
(optional) Set user name to "illumina_ica_admin"
Select the Programmatic access option for the type of access
Select Attach existing policies directly when setting the permissions, and choose the policy created in Create AWS IAM Policy
(Optional) Retrieve the Access Key ID and Secret Access Key by choosing to Download .csv
To create the IAM user and attach the policy via the AWS CLI, enter the following command (AWS IAM users are global resources and do not require a region to be specified). This command creates an IAM user illumina_ica_admin
, retrieves your AWS account number, and then attaches the policy to the user.
If the Access Key information was retrieved during the IAM user creation, skip this step.
Refer to the Managing access keys (console) AWS documentation for instructions on creating an AWS Access Key via the AWS Console. See the "To create, modify, or delete another IAM user's access keys (console)" sub-section.
Use the below command to create the Access Key for the illumina_ica_admin IAM user. Note the SecretAccessKey
is sensitive and should be stored securely. The access key is only displayed when this command is executed and cannot be recovered. A new access key must be created if it is lost.
The AccessKeyId
and SecretAccessKey
values will be provided to ICA in the next step.
Connecting your S3 bucket to ICA does not require any additional bucket policies.
However, if a bucket policy is required for use cases beyond ICA, you need to ensure that the bucket policy supports the essential permissions needed by ICA without inadvertently restricting its functionality.
Here is one such example:
In this example, we have a restriction enabled on the bucket policy to disallow any kind of access to the bucket. However, there is an exception rule added for the IAM user that ICA is using to connect to the S3 bucket. The exception rule is allowing ICA to perform the above S3 action permissions necessary for ICA functionalities.
Additionally, the exception rule is applied to the STS federated user session principal associated with ICA. Since ICA leverages the AWS STS to provide temporary credentials that allow users to perform actions on the S3 bucket, it is crucial to include these STS federated user session principals in your policy's whitelist. Failing to do so could result in 403 Forbidden errors when users attempt to interact with the bucket's objects using the provided temporary credentials.
To connect your S3 account to ICA, you need to add a storage credential in ICA containing the Access Key ID and Access Key created in the previous step. From the ICA home screen, navigate to System Settings > Storage > Credentials and click the +New button to create a new storage credential.
Provide a name for the storage credentials, ensure the type is set to "AWS user" and provide the Access Key ID and Secret Access Key.
With the secret credentials created, a storage configuration can be created using the secret credential. Refer to the instructions to Create a Storage Configuration for details.
ICA uses AssumeRole to copy and move objects from a bucket in an AWS account to another bucket in another AWS account. To allow cross account access to a bucket, the following policy statements must be added in the bucket policy:
In the policy, replace the YOUR_BUCKET_NAME with the actual name of your bucket. The ARN of the cross account role you want to give permission to is specified in the Principal. Refer to the table below to determine which region-specific Role ARN should be used.
The following are common issues encountered when connecting an AWS S3 bucket through a storage configuration
This error occurs when an existing bucket notification's event information overlap with the notifications ICA is trying to add. Amazon S3 event notification only allows overlapping events with non-overlapping prefix. Depending on the conflicts on the notifications, the error can be presented in any of the following:
Volume Configuration cannot be provisioned: storage container is already set up for customer's own notification
Invalid parameters for volume configuration: found conflicting storage container notifications with overlapping prefixes
Failed to update bucket policy: Configurations overlap. Configurations on the same bucket cannot share a common event type
To fix the issue:
In the Amazon S3 Console, review your current S3 bucket's notification configuration and look for prefixes that overlaps with your Storage Configuration's key prefix
Delete the existing notification that overlaps with your Storage Configuration's key prefix
ICA will perform a series of steps in the background to re-verify the connection to your bucket.
This error can occur when recreating a recently deleted storage configuration. To fix the issue, you have to delete the bucket notifications:
In the Amazon S3 Console select the bucket for which you need to delete the notifications from the list.
Choose properties
Navigate to the Event Notifications section and choose the check box for the event notifications with name gds:objectcreated, gds:objectremoved and gds:objectrestore and click Delete.
Wait 15 minutes for the storage to become available in ICA
If you do not want to wait 15 minutes, you can delete the current storage configuration, delete the bucket notifications in the bucket and create a new storage configuration.
In order to create a Tool, a Docker image is required to run the application in a containerized environment. Illumina Connected Analytics supports both public Docker images and private Docker images uploaded to ICA.
Navigate to System Settings > Docker Repository.
Click + New external image to add a new external image.
Add your full image URL in the Url field, e.g. docker.io/alpine:latest
or registry.hub.docker.com/library/alpine:latest
. Docker Name and Version will auto-populate. (Tip: do not add http:// or https:// in your URL)
Note: Do not use :latest when the repository has rate limiting enabled as this interferes with caching and incurs additional data transfer.
(Optional) Complete the Description field.
Click Save.
The newly added image will appear in your Docker Repository list.
Verification of the URL is performed during execution of a pipeline which depends on the Docker image, not during configuration.
External images are accessed from the external source whenever required and not stored in ICA. Therefore, it is important not to move or delete the external source. There is no status displayed on external Docker repositories in the overview as ICA cannot guarantee their availability. The use of :stable instead of :latest is recommended.
In order to use private images in your tool, you must first upload them as a TAR file.
Navigate to Projects > your_project .
Upload your private image as a TAR file, either by dragging and dropping the file in the Data tab, using the CLI or a Connector. For more information please refer to the project Data.
Select your uploaded TAR file and click in the top menu on Manage > Change Format .
Select DOCKER from the drop-down menu and Save.
Navigate to System Settings > Docker Repository (outside of your project).
Click on +New.
Click on the magnifying glass to find your uploaded TAR image file.
Select the appropriate region and if needed, filter on project from the drop-down menus to find your file.
Select that file.
Select the appropriate region, fill in the Docker Name and Version, and click Save.
The newly added image should appear in your Docker Repository list. Verify it is marked as Available under the Status column to ensure it is ready to be used in your tool or pipeline.
Navigate to System Settings > Docker Repository.
Either
Select the required image(s) and go to Manage > Add to Regions.
OR double-click on a required image, check the box matching the region you want to add, and select update.
In both cases, allow a few minutes for the image to become available in the new region (the status becomes available in table view).
Docker image size should be kept as small as practically possible. To this end, it is best practice to compress the image. After compressing and uploading the image, select your uploaded file and click Manage > Change Format in the top menu to change it to Docker format so ICA can recognize the file.
This section describes how to connect an AWS S3 Bucket with SSE-KMS Encryption enabled. General instructions for configuring your AWS account to allow ICA to connect to an S3 bucket are found on this page.
Follow the AWS instructions for how to create S3 bucket with SSE-KMS key.
Note: S3-SSE-KMS must be in the same region as your ICA v2.0 project. See the ICA S3 bucket documentation for more information.
In the "Default encryption" section, enable Server-side encryption and choose AWS Key Management Service key (SSE-KMS)
. Then select Choose your AWS KMS key
.
If you do not have an existing customer managed key, click Create a KMS key
and follow these steps from AWS.
Once the bucket is set, the user is recommended also to create a folder that will be connected to ICA as a prefix. If the user makes a new folder in the bucket that will be linked in the ICA storage configuration, the encryption must be enabled in AWS console.
Follow the general instructions for connecting an S3 bucket to ICA.
In the step "Create AWS IAM policy":
Add permission to use KMS key by adding kms:Decrypt
, kms:Encrypt
, and kms:GenerateDataKey
Add the ARN KMS key arn:aws:kms:xxx
on the first "Resource"
On Unversioned buckets, the permssions will match the following:
On Versioned OR Suspended buckets, the permssions will match the following:
At the end of the policy setting, there should be 3 permissions listed in the "Summary".
Follow the general instructions for how to create a storage configuration in ICA.
On step 3 in process above, continue with the [Optional] Server Side Encryption
to enter the algorithm and key name for server-side encryption processes.
On "Algorithm", input aws:kms
On "Key Name", input the ARN KMS key: arn:aws:kms:xxx
"Key prefix" is optional, but recommended. "Key prefix" refers to the folder name in the bucket the user previously created above.
In addition to following the instructions to Enable Cross Account Copy, the KMS policy must include the following statement for AWS S3 Bucket with SSE-KMS Encyption (refer to the Role ARN table from the linked page for the ASSUME_ROLE_ARN
value):
To use a reference set from within a project, you have first to add it. From the project's page select Flow > Reference Data > Manage > +Add to project. Then select a reference set to add to your project. You can select the entire reference set, or click the arrow next to it to expand it. After expanding, scroll to the right, to see the individual reference files in the set. You can select individual reference files to add to your project, by checking the boxes next to them.
Note: Reference sets are only supported in Graphical CWL pipelines.
Navigate to Reference Data (outside of Project context).
Select the data set(s) you wish to add to another region and select Actions > Copy to another project.
Select a project located in the region where you want to add your reference data.
You can check in which region(s) Reference data is present by double-clicking on individual files in the Reference set and viewing Copy Details on the Data details tab.
Allow a few minutes for new copies to become available before use.
Note: You only need one copy of each reference data set per region. Adding Reference Data sets to additional projects set in the same region does not result in extra copies, but creates links instead. This is done from inside the project at Projects > <your_project> > Flow > Reference Data > Manage > Add to project.
To create a pipeline with a reference data use the CWL - graphical mode (important restriction: as of now you cannot use reference data for pipelines created in advanced mode). Use the reference data icon instead of regular input icon. On the right hand side use the Reference files submenu to specify the name, the format, and the filters. You can specify the options for an end-user to choose from and a default selection. You can select more than 1 file, but you can only select 1 at a time (so, repeat process to select multiple reference files). If you only select 1 reference file, that file will be the only one users can use with your pipeline. In the screenshot a reference data with two options is presented.
If your pipeline was built to give users the option of choosing among multiple input reference files, they will see the option to select among the reference files you configured, under Settings.
After clicking the magnifying glass icon the user can select from provided options.
ICA supports running pipelines defined using .
To specify a compute type for a CWL CommandLineTool, use the ResourceRequirement
with a custom namespace.
Reference for available compute types and sizes.
The ICA Compute Type will be determined automatically based on coresMin/coresMax (CPU) and ramMin/ramMax (Memory) values using a "best fit" strategy to meet the minimum specified requirements (refer to the table).
For example, take the following ResourceRequirements
:
This would result in a best fit of standard-large
ICA Compute Type request for the tool.
If the specified requirements can not be met by any of the presets, the task will be rejected and failed.
FPGA requirements can not be set by means of CWL ResourceRequirements.
The Machine Profile Resource in the graphical editor will override whatever is set for requirements in the ResourceRequirement.
If no Docker image is specified, Ubuntu will be used as default. Both : and / can be used as separator.
In ICA you can provide the "override" recipes as a part of the input JSON. The following example uses CWL overrides to change the environment variable requirement at load time.
A Pipeline is a series of Tools with connected inputs and outputs configured to execute in a specific order.
Pipelines are created and stored within projects.
Navigate to Projects > your_project > Flow > Pipelines.
Select CWL or Nextflow to create a new Pipeline.
Configure pipeline settings in the pipeline property tabs.
When creating a graphical CWL pipeline, drag connectors to link tools to input and output files in the canvas. Required tool inputs are indicated by a yellow connector.
Select Save.
Pipelines use the latest tool definition when the pipeline was last saved. Tool changes do not automatically propagate to the pipeline. In order to update the pipeline with the latest tool changes, edit the pipeline definition by removing the tool and re-adding it back to the pipeline.
Individual Pipeline files are limited to 100 Megabytes. If you need to add more than this, split your content over multiple files.
Pipelines use the latest tool definition when the pipeline was last saved. Tool changes do not automatically propagate to the pipeline. In order to update the pipeline with the latest tool changes, edit the pipeline definition by removing the tool and re-adding it back to the pipeline.
You can edit pipelines while they are in Draft or Release Candidate status. Once released, pipelines can no longer be edited.
The following sections describe the tool properties that can be configured in each tab of the pipeline editor.
Depending on how you design the pipeline, the displayed tabs differ between the graphical and code definitions. For CWL you have a choice on how to define the pipeline, Nextflow is always defined in code mode.
Any additional source files related to your pipeline will be displayed here in alphabetical order.
See the following pages for language-specific details for defining pipelines:
The Information tab provides options for configuring basic information about the pipeline.
The following information becomes visible when viewing the pipeline.
In addition, the clone function will be shown (top-right). When cloning a pipeline, you become the owner of the cloned pipeline.
The Documentation tab provides options for configuring the HTML description for the tool. The description appears in the tool repository but is excluded from exported CWL definitions. If no documentation has been provided, this tab will be empty.
When using graphical mode for the pipeline definition, the Definition tab provides options for configuring the pipeline using a visualization panel and a list of component menus.
In graphical mode, you can drag and drop inputs into the visualization panel to connect them to the tools. Make sure to connect the input icons to the tool before editing the input details in the component menu. Required tool inputs are indicated by a yellow connector.
This page is used to specify all relevant information about the pipeline parameters.
The Analysis Report tab provides options for configuring pipeline execution reports. The report is composed of widgets added to the tab.
The pipeline analysis report appears in the pipeline execution results. The report is configured from widgets added to the Analysis Report tab in the pipeline editor.
[Optional] Import widgets from another pipeline.
Select Import from other pipeline.
Select the pipeline that contains the report you want to copy.
Select an import option: Replace current report or Append to current report.
Select Import.
From the Analysis Report tab, select Add widget, and then select a widget type.
Configure widget details.
Select Save.
The Common Workflow Language main script.
The Nextflow configuration settings.
The Nextflow project main script.
Multiple files can be added to make pipelines more modular and manageable.
Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list. The following formats are supported:
DIFF (.diff)
GROOVY (.groovy .nf)
JAVASCRIPT (.js .javascript)
JSON (.json)
SH (.sh)
SQL (.sql)
TXT (.txt)
XML (.xml)
YAML (.yaml .cwl)
For each process defined by the workflow, ICA will launch a compute node to execute the process.
For each compute type, the standard
(default - AWS on-demand) or economy
(AWS spot instance) tiers can be selected.
When selecting an fpga instance type for running analyses on ICA, it is recommended to use the medium size. While the large size offers slight performance benefits, these do not proportionately justify the associated cost increase for most use cases.
When no type is specified, the default type of compute node is standard-small
.
By default, compute nodes have no scratch space. This is an advanced setting and should only be used when absolutely necessary as it will incur additional costs and may offer only limited performance benefits because it is not local to the compute node.
For simplicity and better integration, consider using shared storage available at /ces
. It is what is provided in the Small/Medium/Large+ compute types. This shared storage is used when writing files with relative paths.
Daemon sets and system processes consume approximately 1CPU and 2GB Mem from the base values shown in the table. Consumption will vary based on the activity of the pod.
* The compute type "fpga-small" is no longer available. Use 'fpga-medium' instead. fpga-large offers little performance benefit at additional cost.
** The transfer size selected is based on the selected storage size for compute type and used during upload and download system tasks.
Use the following instructions to start a new analysis for a single pipeline.
Select a project.
From the project menu, select Flow > Pipelines.
Select the pipeline to run.
Select Start a New Analysis.
Configure analysis settings. See Analysis Properties.
Select Start Analysis.
View the analysis status on the Analyses page.
Requested—The analysis is scheduled to begin.
In Progress—The analysis is in progress.
Succeeded—The analysis is complete.
Failed and Failed Final—The analysis has failed or was aborted.
To end an analysis, select Abort.
To perform a completed analysis again, select Re-run.
The following sections describe the analysis properties that can be configured in each tab.
The Analysis tab provides options for configuring basic information about the analysis.
You can view analysis results on the Analyses page or in the output_folder on the Data page.
Select a project, and then select the Flow > Analyses page.
Select an analysis.
On the Result tab, select an output file.
To preview the file, select the View tab.
Add or remove any user or technical tags, and then select Save.
To download, select Schedule for Download.
View additional analysis result information on the following tabs:
Details—View information on the pipeline configuration.
Logs—Download information on the pipeline process.
Flow provides tooling for building and running secondary analysis pipelines. The platform supports analysis workflows constructed using Common Workflow Language (CWL) and Nextflow. Each step of an analysis pipeline executes a containerized application using inputs passed into the pipeline or output from previous steps.
You can configure the following components in Illumina Connected Analytics Flow:
Tools — Pipeline components that are configured to process data input files. See .
Pipelines — One or more tools configured to process input data and generate output files. See .
Analyses — Launched instance of a pipeline with selected input data. See .
ICA supports running pipelines defined using . See for an example.
In order to run Nextflow pipelines, the following process-level attributes within the Nextflow definition must be considered.
Info | Details |
---|
You can select the Nextflow version while building a pipeline as follows:
interface |
---|
For each compute type, you can choose between the scheduler.illumina.com/lifecycle: standard
(default - AWS on-demand) or scheduler.illumina.com/lifecycle: economy
(AWS spot instance) tiers.
To specify a compute type for a Nextflow process, use the within each process. Set the annotation
to scheduler.illumina.com/presetSize
and the value
to the desired compute type. A list of available compute types can be found . The default compute type, when this directive is not specified, is standard-small
(2 CPUs and 8 GB of memory).
Often, there is a need to select the compute size for a process dynamically based on user input and other factors. The Kubernetes executor used on ICA does not use the cpu
and memory
directives, so instead, you can dynamically set the pod
directive, as mentioned . e.g.
Additionally, it can also be specified in the . Example configuration file:
For Nextflow version 20.10.10 on ICA, using the "copy" method in the publishDir
directive for uploading output files that consume large amounts of storage may cause workflow runs to complete with missing files. The underlying issue is that file uploads may silently fail (without any error messages) during the publishDir
process due to insufficient disk space, resulting in incomplete output delivery.
Workarounds:
Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list.
If no Docker image is specified, Ubuntu will be used as default.
The following configuration settings will be ignored if provided as they are overridden by the system:
Pipelines defined using the "Code" mode require either an XML-based or JSON-based input form to define the fields shown on the launch view in the user interface (UI). The XML-based input form is defined in the "XML Configuration" tab of the pipeline editing view.
The input form XML must adhere to the input form schema.
During the creation of a Nextflow pipeline the user is given an empty form to fill out.
The input files are specified within a single DataInputs node. An individual input is then specified in a separate DataInput node. A DataInput node contains following attributes:
code: an unique id. Required.
format: specifying the format of the input: FASTA, TXT, JSON, UNKNOWN, etc. Multiple entries are possible: example below. Required.
type: is it a FILE or a DIRECTORY? Multiple entries are not allowed. Required.
required: is this input required for the execution of a pipeline? Required.
multiValue: are multiple files as an input allowed? Required.
dataFilter: TBD. Optional.
Additionally, DataInput has two elements: label for labelling the input and description for a free text description of the input.
An example of a single file input which can be in a TXT, CSV, or FASTA format.
To use a folder as an input the following form is required:
For multiple files, set the attribute multiValue to true. This will make it so the variable is considered to be of type list [], so adapt your pipeline when changing from single value to multiValue.
Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc. The following hierarchy of nodes must be followed: steps > step > tool > parameter. The parameter node must contain following attributes:
code: unique id. This is the parameter name that is passed to the workflow
minValues: how many values (at least) should be specified for this setting. If this setting is required, minValues
should be set to 1.
maxValues: how many values (at most) should be specified for this setting
classification: is this setting specified by the user?
In the code below a string setting with the identifier inp1 is specified.
Examples of the following types of settings are shown in the subsequent sections. Within each type, the value
tag can be used to denote a default value in the UI, or can be left blank to have no default. Note that setting a default value has no impact on analyses launched via the API.
For an integer setting the following schema with an element integerType is to be used. To define an allowed range use the attributes minimumValue and maximumValue.
Options types can be used to designate options from a drop-down list in the UI. The selected option will be passed to the workflow as a string. This currently has no impact when launching from the API, however.
Option types can also be used to specify a boolean, for example
For a string setting the following schema with an element stringType
is to be used.
For a boolean setting, booleanType
can be used.
One known limitation of the schema presented above is the inability to specify a parameter that can be multiple type, e.g. File or String. One way to implement this requirement would be to define two optional parameters: one for File input and the second for String input. At the moment ICA UI doesn't validate whether at least one of these parameters is populated - this check can be done within the pipeline itself.
Below one can find both a main.nf and XML configuration of a generic pipeline with two optional inputs. One can use it as a template to address similar issues. If the file parameter is set, it will be used. If the str parameter is set but file is not, the str parameter will be used. If neither of both is used, the pipeline aborts with an informative error message.
Object Class | ICA Status |
---|---|
Variable | Location |
---|---|
Copy Data Rights | Source Project | Destination Project |
---|---|---|
Copy Data Restrictions | Source Project | Destination Project |
---|---|---|
Move Data Rights | Source Project | Destination Project |
---|---|---|
Move Data Restrictions | Source Project | Destination Project |
---|---|---|
ICA Project Region | AWS Region |
---|---|
Region | Role ARN |
---|---|
Error Type | Error Message | Description/Fix |
---|---|---|
ICA supports overriding workflow requirements at load time using Command Line Interface (CLI) with JSON input. Please refer to for more details on the CWL overrides feature.
Field | Entry |
---|
Field | Entry |
---|
Menu | Description |
---|
Widget | Settings |
---|
Placeholder | Description |
---|
See
Field | Entry |
---|
Inputs are specified via the or JSON-based input form. The specified code
in the XML will correspond to the field in the params
object that is available in the workflow. Refer to the for an example.
Outputs for Nextflow pipelines are uploaded from the out
directory in the attached shared filesystem. The can be used to symlink (recommended), copy or move data to the correct folder. Data will be uploaded to the ICA project after the pipeline execution completes.
Use "" instead of "copy" in the publishDir
directive. Symlinking creates a link to the original file rather than copying it, which doesn’t consume additional disk space. This can prevent the issue of silent file upload failures due to disk space limitations.
Use the latest version of Nextflow supported (22.04.0) and enable the "" publishDir
option. This option ensures that the workflow will fail and provide an error message if there's an issue with publishing files, rather than completing silently without all expected outputs.
During execution, the Nextflow pipeline runner determines the environment settings based on values passed via the command-line or via a configuration file (see ). When creating a Nextflow pipeline, use the nextflow.config tab in the UI (or API) to specify a nextflow configuration file to be used when launching the pipeline.