1 of 100

Illumina Connected Analytics

Introduction

Illumina® Connected Analytics is a cloud-based software platform intended to be used to manage, analyze, and interpret large volumes of multi-omics data in a secure, scalable, and flexible environment. The versatility of the system allows the platform to be used for a broad range of applications.

gives an overview of how to access and configure ICA with providing access prerequisites.

New

To see what is new in the latest version, use the and the .

Get Started

About the Platform

When using the applications provided on the platform for diagnostic purposes, it is the responsibility of the user to determine regulatory requirements and to validate for intended use, as appropriate.

The platform is hosted in regions listed below.

Region Name

Region Identifier

Australia

The platform hosts a suite of RESTful HTTP-based application programming interfaces (APIs) to perform operations on data and analysis resources. A web application user-interface is hosted alongside the API to deliver an interactive visualization of the resources and enables additional functionality beyond automated analysis and data transfer. Storage and compute costs are presented via usage information in the account console, and a variety of compute resource options are specifiable for applications to fine tune efficiency.

Our systems are synchronized using a Cloud Time Sync Service to ensure accurate timekeeping and consistent log timestamps.

Getting Started

The user documentation provides material for learning the basics of interacting with the platform including examples and tutorials. Start with the documentation to learn more.

Getting Help

Use the search bar on the top right to navigate through the help docs and find specific topics of interest.

If you have any questions, contact Illumina Technical Support by phone or email:

Illumina Technical Support | [email protected] | 1-800-809-4566

For customers outside the United States, Illumina regional Technical Support contact information can be found at www.illumina.com/company/contact-us.html.

To see the current ICA version you are logged in to, click your username found on the top right of the screen and then select About.

Other Illumina Products

To view a list of the products to which you have access, select the 9 dots symbol at the top right of ICA. This will list your products. If you have multiple regional applications for the same product, the region of each is shown between brackets.

The More Tools category presents the following options

My Illumina Dashboard to monitor instruments, streamline purchases and keep track of upcoming activities.
Link to the Support Center for additional information and help.
Link to the order management from where you can keep track of your current and past orders.

Release Notes

In the section of the documentation, posts are made for new versions of deployments of the core platform components.

Home

Projects

Introduction

When looking at the main ICA navigation, you will see the following structure:

Projects are your primary work locations which contain your data and tools to execute your analyses. Projects can be considered as a binder for your work and information. You can have data contained within a project, or you can choose to make it shareable between projects.

Bundles

Bundles are curated data sets which combine assets such as pipelines, tools, and Base query templates. This is where you will find packaged assets such as Illumina-provided pipelines and sample data. You can create, share and use bundles in projects of your own tenant as well as projects in other tenants.

There is a combined limit of 30 000 projects and bundles per tenant.

The following ICA assets can be included in bundles:

(link / unlink)
(link / unlink)
(add / delete)
(link/unlink)
and (link/unlink)
(read-only) (link/unlink)

The main Bundles screen has two tabs: My Bundles and Entitled Bundles. The My Bundles tab shows all the bundles that you are a member of. This tab is where most of your interactions with bundles occur. The Entitled Bundles tab shows the bundles that have been specially created by Illumina or other organizations and shared with you to use in your projects. See .

Some bundles come with additional restrictions such as disabling bench access or internet access when running pipelines to protect the data contained in them. When you link these bundles, the restrictions will be enforced on your project. Unlinking the bundle will not remove the restrictions.

You can not link bundles which come with additional restrictions to .

As of ICA v.2.29, the content in bundles is linked in such a way that any updates to a bundle are automatically propagated to the projects which have that bundle linked.

If you have created bundle links in ICA versions prior to ICA v2.29 and want to switch them over to links with dynamic updates, you need to unlink and relink them.

Linking an Existing Bundle to a Project

From the main navigation page, select Projects > your_project > Project Settings > Details.
Click the Edit button at the top of the Details page.
Click the + button, under Linked bundles.

The assets included in the bundle will now be available in the respective pages within the Project (e.g. Data and Pipelines pages). Any updates to the assets will be automatically available in the destination project.

To unlink a bundle from a project,

Select Projects > your_project > Project Settings > Details.
Click the Edit button at the top of the Details page.
Click the (-) button, next to the linked bundle you wish to remove.

Bundles and projects have to be in the same region in order to be linked. Otherwise, the error The bundle is in a different region than the project so it's not eligible for linking will be displayed.

The owning tenant of a project must have access to a bundle if you want to link that bundle to the project. You do not carry your access to a bundle over if you are invited to projects of other tenants.

When linking a bundle which includes base to a project that does not have base enabled, there are two possibilities:

Base is not allowed due to entitlements: The bundle will be linked and you will be given access to the data, pipelines, samples,... but you will not see the base tables in your project.

You can not unlink bundles which were linked by external applications

Create a New Bundle

To create a new bundle and configure its settings, do as follows.

From the main navigation, select Bundles.
Select + Create .
Enter a unique name for the bundle.

There is no option to delete bundles, they must be deprecated instead.

To cancel creating a bundle, select Bundles from the navigation at the top of the screen to return to your bundles overview.

Edit an Existing Bundle

To make changes to a bundle:

From the main navigation, select Bundles.
Select a bundle.
Select Edit.

When the changes are saved, they also become available in all projects that have this bundle linked.

Adding Assets to a Bundle

To add assets to a bundle:

Select a bundle.
On the left-hand side, select the type of asset (such as Flow > pipelines, Base > Tables or Bench > Docker Images) you want to add to the bundle.
Select link to add assets to the bundle.

Assets must meet the following requirements before they can be added to a bundle:

For Samples and Data, the the asset belongs to must have data sharing enabled.
The region of the project containing the asset must match the region of the bundle.
You must have permission to access the project containing the asset.

When you link folders to a bundle, a warning is displayed indicating that, depending on the size of the folder, linking may take considerable time. The linking process will run in the background and the progress can be monitored on the Bundles > your_bundle > activity > Batch Jobs screen. To see more details and the progress, double-click the batch job and then double-click the individual item. This will show how many individual files are already linked.

You can not add the same asset twice to a bundle. Once added, the asset will no longer appear in the asset selection list.

You need to be in list view in order to unlink items from a bundle. Select the item and choose the unlink action at the top of the screen.

Which batch jobs are visible as activity depends on the user role.

Create a New Bundle Version

When creating a new bundle version, you can only add assets to the bundle. You cannot remove existing assets from a bundle when creating a new version. If you need to remove assets from a bundle, it is recommended that you create a new bundle. All users wich currently have access to a bundle will automatically have access to the new version as well.

From the main navigation, select Bundles.
Select a bundle.
Select the + Create new Version button.

When you create a new version of a bundle, it will replace the old version in your list. To see the old version, open your new bundle and look at Bundles > your_bundle > Details > Versioning. There you can open the previous version which is contained in your new version.

Assets such as data which were added in a previous version of your bundle will be marked in green, while new content will be black.

Add Terms of Use to a Bundle

From the main navigation, Select Bundles > your_bundle > Bundle Settings > Legal from the left hand navigation.
To add Terms of Use to a Bundle, do as follows:
- Select + Create New Version.

Collaborating on a Bundle

If you want to collaborate with other people on creating a bundle and managing the assets in the bundle, you can add users to your bundle and set their permissions. You use this to create a bundle together, not to use the bundle in your projects.

From the main navigation, select Bundles > your_bundle > Bundle Settings > Team.
To invite a user to collaborate on the bundle, do as follows.
- To add a user from your tenant, select Someone of your tenant and select a user from the drop-down list.

Once you have finalized your bundle and added all assets and legal requirements, you can share your bundle with other tenants to use it in their projects.

Your bundle must be in released status to prevent it from being updated while it is shared.

Go to Bundles > your_bundle > Edit > Details > Bundle status and set it to Released.
Save the change.

Once the bundle is released, you can share it. Invitations are sent to an individual email address, however access is granted and extended to all users and all workgroups inside that tenant.

Go to Bundles > your_bundle > Bundle Settings > Share.
Click Invite and enter the email address of the person you want to share the bundle with. They will receive an email from which they can accept or reject the invitation to use the bundle. The invitation will show the bundle name, description and owner. The link in the invite can only be used once.

Do not to create duplicate entries. You can only use one user/tenant combination per bundle.

You can follow up on the status of the invitation on the Bundles > your_bundle > Bundle Settings > Share page.

If they reject the bundle, the rejection date will be shown. To re-invite that person again later on, select their email address in the list and choose Remove. You can then create a new invitation. If you do not remove the old entry before sending a new invitation, they will be unable to accept and get an error message stating that the user and bundle combination must be unique. They can also not re-use an invitation once it has been accepted or declined.
If they accept the bundle, the acceptance date will be shown. They will in turn see the bundle under Bundles > Entitled bundles. To remove access, select their email address in the list and choose Remove.

Entitled Bundles

Entitled bundles are bundles created by Illumina or third parties for you to use in your projects. Entitled bundles can already be part of your tenant when it is part of your subscription. You can see your entitled bundles at Bundles > Entitled Bundles.

To use your shared entitled bundle, add the bundle to your project via Project Linking (Projects > your_project > Data > Manage > Link). Content shared via entitled bundles is read-only, so you cannot add or modify the contents of an entitled bundle. If you lose access to an entitled bundle previously shared with you, the bundle is unlinked and you will no longer be able to access its contents.

Event Log

The event log shows an overview of system events with options to search and filter. For every entry, it lists the following:

Event date and time
Category (error, warn or info)
Code
Description
Tenant

Up to 200,000 results will be be returned. If your desired records are outside the range of the returned records, please refine the filters or use the search function at the top right.

Export is restricted to the amount of entries shown per page. You can use the selector at the bottom to set this to up to 1000 entries per page.

Project

Data Integrity

You can verify the integrity of the data by comparing the hash which is usually () an MD5 (Message Digest Algorithm 5) checksum. This is a common cryptographic hash function that generates a fixed-size, 128-bit hash value from any input data. This hash value is unique to the content of the data, meaning even a slight change in the data will result in a significantly different MD5 checksum. AWS S3 calculates this checksum when data is uploaded and stores it in the ETag (Entity tag).

For files smaller than 16 MB, you can directly retrieve the MD5 checksum using our endpoints. Make an API GET call to the https://ica.illumina.com/ica/rest/api/projects/{projectId}/data/{dataId} endpoint specifying the data Id you want to check and the corresponding project ID. The response you receive will be in JSON format, containing various file metadata. Within the JSON response, look for the objectETag field. This value is the MD5 checksum for the file you have queried. You can compare this checksum with the one you compute locally to ensure file integrity.

This ETag does not change and can be used as a file integrity check even when that file is archived, unarchived and/or copied to another location. Changes to the metadata have no impact on the ETag

Storage Cost Managment

Since there is a storage cost associated with the data in your projects, it is good practice to regularly check how much cost is being generated by your projects and evaluate which data can be removed from cloud storage. The instructions provided here will help you quickly determine which data is generating the highest storage costs.

Monitoring Storage Cost

To see how much storage costs are currently being generated for your tenant, you can look at the usage explorer at https://platform.illumina.com/usage/ or from within ICA, navigate to the 9-dot symbol () in the top right next to your name and choose the usage explorer from the menu.

From the usage explorer overview screen, you can see below the graphical representation which projects are incurring the highest storage costs.

Project Files in ICA

When you have determined which projects are incurring the largest storage costs, you can find out which files within that project are taking up the most space. To find the largest files in your project,

Go to Projects > your_project > Data and switch to list view with the () icon left above your files.
Use the column icon () top right to add the size column to your view. You can drag the size column to the desired position in your list view or use the move left and use right options which appear when you select the three vertical dots.
Select Sort descending to show the largest files first.

Once you have the list sorted like this, you can evaluate if those large files are still needed, if the can be sent to (manage > archive) or if they can be deleted (manage > delete).

Samples

You can use samples to group information related to a sample, including input files, output files, and analyses. You can consider samples as creating a binder to collect related information. When you then link that sample to another project, you bring over the empty binder, but not the files contained in it. In that project, you can then add your own data to it.

You can search for samples (excluding their metadata) with the Search button at the top right.

Add New Sample

To add a new sample, do as follows.

Flow

Flow provides tooling for building and running secondary analysis pipelines. The platform supports analysis workflows constructed using Common Workflow Language (CWL) and Nextflow. Each step of an analysis pipeline executes a containerized application using inputs passed into the pipeline or output from previous steps.

You can configure the following components in Illumina Connected Analytics Flow:

Reference Data — Reference Data for Graphical CWL flows. See .
Pipelines — One or more tools configured to process input data and generate output files. See .

Snowflake

User

Every Base user has 1 snowflake username: ICA_U_<id>

User/Project-Bundle

For each user/project-bundle combination a role is created: ICA_UR_<id>_<name project/bundle>__<id>

This role receives the viewer or contributor role of the project/bundle, depending on their permissions in ICA.

Roles

Every project or bundle has a dedicated Snowflake database.

For each database, 2 roles are created:

<project/bundle name>_<id>_VIEWER
<project/bundle name>_<id>_CONTRIBUTOR

Project viewer role

This role receives

REFERENCE and SELECT rights on the tables/views within the project's PUBLIC schema.
Grants on the viewer roles of the bundles linked to the project.

Project contributor role

This role receives the following rights on current an future objects in the project's/bundle database in the PUBLIC schema:

ownership
select, insert, update, delete, truncate and references on tables/views/materialized views
usage on sequences/functions/procedures/file formats

It also receives grant on the viewer role of the project.

Warehouses

For each project (not bundle!) 2 warehouses are created, whose size can be changed ICA at projects > your_project > project settings > details.

<projectname>_<id>_QUERY
<projectname>_<id>_LOAD

Using Load instead of Query warehouse

When you generates an oauth token, ICA always uses the QUERY warehouse by default (see bold part below):

snowsql -a iap.us-east-1 -u ICA_U_277853 --authenticator=oauth -r ICA_UR_274853_603465_264891 -d atestbase2_264891 -s PUBLIC -w ATESTBASE2_264891_QUERY --token=<token>

If you wish to use the LOAD warehouse in a session, you have 2 options :

Synchronizing Tables

if you have directly in Snowflake with the OAuth token, you can synchronize them to appear in ICA by means of the Projects > your_project > Base > Tables > Sync button.

Bench

ICA provides a tool called Bench for interactive data analysis. This is a sandboxed workspace which runs a docker image with access to the data and pipelines within a project. This workspace runs on the Amazon S3 system and comes with associated processing and provisioning costs. It is therefore best practice to not keep your Bench instances running indefinitely, but stopping them when not in use.

Access

Having access to Bench depends on the following conditions:

Bench needs to be included in your ICA subscription.
The project owner needs to enable Bench for their project.
Individual users of that project need to be given access to Bench.

Enabling Bench for your project

After creating a project, go to Projects > your_project > Bench > Workspaces page and click the Enable button. The entitlements you have determine the available resources for your Bench workspaces. If you have multiple entitlements, all the resources of your individual entitlements are taken into account. Once bench is enabled, users with matching have access to the Bench module in that project.

If you do not see the Enable button for Bench, then either your tenant subscription does not include Bench or the tenant to which you belong is not the one where the project was created. Users from other tenants can create workspaces in Bench once Bench is enabled, but they cannot enable the Bench module itself.

Setting user level access.

Once Bench has been enabled for your project, the combination of roles and teams settings determines if a user can access Bench.

Tenant administrators and project owners are always able to access Bench and perform all actions.
The teams settings page at Projects > your_project > Project Settings > Team determines the role for the user/workgroup.
- No Access means you have no access to the Bench workspace for that project.

Sun Grid Engine (SGE) on ICA Bench

Running Jobs in a Bench SGE Cluster

Once a cluster is started, the cluster manager can be accessed from the workspace node.

Spark on ICA Bench

Running a Spark application in a Bench Spark Cluster

Running a pyspark application

The JupyterLab environment is by default configured with 3 additional kernels

PySpark –

JupyterLab

Bench workspaces require setting a Docker image to use as the image for the workspace. Illumina Connected Analytics (ICA) provides a default Docker image with installed.

JupyterLab supports (.ipynb). Notebook documents consist of a sequence of cells which may contain executable code, markdown, headers, and raw text.

The JupyterLab Docker image contains the following environment variables:

Variable

Set to

FUSE Driver

Bench Workspaces use a FUSE driver to mount project data directly into a workspace file system. There are both read and write capabilities with some limitations on write capabilities that are enforced by the underlying AWS S3 storage.

As a user, you are allowed to do the following actions from Bench (when having the correct user permissions compared to the workspace permissions) or through the CLI:

Copy project data
Delete project data

Cohorts

Introduction to Cohorts

ICA Cohorts is a cohort analysis tool integrated with Illumina Connected Analytics (ICA). ICA Cohorts combines subject- and sample-level metadata, such as phenotypes, diseases, demographics, and biometrics, with molecular data stored in ICA to perform tertiary analyses on selected subsets of individuals.

Prepare Metadata Sheets

In ICA Cohorts, metadata describe any subjects and samples imported into the system in terms of attributes, including:

subject:
- demographics such as age, sex, ancestry;

Precomputed GWAS and PheWAS

The GWAS and PheWAS tabs in ICA Cohorts allow you to visualize precomputed analysis results for phenotypes/diseases and genes, respectively. Note that these do not reflect the subjects that are part of the cohort that you created.

ICA Cohorts currently hosts GWAS and PheWAS analysis results for approximately 150 quantitative phenotypes (such as "LDL direct" and "sitting height") and about 700 diseases.

Visualize Results from Precomputed Genome-Wide Association Studies (GWAS)

Navigate to the GWAS tab and start looking for phenotypes and diseases in the search box. Cohorts will suggest the best matches against any partial input ("cancer") you provide. After selecting a phenotype/disease, Cohorts will render a Manhattan plot, by default collapsed to gene level and organized by their respective position in each chromosome.

Circles in the Manhattan plot indicate binary traits, potential associations between genes and diseases. Triangles indicate quantitative phenotypes with regression Beta different from zero, and point up or down to depict positive or negative correlation, respectively.

Hovering over a circle/triangle will display the following information:

gene symbol
variant group (see below)
P-value, both raw and FDR-corrected
number of carriers of variants of the given type

For gene-level results, Cohorts distinguishes five different classes of variants: protein truncating; deleterious; missense; missense with a high ILMN PrimateAI score (indicating likely damaging variants); and synonymous variants. You can limit results to either of these five classes, or select All to display all results together.

Deleterious variants (del): the union of all protein-truncating variants (PTVs, defined below), pathogenic missense variants with a PrimateAI score greater than a gene-specific threshold, and variants with a SpliceAI score greater than 0.2.
Protein-truncating variants (ptv): variant consequences matching any of stop_gained, stop_lost, frameshift_variant, splice_donor_variant

To zoom in to a particular chromosome, click the chromosome name underneath the plot, or select the chromosome from the drop down box, which defaults to Whole genome.

Visualize Results from Precomputed Phenome-Wide Association Studies (PheWAS)

To browse PheWAS analysis results by gene, navigate to the PheWAS tab and enter a gene of interest into the search box. The resulting Manhattan plot will show phenotypes and diseases organized into a number of categories, such as "Diseases of the nervous system" and "Neoplasms". Click on the name of a category, shown underneath the plot, to display only those phenotypes/diseases, or select a category from the drop down, which defaults to All.

Connectivity

The platform provides Connectors to facilitate automation for operations on data (ie, upload, download, linking).

Service connectors sync data between your local computer or server and the project's cloud-based data storage.
Project connectors link data between individual projects.

Command-Line Interface

Installation

Download links for the CLI can be found at the Release History.

Both the CLI and the service connector require x86 architecture. For ARM-based architecture on Mac or Windows, you need to keep x86 emulation enabled. Linux does not support x86 emulation.

After the file is downloaded, place the CLI in a folder that is included in your $PATH environment variable list of paths, typically /usr/local/bin. Open the Terminal application, navigate to the folder where the downloaded CLI file is located (usually your Downloads folder), and run the following command to copy the CLI file to the appropriate folder. If you do not have write access to your /usr/local/bin folder, then you may use sudo (which requires a password) prior to the cp command. For example:

If you do not have sudo access on your system, contact your administrator for installation. Alternately, you may place the file in an alternate location and update your $PATH to include this location (see the documentation for your shell to determine how to update this environment variable).

You will also need to make the file executable so that the CLI can run:

You will likely want to place the CLI in a folder that is included in your $PATH environment variable list of paths. In Windows, you typically want to save your applications in the C:\Program Files

Authentication

The ICA CLI uses an Illumina API Key to authenticate. An Illumina API Key can be acquired through the product dashboard after logging into a domain. See API Keys for instructions to create an Illumina API Key.

Authenticate using icav2 config set command. The CLI will prompt for an x-api-key value. Input the API Key generated from the product dashboard here. See the example below (replace EXAMPLE_API_KEY with the actual API Key).

icav2 config set
Creating /Users/johngenome/.icav2/config.yaml
Initialize configuration settings [default]
server-url [ica.illumina.com]: 
x-api-key : EXAMPLE_API_KEY
output-format (allowed values table,yaml,json defaults to table) : 
colormode (allowed values none,dark,light defaults to none) :

The CLI will save the API Key to the config file as an encrypted value.

If you want to overwrite existing environment values, use the command icav2 config set. To remove an existing configuration/session file, use the command icav2 config reset.\

Check the server and confirm you are authenticated using icav2 config get

If during these steps or in the future you need to reset the authentication, you can do so using the command: icav2 config reset

Config Settings

The ICA CLI accepts configuration settings from multiple places, such as environment variables, configuration file, or passed in as command line arguments. When configuration settings are retrieved, the following precedence is used to determine which setting to apply:

Command line options - Passed in with the command such as --access-token
Environment variables - Stored in system environment variables such as ICAV2_ACCESS_TOKEN

Output Format

The CLI supports outputs in table, JSON, and YAML formats. The format is set using the output-format configuration setting through a command line option, environment variable, or configuration file.

Dates are output as UTC times when using JSON/YAML output format and local times when using table format.

To set the output format, use the following setting:

--output-format <string>

json - Outputs in JSON format
yaml - Outputs in YAML format
table - Outputs in tabular format

Sequencer Integration

Cloud Analysis Auto-launch

Please see the Illumina Connected Software site for all content related to Cloud Analysis Auto-Launch:

Tutorials

Bundles

There is a combined limit of 30 000 projects and bundles per tenant.

The following ICA assets can be included in bundles:

(link / unlink)
(link / unlink)
(add / delete)
(link/unlink)
and (link/unlink)
(read-only) (link/unlink)

You can not link bundles which come with additional restrictions to .

As of ICA v.2.29, the content in bundles is linked in such a way that any updates to a bundle are automatically propagated to the projects which have that bundle linked.

If you have created bundle links in ICA versions prior to ICA v2.29 and want to switch them over to links with dynamic updates, you need to unlink and relink them.

Linking an Existing Bundle to a Project

From the main navigation page, select Projects > your_project > Project Settings > Details.
Click the Edit button at the top of the Details page.
Click the + button, under Linked bundles.

To unlink a bundle from a project,

Select Projects > your_project > Project Settings > Details.
Click the Edit button at the top of the Details page.
Click the (-) button, next to the linked bundle you wish to remove.

When linking a bundle which includes base to a project that does not have base enabled, there are two possibilities:

Base is not allowed due to entitlements: The bundle will be linked and you will be given access to the data, pipelines, samples,... but you will not see the base tables in your project.

You can not unlink bundles which were linked by external applications

Create a New Bundle

To create a new bundle and configure its settings, do as follows.

From the main navigation, select Bundles.
Select + Create .
Enter a unique name for the bundle.

There is no option to delete bundles, they must be deprecated instead.

To cancel creating a bundle, select Bundles from the navigation at the top of the screen to return to your bundles overview.

Edit an Existing Bundle

To make changes to a bundle:

From the main navigation, select Bundles.
Select a bundle.
Select Edit.

When the changes are saved, they also become available in all projects that have this bundle linked.

Adding Assets to a Bundle

To add assets to a bundle:

Select a bundle.
On the left-hand side, select the type of asset (such as Flow > pipelines, Base > Tables or Bench > Docker Images) you want to add to the bundle.
Select link to add assets to the bundle.

Assets must meet the following requirements before they can be added to a bundle:

For Samples and Data, the the asset belongs to must have data sharing enabled.
The region of the project containing the asset must match the region of the bundle.
You must have permission to access the project containing the asset.

You can not add the same asset twice to a bundle. Once added, the asset will no longer appear in the asset selection list.

You need to be in list view in order to unlink items from a bundle. Select the item and choose the unlink action at the top of the screen.

Which batch jobs are visible as activity depends on the user role.

Create a New Bundle Version

From the main navigation, select Bundles.
Select a bundle.
Select the + Create new Version button.

Assets such as data which were added in a previous version of your bundle will be marked in green, while new content will be black.

Add Terms of Use to a Bundle

From the main navigation, Select Bundles > your_bundle > Bundle Settings > Legal from the left hand navigation.
To add Terms of Use to a Bundle, do as follows:
- Select + Create New Version.

Collaborating on a Bundle

From the main navigation, select Bundles > your_bundle > Bundle Settings > Team.
To invite a user to collaborate on the bundle, do as follows.
- To add a user from your tenant, select Someone of your tenant and select a user from the drop-down list.

Once you have finalized your bundle and added all assets and legal requirements, you can share your bundle with other tenants to use it in their projects.

Your bundle must be in released status to prevent it from being updated while it is shared.

Go to Bundles > your_bundle > Edit > Details > Bundle status and set it to Released.
Save the change.

Once the bundle is released, you can share it. Invitations are sent to an individual email address, however access is granted and extended to all users and all workgroups inside that tenant.

Go to Bundles > your_bundle > Bundle Settings > Share.
Click Invite and enter the email address of the person you want to share the bundle with. They will receive an email from which they can accept or reject the invitation to use the bundle. The invitation will show the bundle name, description and owner. The link in the invite can only be used once.

Do not to create duplicate entries. You can only use one user/tenant combination per bundle.

You can follow up on the status of the invitation on the Bundles > your_bundle > Bundle Settings > Share page.

If they reject the bundle, the rejection date will be shown. To re-invite that person again later on, select their email address in the list and choose Remove. You can then create a new invitation. If you do not remove the old entry before sending a new invitation, they will be unable to accept and get an error message stating that the user and bundle combination must be unique. They can also not re-use an invitation once it has been accepted or declined.
If they accept the bundle, the acceptance date will be shown. They will in turn see the bundle under Bundles > Entitled bundles. To remove access, select their email address in the list and choose Remove.

Entitled Bundles

Create a Cohort

ICA Cohorts lets you create a research cohort of subjects and associated samples based on the following criteria:

Project:
- Include subjects that are part of any ICA Project that you own or that is shared with you.
- Sample:
  - Sample type such as FFPE.
  - Tissue type.
  - Sequencing technology: Whole genome DNA-sequencing, RNAseq, single-cell RNAseq, etc.
Subject:
- Subject inclusion by Identifier:
  - Input a list of Subject Identifiers (up to 100 entries) when defining a cohort.
Sample:
- Sample type such as FFPE.
- Tissue type.
- Sequencing technology: Whole genome DNA-sequencing, RNAseq, single-cell RNAseq, etc.
Disease:
- Phenotypes and diseases from standardized ontologies.
Drug:
- Drugs from standardized ontologies along with specific typing, stop reasons, drug administration routes, and time points.
Molecular attributes:
- Samples with a somatic mutation in one or multiple, specified genes.
- Samples with a germline variant of a specific type in one or multiple, specified genes.

Disease search

ICA Cohorts currently uses six standard medical ontologies to 1) annotate each subject during ingestion and then to 2) search for subjects: HPO for phenotypes, MeSH, SNOMED-CT, ICD9-CM, ICD10-CM, and OMIM for diseases. By default, any 'type-ahead' search will find matches from all six; and you can limit the search to only the one(s) you prefer. When searching for subjects using names or codes from one of these ontologies, ICA Cohorts will automatically match your query against all the other ontologies, therefore returning subjects that have been ingested using a corresponding entry from another ontology.

In the 'Disease' tab, you can search for subjects diagnosed with one or multiple diseases, as well as phenotypes, in two ways:

Start typing the English name of a disease/phenotype and pick from the suggested matches. Continue typing if your disease/phenotype of interest is not listed initially.
- Use the mouse to select the term or navigate to the term in the dropdown using the arrow buttons.
- If applicable, the concept hierarchy is shown, with ancestors and immediate children visible.

Drug Search

In the 'Drug' tab, you can search for subjects who have a specific medication record:

Start typing the concept name for the drug and pick from suggested matches. Continue typing if the drug is not listed initially.
Paste one or multiple drug concept codes. ICA Cohorts currently use RXNorm as a standard ontology during ingestion. If multiple concepts are in your instance of ICA Cohorts, they will be listed under 'Concept Ontology.'
'Drug Type' is a static list of qualifiers that denote the specific administration of the drug. For example, where the drug was dispensed.

Measurement Search

In the ‘Measurements’ tab, you can search for vital signs and laboratory test data leveraging LOINC concept codes. ·

Start typing the English name of the LOINC term, for example, ‘Body height’. A dropdown will appear with matching terms. Use the mouse or down arrows to select the term.
Upon selecting a term, the term will be available for use in a query.
Terms can be added to your query criteria.

Include/Exclude

As attributes are added to the 'Selected Condition' on the right-navigation panel, you can choose to include or exclude the criteria selected.
- Select a criterion from 'Subject', 'Disease', and/or 'Molecular' attributes by filling in the appropriate checkbox on the respective attribute selection pages.
- When selected, the attribute will appear in the right-navigation panel.

Once you selected Create Cohort, the above data are organized in tabs such as Project, Subject, Disease, and Molecular. Each tab then contains the aforementioned sections, among others, to help you identify cases and/or controls for further analysis. Navigate through these tabs, or search for an attribute by name to directly jump to that tab and section, and select attributes and values that are relevant to describe your subjects and samples of interest. Assign a new name to the cohort you created, and click Apply to save the cohort.

Duplicate a Cohort Definition

After creating a Cohort, select the Duplicate icon.
A copy of the Cohort definition will be created and tagged with "_copy".

Delete a Cohort Definition

Deleting a Cohort Definition can be accomplished by clicking the Delete Cohort icon.
This action cannot be undone.

After creating a Cohort, users can set a Cohort bookmark as Shared. By sharing a Cohort, the Cohort will be available to be applied across the project by other users with access to the Project. Cohorts created in a Project are only accessible at scope of the user. Other users in the project cannot see the cohort created unless they use this sharing functionality.

Create a Cohort using the directions above.
To make the Cohort available to other users in your Project, click the Share icon.
The Share icon will be filled in black and the Shared Status will be turned from Private

Unshare a Cohort Definition

To unshare the Cohort, click the Share icon.
The icon will turn from black to white, and other users within the project will no longer have access to this cohort definition.

Archive a Cohort Definition

A Shared Cohort can be Archived.
Select a Shared Cohort with a black Shared Cohort icon.
Click the Archive Cohort icon.

You can link cohorts data sets to a bundle as follows:

Create or edit a bundle at Bundles from the main navigation.
Navigate to Bundles > your_bundle > Cohorts > Data Sets.
Select Link Data Set to Bundle.

If you can not find the cohorts data sets which you want to link, verify if
Your data set is part of a project (Projects > your_project > Cohorts > Data Sets)
This project is set to Data Sharing (Projects > your_project > Project Settings > Details)

You can unlink cohorts data sets from bundles as follows:

Edit the desired bundle at Bundles from the main navigation.
Navigate to Bundles > your_bundle > Cohorts > Data Sets.
Select the cohorts data set which you want to unlink.

Cohort Analysis

From the Cohorts menu in the left hand navigation, select a cohort created in Create Cohort to begin a cohort analysis.

Query Details

The query details can be accessed by clicking the triangle next to Show Query Details. The query details displays the selections used to create a cohort. The selections can be edited by clicking the pencil icon in the top right.

Charts

Charts will be open by default. If not, click Show Charts.
Use the gear icon in the top-right to change viewable chart settings.
There are four charts available to view summary counts of attributes within a cohort as histogram plots.

Single Subject Timeline View:

Display time-stamped events and observations for a single subject on a timeline.The timeline view is visible to only those subjects which have time-series data.
Below attributes are displayed in timeline view: • Diagnosed and Self-Reported Diseases: • Start and end dates • Progression vs. remission • Medication and Other Treatments: • Prescribed and self-medicated • Start date, end date, and dosage at every time point
The timeline utilizes age (at diagnosis, at event, at measurement) as the x-axis and attribute name as the y-axis. If the birthdate is not recorded for a subject, the user can now switch to Date to visualize data.

Measurement Section: A summary of measurements (without values) is displayed under the section titled "Measurements and Laboratory Values Available." Users can click a link to access the Timeline View for detailed results.

Drug Section: The "Drug Name" section lists drug names without repeating the header "Drug Name" for each entry.

Subjects

By Default, the Subjects tab is displayed.
The Subjects tab with a list of all subjects matching your criteria is displayed below Charts with a link to each Subject by ID and other high-level information. By clicking a subject ID, you will be brought to the data collected at the Subject level.

To Exclude specific subjects from subsequent analysis, such as marker frequencies or gene-level aggregated views, you can uncheck the box at the beginning of each row in the subject list. You will then be prompted to save any exclusion(s).

You can Export the list of subjects either to your ICA Project's data folder or to your local disk as a TSV file for subsequent use. Any export will omit subjects that you excluded after you saved those changes. For more information, see at the bottom of this page.

Remove a Subject

Specific subjects can be removed from a Cohort.
Select the Subjects tab.
Subjects in the Cohort, by default are checked.

Structural variant aggregation: Marker Frequency analysis

For each individual cohort, display a table of all observed SVs that overlap with a given gene.

Marker Frequency

Click the Marker Frequency tab, then click the Gene Expression tab.
Down-regulated genes are displayed in blue and Up-regulated genes are displayed in red.
A frequency in the Cohort is displayed and the Matching number/Total is also displayed in the chart.

Genes

You are brought to the Gene tab under the Gene Summary sub-tab.
Select a Gene by typing the gene name into the Search Genes text box.
A Gene Summary

Correlation

For every correlation, subjects contained in each count can be viewed by selecting the count on the bubble or the count on the X-axis and Y-axis.

Clinical vs. Clinical Attribute Comparison – Bubble Plot

Click the Correlation Tab.
In X-axis category, select Clinical.
In X-axis Attribute

Molecular vs. Molecular Attribute Comparison – Bubble Plot

To see a breakdown of Somatic Mutations vs. RNA Expression levels perform the following steps:

Note this comparison is for a Cancer case.

Click the Correlation Tab.
In X-axis category, select Somatic.
In X-axis Attribute

Clinical vs. Molecular Attribute Comparison – Bubble Plot

Note this comparison is for a Cancer case.

Click the Correlation Tab.
In X-axis category, select Somatic.
In X-axis Attribute

Molecular Breakdown

Click the Molecular Breakdown Tab.
In Enter a clinical Attribute, and select a clinical attribute.
In Enter a gene, select a gene by typing a gene name.

Note: for each of the aforementioned bubble plots, you can view the list of subjects by following the link under each subject count associated with an individual bubble or axis label. This will take you to the list of subjects view, see above.

CNV

If there is Copy Number Variant data in the cohort:

Click the CNV tab.
A graph will show CNV a Sample Percentage on the Y-axis and Chromosomes on the X-axis.
Any value above Zero is a copy number gain, and any value below Zero is a copy number loss.

Subject Export for Analysis in ICA Bench

ICA allows for integrated analysis in a computation workspace. You can export your cohorts definitions and, in combination with molecular data in your ICA Project Data, perform, for example, a GWAS analysis.

Confirm the VCF data for your analysis is in ICA Project Data.
From within your ICA Project, Start a Bench Workspace -- See for more details.
Navigate back to ICA Cohorts.

Metadata Models

Illumina Connected Analytics allows you to create and assign metadata to capture additional information about samples.

Every tenant has a root metadata model that is accessible to all projects of that tenant. This allows an organization to collect the same piece of information, such as an ID number, for every sample in every project. Within this root model, you can configure multiple metadata submodels, even at different levels. These submodels inherit all fields and groups from their parent models.

Illumina recommends that you limit the amount of fields or field groups you add to the root model. Fields can have various types containing single or multiple values and field groups contain fields that belong together, such as all fields related to quality metrics. If there are any misconfigured items in the root model, it will carry over into all other tenant metadata models. Once a root model is published, the fields and groups that are defined within it cannot be deleted, only more fields can be added.

Illumina recommends that you limit the amount of fields or field groups you add to the root model as this model can not be deprecated and anything you add to the root model can not be removed. You should always consider creating submodels before adding anything to the root model.

Do not use dots (.) in the metadata model names, fieldgroup names or field names as this can cause issues with field data.

When configuring a project, you can assign a published metadata model for all samples in the project. This metadata model can be any published metadata model in your tenant such as the root model, or one of the lower level submodels. When a metadata model is selected for a project, all fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.

Metadata gives information about a sample and can be provided by the user, the pipeline and the API. There are 2 general categories of metadata models: Project Metadata models and Pipeline Metadata models . Both models contain metadata fields and groups.

The project metadata model is specific per tenant. A Project metadata model has metadata linked to a specific project. Values are known upfront, general information is required for each sample of a specific project, and it may include general mandatory company information.
The pipeline metadata model is linked to a pipeline, not to a project and can be shared across tenants. Values are populated during pipeline execution and it requires an output file with the name 'metadata.response.json'.

Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled.

Each sample can have multiple metadata models. When you link a project metadata model to your project, you will see its groups and fields present on each sample. The root model from that tenant will be present as every metadata model inherits the groups and fields specified in the parent metadata model(s). When a pipeline is executed with single sample and the pipeline containing a metadata model, the groups and fields will be present as well for each analysis resulting from a pipeline execution.

Creating a Metadata Model

In the main navigation, go to System Settings > Metadata Models. Here you will see the root metadata model and any underlying sub-metadata models. To create a new submodel, select +Create at the bottom of the screen.

The new metadata model screen will show an overview of all the higher-level metadata models. use the down arrow next to the model name to expand these for more information.

For your new metadata model, add a unique name and optional description. Once this is done, start adding the metadata fields with the +Add button. The field type will determine the parameters which you can configure.

To edit your metadata model later on, select it and choose Manage > Edit. Keep in mind that fields can be added, but not removed once the model is published.

Field Types & Properties

field types

The following properties can be selected for groups & fields:

Propery

Fields cannot be both required and filled by pipeline at the same time.

To help retrieve the field values via API calls, you can use System Settings > Metadata Models > your_metadata_model > Manage > Show Field Paths.

Metadata Actions

Publish a Metadata Model

Newly created and updated metadata models are not available for use within the tenant until the metadata model is published. Once a metadata model is published, fields and field groups cannot be deleted, but the names and descriptions for fields and field groups can be edited. A model can be published after verifying all parent models are published first. To publish your model, select System Settings > Metadata Models > your_metadata_model > Manage > Publish.

Retire a Metadata Model

If a published metadata model is no longer needed, you can retire the model (except the root model). Once a model is retired, it can be published again in case you would need to reactivate it.

First, check if the model contains any submodels. A model cannot be retired if it contains any published submodels.
When you are certain you want to retire a model and all submodels are retired, select System Settings > Metadata Models > your_metadata_model > Manage > Retire Metadata Model.

Assign a Metadata Model to a Project

To add metadata to your samples, you first need to assign a metadata model to your project.

Go to Projects > your_project > Project Settings > Details.
Select Edit.
From the Metadata Model drop-down list, select the metadata model you want to use for the project.

Add Metadata to Samples Manually

If you have a metadata model assigned to your project, you can manually fill out the defined metadata of the samples in your project:

Go to Projects > your_project > Samples > your_sample.
Click your sample to open the sample details and choose Edit Sample.
Enter all metadata information as it applies to the selected sample. All required metadata fields must be populated or the pipeline will not be able to start.

Populating a Pipeline Metadata Model

To fill metadata by pipeline executions, a pipeline model must be created.

In the main navigation, go to Projects > your_project > Flow > Pipelines > your_pipeline.
Click on your pipeline to open the pipeline details and choose Edit.
Create/Edit your model under Metadata Model tab. Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled.

In order for your pipeline to fill the metadata model, an output file with the name metadata.response.json must be generated. After adding your group fields to the pipeline model, click on Generate example JSON to view the required format for your pipeline.

Use System Settings > Metadata Models > your_metadata_model > Manage > Generate example JSON to see an example JSON for these fields.

The field names cannot have . in them, e.g. for the metric name Q30 bases (excl. dup & clipped bases) the . after excl must be removed.

Pushing Metadata Metrics to Base

Populating metadata models of samples allows having a sample-centric view of all the metadata. It is also possible to synchronize that data into your project's Base warehouse.

In ICA, select Projects > your_project >Base > Schedule.
Select +Create > From metadata.
Type a name for your schedule, optionally add a description, and set it to active. You can select if sensitive metadata fields should be included as values of sensitive metadata fields will not be visible to other users outside of the project.

Nextflow

ICA supports running pipelines defined using Nextflow. See this tutorial for an example.

In order to run Nextflow pipelines, the following process-level attributes within the Nextflow definition must be considered.

System Information

Version 20.10 on Illumina Connected Analytics will be obsoleted on April 22nd, 2026. After this date, all existing pipelines using Nextflow v20.10 will no longer run.

See

The following table shows when which Nextflow version is

default (⭐) This version will be proposed when creating a new Nextflow pipeline.
supported (✅) This version can be selected when you do not want the default Nextflow version.
deprecated (⚠️) This version can not be selected for new pipelines, but pipelines using this version will still work.

The switchover happens in the January release of that year.

Nextflow Version

You can select the Nextflow version while building a pipeline as follows:

Compute Type

To specify a compute type for a Nextflow process, you can either define the cpu and memory (recommended) or use the compute type sizes (required for specific hardware such as FPGA2).

Do not mix these definition methods within the same process2, use either one or the other method.

CPU and Memory

Specify the task resources using Nextflow directives in both the workflow script (.nf) and the configuration file (nextflow.config) cpus defines the number of CPU cores allocated to the process, memory defines the amount of RAM which will be allocated.

Process file example

Configuration file example

ICA will convert the required resources to the correct predefined size. This enables porting public Nextflow pipelines without configuration changes.

Predefined Sizes

To use the predefined sizes, use the within each process. Set the annotation to scheduler.illumina.com/presetSize and the value to the desired compute type. The default compute type, when this directive is not specified, is standard-small (2 CPUs and 8 GB of memory).

For example, if you want to use , you need to add the line below

Often, there is a need to select the compute size for a process dynamically based on user input and other factors. The Kubernetes executor used on ICA does not use the cpu and memorydirectives, so instead, you can dynamically set the pod directive, as mentioned . e.g.

It can also be specified in the . See the example configuration below:

Standard vs Economy

Concept

For each compute type, you can choose between the

scheduler.illumina.com/lifecycle: standard - (Default) or
scheduler.illumina.com/lifecycle: economy - tiers.

On-Demand Instance

Spot Instance

Configuration

You can switch to economy in the process itself with the pod directive or in the nextflow.config file.

Process example

nextlow.config example

Inputs

Inputs are specified via the form or . The specified code in the XML will correspond to the field in the params object that is available in the workflow. Refer to the for an example.

Outputs

Outputs for Nextflow pipelines are uploaded from the out folder in the attached shared filesystem. The can be used to symlink (recommended), copy or move data to the correct folder. Symlinking is faster and does not increase storage cost as it creates a file pointer instead of copying or moving data. Data will be uploaded to the ICA project after the pipeline execution completes.

Nextflow version 20.10.10 (Deprecated)

Version 20.10 will be obsoleted on April 22nd, 2026. After this date, all existing pipelines using Nextflow v20.10 will no longer be able to run.

For Nextflow version 20.10.10 on ICA, using the "copy" method in the publishDir directive for uploading output files that consume large amounts of storage may cause workflow runs to complete with missing files. The underlying issue is that file uploads may silently fail (without any error messages) during the publishDir process due to insufficient disk space, resulting in incomplete output delivery.

Solutions:

Nextflow Configuration

During execution, the Nextflow pipeline runner determines the environment settings based on values passed via the command-line or via a configuration file (see ). When creating a Nextflow pipeline, use the nextflow.config tab in the UI (or API) to specify a nextflow configuration file to be used when launching the pipeline.

Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list.

If no Docker image is specified, Ubuntu will be used as default.

The following configuration settings will be ignored if provided as they are overridden by the system:

Best Practices

Process Time

Setting a timeout to between 2 and 4 times the expected processing time with the directive for processes or task will ensure that no stuck processes remain indefinitely. Stuck process keep incurring costs for the occupied resources, so if the process can not complete within that timespan, it is safer and more economical to end the process and retry.

Sample Sheet File Ingestion

When you want to use a sample sheet with references to files as Nextflow input, add an extra input to the pipeline. This extra input lets the user select the samplesheet-mentioned files from their project. At run time, those files will get staged in the working directory, and when Nextflow parses the samplesheet and looks for those files without paths, it will find them there. You can not use file paths in a sample sheet without selecting the files in the input form because files are only passed as file/folder ids in the API payload when the analysis is launched.

You can include public data such as http urls because Nextflow is able download those. Nextflow is also able to download publicly accessible S3 urls (s3://...). You can not use Illumina's urn:ilmn:ica:region:... structure.

Migrating from FPGA to FPGA2

New versions of existing DRAGEN workflows have been created to support F2 (FPGA2) instances as F1 (FPGA) instances have been decommissioned. Please consult the for more information. You will need to migrate your pipelines from FPGA to FPGA2.

As long as your pipeline is still in status, you can update them with the FPGA2 configuration, but once the pipeline has been , you need to clone and edit the pipeline as it is protected against editing. Cloning the pipeline is done at projects > your_project > Flow > Pipelines > open pipeline details > Clone (top right). Setting the compute resources can be done in the .nf file directly or in the nextflow.config file.

.nf

nextflow.config

Workspaces

The main concept in Bench is the Workspace. A workspace is an instance of a Docker image that runs the framework which is defined in the image (for example JupyterLab, R Studio). In this workspace, you can write and run code and graphically represent data. You can use API calls to access data, analyses, Base tables and queries in the platform. Via the command line, R-packages, tools, libraries, IGV browsers, widgets, etc. can be installed.

You can create multiple workspaces within a project and each workspace runs on an individual node and is available in different resource sizes. Each node has local storage capacity, where files and results can be temporarily stored and exported from to be permanently stored in a Project. The size of the storage capacity can range from 1GB – 16TB.

For each workspace, you can see the status by the color.

Once a workspace is started, it will be restarted every 30 days for security reasons. Even when you have automatic shutdown configured to be more than 30, the workspace will be restarted after 30 days and the remaining days will be counted in the next cycle.

You can see the remaining time until the next event (Shutdown or restart) in the workspaces overview and on the workspace details.

Create Workspace

If this is the first time you are using a workspace in a Project, click Enable to create new Bench Workspaces. In order to use Bench, you first need to have a workspace. This workspace determines which docker image will be used with which node and storage size.

Click Projects > Your_Project > Bench > Workspaces > + Create Workspace
Complete the following fields and save the changes.

Field

Explanation

(*1) URLs must comply with the following rules:

URLs can be between 1 and 263 characters including dot (.).
URLs can begin with a leading dot (

(*2) When you grant workspace access to multiple users, you need to provide an to access the workspace. Authenticate using icav2 config set command. The CLI will prompt for an x-api-key value. enter the API Key generated from the product dashboard. See for more information.

Example URLs

The following are example URLs which will be considered valid.

example.com www.example.com https://www.example.com subdomain.example.com subdomain.example.com/folder subdomain.example.com/folder/subfolder sub-domain.example.com sub_domain.example.com example.co.uk subdomain.example.co.uk sub-domain.example.co.uk\

Example data science-specific whitelist compatible with restricted Bench workspaces. There are two required URLs to allow for Python pip installs:

pypi.org files.pythonhosted.org repo.anaconda.com conda.anaconda.org github.com cran.r-project.org bioconductor.org www.npmjs.com mvnrepository.com\

The workspace can be edited afterwards when it is stopped, on the Details tab within the workspace. The changes will be applied when the workspace is restarted.

Workspace permissions

When Access limited to workspace owner is selected, only the workspace owner can access the workspace. Everything created in that workspace will belong to the workspace owner.

Administrator vs Contributor

Bench administrators are able to create, edit and delete workspaces and start and stop workspaces. If their permissions match or exceed those of the workspace, they can also access the workspace contents.
Contributors are able to start and stop workspaces and if their permissions match or exceed those of the workspace, they can also access the workspace contents.

create/edit

delete

start/stop

access contents

Setting Workspace Permissions

The determines if someone is an administrator or contributor, while the dedicated indicate what the workspace itself can and cannot do within your project. For this reason, the users need to meet or exceed the required permissions to enter this workspace and use it.

For security reasons, the Tenant administrator and Project owner can always access the workspace.

If one of your permissions is not high enough as bench contributor, you will see the following message "You are not allowed to use this workspace as your user permissions are not sufficient compared to the permissions of this workspace".

The permissions that a Bench workspace can receive are the following:

Upload rights
Download rights (required)
Project (No Access - Dataprovider - Viewer - Contributor)

Based on these permissions, you will be able to upload or download data to your ICA project (upload and download rights) and will be allowed to take actions in the Project, Flow and Base modules related to the granted permission.

If you encounter issues when uploading/downloading data in a workspace, the security settings for that workspace may be set to not allow uploads and downloads. This can result in RequestError: send request failed and read: connection reset by peer. This is by design in restricted workspaces and thus limits data access to your project via /data/project to prevent the extraction of large amounts of (proprietary) data.

Workspaces which were created before this functionality existed can be upgraded by enabling these workspace permissions. If the workspaces are not upgraded, they will continue working as before.

Delete workspace (Bench Administrators Only)

To delete a workspace, go to Projects > your_project > Bench > Workspaces > your_workspace and click “Delete”. Note that the delete option is only available when the workspace is stopped.

The workspace will not be accessible anymore, nor will it be shown in the list of workspaces. The content of it will be deleted so if there is any information that should be kept, you can either put it in a docker image which you can use to start from next time, or export it using the API.

Use workspace

The workspace is not always accessible. It needs to be started before it can be used. From the moment a workspace is Running, a node with a specific capacity is assigned to this workspace. From that moment on, you can start working in your workspace.

As long as the workspace is running, the resources provided for this workspace will be charged.

Start workspace

To start the workspace, follow the next steps:

Go to Projects > your_project > Bench > Workspaces > your_workspace > Details
Click on Start Workspace button
On the top of the details tab, the status changes to “Starting”. When you click on the >_Access tab, the message “The workspace is starting” appears.

You can refresh the workspace status by selecting the round refresh symbol at the top right.

Once a workspace is running, it can be manually stopped or it will be automatically shut down after the amount of time configured in the field. Even with automatic shutdown, it is still best practice to stop your workspace run when you no longer need it to save costs.

You can edit running workspaces to update the shutdown timer, shutdown reminder and auto restart reminder.

If you want to open a running workspace in a new tab, then select the link at Projects > your_project > Bench > Workspaces > Details tab > Access. You can also copy the link with the copy symbol in front of the link.

Stop workspace

When you exit a workspace, you can choose to stop the workspace or keep it running. Keeping the workspace running means that it will continue to use resources and incur associated costs. To stop the workspace, select stop in the displayed dialog. You can also stop a workspace by opening it and selecting stop at the top right.

Stopping the workspace will stop the notebook, but will not delete local data. Content will no longer be accessible and no actions can be performed until it is restarted. Any work that has been saved will stay stored.

Storage will continue to be charged until the workspace is deleted. Administrators have a delete option for the workspace in the exit screen.

The project/tenant administrator can enter and stop workspaces for their project/tenant even if they did not start those workspaces at Projects > your_project > Bench > Workspaces > your_workspace > Details. Be careful not to stop workspaces that are processing data. For security reasons, a log entry is added when a project/tenant administrator enters and exits a workspace.

You can see who is using a workspace in the workspace list view.

Workspace Tabs

Access tab

Once the Workspace is running, the default applications are loaded. These are defined by the start script of the docker image.

The docker images provided by Illumina will load JupyterLab by default. It also contains Tutorial notebooks that can help you get started. Opening a new terminal can be done via the Launcher, + button above the folder structure.

Docker Builds tab (Bench Administrators only)

To ensure that packages (and other objects, including data) are permanently installed on a Bench image, a new Bench image needs to be created, using the BUILD option in Bench. A new image can only be derived from an existing one. The build process uses the DOCKERFILE method, where an existing image is the starting point for the new Docker Image (The FROM directive), and any new or updated packages are additive (they are added as new layers to the existing Docker file).

The Dockerfile commands are all run as ROOT, so it is possible to delete or interfere with an image in such a way that the image is no longer running correctly. The image does not have access to any underlying parts of the platform so will not be able to harm the platform, but inoperable Bench images will have to be deleted or corrected.

In order to create a derived image, open up the image that you would like to use as the basis and select the Build tab.

Name: By default, this is the same name as the original image and it is recommended to change the name.
Version: Required field which can by any value.
Description: The description for your docker image (for example, indicating which apps it contains).

The first 4 lines of the Docker file must NOT be edited. It is not possible to start a docker file with a different FROM directive. The main docker file commands are RUN and COPY. More information on them is available in the official Docker documentation.

Once all information is present, click the Build button. Note that the build process can take a while. Once building has completed, the docker image will be available on the Data page within the Project. If the build has failed, the log will be displayed here and the log file will be in the Data list.

Tools (Bench Administrators Only)

From within the workspace it is possible to create a tool from the Docker image.

Click the Manage > Create CWL Tool button in the top right corner of the workspace.
Give the tool a name.
Replace the description of the tool to describe what it does.

The building can take a while. When it has completed, the tool will be available in the Tool Repository.

Workspace Data

To export data from your workspace to your local machine, it is best practice to move the data in your workspace to the /data/project/ folder so that it becomes available in your project under projects > your_project > Data. Although this storage is slow, it offers read and write access and access to the content from within ICA.

For fast read-only access, link folders with the workspace-ctl data create-mount --mode read-only.
For fast read/write access, link which are visible, but whose contents are not accessible from ICA. Use the workspace-ctl data create-mount --mode read-write to do so. You can not have fast read-write access to indexed folders as the indexing mechanism on those would deteriorate the performance.

Every workspace you start has a read-only /data/.software/ folder which contains the icav2 command-line interface (and readme file).

Activity tab

The last tab of the workspace is the activity tab. On this tab all actions performed in the workspace are shown. For example, the creation of the workspace, starting or stopping of the workspace,etc. The activities are shown with their date, the user that performed the action and the description of the action. This page can be used to check how long the workspace has run.

In the general Activity page of the project, there is also a Bench activity tab. This shows all activities performed in all workspaces within the project, even when the workspace has been deleted. The Activity tab in the workspace only shows the action performed in that workspace. The information shown is the same as per workspace, except that here the workspace in which the action is performed is listed as well.

Analyses

An Analysis is the execution of a pipeline.

Starting Analyses

You can start an analysis from both the dedicated analysis screen or from the actual pipeline.

From Analyses

Navigate to Projects > Your_Project > Flow > Analyses.
Select Start.
Select a single Pipeline.

From Pipelines or Pipeline details

Navigate to Projects > <Your_Project> > Flow > Pipelines
Select the pipeline you want to run or open the pipeline details of the pipeline which you want to run.
Select Start Analysis.

Aborting Analyses

You can abort a running analysis from either the analysis overview (Projects > your_project > Flow > Analyses > your_analysis > Manage > Abort) or from the analysis details (Projects > your_project > Flow > Analyses > your_analysis > Details tab > Abort).

Rerunning Analyses

Once an analysis has been executed, you can rerun it with the same settings or choose to modify the parameters when rerunning. Modifying the parameters is possible on a per-analysis basis. When selecting multiple analyses at once, they will be executed with the original parameters. Draft pipelines are subject to updates and thus can result in a different outcome when rerunning. ICA will display a warning message to inform you of this when you try to rerun an analysis based on a draft pipeline.

When rerunning an analysis, the user reference will be the original user reference (up to 231 characters), followed by _rerun_yyyy-MM-dd_HHmmss.

When there is an XML configuration change on a a pipeline for which you want to rerun an analysis, ICA will display a warning and not fill out the parameters as it cannot guarantee their validity for the new XML. You will need to provide the input data and settings again to rerun the analysis.

Some restrictions apply when trying to rerun an analysis.

Analyses

Rerun

Rerun with modified parameters

To rerun one or more analyses with te same settings:

Navigate to Projects > Your_Project > Flow > Analyses.
In the overview screen, select one or more analyses.
Select Manage > Rerun. The analyses will now be executed with the same parameters as their original run.

To rerun a single analysis with modified parameters:

Navigate to Projects > Your_Project > Flow > Analyses.
In the overview screen, open the details of the analysis you want to rerun by clicking on the analysis user reference.
Select Rerun. (at the top right)

Lifecycle

Status

Description

Final State

When an analysis is started, the availability of resources may impact the start time of the pipeline or specific steps after execution has started. Analyses are subject to delay when the system is under high load and the availability of resources is limited.

During analysis start, ICA runs a verification on the input files to see if they are available. When it encounters files that have not completed their upload or transfer, it will report "Data found for parameter [parameter_name], but status is Partial instead of Available". Wait for the file to be available and restart the analysis.

When the underlying storage provider runs out of storage resources, the Status field of the Analysis details will indicate this. There is no need to abort or rerun the analysis.

Analysis steps logs

During the execution of an analysis, logs are produced for each process involved in the analysis lifecyle. In the analysis details view, the Steps tab is used to view the steps in near real time as they're produced in the running processes. A grid layout is used for analyses with more than 50 steps, a tiled view for analyses with 50 steps or less, though you can choose to also use the grid layout for those by means of the tile/grid button on the top right of the analysis log tab. The steps tab also shows which resources were used as compute type in the different main analysis steps. (For child steps, these are displayed on the parent step)

There are system processes involved in the lifeycle for all analyses (ie. downloading inputs, uploading outputs, etc.) and there are processes which are pipeline-specific, such as processes which execute the pipeline steps. The below table describes the system processes. You can choose to display or hide these system processes with the Show technical steps

Process

Description

Additional log entries will show for the processes which execute the steps defined in the pipeline.

Each process shows as a distinct entry in the steps view with a Queue Date, Start Date, and End Date.

Timestamp

Description

The time between the Start Date and the End Date is used to calculate the duration. The time of the duration is used to calculate the usage-based cost for the analysis. Because this is an active calculation, sorting on this field is not supported.

Each log entry in the Steps view contains a checkbox to view the stdout and stderr log files for the process. Clicking a checkbox adds the log as a tab to the log viewer where the log text is displayed and made available for download.

Analysis Cost

To see the price of an analysis in iCredits, look at Projects > your_project > Flow > Analyses > your_analysis > Details tab. The pricing section will show you the entitlement bundle, storage detail and price in iCredits once the analysis has succeeded, failed or been aborted.

Log Files

By default, the stdout and stderr files are located in the ica_logs subfolder within the analysis. This location can be changed by selecting a different in the current project at the start of the analysis. Do not use a folder which already contains log files as these will be overwritten. To set the log file location, you can also use the CreateAnalysisLogs section of the Create Analysis .

If you delete these files, no log information will be available on the analysis details > Steps tab.

You can access the log files from the analysis details (projects > your_project > flow > analysis > your_analysis > details tab)

Log Streaming

Logs can also be streamed using websocket client tooling. The API to retrieve analysis step details returns websocket URLs for each step to stream the logs from stdout/stderr during the step's execution. Upon completion, the websocket URL is no longer available.

Analysis Output Mappings

Currently, only FOLDER type output mappings are supported

By default, analysis outputs are directed to a new folder within the project where the analysis is launched. Analysis output mappings may be specified to redirect outputs to user-specified locations consisting of project and path. An output mapping consists of:

the source path on the local disk of the analysis execution environment, relative to the working folder.
the data type, either FILE or FOLDER
the target project ID to direct outputs to; analysis launcher must have contributor access to the project.

If the output folder already exists, any existing contents with the same filenames as those output from the pipeline will be overwritten by the new analysis

Example

In this example, 2 analysis output mappings are specified. The analysis writes data during execution in the working directory at paths out/test and out/test2. The data contained in these folders are directed to project with ID 4d350d0f-88d8-4640-886d-5b8a23de7d81 and at paths /output-testing-01/ and /output-testing-02/ respectively, relative to the root of the project data.

The following demonstrates the construction of the request body to start an analysis with the output mappings described above:

You can jump from the Analysis Details to the individual files and folders by opening the output files tab on the detail view (Projects > your_project > Flow > Analyses > your_analysis > Output files tab > your_output_file) and selecting Open in data.

The Output files section of the analyses will always show the generated outputs, even when they have since been deleted from storage. This is done so you can always see which files were generated during the analysis. In this case it will no longer be possible to navigate to the actual output files.

analysis output

logs output

Notes

Hyperlinking

If you want to share a link to an analysis, you can copy and paste the URL from your browser when you have the analysis open. The syntax of the analysis link will be <hostURL>/ica/link/project/<projectUUID>/analysis/<analysisUUID>. Likewise, workflow sessions will use the syntax <hostURL>/ica/link/project/<projectUUID>/workflowSession/<workflowsessionUUID>. To prevent third parties from accessing data via the link when it is shared or forwarded, ICA will verify the access rights of every user when they open the link.

Restrictions

Input for analysis is limited to a total of 50,000 files (including multiple copies of the same file). Concurrency limits on analyses prevent resource hogging which could result in resource starvation for other tenants. Additional analyses will be queued and scheduled when currently running analyses complete and free up positions. The theoretical limit is 20, but this can be less in practice, depending on a number of external factors.

Troubleshooting

When your analysis fails, open the analysis details view (Projects > your_project> Flow > Analyses > your_analysis) and select display failed steps. This will give you the steps view filtered on those steps that had non-0 exit codes. If there is only one failed step which has logfiles, the stderr of that step will be displayed.

For pipeline developers: add automatic retrying to the individual steps that fail with error 55 / 56 (provided these steps are idempotent) See for retries.

Exit code 55 indicates analysis failure on economy instances due to an external event such as spot termination. You can retry the analysis.
Exit code 56 indicates analysis failure due to pod disruption and deletion by Kubernetes' Pod Garbage Collector (PodGC) because the node it was running on no longer exists. You can retry the anlaysis.

IAM Role Method

To use the IAM Role method, you need to:

Set browser access to the S3 bucket (CORS).
Create data access permissions (IAM policy).
Configure storage credentials in ICA.
Create the and .
in ICA.
To use copy and move operations, you need to add the necessary policy statements in the S3 bucket policy.

Optionally

It is best practice to to the S3 bucket.
If your bucket is KMS-enabled, follow the additional steps described .

1 - Configure Bucket CORS Permission

ICA requires cross-origin resource sharing (CORS) permissions to write to the S3 bucket for uploads via the browser. Refer to (expand the Using the S3 console section) documentation for instructions on enabling CORS via the AWS Management Console.

In the cross-origin resource sharing (CORS) section, enter the following content.

2 - Create Data Access Permission - AWS IAM Policy

ICA requires specific permissions to access data in an AWS S3 bucket. These permissions are contained in an AWS IAM Policy.

Permissions

Refer to the documentation for instructions on creating an AWS IAM Policy via the AWS Management Console (on AWS go to IAM > Policies > create policy). Use the configuration below during the process, tab one shows the code for unversioned buckets, tab two the code for versioned and versioning-suspended buckets.

Paste the JSON policy document below. Note the example below provides access to all object prefixes in the bucket.

Replace <YOUR_BUCKET_NAME> with the name of the S3 bucket you created for ICA. Replace <YOUR_FOLDER_NAME> with the name of the folder in your S3 bucket.

On Versioned OR Suspended buckets, paste the JSON policy document below. Note the example below provides access to all objects prefixes in the bucket.

(Optional) Set policy name to "illumina-ica-admin-policy"

To create the IAM Policy via the AWS CLI, create a local file named illumina-ica-admin-policy.json containing the policy content above and run the following command. Be sure the path to the policy document (--policy-document) leads to the path where you saved the file:

3 - Create ICA Storage Credential

To connect your S3 account to ICA, you need to add a storage credential in ICA which will generate the RoleSessionName prefix.

From the ICA home screen, navigate to System Settings > Credentials > Create > Storage Credential to create a new storage credential.

Select AWS_Role as type and provide a name for the storage credential.
Choose Generate to create the RoleSessionName prefix. Once generated, you can download it with the Download to Excel button or copy and paste it by unmasking the prefix with the eye symbol on the right. You will need this RoleSessionName in the next step.

You can only download or copy this value now during creation. Once this dialog box closes after saving, you can no longer access this value.

With the storage credentials created, a storage configuration can be created using the secret credential. Refer to the instructions to for details.

4 - Create AWS IAM Role

You need to create the IAM role which ICA will assume to access your S3 bucket. See this for instructions on creating an AWS IAM Role via the AWS Management Console. This Role will allow to delegate permissions to ICA to connect to your S3 storage for the required duration.

Open your and perform the steps below:

Copy the Trust Policy below to an editor and update the following values:
- <your AWS Account ID> is your .
- <region-alias> must be taken from the below. For example, us-east-1 for United States.

If the role time is not set to 12 hours, the storage configuration will not go online.

Copy the ARN from your created role summary as this will be needed in the in ICA

Trust Policy

Example

OIDC Reference table

Region Name

OIDC provider ID

OIDC Provider ARN

Region Alias

Namespace

5 - Create OpenID Connect (OIDC) Identity Provider

An OpenID Connect identity provider is a trusted resource that provides identity tokens. This allows AWS to know which external identities are allowed to obtain the temporary roles. Here you connect your regional ICA instance (see table below) so that it can obtain the required role to access your storage. For more information on OIDC providers, see on AWS.

Open your and perform the steps below:

Under IAM > Identity Providers > Add provider.
Select OpenID Connect as provider and enter the Provider URL which matches your ICA/S3 location from the below.
For Audience, enter sts.amazonaws.com and click on Add Provider.

See below for an example of how the OIDC provider and IAM role Trusted entities look

OIDC Provider Locations

Region

OIDC Provider URL

6 - S3 Bucket Policy

Connecting your S3 bucket to ICA does not require any additional bucket policies.

What if you need a bucket policy for use cases beyond ICA?

The bucket policy must then support the essential permissions needed by ICA without inadvertently restricting its functionality.

Be sure to replace the following fields:

7 - Block Public Access to S3 bucket (optional)

By default, public access to the S3 bucket is allowed. For increased security, it is advised to block public access with the command below. Change <YOUR_BUCKET_NAME> to the name of your S3 bucket.

To block public access to S3 buckets on account level, you can use the AWS Console on the website.

8 - Enabling Cross-Account Access for Copy and Move Operations

ICA uses AssumeRole to copy and move objects from a bucket in an AWS account to another bucket in another AWS account. To allow cross account access to a bucket, the following policy statements must be added in the S3 bucket policy (tab one below shows the code for unversioned buckets, tab two the code for versioned and versioning-suspended buckets.)

Be sure to replace the following fields:

<ASSUME_ROLE_ARN>: Replace this field with the ARN of the cross account role you want to give permission to. Refer to the table below to determine which region-specific Role ARN should be used.

The ARN of the cross account role you want to give permission to is specified in the Principal. Refer to the table below to determine which region-specific Role ARN should be used.

Region

Role ARN

Enable Copying Object Tags (optional)

You can enable copying object tags when performing Copy, Move, Archive and Unarchive Operations within the same account or across accounts when using your own S3 storage.

If you want to use copying of your tags,

Contact Illumina support to enable TaggingPermissionType on the ICA Storage Configuration record associated with the S3 bucket with Object tags.
Verify you have the required permission in your policies
- In the configuration above, the s3:GetObjectTagging and s3:PutObjectTagging are part of the .

Troubleshooting

The table below show some typical error situations and how to resolve them. After performing the configuration update suggested below, perform the validate action (System Settings > Storage > select your storage > Manage > Validate) to quickly see if this has resolved your issue.

Error

Possible Solution

Data

The Data section gives you access to the files and folders stored in the project as well as those linked to the project. Here, you can perform searches and data management operations such as moving, copying, deleting and (un)archiving.

See also Non-indexed folders which are a special form of data storage optimised for fast processing.

Recommended Practices

File/Folder Naming

ICA supports UTF-8 characters in file and folder names for data. Please follow the guidelines detailed below. (For more information about recommended approaches to file naming that can be applicable across platforms, please refer to the .)

Characters generally considered "safe"

Alphanumeric characters
- 0-9

Folders and files cannot be renamed after they have been created. To rename a folder, you will need to create a new folder with the desired name, move the contents from the original folder into the new one, and then delete the original folder. Please see the section for more information.

Data Formats

See the list of supported

Data Privacy

Data privacy should be carefully considered when adding data in ICA, either through storage configurations (ie, AWS S3) or ICA data upload. Be aware that when adding data from cloud storage providers by creating a storage configuration, ICA will provide access to the data. Ensure the storage configuration source settings are correct and ensure uploads do not include unintended data in order to avoid unintentional privacy breaches. More guidance can be found in the .

Data Integrity

See

Viewing Data

On the Projects > your_project > Data page, you can view file information and preview files.

Tree view vs list view

You can switch between tree view and list view with the icons at the left.

Tree view shows the navigation structure and only the files and folders in the current folder.
List view shows all files and folders within the current project.

Files

To view file details click on the filename to see the file details.
- Run input tags identifies the last 100 pipelines which used this file as input.
- Connector tags show if the file was added via browser upload or connector.

When you share the data view by sharing the link from your browser, filters and sorting is retained in links, so the recipient will see the same data and order.

If your data is the result of an analysis, you can find the analysis which created it at Projects > your_project > Data > your_data > view > Data details tab > Source analysis. Clicking the link here will open the analysis.

To see the ongoing actions (copying from, copying to, moving from, moving to) on data in the data overview (Projects > your_project > Data), add the ongoing actions column from the column list. This contains a list of ongoing actions sorted by when they were created. You can also consult the data detail view for ongoing actions by clicking on the data in the overview. When clicking on an ongoing action itself, the data job details of the most recent created data job are shown.

Folders

If you open a folder by clicking it, you can see the folder details link at the top right. This will open the details screen where you can consult the folder size and number of files in that folder, the owning project, ongoing actions and folder id. You can also download the folder and all contents here with the download button.

To help navigate between folders in flat view, you can use the "parent folder' column in the data view which shows a link to the folder which contains the file you are currently looking at. If you want to go further up the folder path, you can use the folder structure above the file view. If the parent folder is not visible, you can add it with the three-columns symbol next to the filter symbol.

Secondary Data

When Secondary Data is added to a data record, those secondary data records are mounted in the same parent folder path as the primary data file when the primary data file is provided as an input to a pipeline. Secondary data is intended to work with the CWL secondaryFiles feature. This is commonly used with genomic data such as BAM files with companion BAM index files (refer to https://www.ncbi.nlm.nih.gov/tools/gbench/tutorial6/ for an example).

Hyperlinking to Data

You can create hyperlinks to data to quickly share it, to do so, use the following syntax:

Variable

Location

Normal permission checks still apply with these links. If you try to follow a link to data to which you do not have access, you will be returned to the main project screen or login screen, depending on your permissions.

Downloading Data

Single files can be downloaded directly from within the UI.

Select the checkbox next to the file which you want to download, followed by Download > Select Browser Download > Download.
You can also download files from their details screen. Click on the file name and select Download at the bottom of the screen. Depending on the size of your file, it may take some time to load the file contents.

Schedule for Download

You can trigger an asynchronous download via service connector using the Schedule for Download button with one or more files selected.

Select a file or files to download.
Select Download > Schedule download (for files or folders). This will display a list of all available connectors.
Select a connector and optionally, enter your email address if you want to be notified of download completion, and then select Download.

If you do not have a connector, you can click the Don't have a connector yet? option to create a new connector. You must then install this new connector and return to the file selection in step 1 to use it.

You can view the progress of the download or abort the scheduled download on the Activity page for the project.

Exporting the Data List

You can export the list of data which you see in the overview to a CSV, JSON, or excel file.

Select one or more files to export at Projects > your_project > Data.
Select Export at the bottom of the screen.
Choose between the following export options:

Data Management

To prevent cost issues, you can not perform actions such as copying and moving data which would write data to the workspace when the project billing mode is set to tenant and the owning tenant of the folder is not the current user's tenant.

Uploading Data

Uploading data to the platform makes it available for consumption by analysis workflows and tools.

UI Upload

To upload data manually via the drag-and-drop interface in the platform UI, go to Projects > your_project > Data and either

Drag a file from your system into the Choose a file or drag it here box.
Select the Choose a file or drag it here box, and then choose a file. Select Open to upload the file.

Your files are added to the Data page with status partial during upload and become available when upload completes.

Do not close the ICA tab in your browser while data uploads.

Uploads via the UI are limited to 5TB and no more than 100 concurrent files at a time, but for practical and performance reasons, it is recommended to use the CLI or when uploading large amounts of data.

Upload Data via CLI

For instructions on uploading/downloading data via CLI, see .

Copying Data

You can copy data from your project to a different folder within the same project or you can copy data from another project to your current project, provided you have the necessary access rights.

You can copy data from a subfolder to a higher-level folder to move data up one or more levels (folder/destination/source). You can not copy data from the source folder onto itself or onto a subfolder of the source folder as this would result in a loop.

The person copying the data must have the following rights:

Copy Data Rights

Source Project

Destination Project

The following restrictions apply when copying data:

Copy Data Restrictions

Source Project

Destination Project

Data in the "Partial" or "Archived" state will be skipped during a copy job.

To use data copy:

Go to the destination project for your data copy and proceed to Projects > your_project > Data > Manage > Copy From.
Optionally, use the filters (Type, Name, Status, Format or additional filters) to filter out the data or search with the search box.
Select the data (individual files or folders with data) you want to copy.

The outcome can be

INITIALIZED
WAITING_FOR_RESOURCES
RUNNING
STOPPED - When choosing to stop the batch job.

To see the ongoing actions on data in the data overview (Projects > your_project > Data), you can add the ongoing actions column from the column list with the three column symbol at the top right, next to the filter funnel. You can also consult the data detail view for ongoing actions by clicking on the data in the overview.

There is a difference in copy type behavior between copying files and folders. The behavior is designed for files and it is best practice to not copy folders if there already is a folder with the same name in the destination location.

Notes on copying data

Copying data comes with an additional storage cost as it will create a copy of the data.
You can copy over the same data multiple times.

Moving Data

You can move data both within a project and between different projects to which you have access. If you allow notifications from your browser, a pop-up will appear when the move is completed.

Move From is used when you are in the destination location.
Move To is used when you are in the source location. Before moving the data, pre-checks are performed to verify that the data can be moved and no currently running operations are being performed on the folder. Conflicting jobs and missing permissions will be reported. Once the move has started, no other operation should be performed on the data being moved to avoid potential data loss or duplication. Adding or (un)archiving files during the move may result in duplicate folders and files with different identifiers. If this happens, you will need to manually delete the duplicate files and move the files which were skipped during the initial move.

When you move data from one location to another, you should not change the source data while the Move job is in progress. This will result in jobs getting aborted. Please expand the "Troubleshooting" section below for information on how to fix this if it occurs.

Troubleshooting

If the source or destination of data being moved is modified, the Move jobs will detect the changes and abort the job.
Modifying data at either the source or destination during a Move process can result in incomplete data transfer. Users can still manually move any remaining data afterward.

There are a number of rights and restrictions related to data move as this will delete the data in the source location.

Move Data Rights

Source Project

Destination Project

Move Data Restrictions

Source Project

Destination Project

Move jobs will fail if any data being moved is in the Partial or Archived state.

Move Data From

Move Data From is used when you are in the destination location.

Navigate to Projects > your_project > Data > your_destination_location > Manage > Move From.
Select the files and folders which you want to move.
Select the Move button. Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.

Move Data To

Move Data To is used when you are in the source location. You will need to select the data you want to move from to current location and the destination to move it to.

Navigate to Projects > your_project > Data > your_source_location.
Select the files and folders which you want to move.
Select to Projects > your_project > Data > your_source_location > Manage > Move To.

Move Status

INITIALIZED
WAITING_FOR_RESOURCES
RUNNING
STOPPED - When choosing to stop the batch job.

Restrictions:

A total maximum of 1000 items can be moved in one operation. An item can be either a file or a folder. Folders with subfolders and subfiles still count as one item.
You can not move files and folders to a destination where one or more files or folders with the

If you are only able to select your source project as the target data project, this may indicate that data sharing (Projects > your_project > Project Settings > Details > Data Sharing) is not enabled for your project or that you do not have have upload rights in other projects.

Archiving, Unarchiving and Deleting Data

To manually archive or delete files, do as follows:

Select the checkbox next to the file or files to delete or archive.
Select Manage, and then select one of the following options:
- Archive — Move the file or files to long-term storage (event code ICA_DATA_110).

When attempting concurrent archiving or unarchiving of the same file, a message will inform you to wait for the currently running (un)archiving to finish first.

To archive or delete files programmatically, you can use ICA's API endpoints:

the file's information.
Modify the dates of the file to be deleted/archived.
the updated information back in ICA.

Python Example

The Python snippet below exemplifies the approach: it sets (or updates if set already) the time to be archived for a specific file:

To delete a file at specific timepoint, the key 'willBeDeletedAt' should be added or changed using the API call. If running in the terminal, a successful run will finish with the message ‘200’. In the ICA UI, you can check the details of the file to see the updated values for ‘Time To Be Archived’ (willBeArchivedAt) or ‘Time To Be Deleted’ (willBeDeletedAt), as shown in the screenshot.

Linking and Unlinking Data

Data linking creates a dynamic read-only view to the source data. You can use data linking to get access to data without running the risk of modifying the source material and to share data between projects. In addition, linking ensures changes to the source data are immediately visible and no additional storage is required. You can recognise linked data by the green color and see the owning project as part of the details.

Since this is read-only access, you cannot perform actions on linked data that need write access. Actions like (un)archiving, linking, creating, deleting, adding or moving data and folders, and copying data into the linked data are not possible.

Linking data is only possible from the root folder of your destination project. The action is disabled in project subfolders.

Linking a parent folder after linking a file or subfolder will unlink the file or subfolder and link the parent folder. So root\linked_subfolder will become root\linked_parentfolder\linked_subfolder.

Migrating snapshot linked data. (linked before ICA release v.2.29)

Before ICA version v.2.29, when data was linked, a snapshot was created of the file and folder structure. These links created a read-only view of the data as it was at the time of linking, but did not propagate changes to the file and folder structure. If you want to use the advantages of the new way of linking with dynamic updates, unlink the data and relink it. Since snapshot linking has been deprecated, all new data linking done in ICA v.2.29 or later has dynamic content updates.

Initial linking can take considerable time when there is a large amount of source data. However, once the initial link is made, updates to the source data will be instantaneous. You can monitor the progress at Projects > your_project > activity > Batch Jobs.

Linking data from another project.

Select Projects > your_project > Data > Manage, and then select Link.
To view data by project, select the funnel symbol, and then select Owning Project. If you only know which project the data is linked to, you can choose to filter on linked projects.
Select the checkbox next to the file or files to add.

Your files are added to the Data page. To view the linked data file, select Add filter, and then select Links.

Display Owning Project

if you have selected multiple owning projects, you can add the owning project column to see which project owns the data.

At the top of the screen, next to the filer icon, select the three columns.

Linking Folders

If you link a folder instead of individual files, a warning is displayed indicating that, depending on the size of the folder, linking may take considerable time. The linking process will run in the background and the progress can be monitored on the Projects > your_project > activity > Batch Jobs screen.

To see more details, double-click the batch job.

To see how many individual files are already linked, double-click the item.

Unlinking Project Data

To unlink the data, go to the root level of your project and select the linked folder or if you have linked individual files separately, then you can select those linked files (limited to 100 at a time) and select Manage > Unlink. As during linking a folder, when unlinking, the progress can be monitored at Projects > your_project > Activity > Batch Jobs.

Troubleshooting

Path too long

If you get an error "Unable to generate credentials from the objectstore as the requested path is too long." from AWS when requesting temporary credentials, then the path should be shortened.

You can truncate the sample name and user reference or use advanced output mapping in the API which avoids generating the long folders and creates output in the targetPath-defined location.

Illumina Connected Analytics

Introduction

hashtagNew

Get Started

About the Platform

hashtagGetting Started

hashtagGetting Help

hashtagOther Illumina Products

hashtagRelease Notes

Home

Projects

hashtagIntroduction

Bundles

hashtagLinking an Existing Bundle to a Project

hashtagCreate a New Bundle

hashtagEdit an Existing Bundle

hashtagAdding Assets to a Bundle

hashtagCreate a New Bundle Version

hashtagAdd Terms of Use to a Bundle

hashtagCollaborating on a Bundle

hashtagSharing a Bundle

hashtagEntitled Bundles

Event Log

Project

Data Integrity

Storage Cost Managment

hashtagMonitoring Storage Cost

hashtagProject Files in ICA

Samples

hashtagAdd New Sample

Flow

Snowflake

hashtagUser

hashtagUser/Project-Bundle

hashtagRoles

hashtagProject viewer role

hashtagProject contributor role

hashtagWarehouses

hashtagSynchronizing Tables

Bench

hashtagAccess

hashtagEnabling Bench for your project

hashtagSetting user level access.

Sun Grid Engine (SGE) on ICA Bench

hashtagRunning Jobs in a Bench SGE Cluster

hashtag

Spark on ICA Bench

hashtagRunning a pyspark application

JupyterLab

FUSE Driver

Cohorts

hashtagIntroduction to Cohorts

hashtag

Prepare Metadata Sheets

Precomputed GWAS and PheWAS

hashtagVisualize Results from Precomputed Genome-Wide Association Studies (GWAS)

hashtagVisualize Results from Precomputed Phenome-Wide Association Studies (PheWAS)

Connectivity

Command-Line Interface

Installation

Authentication

Config Settings

Output Format

Sequencer Integration

Cloud Analysis Auto-launch

Tutorials

Event Log

About the Platform

hashtagGetting Started

hashtagGetting Help

hashtagOther Illumina Products

hashtagRelease Notes

Introduction

hashtagNew

hashtagProject

hashtagCommand-Line Interface

hashtagSequencer Integration

hashtagTutorials

hashtagReference

Projects

New

Getting Started

Getting Help

Other Illumina Products

Release Notes

Introduction

Linking an Existing Bundle to a Project

Create a New Bundle

Edit an Existing Bundle

Adding Assets to a Bundle

Create a New Bundle Version

Add Terms of Use to a Bundle

Collaborating on a Bundle

Sharing a Bundle

Entitled Bundles

Monitoring Storage Cost

Project Files in ICA

Add New Sample

User

User/Project-Bundle

Roles

Project viewer role

Project contributor role

Warehouses

Synchronizing Tables

Access

Enabling Bench for your project

Setting user level access.

Running Jobs in a Bench SGE Cluster

Running a pyspark application

Introduction to Cohorts

Visualize Results from Precomputed Genome-Wide Association Studies (GWAS)

Visualize Results from Precomputed Phenome-Wide Association Studies (PheWAS)

Getting Started

Getting Help

Other Illumina Products

Release Notes

New

Project

Command-Line Interface

Sequencer Integration

Tutorials

Reference

Introduction

Create new Project

Create with Storage Configuration

Managing Projects

Externally-managed projects

Data Transfer

Notes

Tutorial

Sharing

Linking an Existing Bundle to a Project

Create a New Bundle

Edit an Existing Bundle

Adding Assets to a Bundle

Create a New Bundle Version

Add Terms of Use to a Bundle

Collaborating on a Bundle

Sharing a Bundle

Entitled Bundles

Monitoring Storage Cost

Project Files in ICA

User

User/Project-Bundle

Roles

Project viewer role

Project contributor role

Warehouses

Synchronizing Tables

Visualize Results from Precomputed Genome-Wide Association Studies (GWAS)

Visualize Results from Precomputed Phenome-Wide Association Studies (PheWAS)

Access

Enabling Bench for your project

Setting user level access.

Add New Sample

Running a pyspark application

Introduction to Cohorts

Running Jobs in a Bench SGE Cluster

Add Files to Samples