Only this pageAll pages
Powered by GitBook
Couldn't generate the PDF for 116 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

Illumina Connected Analytics

Loading...

Get Started

Loading...

Loading...

Home

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Project

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Command-Line Interface

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Sequencer Integration

Loading...

Tutorials

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Reference

Loading...

Loading...

Loading...

Loading...

About the Platform

Illumina® Connected Analytics is a cloud-based software platform intended to be used to manage, analyze, and interpret large volumes of multi-omics data in a secure, scalable, and flexible environment. The versatility of the system allows the platform to be used for a broad range of applications. When using the applications provided on the platform for diagnostic purposes, it is the responsibility of the user to determine regulatory requirements and to validate for intended use, as appropriate.

The platform is hosted in regions listed below.

Region Name
Region Identifier

Australia

AU

Canada

CA

Germany

EU

India

IN

Indonesia

ID

Japan

JP

Singapore

SG

South Korea

KR

United Kingdom

GB

United Arab Emirates

AE

United States

US

The platform hosts a suite of RESTful HTTP-based application programming interfaces (APIs) to perform operations on data and analysis resources. A web application user-interface is hosted alongside the API to deliver an interactive visualization of the resources and enables additional functionality beyond automated analysis and data transfer. Storage and compute costs are presented via usage information in the account console, and a variety of compute resource options are specifiable for applications to fine tune efficiency.

Our systems are synchronized using a Cloud Time Sync Service to ensure accurate timekeeping and consistent log timestamps.

Getting Started

Getting Help

Use the search bar on the top right to navigate through the help docs and find specific topics of interest.

If you have any questions, contact Illumina Technical Support by phone or email:

Illumina Technical Support | techsupport@illumina.com | 1-800-809-4566

For customers outside the United States, Illumina regional Technical Support contact information can be found at www.illumina.com/company/contact-us.html.

To see the current ICA version you are logged in to, click your username found on the top right of the screen and then select About.

Other Illumina Products

To view a list of the products to which you have access, select the 9 dots symbol at the top right of ICA. This will list your products. If you have multiple regional applications for the same product, the region of each is shown between brackets.

The More Tools category presents the following options

  • My Illumina Dashboard to monitor instruments, streamline purchases and keep track of upcoming activities.

  • Link to the Support Center for additional information and help.

  • Link to the order management from where you can keep track of your current and past orders.

Release Notes

The user documentation provides material for learning the basics of interacting with the platform including examples and tutorials. Start with the documentation to learn more.

In the section of the documentation, posts are made for new versions of deployments of the core platform components.

Get Started
Release Notes

Event Log

The event log shows an overview of system events with options to search and filter. For every entry, it lists the following:

  • Event date and time

  • Category (error, warn or info)

  • Code

  • Description

  • Tenant

Up to 200,000 results will be be returned. If your desired records are outside the range of the returned records, please refine the filters or use the search function at the top right.

Export is restricted to the amount of entries shown per page. You can use the selector at the bottom to set this to up to 1000 entries per page.

Get Started

Software Registration

Tenant Setup

The platform requires a provisioned tenant in the Illumina account management system with access to the Illumina Connected Analytics (ICA) application. Once a tenant has been provisioned, a tenant administrator will be assigned. The tenant administrator has permission to manage account access including add users, create workgroups, and add additional tenant administrators.

Each tenant is assigned a domain name used to login to the platform. The domain name is used in the login URL to navigate to the appropriate login page in a web browser. The login URL is https://<domain>.login.illumina.com, where <domain> is substituted with the domain name assigned to the tenant.

New user accounts can be created for a tenant by navigating to the domain login URL and following the links on the page to setup a new account with a valid email address. Once the account has been added to the domain, the tenant administrator may assign registered users to workgroups with permission to use the ICA application. Registered users may also be made workgroup administrators by tenant administrators or existing workgroup administrators.

API Keys

For security reasons, it is best practice to not use accounts with administrator level access to generate API keys and instead create a specific CLI user with basic permission. This will minimize the possible impact of compromised keys.

Generate an API Key

Click the button to generate a new API Key. Provide a name for the API Key. Then choose to either include all workgroups or select the workgroups to be included. Selected workgroups will be accessible with the API Key.

Click to generate the API Key. The API Key is then presented (hidden) with a button to show the key to be copied and a link to download to a file to be stored securely for future reference. Once the window is closed, the key contents will not be accessible through the domain login page, so be sure to store it securely for future reference if needed.

After generating an API key, save the key somewhere secure to be referenced when using the command-line interface or APIs.

Access via Web UI

  • On the left, you have the navigation bar (1) which will auto-collapse on smaller screens. To collapse it, use the double arrow symbol (2). When collapsed, use the >> symbol to expand it.

  • The central part (3) of the display is the item on which you are performing your actions and the breadcrumb menu (4) to return to the projects overview or a previous level. You can also use your browser's back button to return to the level from which you came.

  • At the top right, you have icons to refresh contents (5) Illumina product access (6), access to the online help (7) and user information (8), .

Access via the CLI

Access via the API

Object Identifiers

The object data models for resources that are created in the platform include a unique id field for identifying the resource. These fixed machine-readable IDs are used for accessing and modifying the resource through the API or CLI, even if the resource name changes.

JSON Web Token (JWT)

Accessing the platform APIs requires authorizing calls using JSON Web Tokens (JWT). A JWT is a standardized trusted claim containing authentication context. This is a primary security mechanism to protect against unauthorized cross-account data access.

A JWT is generated by providing user credentials (API Key or username/password) to the token creation endpoint. Token creation can be performed using the API directly or the CLI.

Introduction

Bundles

Bundles are curated data sets which combine assets such as pipelines, tools, and Base query templates. This is where you will find packaged assets such as Illumina-provided pipelines and sample data. You can create, share and use bundles in projects of your own tenant as well as projects in other tenants.

There is a combined limit of 30 000 projects and bundles per tenant.

The following ICA assets can be included in bundles:

Some bundles come with additional restrictions such as disabling bench access or internet access when running pipelines to protect the data contained in them. When you link these bundles, the restrictions will be enforced on your project. Unlinking the bundle will not remove the restrictions.

As of ICA v.2.29, the content in bundles is linked in such a way that any updates to a bundle are automatically propagated to the projects which have that bundle linked.

If you have created bundle links in ICA versions prior to ICA v2.29 and want to switch them over to links with dynamic updates, you need to unlink and relink them.

Linking an Existing Bundle to a Project

  1. From the main navigation page, select Projects > your_project > Project Settings > Details.

  2. Click the Edit button at the top of the Details page.

  3. Click the + button, under Linked bundles.

  4. Click on the desired bundle, then click the +Link Bundles button.

  5. Click Save.

The assets included in the bundle will now be available in the respective pages within the Project (e.g. Data and Pipelines pages). Any updates to the assets will be automatically available in the destination project.

To unlink a bundle from a project,

  1. Select Projects > your_project > Project Settings > Details.

  2. Click the Edit button at the top of the Details page.

  3. Click the (-) button, next to the linked bundle you wish to remove.

Bundles and projects have to be in the same region in order to be linked. Otherwise, the error The bundle is in a different region than the project so it's not eligible for linking will be displayed.

You can only link bundles to a project if that project belongs to a tenant who has access to the bundle. You do not carry your access to a bundle over if you are invited to projects of other tenants.

When linking a bundle which includes base to a project that does not have base enabled, there are two possibilities:

  • Base is not allowed due to entitlements: The bundle will be linked and you will be given access to the data, pipelines, samples,... but you will not see the base tables in your project.

  • Base is allowed, but not yet enabled for the project. The bundle will be linked and you will be given access to the data, pipelines, samples,...but you will not see the base tables in your project and base remains disabled until you enable it.

You can not unlink bundles which were linked by external applications

Create a New Bundle

To create a new bundle and configure its settings, do as follows.

  1. From the main navigation, select Bundles.

  2. Select + Create .

  3. Enter a unique name for the bundle.

  4. From the Region drop-down list, select where the assets for this bundle should be stored.

  5. Set the status of the bundle. When the status of a bundle changes, it cannot be reverted to a draft or released state.

    • Draft—The bundle can be edited.

    • Released—The bundle is released. Technically, you can still edit bundle information and add assets to the bundle, but should refrain from doing so.

    • Deprecated—The bundle is no longer intended for use. By default, deprecated bundles are hidden on the main Bundles screen (unless non-deprecated versions of the bundle exist). Select "Show deprecated bundles" to show all deprecated bundles. Bundles can not be recovered from deprecated status.

  6. [optional] Configure the following settings.

    • Categories—Select an existing category or enter a new one.

    • Short Description—Enter a description for the bundle.

    • Metadata Model—Select a metadata model to apply to the bundle.

  7. Enter a release version for the bundle and optionally enter a description for the version.

  8. [Optional] Links can be added with a display name (max 100 chars) and URL (max 2048 chars).

    • Homepage

    • License

    • Links

    • Publications

  9. [Optional] Enter any information you would like to distribute with the bundle in the Documentation section.

  10. Select Save.

There is no option to delete bundles, they must be deprecated instead.

To cancel creating a bundle, select Bundles from the navigation at the top of the screen to return to your bundles overview.

Edit an Existing Bundle

To make changes to a bundle:

  1. From the main navigation, select Bundles.

  2. Select a bundle.

  3. Select Edit.

  4. Modify the bundle information and documentation as needed.

  5. Select Save.

When the changes are saved, they also become available in all projects that have this bundle linked.

Adding Assets to a Bundle

To make changes to a bundle:

  1. Select a bundle.

  2. On the left-hand side, select the type of asset (such as Flow > pipelines, Base > Tables or Bench > Docker Images) you want to add to the bundle.

  3. Depending on the asset type, select add or link to bundle.

  4. Select the assets and confirm.

Assets must meet the following requirements before they can be added to a bundle:

  • For Samples and Data, the project the asset belongs to must have data sharing enabled.

  • The region of the project containing the asset must match the region of the bundle.

  • You must have permission to access the project containing the asset.

  • Pipelines and tools need to be in released status.

  • Samples must be available in a complete state.

When you link folders to a bundle, a warning is displayed indicating that, depending on the size of the folder, linking may take considerable time. The linking process will run in the background and the progress can be monitored on the Bundles > your_bundle > activity > Batch Jobs screen. To see more details and the progress, double-click the batch job and then double-click the individual item. This will show how many individual files are already linked.

You can not add the same asset twice to a bundle. Once added, the asset will no longer appear in the selection list.

You need to be in list view in order to unlink items from a bundle. Select the item and choose the unlink action at the top of the screen.

Which batch jobs are visible as activity depends on the user role.

Create a New Bundle Version

When creating a new bundle version, you can only add assets to the bundle. You cannot remove existing assets from a bundle when creating a new version. If you need to remove assets from a bundle, it is recommended that you create a new bundle. All users wich currently have access to a bundle will automatically have access to the new version as well.

  1. From the main navigation, select Bundles.

  2. Select a bundle.

  3. Select + Create new Version.

  4. Make updates as needed and update the version number.

  5. Select Save.

When you create a new version of a bundle, it will replace the old version in your list. To see the old version, open your new bundle and look at Bundles > your_bundle > Details > Versioning. There you can open the previous version which is contained in your new version.

Assets such as data which were added in a previous version of your bundle will be marked in green, while new content will be black.

Add Terms of Use to a Bundle

  1. From the main navigation, Select Bundles > your_bundle > Bundle Settings > Legal.

  2. To add Terms of Use to a Bundle, do as follows:

    • Select + Create New Version.

    • Use the WYSIWYG editor to define Terms of Use for the selected bundle.

    • Click Save.

    • [Optional] Require acceptance by clicking the checkbox next to Acceptance required. Acceptance required will prompt a user to accept the Terms of Use before being able to use a bundle or add the bundle to a project.

  3. To edit the Terms of Use, repeat Steps 1-3 and use a unique version name. If you select acceptance required, you can choose to keep the acceptance status as is or require users to reaccept the terms of use. When reacceptance is required, users need to reaccept the terms in order continue using this bundle in their pipelines. This is indicated when they want to enter projects which use this bundle.

Collaborating on a Bundle

If you want to collaborate with other people on creating a bundle and managing the assets in the bundle, you can add users to your bundle and set their permissions. You use this to create a bundle together, not to use the bundle in your projects.

  1. From the main navigation, select Bundles > your_bundle > Bundle Settings > Team.

  2. To invite a user to collaborate on the bundle, do as follows.

    • To add a user from your tenant, select Someone of your tenant and select a user from the drop-down list.

    • To add a user by their email address, select By email and enter their email address.

    • To add all the users of an entire workgroup, select Add workgroup and select a workgroup from the drop-down list.

    • Select the Bundle Role drop-down list and choose a role for the user or workgroup. This role defines the ability of the user or workgroup to view or edit bundle settings.

      • Viewer: view content without editing rights.

      • Contributor: view bundle content and link/unlink assets.

      • Administrator: full edit rights of content and configuration.

    • Repeat as needed to add more users.

    Users are not officially added to the bundle until they accept the invitation.

  3. To change the permissions role for a user, select the Bundle Role drop-down list for the user and select a new role.

  4. To revoke bundle permissions from a user, select the trash icon for the user.

  5. Select Save Changes.

Sharing a Bundle

Once you have finalized your bundle and added all assets and legal requirements, you can share your bundle with other tenants to use it in their projects.

Your bundle must be in released status to prevent it from being updated while it is shared.

  1. Go to Bundles > your_bundle > Edit > Details > Bundle status and set it to Released.

  2. Save the change.

Once the bundle is released, you can share it. Invitations are sent to an individual email address, however access is granted and extended to all users and all workgroups inside that tenant.

  1. Go to Bundles > your_bundle > Bundle Settings > Share.

  2. Click Invite and enter the email address of the person you want to share the bundle with. They will receive an email from which they can accept or reject the invitation to use the bundle. The invitation will show the bundle name, description and owner. The link in the invite can only be used once.

Do not to create duplicate entries. You can only use one user/tenant combination per bundle.

You can follow up on the status of the invitation on the Bundles > your_bundle > Bundle Settings > Share page.

  • If they reject the bundle, the rejection date will be shown. To re-invite that person again later on, select their email address in the list and choose Remove. You can then create a new invitation. If you do not remove the old entry before sending a new invitation, they will be unable to accept and get an error message stating that the user and bundle combination must be unique. They can also not re-use an invitation once it has been accepted or declined.

  • If they accept the bundle, the acceptance date will be shown. They will in turn see the bundle under Bundles > Entitled bundles. To remove access, select their email address in the list and choose Remove.

Entitled Bundles

Entitled bundles are bundles created by Illumina or third parties for you to use in your projects. Entitled bundles can already be part of your tenant when it is part of your subscription. You can see your entitled bundles at Bundles > Entitled Bundles.

To use your shared entitled bundle, add the bundle to your project via Project Linking. Content shared via entitled bundles is read-only, so you cannot add or modify the contents of an entitled bundle. If you lose access to an entitled bundle previously shared with you, the bundle is unlinked and you will no longer be able to access its contents.

Docker Repository

In order to create a Tool or Bench image, a Docker image is required to run the application in a containerized environment. Illumina Connected Analytics supports both public Docker images and private Docker images uploaded to ICA.

Importing a Public External Image (Tools)

  1. Navigate to System Settings > Docker Repository.

  2. Click Create > External image to add a new external image.

  3. Add your full image URL in the Url field, e.g. docker.io/alpine:latest or registry.hub.docker.com/library/alpine:latest. Docker Name and Version will auto-populate. (Tip: do not add http:// or https:// in your URL)

Do not use :latest when the repository has rate limiting enabled as this interferes with caching and incurs additional data transfer.

  1. (Optional) Complete the Description field.

  2. Click Save.

  3. The newly added image will appear in your Docker Repository list.

Verification of the URL is performed during execution of a pipeline which depends on the Docker image, not during configuration.

External images are accessed from the external source whenever required and not stored in ICA. Therefore, it is important not to move or delete the external source. There is no status displayed on external Docker repositories in the overview as ICA cannot guarantee their availability. The use of :stable instead of :latest is recommended.

Importing a Private Image (Tools + Bench Images)

In order to use private images in your tool, you must first upload them as a TAR file.

  1. Navigate to Projects > your_project .

  2. Select your uploaded TAR file and click in the top menu on Manage > Change Format .

  3. Navigate to System Settings > Docker Repository (outside of your project).

  4. Click on Create > Image.

  5. Click on the magnifying glass to find your uploaded TAR image file.

  6. Select the appropriate region and if needed, filter on project from the drop-down menus to find your file.

  7. Select that file.

  8. The newly added image should appear in your Docker Repository list. Verify it is marked as Available under the Status column to ensure it is ready to be used in your tool or pipeline.

Copying Docker Images to other Regions

  1. Navigate to System Settings > Docker Repository.

  2. Either

    • Select the required image(s) and go to Manage > Add Region.

    • OR double-click on a required image, check the box matching the region you want to add, and select update.

  3. In both cases, allow a few minutes for the image to become available in the new region (the status becomes available in table view).

To remove regions, go to Manage > Remove Region or unselect the regions from the Docker image detail view.

Downloading Docker Images

You can download your created Docker images at System Settings > Docker Images > your_Docker_image > Manage > Download.

In order to be able to download Docker images, the following requirements must be met:

  • The Docker image can not be from an entitled bundle.

  • Only self-created Docker images can be downloaded.

  • The Docker image must be an internal image and in status Available.

  • You can only select a single Docker image at a time for download.

File Size Considerations

Docker image size should be kept as small as practically possible. To this end, it is best practice to compress the image. After compressing and uploading the image, select your uploaded file and click Manage > Change Format in the top menu to change it to Docker format so ICA can recognize the file.

Metadata Models

Illumina Connected Analytics allows you to create and assign metadata to capture additional information about samples.

Each tenant has one root metadata model that is accessible to all projects in the tenant. This allows an organization to collect the same piece of information for every sample in every project in the tenant, such as an ID number. Within this root model, you can configure multiple metadata submodels, even at different levels.

Illumina recommends that you limit the amount of fields or field groups you add to the root model. If there are any misconfigured items in the root model, it will carry over into all other metadata models in the tenant. Once a root model is published, the fields and groups that are defined within it cannot be deleted. You should first consider creating submodels before adding anything to the root model. When configuring a project, you have the option to assign one published metadata model for all samples in the project. This metadata model can be the root model, a submodel of the root model, or a submodel of a submodel. It can be any published metadata model in the tenant. When a metadata model is selected for a project, all fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.

❗️ Illumina recommends that you limit the amount of fields or field groups you add to the root model. You should first consider creating submodels before adding anything to the root model.

Metadata concepts

The following terminology is used within this page:

  • Metadata fields = Metadata fields will be linked to a sample in the context of a project. They can be of various types and could contain single or multiple values.

  • Metadata groups = You can identify that a few fields belong together (for example, they all are related to quality metrics). That would be the moment to create a group so that the user knows these fields belong together

  • Root model = Model that is linked to the tenant. Every metadata model that you link to a project will also contain the fields and groups specified in this model as this is a parent model for all other models. This is a subcategory of a project metadata model

  • Child/Sub model = Any metadata model that is not the root model. Child models will inherit all fields and groups from their parent models. This is a subcategory of a project metadata model

  • Pipeline model = Model that is linked to a specific pipeline and not a project

Metadata in the context of ICA will always give information about a sample. It can be provided by the user, the pipeline and via the API. There are 2 general categories of metadata models: Project Metadata Model and Pipeline Metadata Model. Both models are built from metadata fields and groups. The project metadata model is specific per tenant, while the pipeline metadata model is linked to a pipeline and can be shared across tenants. These models are defined by users.

Each sample can have multiple metadata models. Whenever you link a project metadata model to your project, you will see its groups and fields present on each sample. The root model from that tenant will also be present as every metadata model inherits the groups and fields specified in the parent metadata model(s). When a pipeline is executed with sample and the pipeline contained a metadata model, the groups and fields will be present as well for each analysis that comes out of a pipeline execution.

Groups & fields

The following field types are used within ICA:

  • Text: Free text

  • Keyword: Automatically complete value based on already used values

  • Numeric: Only numbers

  • Boolean: True or false, cannot be multiple value

  • Date: e.g. 23/02/2022

  • Date time: e.g. 23/02/2022 11:43:53, saved in UTC

  • Enumeration: select value out of drop-down list

The following properties can be selected for groups & fields:

  • Required: Pipeline can’t be started with this sample until the required group/field is filled in

  • Sensitive: Values of this group/field are only visible to project users of the own tenant. When a sample is shared across tenants, these fields won't be visible

  • Filled by pipeline: Fields that need to be filled by pipeline should be part of the same group. This group will automatically be multiple value and values will be available after pipeline execution. This property is only available for groups

  • Multiple value: This group/field can consist out of multiple (grouped) values

❗️ Fields cannot be both required and filled by pipeline

Project vs. Pipeline Metadata Models

Project metadata model has metadata linked to a specific project. Values are known upront, general information is required for each sample of a specific project, and it may include general mandatory company information.

Pipeline metadata model has metadata linked to a specific pipeline. Values are populated during the pipeline execution and it requires an output file with the name 'metadata.response.json'.

❗️ Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled

Metadata Actions

Publish a Metadata Model

Newly created and updated metadata models are not available for use within the tenant until the metadata model is published. When a metadata model is published, fields and field groups cannot be deleted, but the names and descriptions for fields and field groups can be edited. A model can be published after verifying all parent models are published first.

Retire a Metadata Model

If a published metadata model is no longer needed, you can retire the model (except the root model).

  1. First, check if the model contains any submodels. A model cannot be retired if it contains any published submodels.

  2. When you are certain you want to retire a model and all submodels are retired, click on the three dots in the top right of the model window, and then select Retire Metadata Model.

Assign a Metadata Model to a Project

To add metadata to your samples, you first need to assign a metadata model to your project.

  1. Go to Projects > your_project > Project Settings > Details.

  2. Select Edit.

  3. From the Metadata Model drop-down list, select the metadata model you want to use for the project.

  4. Select Save. All fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.

Add Metadata to Samples Manually

To manually add metadata to samples in your project, do as follows.

  1. Precondition is that you have a metadata model assigned to your project

  2. Go to Projects > your_project > Samples > your_sample.

  3. Double-click your sample to open the sample details.

  4. Enter all metadata information as it applies to the selected sample. All required metadata fields must be populated or the pipeline cannot start.

  5. Select Save

Populating a Pipeline Metadata Model

To fill metadata by pipeline executions, a pipeline model must be created.

  1. In the Illumina Connected Analytics main navigation, go to Projects > your_project > Flow > Pipelines > your_pipeline.

  2. Double-click on your pipeline to open the pipeline details.

  3. Create/Edit your model under Metadata Model tab. Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled.

In order for your pipeline to fill the metadata model, an output file with the name metadata.response.json must be generated. After adding your group fields to the pipeline model, click on Generate example JSON to view the required format for your pipeline.

❗️ The field names cannot have . in them, e.g. for the metric name Q30 bases (excl. dup & clipped bases) the . after excl must be removed.

Pushing Metadata Metrics to Base

Populating metadata models of samples allows having a sample-centric view of all the metadata. It is also possible to synchronize that data into your project's Base warehouse.

  1. In the Illumina Connected Analytics main navigation, select Projects.

  2. In your project menu select Schedule.

  3. Select 'Add new', and then click on the Metadata Schedule option.

  4. Type a name for your schedule, optionally add description, and select whether you would like the metadata source would be the current project or the entire tenant. It is also possible to select whether ICA references would be anonymized and if sensitive metadata fields would be included. As a reminder, values of sensitive metadata fields would not be visible to other users outside of the project.

  5. Select Save.

  6. Navigate to Tables under BASE menu in your project.

  7. Two new table schemas should be added with your current metadata models.

Data Integrity

This ETag does not change and can be used as a file integrity check even when that file is archived, unarchived and/or copied to another location. Changes to the metadata have no impact on the ETag

For larger files, the process is different due to computation limitations. In these cases, we recommend using a dedicated pipeline on our platform to explicitly calculate the MD5 checksum. Below you can find both a main.nf file and the corresponding XML for a possible Nextflow pipeline to calculate the MD5 checksum for FASTQ files.

Connect AWS S3 Bucket

You can use your own S3 bucket with Illumina Connected Analytics (ICA) for data storage. This section describes how to configure your AWS account to allow ICA to connect to an S3 bucket.

When configuring a new project in ICA to use a preconfigured S3 bucket, create a folder on your S3 bucket in the AWS console. This folder will be connected to ICA as a prefix.

Failure to create a folder will result in the root folder of your S3 bucket being assigned which will block your S3 bucket from being used for other ICA projects with the error "Conflict while updating file/folder. Please try again later."

  • When creating an empty folder in S3, it will not be visible in ICA.

  • When moving folders in S3, the original, but empty, folder will remain visible in ICA and must be manually deleted there.

  • When deleting a folder and its contents in S3, the empty folder will remain visible in ICA and must be manually deleted there.

  • Projects cannot be created with ./ as prefix since S3 does not allow uploading files with this key prefix.

Prerequisites

The AWS S3 bucket must exist in the same AWS region as the ICA project. Refer to the table below for a mapping of ICA project regions to AWS regions:

(*) BSSH is not currently deployed on the South Korea instance, resulting in limited functionality in this region with regard to sequencer integration.

You can use unversioned, versioned and suspended buckets as own S3 storage.

If you connect buckets with object versioning, the data in ICA will be automatically synced with the data in objectstore. When an object is deleted without specifying a particular version, a Delete marker is created on the objectstore to indicate that the object has been deleted. ICA will reflect the object state by deleting the record from the database. No further action on your side is needed to sync.

Configuration

1 - Configure Bucket CORS Permission

In the cross-origin resource sharing (CORS) section, enter the following content.

2 - Create Data Access Permission - AWS IAM Policy

ICA requires specific permissions to access data in an AWS S3 bucket. These permissions are contained in an AWS IAM Policy.

Permissions

paste the JSON policy document below. Note the example below provides access to all objects prefixes in the bucket.

Replace YOUR_BUCKET_NAME with the name of the S3 bucket you created for ICA. Replace YOUR_FOLDER_NAME with the name of the folder in your S3 bucket.

On Versioned OR Suspended buckets, paste the JSON policy document below. Note the example below provides access to all objects prefixes in the bucket.

Replace YOUR_BUCKET_NAME with the name of the S3 bucket you created for ICA. Replace YOUR_FOLDER_NAME with the name of the folder in your S3 bucket.

(Optional) Set policy name to "illumina-ica-admin-policy"

To create the IAM Policy via the AWS CLI, create a local file named illumina-ica-admin-policy.json containing the policy content above and run the following command. Be sure the path to the policy document (--policy-document) leads to the path where you saved the file:

3 - Create AWS IAM User

An AWS IAM User is needed to create an Access Key for ICA to connect to the AWS S3 Bucket. The policy will be attached to the IAM user to grant the user the necessary permissions.

  • (optional) Set user name to "illumina_ica_admin"

  • Select the Programmatic access option for the type of access

  • (Optional) Retrieve the Access Key ID and Secret Access Key by choosing to Download .csv

To create the IAM user and attach the policy via the AWS CLI, enter the following command (AWS IAM users are global resources and do not require a region to be specified). This command creates an IAM user illumina_ica_admin, retrieves your AWS account number, and then attaches the policy to the user.

4. -Create AWS Access Key

If the Access Key information was retrieved during the IAM user creation, skip this step.

Use the command below to create the Access Key for the illumina_ica_admin IAM user. Note the SecretAccessKey is sensitive and should be stored securely. The access key is only displayed when this command is executed and cannot be recovered. A new access key must be created if it is lost.

The AccessKeyId and SecretAccessKey values will be provided to ICA in the next step.

5 - S3 Bucket Policy

Connecting your S3 bucket to ICA does not require any additional bucket policies.

What if you need a bucket policy for use cases beyond ICA?

The bucket policy must then support the essential permissions needed by ICA without inadvertently restricting its functionality.

Be sure to replace the following fields:

  • YOUR_BUCKET_NAME: Replace this field with the name of the S3 bucket you created for ICA.

  • YOUR_ACCOUNT_ID: Replace this field with your account ID number.

  • YOUR_IAM_USER: Replace this field with the name of your IAM user created for ICA.

In this example, restriction is enabled on the bucket policy to prevent any kind of access to the bucket. However, there is an exception rule added for the IAM user that ICA is using to connect to the S3 bucket. The exception rule is allowing ICA to perform the above S3 action permissions necessary for ICA functionalities.

Additionally, the exception rule is applied to the STS federated user session principal associated with ICA. Since ICA leverages the AWS STS to provide temporary credentials that allow users to perform actions on the S3 bucket, it is crucial to include these STS federated user session principals in your policy's whitelist. Failing to do so could result in 403 Forbidden errors when users attempt to interact with the bucket's objects using the provided temporary credentials.

6 - Create ICA Storage Credential

To connect your S3 account to ICA, you need to add a storage credential in ICA containing the Access Key ID and Access Key created in the previous step. From the ICA home screen, navigate to System Settings > Credentials and click the Create button to create a new storage credential.

Provide a name for the storage credentials, ensure the type is set to "AWS user" and provide the Access Key ID and Secret Access Key.

7 - Enabling Cross Account Access for Copy and Move Operations

ICA uses AssumeRole to copy and move objects from a bucket in an AWS account to another bucket in another AWS account. To allow cross account access to a bucket, the following policy statements must be added in the bucket policy:

Be sure to replace the following fields:

  • ASSUME_ROLE_ARN: Replace this field with the ARN of the cross account role you want to give permission to. Refer to the table below to determine which region-specific Role ARN should be used.

  • YOUR_BUCKET_NAME: Replace this field with the name of the S3 bucket you created for ICA.

The ARN of the cross account role you want to give permission to is specified in the Principal. Refer to the table below to determine which region-specific Role ARN should be used.


Troubleshooting

Common Issues

The following are common issues encountered when connecting an AWS S3 bucket through a storage configuration

Conflicting bucket notifications

  • Volume Configuration cannot be provisioned: storage container is already set up for customer's own notification

  • Invalid parameters for volume configuration: found conflicting storage container notifications with overlapping prefixes

  • Failed to update bucket policy: Configurations overlap. Configurations on the same bucket cannot share a common event type

Solution:

  1. In the Amazon S3 Console, review your current S3 bucket's notification configuration and look for prefixes that overlaps with your Storage Configuration's key prefix

  2. Delete the existing notification that overlaps with your Storage Configuration's key prefix

  3. ICA will perform a series of steps in the background to re-verify the connection to your bucket.

GetTemporaryUploadCredentialsAsync failure

This error can occur when recreating a recently deleted storage configuration. To fix the issue, you have to delete the bucket notifications:

  1. Choose properties

  2. Navigate to the Event Notifications section and choose the check box for the event notifications with name gds:objectcreated, gds:objectremoved and gds:objectrestore and click Delete.

  3. Wait 15 minutes for the storage to become available in ICA

If you do not want to wait 15 minutes, you can delete the current storage configuration, delete the bucket notifications in the bucket and create a new storage configuration.

New users may reference the for detailed guidance on setting up an account and registering a subscription.

For more details on identity and access management, please see the help site.

To access the APIs using the command-line interface (CLI), an API Key may be provided as credentials when logging in. API Keys operate similar to a user name and password and should be kept secure and rotated on a regular basis (preferably yearly). When keys are compromised or no longer in use, they must be revoked. This is done through the by navigating to the profile drop down and selecting "Manage API Keys", followed by selecting the key and using the trash icon next to it.

For long-lived credentials to the API, an API Key can be generated from the account console and used with the API and command-line interface. Each user is limited to 10 API Keys. API Keys are managed through the product dashboard after logging in through the by navigating to the profile drop down and selecting "Manage API Keys".

The web application provides a visual user interface (UI) for navigating resources in the platform, managing projects, and extended features beyond the API. To access the web application, navigate to the .

The command-line interface offers a developer-oriented experience for interacting with the APIs to manage resources and launch analysis workflows. Find instructions for using the command-line interface including download links for your operating system in the .

The HTTP-based application programming interfaces (APIs) are listed in the section of the documentation. The reference documentation provides the ability to call APIs from the browser page and shows detailed information about the API schemas. HTTP client tooling such as Postman or cURL can be used to make direct calls to the API outside of the browser.

When accessing the API using the API Reference page or through REST client tools, the Authorization header must be provided with the value set to Bearer <token> where <token> is replaced with a valid JSON Web Token (JWT). For generating a JWT, see .

Learn how to .

The and documentation pages match navigation within ICA. We also offer supporting documentation for popular topics like , , and .

For more content on topics like , , , and other resources, view the section.

For more an overview of the available subscription tiers and functionality, please refer to on the Illumina website.

(link / unlink)

(link / unlink)

(add / delete)

(link/unlink)

and (link/unlink)

(read-only) (link/unlink)

The main Bundles screen has two tabs: My Bundles and Entitled Bundles. The My Bundles tab shows all the bundles that you are a member of. This tab is where most of your interactions with bundles occur. The Entitled Bundles tab shows the bundles that have been specially created by Illumina or other organizations and shared with you to use in your projects. See .

You can not link bundles which come with additional restrictions to .

Upload your private image as a TAR file, either by dragging and dropping the file in the Data tab, using the CLI or a Connector. For more information please refer to the project .

Select DOCKER from the drop-down menu and Save.

Select the appropriate region, fill in the Docker Name and Version and if it is a tool or a bench image, and click Save. \

You need a with a download rule to download the Docker image.

You can verify the integrity of the data by comparing the hash which is usually () an MD5 (Message Digest Algorithm 5) checksum. This is a common cryptographic hash function that generates a fixed-size, 128-bit hash value from any input data. This hash value is unique to the content of the data, meaning even a slight change in the data will result in a significantly different MD5 checksum. AWS S3 calculates this checksum when data is uploaded and stores it in the ETag (Entity tag).

For files smaller than 16 MB, you can directly retrieve the MD5 checksum using our endpoints. Make an API GET call to the https://ica.illumina.com/ica/rest/api/projects/{projectId}/data/{dataId} endpoint specifying the data Id you want to check and the corresponding project ID. The response you receive will be in JSON format, containing various file metadata. Within the JSON response, look for the objectETag field. This value is the MD5 checksum for the file you have queried. You can compare this checksum with the one you compute locally to ensure file integrity.

These instructions utilize the AWS CLI. Follow the for instructions to download and install.

Because of how and does not send events for S3 folders, the following restrictions must be taken into account for ICA project data stored in S3.

ICA Project Region
AWS Region

You can enable SSE using an Amazon S3-managed key (SSE-S3). Instructions for using KMS-managed (SSE-KMS) keys are found .

ICA requires cross-origin resource sharing (CORS) permissions to write to the S3 bucket for uploads via the browser. Refer to the (expand the "Using the S3 console" section) documentation for instructions on enabling CORS via the AWS Management Console.

Refer to the documentation for instructions on creating an AWS IAM Policy via the AWS Management Console. Use the following configuration during the process:

Refer to the documentation for instructions on creating an AWS IAM User via the AWS Management Console. Use the following configuration during the process:

Select Attach existing policies directly when setting the permissions, and choose the policy created in

Refer to the AWS documentation for instructions on creating an AWS Access Key via the AWS Console. See the "To create, modify, or delete another IAM user's access keys (console)" sub-section.

With the secret credentials created, a storage configuration can be created using the secret credential. Refer to the instructions to for details.

Region
Role ARN
Error Type
Error Message
Description/Fix

This error occurs when an existing bucket notification's event information overlaps with the notifications ICA is trying to add. only allows overlapping events with non-overlapping prefix. Depending on the conflicts on the notifications, the error can be presented in any of the following:

In the select the bucket for which you need to delete the notifications from the list.

Illumina Connected Software Registration Guide
Illumina Connected Software
domain login URL
domain login URL
Illumina Connected Analytics portal
CLI documentation
API Reference
Get Started
Home
Project
Command-line interface
Sequencer Integration
Tutorials
Release Notes
Security and Compliance
API
Reference
this page
Data
Samples
Reference Data
Pipelines
Tools
Tool images
Base tables
Base query templates
Bench docker images
Access and Use an Entitled Bundle
service connector
JSON Web Token (JWT)
nextflow.enable.dsl = 2


process md5sum {
    
    container "public.ecr.aws/lts/ubuntu:22.04"
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
    
    input:
        file txt

    output:
        stdout emit: result
        path '*', emit: output

    publishDir "out", mode: 'symlink'

    script:
        txt_file_name = txt.getName()
        id = txt_file_name.takeWhile { it != '.'}

        """
        set -ex
        echo "File: $txt_file_name"
        echo "Sample: $id"
        md5sum ${txt} > ${id}_md5.txt
        """
    }

workflow {
    txt_ch = Channel.fromPath(params.in)
    txt_ch.view()
    md5sum(txt_ch).result.view()
}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition">
    <pd:dataInputs>
        <pd:dataInput code="in" format="FASTQ" type="FILE" required="true" multiValue="true">
            <pd:label>Input</pd:label>
            <pd:description>FASTQ files input</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps/>
</pd:pipeline>

Australia

ap-southeast-2

Canada

ca-central-1

Germany

eu-central-1

India

ap-south-1

Indonesia

ap-southeast-3

Israel

il-central-1

Japan

ap-northeast-1

Singapore

ap-southeast-1

South Korea*

ap-northeast-2

UK

eu-west-2

United Arab Emirates

me-central-1

United States

us-east-1

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "HEAD",
            "GET",
            "PUT",
            "POST",
            "DELETE"
        ],
        "AllowedOrigins": [
            "https://ica.illumina.com"
        ],
        "ExposeHeaders": [
            "ETag",
            "x-amz-meta-custom-header"
        ]
    }
]
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutBucketNotification",
                "s3:ListBucket",
                "s3:GetBucketNotification",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::YOUR_BUCKET_NAME"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:RestoreObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/YOUR_FOLDER_NAME/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sts:GetFederationToken"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutBucketNotification",
                "s3:ListBucket",
                "s3:GetBucketNotification",
                "s3:GetBucketLocation",
                "s3:ListBucketVersions",
                "s3:GetBucketVersioning"
            ],
            "Resource": [
                "arn:aws:s3:::YOUR_BUCKET_NAME"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:RestoreObject",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion",
                "s3:GetObjectVersion"
            ],
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/YOUR_FOLDER_NAME/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sts:GetFederationToken"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}
aws iam create-policy --policy-name illumina-ica-admin-policy --policy-document file://illumina-ica-admin-policy.json
aws iam create-user --user-name illumina_ica_admin
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
aws iam attach-user-policy --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/illumina-ica-admin-policy --user-name illumina_ica_admin
aws iam create-access-key --user-name illumina_ica_admin

    "AccessKey": {
        "UserName": "illumina_ica_admin",
        "AccessKeyId": "<access key id>",
        "Status": "Active",
        "SecretAccessKey": "<secret access key>",
        "CreateDate": "2020-10-22 09:42:24+00:00"
    }
{
     "Version": "2012-10-17",
     "Statement": [
         {
             "Effect": "Deny",
             "Principal": {
                 "AWS": "*"
             },
             "Action": [
                 "s3:PutObject",
                 "s3:GetObject",
                 "s3:RestoreObject",
                 "s3:DeleteObject",
                 "s3:DeleteObjectVersion",
                 "s3:GetObjectVersion"
             ],
             "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*",
             "Condition": {
                 "ArnNotLike": {
                     "aws:PrincipalArn": [
                         "arn:aws:iam::YOUR_ACCOUNT_ID:user/YOUR_IAM_USER",
                         "arn:aws:sts::YOUR_ACCOUNT_ID:federated-user/*"
                     ]
                 }
             }
         }
     ]
 }
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "Allow cross account access",
                "Effect": "Allow",
                "Principal": {
                    "AWS": "ASSUME_ROLE_ARN"
                },
                "Action": [
                    "s3:PutObject",
                    "s3:DeleteObject",
                    "s3:ListMultipartUploadParts",
                    "s3:AbortMultipartUpload",
                    "s3:GetObject"
                ],
                "Resource": [
                    "arn:aws:s3:::YOUR_BUCKET_NAME",
                    "arn:aws:s3:::YOUR_BUCKET_NAME/*"
                ]
            }
        ]
    }

Australia (AU)

arn:aws:iam::079623148045:role/ica_aps2_crossacct

Canada (CA)

arn:aws:iam::079623148045:role/ica_cac1_crossacct

Germany (EU)

arn:aws:iam::079623148045:role/ica_euc1_crossacct

India (IN)

arn:aws:iam::079623148045:role/ica_aps3_crossacct

Indonesia (ID)

arn:aws:iam::079623148045:role/ica_aps4_crossacct

Israel (IL)

arn:aws:iam::079623148045:role/ica_ilc1_crossacct

Japan (JP)

arn:aws:iam::079623148045:role/ica_apn1_crossacct

Singapore (SG)

arn:aws:iam::079623148045:role/ica_aps1_crossacct

South Korea (KR)

arn:aws:iam::079623148045:role/ica_apn2_crossacct

UK (GB)

arn:aws:iam::079623148045:role/ica_euw2_crossacct

United Arab Emirates (AE)

arn:aws:iam::079623148045:role/ica_mec1_crossacct

United States (US)

arn:aws:iam::079623148045:role/ica_use1_crossacct

Samples

You can use samples to group information related to a sample, including input files, output files, and analyses.

You can search for samples (excluding their metadata) with the Search button at the top right.

Add New Sample

To add a new sample, do as follows.

  1. Select Projects > your_project > Samples.

  2. To add a new sample, select + Create, and then enter a unique name and description for the sample.

  3. To include files related to the sample, select + Add data to sample.

Your sample is added to the Samples page. To view information on the sample, select the sample, and then select Open Details.

Add Files to Samples

You can add additional files to a sample after creating the sample. Any files that are not currently included in a sample are listed on the Unlinked Files tab.

To add an unlinked file to a sample, do as follows.

  1. Go to Projects > your_project > Samples > Unlinked files tab.

  2. Select a file or files, and then select one of the following options:

    • Create sample — Create a new sample that includes the selected files.

    • Link to sample — Select an existing sample in the project to link the file to.

Alternatively, you can add unlinked files from the sample details.

  1. Going to Projects > your_project > Samples > your_sample.

  2. Select your sample to open the details.

  3. The last section of the details is files, where you select + Add data to sample.

  4. If the data is not in your project, select Choose a file, which will upload the data to your project. This does not automatically add it to your sample, you will still have to select that newly uploaded data and then select add data to sample.

Data can only be linked to a single sample, so once you have linked data to a sample, it will no longer appear in the list of data to choose form.

Removing Files from Samples

To remove files from samples,

  1. Go to Projects > your_project > Samples > your_sample > Details.

  2. Go to the files section and open the file details of the file you want to remove.

  3. Select Remove data from sample.

  4. Save your changes.

Link Samples to Project

A Sample can be linked to a project from a separate project to make it available in read-only capacity.

  1. Navigate to the Samples view in the Project

  2. Click the Link button

  3. Select the Sample(s) to link to the project

  4. Click the Link button

Data linked to Samples is not automatically linked to the project. The data must be linked separately from the Data view. Samples also must be available in a complete state in order to be linked.

Delete Samples

If you want to remove a sample, select it and use the delete option from the top navigation row. You will be presented a choice of how to handle the data in the sample.

  • Unlink all data without deleting it.

  • Delete input data and unlink other data.

  • Delete all data.

Illumina Connected Analytics
with some exceptions
API
AWS CLI documentation
Amazon S3 handles folders
here
Configuring cross-origin resource sharing (CORS)
Creating policies on the JSON tab
Creating IAM users (console)
Managing access keys (console)
Amazon S3 event notification
Amazon S3 Console
Create AWS IAM Policy
Data
Connect AWS S3 Bucket to ICA Project

Access Forbidden

Access forbidden: {message}

Mostly occurs because of lack of permission. Fix: Review IAM policy, Bucket policy, ACLs for required permissions

Conflict

System topic is not in a valid state

Conflict

Found conflicting storage container notifications with overlapping prefixes

Conflict

Found conflicting storage container notifications for {prefix}{eventTypeMsg}

Conflict

Found conflicting storage container notifications with overlapping prefixes{prefixMsg}{eventTypeMsg}

Customer Container Notification Exists

Volume Configuration cannot be provisioned: storage container is already set up for customer's own notification

Invalid Access Key ID

Failed to update bucket policy: The AWS Access Key Id you provided does not exist in our records.

Check the status of the AWS Access Key ID in the console. If not active, activate it. If missing, create it.

Invalid Paramater

Missing credentials for storage container

Invalid Parameter

Missing bucket name for storage container

Invalid Parameter

The storage container name has invalid characters

Invalid Parameter

Storage Container '{storageContainer}' does not exist

Invalid Parameter

Invalid parameters for volume configuration: {message}

Invalid Storage Container Location

Storage container must be located in the {region} region

Invalid Storage Container Location

Storage container must be located in one of the following regions: {regions}

Missing Configuration

Missing queue name for storage container notification

Missing Configuration

Missing system topic name for storage container notification

Missing Configuration

Missing lambda ARN for storage container notification

Missing Configuration

Missing subscription name for storage container notification

Missing Storage Account Settings

The storage account '{storageAccountName}' needs HNS (Hierarchical Namespace) enabled.

Missing Storage Container Settings

Missing settings for storage container

SSE-KMS Encryption

Create an S3 bucket with SSE-KMS

In the "Default encryption" section, enable Server-side encryption and choose AWS Key Management Service key (SSE-KMS). Then select Choose your AWS KMS key.

Connect the S3-SSE-KMS to ICA

  • Add permission to use KMS key by adding kms:Decrypt, kms:Encrypt, and kms:GenerateDataKey

  • Add the ARN KMS key arn:aws:kms:xxx on the first "Resource"

  • On Unversioned buckets, the permssions will match the following:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "kms:Decrypt",
                    "kms:Encrypt",
                    "kms:GenerateDataKey",
                    "s3:PutBucketNotification",
                    "s3:ListBucket",
                    "s3:GetBucketNotification",
                    "s3:GetBucketLocation"
                ],
                "Resource": [
                    "arn:aws:kms:xxx",
                    "arn:aws:s3:::BUCKET_NAME"
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:RestoreObject",
                    "s3:DeleteObject"
                ],
                "Resource": "arn:aws:s3:::BUCKET_NAME/YOUR_FOLDER_NAME/*"
            },
            {
                "Effect": "Allow",
                "Action": [
                    "sts:GetFederationToken"
                ],
                "Resource": [
                    "*"
                ]
            }
        ]
    }
  • On Versioned OR Suspended buckets, the permssions will match the following:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "kms:Decrypt",
                    "kms:Encrypt",
                    "kms:GenerateDataKey",
                    "s3:PutBucketNotification",
                    "s3:ListBucket",
                    "s3:GetBucketNotification",
                    "s3:GetBucketLocation",
                    "s3:ListBucketVersions",
                    "s3:GetBucketVersioning"
                ],
                "Resource": [
                    "arn:aws:kms:xxx",
                    "arn:aws:s3:::BUCKET_NAME"
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:RestoreObject",
                    "s3:DeleteObject",
                    "s3:DeleteObjectVersion",
                    "s3:GetObjectVersion"
                ],
                "Resource": "arn:aws:s3:::BUCKET_NAME/YOUR_FOLDER_NAME/*"
            },
            {
                "Effect": "Allow",
                "Action": [
                    "sts:GetFederationToken"
                ],
                "Resource": [
                    "*"
                ]
            }
        ]
    }

At the end of the policy setting, there should be 3 permissions listed in the "Summary".

Create the S3-SSE-KMS configuration in ICA

On step 3 in process above, continue with the [Optional] Server Side Encryption to enter the algorithm and key name for server-side encryption processes.

  • On "Algorithm", input aws:kms

  • On "Key Name", input the ARN KMS key: arn:aws:kms:xxx

Although "Key prefix" is optional, it is highly recommended to use this and not use the root folder of your S3 bucket. "Key prefix" refers to the folder name in the bucket which you created.

Additional set up for Cross Account Copy for S3 buckets with SSE-KMS encryption

    {
        "Sid": "Allow cross account access",
        "Effect": "Allow",
        "Principal": {
            "AWS": "ASSUME_ROLE_ARN"
        },
        "Action": [
            "kms:Encrypt",
            "kms:Decrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey*",
            "kms:DescribeKey"
        ],
        "Resource": "*"
    }

Data

The Data section gives you access to the files and folders stored in the project as well as those linked to the project. Here, you can perform searches and data management operations such as moving, copying, deleting and (un)archiving.

Recommended Practices

File/Folder Naming

The length of the file name (minus prefixes and delimiters) is ideally limited to 32 characters.

Characters generally considered "safe"
  • Alphanumeric characters

    • 0-9

    • a-z

    • A-Z

  • Special characters

    • Exclamation point !

    • Hyphen -

    • Underscore _

    • Period .

    • Asterisk *

    • Single quote '

    • Open parenthesis (

    • Closed parenthesis )

Troubleshooting

If you get an error "Unable to generate credentials from the objectstore as the requested path is too long." from AWS when requesting temporary credentials, then the path should be shortened.

You can truncate the sample name and user reference or use advanced output mapping in the API which avoids generating the long folders and creates output in the targetPath-defined location.

"analysisOutput": [ { "sourcePath": "out", "type": "FOLDER", "targetProjectId": "enter_your_target_project_id", "targetPath": "/enter_your_target_folder/" }

Data Formats

Data Privacy

Data Integrity

Data Management

To prevent cost issues, you can not perform actions such as copying and moving data which would write data to the workspace when the project billing mode is set to tenant and the owning tenant of the folder is not the current user's tenant.


Viewing Data

On the Projects > your_project > Data page, you can view file information and preview files.

To view file details click on the filename to see the file details.

  • Run input tags identify the last 100 pipelines which used this file as input.

  • Connector tags indicate if the file was added via browser upload or connector.

To view file contents, select the checkbox at the begining of the line and then select View from the top menu. Alternatively, you can first click on the filename to see the details and then click view to preview the file.

When you share the data view by sharing the link from your browser, filters and sorting is retained in links, so the reciepient will see the same data and order.

To see the ongoing actions (copying from, copying to, moving from, moving to) on data in the data overview (Projects > your_project > Data), add the ongoing actions column from the column list. This contains a list of ongoing actions sorted by when they were created. You can also consult the data detail view for ongoing actions by clicking on the data in the overview. When clicking on an ongoing action itself, the data job details of the most recent created data job are shown.

For folders, the list of ongoing actions is displayed on top left of the folder details. When clicking the list, the data job details are shown of the most recent created data job of all actions.

Secondary Data

When Secondary Data is added to a data record, those secondary data records are mounted in the same parent folder path as the primary data file when the primary data file is provided as an input to a pipeline. Secondary data is intended to work with the CWL secondaryFiles feature. This is commonly used with genomic data such as BAM files with companion BAM index files (refer to https://www.ncbi.nlm.nih.gov/tools/gbench/tutorial6/ for an example).


Hyperlinking to Data

To hyperlink to data, use the following syntax:

https://<ServerURL>/ica/link/project/<ProjectID>/data/<FolderID>
https://<ServerURL>/ica/link/project/<ProjectID>/analysis/<AnalysisID>
Variable
Location

ServerURL

see browser addres bar

projectID

At YourProject > Details > URN > urn:ilmn:ica:project:ProjectID#MyProject

FolderID

At YourProject > Data > folder > folder details > ID

AnalysisID

At YourProject > Flow > Analyses > YourAnalysis > ID

Normal permission checks still apply with these links. If you try to follow a link to data to which you do not have access, you will be returned to the main project screen or login screen, depending on your permissions.


Uploading Data

Uploading data to the platform makes it available for consumption by analysis workflows and tools.

UI Upload

To upload data manually via the drag-and-drop interface in the platform UI, go to Projects > your_project > Data and either

  • Drag a file from your system into the Choose a file or drag it here box.

  • Select the Choose a file or drag it here box, and then choose a file. Select Open to upload the file.

Your files are added to the Data page with status partial during upload and become available when upload completes.

Do not close the ICA tab in your browser while data uploads.

Upload Data via CLI


Copying Data

You can copy data from the same project to a different folder or from another project to which you have access.

In order to copy data, the following rights must be assigned to the person copying the data:

Copy Data Rights
Source Project
Destination Project

Within a project

  • Contributor rights

  • Upload and Download rights

  • Contributor rights

  • Upload and Download rights

Between different projects

  • Download rights

  • Viewer rights

  • Upload rights

  • Contributor rights

The following restrictions apply when copying data:

Copy Data Restrictions
Source Project
Destination Project

Within a project

  • No linked data

  • No partial data

  • No archived data

  • No Linked data

Between different projects

  • Data sharing enabled

  • No partial data

  • No archived data

  • Within the same region

  • No linked data

  • Within the same region

Data in the "Partial" or "Archived" state will be skipped during a copy job.

To use data copy:

  1. Go to the destination project for your data copy and proceed to Projects > your_project > Data > Manage > Copy From.

  2. Optionally, use the filters (Type, Name, Status, Format or additional filters) to filter out the data or search with the search box.

  3. Select the data (individual files or folders with data) you want to copy.

  4. Select any meta data which you want to keep with the copied data (user tags, technical system tags or instrument information).

  5. Select which action to take if the data already exists (overwrite exsiting data, don't copy or keep both the original and the new copy by appending a number to the copied data).

  6. Select Copy Data to copy the data to your project. You can see the progress in Projects > your_project > Activity > Batch Jobs and if your browser permits it, a pop-up message will be displayed whan the copy process completes.

The outcome can be

  • INITIALIZED

  • WAITING_FOR_RESOURCES

  • RUNNING

  • STOPPED - When choosing to stop the batch job.

  • SUCCEEDED - All files and folders are copied.

  • PARTIALLY_SUCCEEDED - Some files and folders could be copied, but not all. Partially succeeded will typically occur when files were being modified or unavailable while the copy process was running.

  • FAILED - None of the files and folders could be copied.

To see the ongoing actions on data in the data overview (Projects > your_project > Data), you can add the ongoing actions column from the column list with the three column symbol at the top right, next to the filter funnel. You can also consult the data detail view for ongoing actions by clicking on the data in the overview.

There is a difference in copy type behavior between copying files and folders. The behavior is designed for files and it is best practice to not copy folders if there already is a folder with the same name in the destination location.

Notes on copying data

  • Copying data comes with an additional storage cost as it will create a copy of the data.

  • You can copy over the same data multiple times.

  • On the command-line interface, the command to copy data is icav2 projectdata copy.


Move Data

You can move data both within a project and between different projects to which you have access. If you allow notifications from your browser, a pop-up will appear when the move is completed.

  • Move From is used when you are in the destination location.

  • Move To is used when you are in the source location. Before moving the data, pre-checks are performed to verify that the data can be moved and no currently running operations are being performed on the folder. Conflicting jobs and missing permissions will be reported. Once the move has started, no other operation should be performed on the data being moved to avoid potential data loss or duplication. Adding or (un)archiving files during the move may result in duplicate folders and files with different identifiers. If this happens, you will need to manually delete the duplicate files and move the files which were skipped during the initial move.

When you move data from one location to another, you should not change the source data while the Move job is in progress. This will result in jobs getting aborted. Please expand the "Troubleshooting" section below for information on how to fix this if it occurs.

Troubleshooting
  1. If the source or destination of data being moved is modified, the Move jobs will detect the changes and abort the job.

  2. Modifying data at either the source or destination during a Move process can result in incomplete data transfer. Users can still manually move any remaining data afterward.

There are a number of rights and restrictions related to data move as this will delete the data in the source location.

Move Data Rights
Source Project
Destination Project

Within a project

  • Contributor rights

  • Contributor rights

Between different projects

  • Download rights

  • Contributor rights

  • Upload rights

  • Viewer rights

Move Data Restrictions
Source Project
Destination Project

Within a project

  • No linked data

  • No partial data

  • No archived data

  • No Linked data

Between different projects

  • Data sharing enabled

  • Data owned by user's tenant

  • No linked data

  • No partial data

  • No archived data

  • No externally managed projects

  • Within the same region

  • No linked data

  • Within same region

Move jobs will fail if any data being moved is in the "Partial" or "Archived" state.

Move Data From

Move Data From is used when you are in the destination location.

  1. Navigate to Projects > your_project > Data > your_destination_location > Manage > Move From.

  2. Select the files and folders which you want to move.

  3. Select the Move button. Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.

Move Data To

Move Data To is used when you are in the source location. You will need to select the data you want to move from to current location and the destination to move it to.

  1. Navigate to Projects > your_project > Data > your_source_location.

  2. Select the files and folders which you want to move.

  3. Select to Projects > your_project > Data > your_source_location > Manage > Move To.

  4. Select your target project and location.

  5. Select the Move button. Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.

Move Status

  • INITIALIZED

  • WAITING_FOR_RESOURCES

  • RUNNING

  • STOPPED - When choosing to stop the batch job.

  • SUCCEEDED - All files and folders are moved.

  • PARTIALLY_SUCCEEDED - Some files and folders could be moved, but not all. Partially succeeded will typically occur when files were being modified or unavailable while the move process was running.

  • FAILED - None of the files and folders could be moved.

To see the ongoing actions on data in the data overview (Projects > your_project > Data), you can add the ongoing actions column from the column list with the three column symbol at the top right, next to the filter funnel. You can also consult the data detail view for ongoing actions by clicking on the data in the overview.

Restrictions:

  • A total maximum of 1000 items can be moved in one operation. An item can be either a file or a folder. Folders with subfolders and subfiles still count as one item.

  • You can not move files and folders to a destination where one or more files or folders with the same name already exists.

  • You can not move data and folders to linked data.

  • You can not move a folder to itself.

  • You can not move data which is in the process of being moved.

  • You can not move data across regions.

  • You can not move data from externally-managed projects.

  • You can not move linked data.

  • You can not move data between regions.

  • You can not move externally managed data.

  • You can only move data when it has status available.

  • To move data across projects, it must be owned by the user's tenant.

  • If you do not select a target folder for Move Data To, the root folder of the target project is used.

If you are only able to select your source project as the target data project, this may indicate that data sharing (Projects > your_project > Project Settings > Details > Data Sharing) is not enabled for your project or that you do not have have upload rights in other projects.


Download Data

Single files can be downloaded directly from within the UI.

  • Select the checkbox next to the file which you want to download, followed by Download > Select Browser Download > Download.

  • You can also download files from their details screen. Click on the file name and select Download at the bottom of the screen. Depending on the size of your file, it may take some time to load the file contents.

Schedule for Download

You can trigger an asynchronous download via service connector using the Schedule for Download button with one or more files selected.

  1. Select a file or files to download.

  2. Select Download > Schedule download (for files or folders). This will display a list of all available connectors.

  3. Select a connector and optionally, enter your email address if you want to be notified of download completion, and then select Download.

If you do not have a connector, you can click the Don't have a connector yet? option to create a new connector. You must then install this new connector and return to the file selection in step 1 to use it.

You can view the progress of the download or stop the download on the Activity page for the project.


Export Project Data Information

The data records contained in a project can be exported in CSV, JSON, and excel format.

  1. Select one or more files to export.

  2. Select Export.

  3. Select the following export options:

    • To export only the selected file, select the Selected rows as the Rows to export option. To export all files on the page, select Current page.

    • To export only the columns present for the file, select the Visible columns as the Columns to export option.

  4. Select the export format.


Archiving and Deleting files

To manually archive or delete files, do as follows:

  1. Select the checkbox next to the file or files to delete or archive.

  2. Select Manage, and then select one of the following options:

    • Archive — Move the file or files to long-term storage (event code ICA_DATA_110).

    • Unarchive — Return the file or files from long-term storage. Unarchiving can take up to 48 hours, regardless of file size. Unarchived files can be used in analysis (event code ICA_DATA_114).

    • Delete — Remove the file completely (event code ICA_DATA_106).

When attempting concurrent archiving or unarchiving of the same file, a message will inform you to wait for the currently running (un)archiving to finish first.

To archive or delete files programmatically, you can use ICA's API endpoints:

  1. Modify the dates of the file to be deleted/archived.

Python Example

The Python snippet below exemplifies the approach: it sets (or updates if set already) the time to be archived for a specific file:

import requests
import json

from config import PROJECT_ID, DATA_ID, API_KEY

url_get="https://ica.illumina.com/ica/rest/api/projects/" + PROJECT_ID + "/data/" + DATA_ID

# set the API get headers
headers = {
            'X-API-Key': API_KEY,
            'accept': 'application/vnd.illumina.v3+json'
            }

# set the API put headers
headers_put = {
            'X-API-Key': API_KEY,
            'accept': 'application/vnd.illumina.v3+json',
            'Content-Type': 'application/vnd.illumina.v3+json'
            }

# Helper function to insert willBeArchivedAt after field named 'region'
def insert_after_region(details_dict, timestamp):
    new_dict = {}
    for k, v in details_dict.items():
        new_dict[k] = v
        if k == 'region':
            new_dict['willBeArchivedAt'] = timestamp
    if 'willBeArchivedAt' in details_dict:
        new_dict['willBeArchivedAt'] = timestamp
    return new_dict

# 1. Make the GET request
response = requests.get(url_get, headers=headers)
response_data = response.json()

# 2. Modify the JSON data
timestamp = "2024-01-26T12:00:04Z"  # Replace with the provided timestamp
response_data['data']['details'] = insert_after_region(response_data['data']['details'], timestamp)

# 3. Make the PUT request
put_response = requests.put(url_get, data=json.dumps(response_data), headers=headers_put)
print(put_response.status_code)

To delete a file at specific timepoint, the key 'willBeDeletedAt' should be added or changed using the API call. If running in the terminal, a successful run will finish with the message ‘200’. In the ICA UI, you can check the details of the file to see the updated values for ‘Time To Be Archived’ (willBeArchivedAt) or ‘Time To Be Deleted’ (willBeDeletedAt), as shown in the screenshot.


Link Project Data

Linking a folder creates a dynamic read-only view of the source data. You can use this to get access to data without running the risk of modifying the source material and to share data between projects. In addition, linking ensures changes to the source data are immediately visible and no additional storage is required.

You can recognise linked data by the green color and see the owning project as part of the details.

Since this is read-only access, you cannot perform actions on linked data that need to write access. Actions like (un)archiving, linking, creating, deleting, adding or moving data and folders, and copying data into the linked data are not possible.

Linking data is only possible from the root folder of your destination project. The action is disabled in project subfolders.

Linking a parent folder after linking a file or subfolder will unlink the file or subfolder and link the parent folder. So root\linked_subfolder will become root\linked_parentfolder\linked_subfolder.

Migrating snapshot linked data. (linked before ICA release v.2.29)

Before ICA version v.2.29, when data was linked, a snapshot was created of the file and folder structure. These links created a read-only view of the data as it was at the time of linking, but did not propagate changes to the file and folder structure. If you want to use the advantages of the new way of linking with dynamic updates, unlink the data and relink it. Since snapshot linking has been deprecated, all new data linking done in ICA v.2.29 or later has dynamic content updates.

Initial linking can take considerable time when there is a large amount of source data. However, once the initial link is made, updates to the source data will be instantaneous.

You can perform analysis on data from other projects by linking data from that project.

  1. Select Projects > your_project > Data > Manage, and then select Link.

  2. To view data by project, select the funnel symbol, and then select Owning Project. If you only know which project the data is linked to, you can choose to filter on linked projects.

  3. Select the checkbox next to the file or files to add.

  4. Select Select Data.

Your files are added to the Data page. To view the linked data file, select Add filter, and then select Links.

Display Owning Project

if you have selected multiple owning projects, you can add the owning project column to see which project owns the data.

  1. At the top of the screen, next to the filer icon, select the three columns.

  2. The Add/remove columns tab will appear.

  3. Choose Owning Project (or Linked Projects)

Linking Folders

If you link a folder instead of individual files, a warning is displayed indicating that, depending on the size of the folder, linking may take considerable time. The linking process will run in the background and the progress can be monitored on the Projects > your_project > activity > Batch Jobs screen.

To see more details, double-click the batch job.

To see how many individual files are already linked, double-click the item.


Unlinking Project Data

To unlink the data, go to the root level of your project and select the linked folder or if you have linked individual files separately, then you can select those linked files (limited to 100 at a time) and select Manage > Unlink. As during linking a folder, when unlinking, the progress can be monitored at Projects > your_project > activity > Batch Jobs.


Non-indexed Folders

The GUI considers non-indexed folders as a single object. You can access the contents from a non-indexed folder

  • as Analysis input/output

  • in Bench

  • via the API

Action
Allowed
Details

Creation

Yes

Deletion

Yes

You can delete non-indexed folders by selecting them at Projects > your_project > Data > select the folder > Manage > Delete. or with the /api​/projects​/{projectId}​/data/{dataId}:delete endpoint

Uploading Data

API Bench Analysis

Use non-indexed folders as normal folders for Analysis runs and bench. Different methods are available with the API such as creating temporary credentials to upload data to S3 or using /api/projects/{projectId}/data:createFileWithUploadUrl

Downloading Data

Yes

Use non-indexed folders as normal folders for Analysis runs and bench. Use temporary credentials to list and download data with the API.

Analysis Input/Output

Yes

Non-indexed files can be used as input for an analysis and the non-indexed folder can be used as output location. You will not be able to view the contents of the input and output in the analysis details screen.

Bench

Yes

Non-indexed folders can be used in Bench and the output from Bench can be written to non-indexed folders. Non-indexed folders are accessible across Bench workspaces within a project.

Viewing

No

The folder is a single object, you can not view the contents.

Linking

Yes

You cannot see non-indexed folder contents.

Copying

No

Prohibited to prevent storage issues.

Moving

No

Prohibited to prevent storage issues.

Managing tags

No

You cannot see non-indexed folder contents.

Managing format

No

You cannot see non-indexed folder contents.

Use as Reference Data

No

You cannot see non-indexed folder contents.

Activity

The Activity view shows the status and history of long-running activities including Data Transfers, Base Jobs, Base Activity, Bench Activity and Batch Jobs.

Data Transfers

The Data Transfers tab shows the status of data uploads and downloads. You can sort, search and filter on various criteria and export the information. Show ongoing transfers (top right) allows you to filter out the completed and failed transfers to focus on current activity.

Base Jobs

The Base Jobs tab gives an overview of all the actions related to a table or a query that have run or are running (e.g., Copy table, export table, Select * from table, etc.)

The jobs are shown with their:

  • Creation time: When did the job start

  • Description: The query or the performed action with some extra information

  • Type: Which action was taken

  • Status: Failed or succeeded

  • Duration: How long the job took

  • Billed bytes: The used bytes that need to be paid for

Failed jobs provide information on why the job failed. Details are accessed by double-clicking the failed job. Jobs in progress can be aborted here.

Base Activity

The Base Activity tab gives an overview of previous results (e.g., Executed query, Succeeded Exporting table, Created table, etc.) Collecting this information can take considerable time. For performance reasons, only the activity of the last month (rolling window) with a limit of 1000 records is shown and available for download as Excel or JSON. To get the data for the last year without limit on the number of records, use the export as file function. No activity data is retained for more than one year.

The activities are shown with:

  • Start Time: The moment the action was started

  • Query: The SQL expression.

  • Status: Failed or succeeded

  • Duration: How long the job took

  • User: The user that requested the action

  • Size: For SELECT queries, the size of the query results is shown. Queries resulting in less than 100Kb of data will be shown with a size of <100K

Bench Activity

The Bench Activity tab shows the actions taken on Bench Workspaces in the project.

The activities are shown with:

  • Workspace: Workspace where the activity took place

  • Date: Date and time of the activity

  • User: User who performed the activity

  • Action: Which activity was performed

Batch Jobs

The Batch Jobs tab allows users to monitor progress of Batch Jobs in the project. It lists Data Downloads, Sample Creation (double-click entries for details) and Data Linking (double-click entries for details). The (ongoing) Batch Job details are updated each time they are (re)opened, or when the refresh button is selected at the bottom of the details screen. Batch jobs which have a final state such as Failed or Succeeded are removed from the activity list after 7 days.

Which batch jobs are visible depends on the user role.

Project Creator

Project Collaborator same tenant

Project Collaborator different tenant

All batch jobs

All batch jobs

Only batch jobs of own tenant

Reference Data

To use a reference set from within a project, you have first to add it. From the project's page select Flow > Reference Data > Manage > +Add to project. Then select a reference set to add to your project. You can select the entire reference set, or click the arrow next to it to expand it. After expanding, scroll to the right, to see the individual reference files in the set. You can select individual reference files to add to your project, by checking the boxes next to them.

Note: Reference sets are only supported in Graphical CWL pipelines.

Copying Reference Data to other Regions

  1. Navigate to Reference Data (outside of Project context).

  2. Select the data set(s) you wish to add to another region and select Actions > Copy to another project.

  3. Select a project located in the region where you want to add your reference data.

  4. You can check in which region(s) Reference data is present by double-clicking on individual files in the Reference set and viewing Copy Details on the Data details tab.

  5. Allow a few minutes for new copies to become available before use.

Note: You only need one copy of each reference data set per region. Adding Reference Data sets to additional projects set in the same region does not result in extra copies, but creates links instead. This is done from inside the project at Projects > <your_project> > Flow > Reference Data > Manage > Add to project.

Creating a Pipeline with Reference Data

To create a pipeline with a reference data use the CWL - graphical mode (important restriction: as of now you cannot use reference data for pipelines created in advanced mode). Use the reference data icon instead of regular input icon. On the right hand side use the Reference files submenu to specify the name, the format, and the filters. You can specify the options for an end-user to choose from and a default selection. You can select more than 1 file, but you can only select 1 at a time (so, repeat process to select multiple reference files). If you only select 1 reference file, that file will be the only one users can use with your pipeline. In the screenshot a reference data with two options is presented.

If your pipeline was built to give users the option of choosing among multiple input reference files, they will see the option to select among the reference files you configured, under Settings.

After clicking the magnifying glass icon the user can select from provided options.

Flow

Flow provides tooling for building and running secondary analysis pipelines. The platform supports analysis workflows constructed using Common Workflow Language (CWL) and Nextflow. Each step of an analysis pipeline executes a containerized application using inputs passed into the pipeline or output from previous steps.

You can configure the following components in Illumina Connected Analytics Flow:

Pipelines

A Pipeline is a series of Tools with connected inputs and outputs configured to execute in a specific order.

Linking Existing Pipelines

Linking a pipeline (Projects > your_project > Flow > Pipelines > Link) adds that pipeline to your project. This is not as a copy, but as the actual pipeline, so any changes to the pipeline are atomatically propagated to and from any project which has this pipeline linked.

Activation codes are tokens which allow you to run your analyses and are used for accounting and allocating the appropriate resources. ICA will automatically determine the best matching activation code, but this can be overwritten if needed.

If you unlink a pipeline it removes the pipline from your project, but it remains part of the list of pipelines of your tenant, so it can be linked to other projects later on.

There is no way to permanently delete a pipeline.


Create a Pipeline

Pipelines are created and stored within projects.

  1. Navigate to Projects > your_project > Flow > Pipelines > +Create.

  2. Select Nextflow (XML / JSON) , CWL Graphical or CWL code (XML / JSON) to create a new Pipeline.

  3. Configure pipeline settings in the pipeline property tabs.

  4. When creating a graphical CWL pipeline, drag connectors to link tools to input and output files in the canvas. Required tool inputs are indicated by a yellow connector.

  5. Select Save.

Pipelines use the latest tool definition when the pipeline was last saved. Tool changes do not automatically propagate to the pipeline. In order to update the pipeline with the latest tool changes, edit the pipeline definition by removing the tool and re-adding it back to the pipeline.

Individual Pipeline files are limited to 20 Megabytes. If you need to add more than this, split your content over multiple files.

Pipeline Status

You can edit pipelines while they are in Draft or Release Candidate status. Once released, pipelines can no longer be edited.


Pipeline Properties

The following sections describe the properties that can be configured in each tab of the pipeline editor.

Depending on how you design the pipeline, the displayed tabs differ between the graphical and code definitions. For CWL you have a choice on how to define the pipeline, Nextflow is always defined in code mode.

Any additional source files related to your pipeline will be displayed here in alphabetical order.

See the following pages for language-specific details for defining pipelines:


Details

The details tab provides options for configuring basic information about the pipeline.

The following information becomes visible when viewing the pipeline details.

The clone action will be shown in the pipeline details at the top-right. Cloning a pipeline allows you to create modifications without impacting the original pipeline. When cloning a pipeline, you become the owner of the cloned pipeline. When you clone a pipeline, you must give it a unique name because no duplicate names are allowed within all projects of the tenant. So the name must be unique per tenant. It is possible that you see the same pipeline name twice when a pipeline linked from another tenant is cloned with that same name in your tenant. The name is then still unique per tenant, but you will see them both in your tenant.

When you clone a Nextflow pipeline, a verification of the configured Nextflow version is done to prevent the use of deprecated versions.

Documentation

The Documentation tab provides is the place where you explain how your pipeline works to users. The description appears in the tool repository but is excluded from exported CWL definitions. If no documentation has been provided, this tab will be empty.

Definition (Graphical)

When using graphical mode for the pipeline definition, the Definition tab provides options for configuring the pipeline using a visualization panel and a list of component menus.

In graphical mode, you can drag and drop inputs into the visualization panel to connect them to the tools. Make sure to connect the input icons to the tool before editing the input details in the component menu. Required tool inputs are indicated by a yellow connector.

XML Configuration / JSON Inputform Files (Code)

This page is used to specify all relevant information about the pipeline parameters.

There is a limit of 200 reports per report pattern which will be shown when you have multiple reports matching your regular expression.

Compute Resources

Compute Nodes

For each process defined by the workflow, ICA will launch a compute node to execute the process.

  • For each compute type, the standard (default - AWS on-demand) or economy (AWS spot instance) tiers can be selected.

  • When selecting an fpga instance type for running analyses on ICA, it is recommended to use the medium size. While the large size offers slight performance benefits, these do not proportionately justify the associated cost increase for most use cases.

  • When no type is specified, the default type of compute node is standard-small.

By default, compute nodes have no scratch space. This is an advanced setting and should only be used when absolutely necessary as it will incur additional costs and may offer only limited performance benefits because it is not local to the compute node.

For simplicity and better integration, consider using shared storage available at /ces. It is what is provided in the Small/Medium/Large+ compute types. This shared storage is used when writing files with relative paths.

Scratch space

If you do require scratch space via a Nextflow pod annotation or a CWL resource requirement, the path is /scratch.

  • For Nextflow pod annotation: 'volumes.illumina.com/scratchSize', value: '1TiB' will reserve 1 TiB.

  • For CWL, adding - class: ResourceRequirement tmpdirMin: 5000 to your requirements section will reserve 5000 MiB for CWL.

Avoid the following as it does not align with ICAv2 scratch space configuration.

  • Container overlay tmp path: /tmp

  • Legacy paths: /ephemeral

  • Environment Variables ($TMPDIR, $TEMP and $TMP)

  • Bash Command mktemp

  • CWL runtime.tmpdir

Compute Types

Daemon sets and system processes consume approximately 1 CPU and 2 GB Memory from the base values shown in the table. Consumption will vary based on the activity of the pod.

(2) The compute type himem-xlarge has low availability.

(3) The compute type fpga-large is only available in the US (use1) region. This compute type is not recommended as it suffers from low availability and offers little performance benefit over fpga-medium at significant additional cost.

(4) The transfer size selected is based on the selected storage size for compute type and used during upload and download system tasks.

Analysis Report (Graphical)

The pipeline analysis report appears in the pipeline execution results. The report is configured from widgets added to the Analysis Report tab in the pipeline editor.

  1. [Optional] Import widgets from another pipeline.

    1. Select Import from other pipeline.

    2. Select the pipeline that contains the report you want to copy.

    3. Select an import option: Replace current report or Append to current report.

    4. Select Import.

  2. From the Analysis Report tab, select Add widget, and then select a widget type.

  3. Configure widget details.

  4. Select Save.

By default, compute nodes have no scratch space. This is an advanced setting and should only be used when absolutely necessary as it will incur additional costs and offers only limited performance benefits because it is not local to the compute node.

For better integration, use shared storage available at /ces. This shared storage is used when writing files with relative paths.

Daemon sets and system processes consume approximately 1 CPU and 2 GiB Memory from the base values shown in the table. Consumption will vary based on the activity of the pod.

(*2) The compute type himem-xlarge has low availability.

Free Text Placeholders

The following placeholders can be used to insert project data.

Nextflow/CWL Files (Code)

Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list. The following formats are supported:

  • DIFF (.diff)

  • GROOVY (.groovy .nf)

  • JAVASCRIPT (.js .javascript)

  • JSON (.json)

  • SH (.sh)

  • SQL (.sql)

  • TXT (.txt)

  • XML (.xml)

  • YAML (.yaml .cwl)

If the file type is not recognized, it will default to text display. This can result in the application interpreting binary files as text when trying to display the contents.

Main.nf (Nextflow code)

The Nextflow project main script.

Nextflow.config (Nextflow code)

The Nextflow configuration settings.

Workflow.cwl (CWL code)

The Common Workflow Language main script.

Adding Files

Multiple files can be added by selecting the +Create option at the bottom of the screen to make pipelines more modular and manageable.

Metadata Model

Report

Here patterns for detecting report files in the analysis output can be defined. On opening an analysis result window of this pipeline, an additional tab will display these report files. The goal is to provide a pipeline-specific user-friendly representation of the analysis result.

To add a report select the + symbol on the left side. Provide your report with a unique name, a regular expression matching the report and optionally, select the format of the report. This must be the source format of the report data generated during the analysis.

There is a limit of 20 reports per report pattern which will be shown when you have multiple reports matching your regular expression.


Start a New Analysis

Use the following instructions to start a new analysis for a single pipeline.

  1. Select Projects > your_project > Flow > Pipelines.

  2. Select the pipeline or pipeline details of the pipeline you want to run.

  3. Select Start Analysis.

  4. Select Start Analysis.

  5. View the analysis status on the Analyses page.

    • Requested—The analysis is scheduled to begin.

    • In Progress—The analysis is in progress.

    • Succeeded—The analysis is complete.

    • Failed —The analysis has failed.

    • Aborted — The analysis was aborted before completing.

  6. To end an analysis, select Abort.

  7. To perform a completed analysis again, select Re-run.

Analysis Tab

The Analysis tab provides options for configuring basic information about the analysis.

Aborting Analyses

You can abort a running analysis from either the analysis overview (Projects > your_project > Flow > Analyses > your_analysis > Manage > Abort) or from the analysis details (Projects > your_project > Flow > Analyses > your_analysis > Details tab > Abort).

View Analysis Results

You can view analysis results on the Analyses page or in the output_folder on the Data page.

  1. Select a project, and then select the Flow > Analyses page.

  2. Select an analysis.

  3. On the Details tab, select the square symbol right of the output files.

  4. From the output files view, expand the list and select an output file.

    1. If you want to add or remove any user or technical tags, you can do so from the data details view.

    2. If you want to download the file, select Schedule download.

  5. To preview the file, select the View tab.

  6. Return to Flow > Analyses > your_analysis.

  7. View additional analysis result information on the following tabs:

    • Details - View information on the pipeline configuration.

    • Steps - stderr and stdout information

    • Nextflow Timeline - Nextflow process execution timeline.

    • Nextflow Execution - Nextflow analysis report. Showing the run times, commands, resource usage and tasks for Nextflow analyses.

CWL

Compute Type

To specify a compute type for a CWL CommandLineTool, use the ResourceRequirement with a custom namespace.

For example, take the following ResourceRequirements:

This would result in a best fit of standard-large ICA Compute Type request for the tool.

  • If the specified requirements can not be met by any of the presets, the task will be rejected and failed.

  • FPGA requirements can not be set by means of CWL ResourceRequirements.

  • The Machine Profile Resource in the graphical editor will override whatever is set for requirements in the ResourceRequirement.

Considerations

If no Docker image is specified, Ubuntu will be used as default. Both : and / can be used as separator.

CWL Overrides

In ICA you can provide the "override" recipes as a part of the input JSON. The following example uses CWL overrides to change the environment variable requirement at load time.

XML Input Form

Pipelines defined using the "Code" mode require either an XML-based or JSON-based input form to define the fields shown on the launch view in the user interface (UI). The XML-based input form is defined in the "XML Configuration" tab of the pipeline editing view.

The input form XML must adhere to the input form schema.

Empty Form

During the creation of a Nextflow pipeline the user is given an empty form to fill out.

Files

The input files are specified within a single DataInputs node. An individual input is then specified in a separate DataInput node. A DataInput node contains following attributes:

  • code: an unique id. Required.

  • format: specifying the format of the input: FASTA, TXT, JSON, UNKNOWN, etc. Multiple entries are possible: example below. Required.

  • type: is it a FILE or a DIRECTORY? Multiple entries are not allowed. Required.

  • required: is this input required for the execution of a pipeline? Required.

  • multiValue: are multiple files as an input allowed? Required.

  • dataFilter: TBD. Optional.

Additionally, DataInput has two elements: label for labelling the input and description for a free text description of the input.

Single file input

An example of a single file input which can be in a TXT, CSV, or FASTA format.

Folder as an input

To use a folder as an input the following form is required:

Multiple files as an input

For multiple files, set the attribute multiValue to true. This will make it so the variable is considered to be of type list [], so adapt your pipeline when changing from single value to multiValue.

Settings

Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc. The following hierarchy of nodes must be followed: steps > step > tool > parameter. The parameter node must contain following attributes:

  • code: unique id. This is the parameter name that is passed to the workflow

  • minValues: how many values (at least) should be specified for this setting. If this setting is required, minValues should be set to 1.

  • maxValues: how many values (at most) should be specified for this setting

  • classification: is this setting specified by the user?

In the code below a string setting with the identifier inp1 is specified.

Examples of the following types of settings are shown in the subsequent sections. Within each type, the value tag can be used to denote a default value in the UI, or can be left blank to have no default. Note that setting a default value has no impact on analyses launched via the API.

Integers

For an integer setting the following schema with an element integerType is to be used. To define an allowed range use the attributes minimumValue and maximumValue.

Options

Options types can be used to designate options from a drop-down list in the UI. The selected option will be passed to the workflow as a string. This currently has no impact when launching from the API, however.

Option types can also be used to specify a boolean, for example

Strings

For a string setting the following schema with an element stringType is to be used.

Booleans

For a boolean setting, booleanType can be used.

Limitations

One known limitation of the schema presented above is the inability to specify a parameter that can be multiple type, e.g. File or String. One way to implement this requirement would be to define two optional parameters: one for File input and the second for String input. At the moment ICA UI doesn't validate whether at least one of these parameters is populated - this check can be done within the pipeline itself.

Below one can find both a main.nf and XML configuration of a generic pipeline with two optional inputs. One can use it as a template to address similar issues. If the file parameter is set, it will be used. If the str parameter is set but file is not, the str parameter will be used. If neither of both is used, the pipeline aborts with an informative error message.

Tool Repository

A Tool is the definition of a containerized application with defined inputs, outputs, and execution environment details including compute resources required, environment variables, command line arguments, and more.

Create a Tool

Tools define the inputs, parameters, and outputs for the analysis. Tools are available for use in graphical CWL pipelines by any project in the account.

  1. Select System Settings > Tool Repository > + Create.

  2. Configure tool settings in the tool properties tabs. See Tool Properties.

  3. Select Save.

Tool Properties

The following sections describe the tool properties that can be configured in each tab.

Details Tab

Tool Status

The release status of the tool. can be one of "Draft", "Release Candidate", "Released" or "Deprecated". The Building and Build Failed options are set by the application and not during configuration.

General Tab

The General provides options to configure the basic command line.

The Hints/Requirements include CWL features to indicate capabilities expected in the Tool's execution environment.

  • Inline Javascript

    • The Tool contains a property with a JavaScript expression to resolve it's value.

  • Initial workdir

    • The workdir can be any of the following types:

      • String or Expression — A string or JavaScript expression, eg, $(inputs.InputFASTA)

      • File or Dir — A map of one or more files or directories, in the following format: {type: array, items: [File, Directory]}

      • Dirent — A script in the working directory. The Entry name field specifies the file name.

  • Scatter feature — Indicates that the workflow platform must support the scatter and scatterMethod fields.

Arguments Tab

The Arguments tab provides options to configure base command parameters that do not require user input.

Tool arguments may be one of two types:

  • String or Expression — A literal string or JavaScript expression, eg --format=bam.

  • Binding — An argument constructed from the binding of an input parameter.

The following table describes the argument input fields.

Example

Inputs Tab

The Inputs tab provides options to define the input files and folders for the tool. The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.

Settings Tab

The Settings tab provides options to define parameters that can be set at the time of execution. The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.

Outputs Tab

The Outputs tab provides options to define the parameters of output files.

The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.

Edit a Tool

  1. From the System Settings > Tool Repository page, select a tool.

  2. Select Edit.

Update Tool Status

  1. From the System Settings > Tool Repository page, select a tool.

  2. Select the Information tab.

  3. From the Status drop-down menu, select a status.

  4. Select Save.

Creating definitions without the wizard

In addition to the interactive Tool builder, the platform GUI also supports working directly with the raw definition on the right hand side of the screen when developing a new Tool. This provides the ability to write the Tool definition manually or bring an existing Tool's definition to the platform.

Be careful when editing the raw tool definition as this can introduce errors.

A simple example CWL Tool definition is provided below.

After pasting into the editor, the definition is parsed and the other tabs for visually editing the Tool will populate according to the definition contents.

Creating Your First Tool - Tips and Tricks

  • General Tool - includes your base command and various optional configurations.

    • The base command is required for your tool to run, e.g. python /path/to/script.py such that python and /path/to/script.py are added in separate lines.

    • Inline Javascript requirement - must be enabled if you are using Javascript anywhere in your tool definition.

    • Initial workdir requirement - Dirent Type

      • Your tool must point to a script that executes your analysis. That script can either be provided in your Docker image or using a Dirent. Defining a script via Dirent allows you to dynamically modify your script without updating your Docker image. In order to define your Dirent script define your script name under Entry name (e.g. runner.sh) and the script content under Entry. Then, point your base command to that custom script, e.g. bash runner.sh.

The difference between Settings and Arguments: Settings are exposed at the pipeline level with the ability to get modified at launch, while Arguments are intended to be immutable and hidden from users launching the pipeline.

  • How to reference your tool inputs and settings throughout the tool definition?

    • You can either reference your inputs using their position or ID.

      • Settings can be referenced using their defined IDs, e.g. $(inputs.InputSetting)

      • All inputs can also be referenced using their position, e.g. bash script.sh $1 $2

Nextflow

In order to run Nextflow pipelines, the following process-level attributes within the Nextflow definition must be considered.

System Information

(*) Pipelines will still run when 20.10.0 will be deprecated, but you will no longer be able to choose it when creating new pipelines.

Nextflow Version

You can select the Nextflow version while building a pipeline as follows:

Compute Node

For each compute type, you can choose between the scheduler.illumina.com/lifecycle: standard (default - AWS on-demand) or scheduler.illumina.com/lifecycle: economy (AWS spot instance) tiers.

Compute Type

Inputs

Outputs

Nextflow version 20.10.10 (Deprecated)

For Nextflow version 20.10.10 on ICA, using the "copy" method in the publishDir directive for uploading output files that consume large amounts of storage may cause workflow runs to complete with missing files. The underlying issue is that file uploads may silently fail (without any error messages) during the publishDir process due to insufficient disk space, resulting in incomplete output delivery.

Solutions:

Nextflow Configuration

Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list.

If no Docker image is specified, Ubuntu will be used as default.

The following configuration settings will be ignored if provided as they are overridden by the system:

Projects

Introduction

When looking at the main ICA navigation, you will see the following structure:

  • Projects are your primary work locations which contain your data and tools to execute your analyses. Projects can be considered as a binder for your work and information. You can have data contained within a project, or you can choose to make it shareable between projects.

  • Reference Data are reference genome sets which you use to help look for deviations and to compare your data against.

  • Bundles are packages of assets such as sample data, pipelines, tools and templates which you can use as a curated data set. Bundles can be provided both by Illumina and other providers, and you can even create your own bundles. You will find the Illumina-provided pipelines in bundles.

  • Audit/Event Logs are used for audit purposes and issue resolving.

  • System Settings contain general information susch as the location of storage space, docker images and tool repositories.

Projects are the main dividers in ICA. They provide an access-controlled boundary for organizing and sharing resources created in the platform. The Projects view is used to manage projects within the current tenant.

There is a combined limit of 30,000 projects and bundles per tenant.

Create new Project

To create a new project, click the Projects > + Create button.

Required fields include:

  • Name

    • 1-255 characters

    • Must begin with a letter

    • Characters are limited to alphanumerics, hyphens, underscores, and spaces

  • Project Owner Owner (and usually contact person) of the project. The project owner has the same rights as a project administrator, but can not be removed from a project without first assigning another project owner. This can be done by the current project owner, the tenant administrator or a project administrator of the current project. Reassignment is done at Projects > your_project > Project Settings > Team > Edit.

  • Region Select your project location. Options available are based on Entitlement(s) associated with purchased subscription.

  • Analysis Priority (Low/Medium(default)/High) This is balanced per tenant with high priority analyses started first and the system progressing to the next lower priority once all higher priority analyses are running. Balance your priorities so that lower priority projects do not remain waiting for resources indefinitely.

  • Billing Mode Select if the costs of this project are to be charged to the tenant of the Project owner or the tenant of the user who is using the project.

  • Storage Bundle (auto-selected based on user selection of Project Location)

Click the Save button to finish creating the project. The project will be visible from the Projects view.

Create with Storage Configuration

During project creation, select the I want to manage my own storage checkbox to use a Storage Configuration as the data provider for the project.

With a storage configuration set, a project will have a 2-way sync with the external cloud storage provider: any data added directly to the external storage will be sync'ed into the ICA project data, and any data added to the project will be sync'ed into the external cloud storage.

Managing Projects

Several tools are available to assist you with keeping an overview of your projects. These filters work in both list and tile view and persist across sessions.

  1. Searching is a case-insensitive wildcard filter. Any project which contains the characters will be shown. Use * as wildcard in searches. Be aware that operators without search words are blocked and will result in Unexpected error occurred when searching for projects. You can use the brackets, AND, OR and NOT operators, provided that you do not start the search with them (Monkey AND Banana is allowed, AND Aardvark by itself is invalid syntax)

  2. Filter by Workgroup : Projects in ICA can be accessible for different workgroups. This drop-down list allows you to filter projects for specific workgroups. To reset the filter so it displays projects from all your workgroups, use the x on the right which appears when a workgroup is selected.

  3. Hidden projects : You can hide projects (Projects > your_project > Details > Hide) which you no longer use. Hiding will delete data in base and bench and will thus be irreversible.

    • You can still see hidden projects if you select this option and delete the data they contain at Projects > your_project > Data to save on storage costs.

    • If you are using your own S3 bucket, your S3 storage will be unlinked from the project, but the data will remain in your S3 storage. Your S3 storage can then be used for other projects.

  4. Favorites : By clicking on the star next to the project name in the tile view, you set a project as favourite. You can have multiple favourites and use the Favourites checkbox to only show those favourites. This prevents having too many projects visible.

  5. Tile view shows a grid of projects. This view is best suited if you only have a few projects or have filtered them out by creating favourites. A single click will open the project.

  6. List view shows a list of projects. This view allows you to add additional filters on name, description, location, user role, tenant, size and analyses. A double-click is required to open the project.

Items which are shown in list view have an Export option at the bottom of the screen. You can choose to support the entire page or only the selected rows in CSV, JSON or Excel format.

If you are missing Projects

Externally-managed projects

Illumina software applications which do their own data management on ICA (such as BSSH) store their resources and data in a project much in the same was as manually created projects work in ICA. For ICA, these projects are considered to be externally-managed projects and from ICA, there are a number of restrictions on which actions are allowed on externally-managed projects. For example, you can not delete or move externally-managed data. This is to prevent inconsistencies when these applications want to access their own project data.

When you create a folder with a name which already exists as externally-managed folder, your project will have that folder twice. Once ICA-managed and once externally-managed as S3 does not require unique folder names.

You can keep track of which files are externally controlled and which are ICA-managed by means of the “managed by” column, visible in the data list view of externally-managed projects at Projects > your_project > Data.

Projects are indicated as externally-managed in the projects overview screen by a project card with a light grey accent and a lock symbol followed by "managed by app".

Tutorial

Sharing

Storage

A storage configuration provides ICA with information to connect to an external cloud storage provider, such as AWS S3. The storage configuration validates that the information provided is correct, and then continuously monitors the integration.

Refer to the following pages for instructions to setup supported external cloud storage providers:

Credentials

The storage configuration requires credentials to connect to your storage. AWS uses the security credentials to authenticate and authorize your requests. On the System Settings > Credentials > Create, you can enter these credentials. Long-term access keys consist of a combination of the access key ID and secret access key as a set.

Fill out the following fields:

  • Type—The type of access credentials. This will usually be AWS user.

  • Name—Provide a name to easily identify your access key.

  • Access key ID—The access key you created.

  • Secret access key—Your related secret access key.

You can share the credentials you own with other users of your tenant. To do so select your credentials at System Settings > Credentials and choose Share.

Create a Storage Configuration

  1. In the ICA main navigation, select System Settings > Storage > Create.

  2. Configure the following settings for the storage configuration.

    • Type—Use the default value, eg, AWS_S3. Do not change.

    • Region—Select the region where the bucket is located.

    • Configuration name—You will use this name when creating volumes that reside in the bucket. The name length must be in between 3 and 63 characters.

    • Description—Here you can provide a description for yourself or other users to identify this storage configuration.

    • Bucket name—Enter the name of your S3 bucket.

    • Key prefix [Optional]—You can provide a key prefix to allow only files inside the prefix to be accessible. The key prefix must end with "/".

    • If a key prefix is specified, your projects will only have access to that folder and subfolders. For example, using the key prefix folder-1/ ensures that only the data from the folder-1 folder in your S3 bucket is synced with your ICA project. Using prefixes and distinct folders for each ICA project is the recommended configuration as it allows you to use the same S3 bucket for different projects.

    • Using no key prefix results in syncing all data in your S3 bucket (starting from root level) with your ICA project. Your project will have access to your entire S3 bucket, which prevents that S3 bucket from being used for other ICA projects. Although possible, this configuration is not recommended.

    • Secret—Select the credentials to associate with this storage configuration. These were created on the Credentials tab.

    • Server Side Encryption [Optional]—If needed, you can enter the algorithm and key name for server-side encryption processes.

  3. Select Save.

With the action Set as default for region, you select which storage will be used as default storage in a region for new projects of your tenant. Only one storage can be default at a time for a region, so selecting a new storage as default will unselect the previous default. If you do not want to have a default, you can select the default storage and the action will become Unset as default for region.

The System Settings > Credentials > Share action is used to make the storage available to everyone in your tenant. By default, storage is private per user so that you have complete control over the contents. Once you decide you want to share the storage, simply select it and use the Share action. Do take into account that once shared, you can not unshare the storage. Once your storage is used in a project, it can also no longer be deleted.

Filenames beginning with / are not allowed, so be careful when entering full path names. Otherwise the file will end up on S3 but not be visible in ICA. If this happens, access your S3 storage directly and copy the data to where it was intended. If you are using an Illumina-managed S3 storage, submit a support request to delete the erroneous data.

Storage Configuration Verification

Every 4 hours, ICA will verify the storage configuration and credentials to ensure availability. When an error is detected, ICA will attempt to reconnect once every 15 minutes. After 200 consecutively failed connection attempts (50 hours), ICA will stop trying to connect.

When you update your credentials, the storage configuration is automatically validated. In addition, you can manually trigger revalidation when ICA has stopped trying to connect by selecting the storage and then clicking Validate on the System Settings > Storage > Manage.

Supported Storage Classes

See

See

See

See

This section describes how to connect an AWS S3 Bucket with enabled. General instructions for configuring your AWS account to allow ICA to connect to an S3 bucket are found on .

Follow the for how to create S3 bucket with SSE-KMS key.

S3-SSE-KMS must be in the same region as your ICA v2.0 project. See the for more information.

If you do not have an existing customer managed key, click Create a KMS key and follow from AWS.

Once the bucket is set, create a folder with encryption enabled in the bucket that will be linked in the ICA storage configuration. This folder will be connected to ICA as a . Although it is technically possible to use the root folder, this is not recommended as it will cause the S3 bucket to no longer be available for other projects.

Follow the for connecting an S3 bucket to ICA.

In the step :

Follow the for how to create a storage configuration in ICA.

In addition to following the instructions to , the KMS policy must include the following statement for AWS S3 Bucket with SSE-KMS Encyption (refer to the Role ARN table from the linked page for the ASSUME_ROLE_ARN value):

ICA supports UTF-8 characters in file and folder names for data. Please follow the guidelines detailed below. (For more information about recommended approaches to file naming that can be applicable across platforms, please refer to the .)

Folders cannot be renamed after they have been created. To rename a folder, you will need to create a new folder with the desired name, move the contents from the original folder into the new one, and then delete the original folder. Please see section for more information.

See the list of supported

Data privacy should be carefully considered when adding data in ICA, either through storage configurations (ie, AWS S3) or ICA data upload. Be aware that when adding data from cloud storage providers by creating a storage configuration, ICA will provide access to the data. Ensure the storage configuration source settings are correct and ensure uploads do not include unintended data in order to avoid unintentional privacy breaches. More guidance can be found in the .

See

Uploads via the UI are limited to 5TB and no more than 100 concurrent files at a time, but for practical and performance reasons, it is recommended to use the CLI or when uploading large amounts of data.

For instructions on uploading/downloading data via CLI, see .

Copying data from your own S3 storage requires additional configuration. See and ..

This partial move may cause data at the destination to become unsynchronized between the object store (S3) and ICA. To resolve this, users can create a folder session on the parent folder of the destination directory by following the steps in the API: and then . Ensure the Move job is already aborted before submitting the folder session create and complete requests. Wait for the session status t

Note: You can create a new folder to move data to by filling in the "New folder name (optional)" field. This does NOT rename an existing folder. To rename an existing folder, please see .

the file's information.

the updated information back in ICA.

Non-indexed folders () are designed for optimal performance in situations where no file actions are needed. They serve as fast storage in situations like temporary analysis file storage where you don't need access or searches via the GUI to individual files or subfolders within the folder. Think of a non-indexed folder as a data container. You can access the container which contains all the data, but you can not access the individual data files within the container from the GUI. As non-indexed folders contain data, they count towards your total project storage.

You can create non-indexed folders at Projects > your_project > Data > Manage > Create non-indexed folder. or with the /api​/projects​/{projectId}​/data:createNonIndexedFolder

Transfers with a yellow background indicate that rules have been modified in ways that prevent planned files from being uploaded. Please verify your service connectors to resolve this.

Reference Data — Reference Data for Graphical CWL flows. See .

Pipelines — One or more tools configured to process input data and generate output files. See .

Analyses — Launched instance of a pipeline with selected input data. See .

You can link a pipeline if it is not already linked to your project and it is from your tenant or available in your or activation code.

Field
Entry
Field
Entry
Menu
Description
Widget
Settings
Placeholder
Description

See

Configure .

Field
Entry

Report - Shows the reports defined on the tab.

ICA supports running pipelines defined using .

Reference for available compute types and sizes.

The ICA Compute Type will be determined automatically based on coresMin/coresMax (CPU) and ramMin/ramMax (Memory) values using a "best fit" strategy to meet the minimum specified requirements (refer to the table).

ICA supports overriding workflow requirements at load time using Command Line Interface (CLI) with JSON input. Please refer to for more details on the CWL overrides feature.

Refer to the for further explanation about many of the properties described below. Not all features described in the specification are supported.

Field
Entry
Status
Description
Field
Entry
Field
Entry
Type
Field
Value
Field
Entry
Field
Entry
Field
Entry

File/Folder inputs can be referenced using their defined IDs, followed by the desired field, e.g. $(inputs.InputFile.path). For additional information please refer to the .

ICA supports running pipelines defined using . See for an example.

Info
Details
interface

To specify a compute type for a Nextflow process, use the within each process. Set the annotation to scheduler.illumina.com/presetSize and the value to the desired compute type. A list of available compute types can be found . The default compute type, when this directive is not specified, is standard-small (2 CPUs and 8 GB of memory).

Often, there is a need to select the compute size for a process dynamically based on user input and other factors. The Kubernetes executor used on ICA does not use the cpu and memorydirectives, so instead, you can dynamically set the pod directive, as mentioned . e.g.

Additionally, it can also be specified in the . Example configuration file:

Inputs are specified via the or JSON-based input form. The specified code in the XML will correspond to the field in the params object that is available in the workflow. Refer to the for an example.

Outputs for Nextflow pipelines are uploaded from the out folder in the attached shared filesystem. The can be used to symlink (recommended), copy or move data to the correct folder. Data will be uploaded to the ICA project after the pipeline execution completes.

Use "" instead of "copy" in the publishDir directive. Symlinking creates a link to the original file rather than copying it, which doesn’t consume additional disk space. This can prevent the issue of silent file upload failures due to disk space limitations.

Use Nextflow 22.04.0 or later and enable the "" publishDir option. This option ensures that the workflow will fail and provide an error message if there's an issue with publishing files, rather than completing silently without all expected outputs.

During execution, the Nextflow pipeline runner determines the environment settings based on values passed via the command-line or via a configuration file (see ). When creating a Nextflow pipeline, use the nextflow.config tab in the UI (or API) to specify a nextflow configuration file to be used when launching the pipeline.

On the project creation screen, add information to create a project. See page for information about each field.

Refer to the documentation for details on creating a storage configuration.

Hiding projects is not possible for projects.

If you are missing projects, especially those been created by other users, the workgroup filter might still be active. Clear the filter with the x to the right. You can verify the list of projects to which you have access with the icav2 projects list.

What you can do is add and data such as to externally managed projects. Separation of data is ensured by only allowing additional files at the root level or in dedicated subfolders which you can create in your projects. Data which you have added can be moved and deleted again.

You can add to externally managed projects, provided those bundles do not come with additional restrictions for the project.

You can start workspaces in externally-managed projects. The resulting data will be stored in the externally-managed project.

Tertiary modules such as are not supported for externally-managed projects.

Externally-managed projects protect their notification subscriptions to ensure no user can delete them. It is possible to add your own subscriptions to externally-managed projects, see for more information.

For a better understanding of how all components of ICA work, try the .

You can share links to your project and content within projects to people who have to it. Sharing is done by copying the URL from your browser. This URL contains both the filters and the sort options which you have applied.

For more information, refer to the documentation.

ICA performs a series of steps in the background to verify the connection to your bucket. This can take several minutes. You may need to manually refresh the list to verify that the bucket was successfully configured. Once the storage configuration setup is complete, the configuration can be used while .

Refer to this for the troubleshooting guide.

ICA supports the following storage classes. Please see the for more information on each:

Object Class
ICA Status

If you are using , which allows S3 to automatically move files into different cost-effective storage tiers, please do NOT include the Archive and Deep Archive Access tiers, as these are not supported by ICA yet. Instead, you can use lifecycle rules to automatically move files to Archive after 90 days and Deep Archive after 180 days. Lifecycle rules are supported for user-managed buckets.

Filtering

To add filters, select the funnel/filter symbol at the top right, next to the search field.

Filters are reset when you exit the current screen.

Sorting

To sort data, select the three vertical dots in the column header on which you want to sort and chose ascending or descending.

Sorting is retained when you exit the current screen.

Displaying Columns

To change which columns are displayed, select the three columns symbol and select which columns should be shown.

You can keep track of which files are externally controlled and which are ICA-managed by means of the “managed by” column.

The displayed columns are retained when you exit the current screen.

Replace

Overwrites the existing data. Folders will copy their data in an existing folder with existing files. Existing files will be replaced when a file with the same name is copied and new files will be added. The remaining files in the target folder will remain unchanged.

Don't copy

The original files are kept. If you selected a folder, files that do not yet exist in the destination folder are added to it. Files that already exist at the destination are not copied over and the originals are kept.

Keep both

Files have a number appended to them if they already exist. If you copy folders, the folders are merged, with new files added to the destination folder and original files kept. New files with the same name get copied over into the folder with a number appended.

SSE-KMS Encryption
this page
AWS instructions
ICA S3 bucket documentation
these steps
general instructions
"Create AWS IAM policy"
general instructions
Enable Cross Account Copy
AWS S3 documentation
Data Formats
ICA Security and Compliance section
Data Integrity
Service connector
CLI Data Transfer
Connect AWS S3 Bucket
SSE-KMS Encryption
Create Folder Session
Complete Folder Session
GET
PUT
service connector
Reference Data
Pipelines
Analyses
Conflicting bucket notifications
Conflicting bucket notifications
Conflicting bucket notifications
Conflicting bucket notifications
prefix
Move Data
File/Folder Naming
Status

Draft

Description

Fully editable draft.

Status

Release Candidate

Description

The pipeline is ready for release. Editing is locked but the pipeline can be cloned (top right in the details view) to create a new version.

Status

Released

Description

The pipeline is released. To release a pipeline, all tools of that pipeline must also be in released status. Editing a released pipeline is not possible, but the pipeline can be cloned (top right in the details view) to create a new editable version.

CWL Graphical

  • Details

  • Documentation

  • Definition

  • Analysis Report

  • Metadata Model

  • Report

CWL Code

  • Details

  • Documentation

  • Inputform files (JSON) or XML Configuration (XML)

  • CWL Files

  • Metadata Model

  • Report

Nextflow Code

  • Details

  • Documentation

  • Inputform Files (JSON) or XML Configuration (XML)

  • Nextflow files

  • Metadata Model

  • Report

ID

Unique Identifier of the pipeline.

URN

Identification of the pipeline in Uniform Resource Name

Compute Type

CPUs

Mem (GiB)

Nextflow (pod.value)

CWL (type, size)

standard-small

2

8

standard-small

standard, small

standard-medium

4

16

standard-medium

standard, medium

standard-large

8

32

standard-large

standard, large

standard-xlarge

16

64

standard-xlarge

standard, xlarge

standard-2xlarge

32

128

standard-2xlarge

standard, 2xlarge

standard-3xlarge

64

256

standard-3xlarge

standard, 3xlarge

hicpu-small

16

32

hicpu-small

hicpu, small

hicpu-medium

36

72

hicpu-medium

hicpu, medium

hicpu-large

72

144

hicpu-large

hicpu, large

himem-small

8

64

himem-small

himem, small

himem-medium

16

128

himem-medium

himem, medium

himem-large

48

384

himem-large

himem, large

himem-xlarge2

92

700

himem-xlarge

himem, xlarge

hiio-small

2

16

hiio-small

hiio, small

hiio-medium

4

32

hiio-medium

hiio, medium

fpga2-medium1

24

256

fpga2-medium

fpga2,medium

fpga-medium

16

244

fpga-medium

fpga, medium

fpga-large3

64

976

fpga-large

fpga, large

transfer-small4

4

10

transfer-small

transfer, small

transfer-medium 4

8

15

transfer-medium

transfer, medium

transfer-large4

16

30

transfer-large

transfer, large

Compute Type

CPUs

Mem (GiB)

Nextflow (pod.value)

CWL (type, size)

standard-small

2

8

standard-small

standard, small

standard-medium

4

16

standard-medium

standard, medium

standard-large

8

32

standard-large

standard, large

standard-xlarge

16

64

standard-xlarge

standard, xlarge

standard-2xlarge

32

128

standard-2xlarge

standard, 2xlarge

hicpu-small

16

32

hicpu-small

hicpu, small

hicpu-medium

36

72

hicpu-medium

hicpu, medium

hicpu-large

72

144

hicpu-large

hicpu, large

himem-small

8

64

himem-small

himem, small

himem-medium

16

128

himem-medium

himem, medium

himem-large

48

384

himem-large

himem, large

himem-xlarge2

92

700

himem-xlarge

himem, xlarge

hiio-small

2

16

hiio-small

hiio, small

hiio-medium

4

32

hiio-medium

hiio, medium

fpga2-medium1

24

256

fpga2-medium

fpga2,medium

fpga-medium

16

244

fpga-medium

fpga, medium

fpga-large3

64

976

fpga-large

fpga, large

transfer-small4

4

10

transfer-small

transfer, small

transfer-medium4

8

15

transfer-medium

transfer, medium

transfer-large4

16

30

transfer-large

transfer, large

[[BB_PROJECT_NAME]]

The project name.

[[BB_PROJECT_OWNER]]

The project owner.

[[BB_PROJECT_DESCRIPTION]]

The project short description.

[[BB_PROJECT_INFORMATION]]

The project information.

[[BB_PROJECT_LOCATION]]

The project location.

[[BB_PROJECT_BILLING_MODE]]

The project billing mode.

[[BB_PROJECT_DATA_SHARING]]

The project data sharing settings.

[[BB_REFERENCE]]

The analysis reference.

[[BB_USERREFERENCE]]

The user analysis reference.

[[BB_PIPELINE]]

The name of the pipeline.

[[BB_USER_OPTIONS]]

The analysis user options.

[[BB_TECH_OPTIONS]]

The analysis technical options. Technical options include the TECH suffix and are not visible to end users.

[[BB_ALL_OPTIONS]]

All analysis options. Technical options include the TECH suffix and are not visible to end users.

[[BB_SAMPLE]]

The sample.

[[BB_REQUEST_DATE]]

The analysis request date.

[[BB_START_DATE]]

The analysis start date.

[[BB_DURATION]]

The analysis duration.

[[BB_REQUESTOR]]

The user requesting analysis execution.

[[BB_RUNSTATUS]]

The status of the analysis.

[[BB_ENTITLEMENTDETAIL]]

The used entitlement detail.

[[BB_METADATA:path]]

The value or list of values of a metadata field or multi-value fields.

User Reference

The unique analysis name.

User tags

One or more tags used to filter the analysis list. Select from existing tags or type a new tag name in the field.

Pricing

Select a subscription to which the analysis will be charged.

Notification

Enter your email address if you want to be notified when the analysis completes.

Output Folder

Select a folder in which the output folder of the analysis should be located. When no folder is selected, the output folder will be located in the root of the project. When you open the folder selection dialog, you have the option to create a new folder (bottom of the screen).

Input

Select the input files to use in the analysis. (max. 50,000)

Settings

Provide input settings.

requirements:
    ResourceRequirement:
        https://platform.illumina.com/rdf/ica/resources:type: fpga
        https://platform.illumina.com/rdf/ica/resources:size: small 
        https://platform.illumina.com/rdf/ica/resources:tier: standard
requirements:
    ResourceRequirement:
      ramMin: 10240
      coresMin: 6
icav2 projectpipelines start cwl cli-tutorial --data-id fil.a725a68301ee4e6ad28908da12510c25 --input-json '{
  "ipFQ": {
    "class": "File",
    "path": "test.fastq"
  },
  "cwltool:overrides": {
  "tool-fqTOfa.cwl": {
    "requirements": {
      "EnvVarRequirement": {
        "envDef": {
          "MESSAGE": "override_value"
          }
        }                                       
       }
      }
    }
}' --type-input JSON --user-reference overrides-example
<pipeline code="" version="1.0" xmlns="xsd://www.illumina.com/ica/cp/pipelinedefinition">
    <dataInputs>
    </dataInputs>
    <steps>
    </steps>
</pipeline>
        <pd:dataInput code="in" format="TXT, CSV, FASTA" type="FILE" required="true" multiValue="false">
            <pd:label>Input file</pd:label>
            <pd:description>Input file can be either in TXT, CSV or FASTA format.</pd:description>
        </pd:dataInput>
    <pd:dataInput code="fastq_folder" format="UNKNOWN" type="DIRECTORY" required="false" multiValue="false">
         <pd:label>fastq folder path</pd:label>
        <pd:description>Providing Fastq folder</pd:description>
    </pd:dataInput>
<pd:dataInput code="tumor_fastqs" format="FASTQ" type="FILE" required="false" multiValue="true">
    <pd:label>Tumor FASTQs</pd:label>
    <pd:description>Tumor FASTQ files to be provided as input. FASTQ files must have "_LXXX" in its filename to denote the lane and "_RX" to denote the read number. If either is omitted, lane 1 and read 1 will be used in the FASTQ list. The tool will automatically write a FASTQ list from all files provided and process each sample in batch in tumor-only mode. However, for tumor-normal mode, only one sample each can be provided.
    </pd:description>
</pd:dataInput>
    <pd:steps>
        <pd:step execution="MANDATORY" code="General">
            <pd:label>General</pd:label>
            <pd:description>General parameters</pd:description>
            <pd:tool code="generalparameters">
                <pd:label>generalparameters</pd:label>
                <pd:description></pd:description>
                <pd:parameter code="inp1" minValues="1" maxValues="3" classification="USER">
                    <pd:label>inp1</pd:label>
                    <pd:description>first</pd:description>
                    <pd:stringType/>
                    <pd:value></pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>
<pd:parameter code="ht_seed_len" minValues="0" maxValues="1" classification="USER">
    <pd:label>Seed Length</pd:label>
    <pd:description>Initial length in nucleotides of seeds from the reference genome to populate into the hash table. Consult the DRAGEN manual for recommended lengths. Corresponds to DRAGEN argument --ht-seed-len.
    </pd:description>
    <pd:integerType minimumValue="10" maximumValue="50"/>
    <pd:value>21</pd:value>
</pd:parameter>
<pd:parameter code="cnv_segmentation_mode" minValues="0" maxValues="1" classification="USER">
    <pd:label>Segmentation Algorithm</pd:label>
    <pd:description> DRAGEN implements multiple segmentation algorithms, including the following algorithms, Circular Binary Segmentation (CBS) and Shifting Level Models (SLM).
    </pd:description>
    <pd:optionsType>
        <option>CBS</option>
        <option>SLM</option>
        <option>HSLM</option>
        <option>ASLM</option>
    </pd:optionsType>
    <pd:value>false</pd:value>
</pd:parameter>
<pd:parameter code="output_format" minValues="1" maxValues="1" classification="USER">
    <pd:label>Map/Align Output</pd:label>
    <pd:description></pd:description>
    <pd:optionsType>
        <pd:option>BAM</pd:option>
        <pd:option>CRAM</pd:option>
    </pd:optionsType>
    <pd:value>BAM</pd:value>
</pd:parameter>
<pd:parameter code="output_file_prefix" minValues="1" maxValues="1" classification="USER">
    <pd:label>Output File Prefix</pd:label>
    <pd:description></pd:description>
    <pd:stringType/>
    <pd:value>tumor</pd:value>
</pd:parameter>
<pd:parameter code="quick_qc" minValues="0" maxValues="1" classification="USER">
    <pd:label>quick_qc</pd:label>
    <pd:description></pd:description>
    <pd:booleanType/>
    <pd:value></pd:value>
</pd:parameter>
nextflow.enable.dsl = 2

// Define parameters with default values
params.file = false
params.str = false

// Check that at least one of the parameters is specified
if (!params.file && !params.str) {
    error "You must specify at least one input: --file or --str"
}

process printInputs {
    
    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'

    input:
    file(input_file)

    script:
    """
    echo "File contents:"
    cat $input_file
    """
}

process printInputs2 {

    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'

    input:
    val(input_str)

    script:
    """
    echo "String input: $input_str"
    """
}

workflow {
    if (params.file) {
        file_ch = Channel.fromPath(params.file)
        file_ch.view()
        str_ch = Channel.empty()
        printInputs(file_ch)
    }
    else {
        file_ch = Channel.empty()
        str_ch = Channel.of(params.str)
        str_ch.view()
        file_ch.view()
        printInputs2(str_ch)
    } 
}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
    <pd:dataInputs>
        <pd:dataInput code="file" format="TXT" type="FILE" required="false" multiValue="false">
            <pd:label>in</pd:label>
            <pd:description>Generic file input</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps>
        <pd:step execution="MANDATORY" code="general">
            <pd:label>General Options</pd:label>
            <pd:description locked="false"></pd:description>
            <pd:tool code="general">
                <pd:label locked="false"></pd:label>
                <pd:description locked="false"></pd:description>
                <pd:parameter code="str" minValues="0" maxValues="1" classification="USER">
                    <pd:label>String</pd:label>
                    <pd:description></pd:description>
                    <pd:stringType/>
                    <pd:value>string</pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>
</pd:pipeline>

Draft

Fully editable draft.

Release Candidate

The tool is ready for release. Editing is locked but the tool can be cloned to create a new version.

Released

The tool is released. Tools in this state cannot be edited. Editing is locked but the tool can be cloned to create a new version.

Deprecated

The tool is no longer intended for use in pipelines. but there are no restrictions placed on the tool. That is, it can still be added to new pipelines and will continue to work in existing pipelines. It is merely an indication to the user that the tool should no longer be used.

ID

CWL identifier field

CWL version

The CWL version in use. This field cannot be changed.

Base command

Components of the command. Each argument must be added in a separate line.

Standard out

The name of the file that captures Standard Out (STDOUT) stream information.

Standard error

The name of the file that captures Standard Error (STDERR) stream information.

Requirements

The requirements for triggering an error message. (see below)

Hints

The requirements for triggering a warning message. (see below)

Value

The literal string to be added to the base command.

String or expression

Position

The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added.

Binding

Prefix

The string prefix.

Binding

Item separator

The separator that is used between array values.

Binding

Value from

The source string or JavaScript expression.

Binding

Separate

The setting to require the Prefix and Value from fields to be added as separate or combined arguments. Tru indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument.

Binding

Shell quote

The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually.

Binding

Prefix

--output-filename

Value from

$(inputs.inputSAM.nameroot).bam

Input file

/tmp/storage/SRR45678_sorted.sam

Output file

SRR45678_sorted.bam

ID

The file ID.

Label

A short description of the input.

Description

A long description of the input.

Type

The input type, which can be either a file or a directory.

Input options

Optional indicates the input is optional. Multi value indicates there is more than one input file or directory. Streamable indicates the file is read or written sequentially without seeking.

Secondary files

The required secondary files or directories.

Format

The input file format.

Position

The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added.

Prefix

The string prefix.

Item separator

The separator that is used between array values.

Value from

The source string or JavaScript expression.

Load contents

The setting to require the Prefix and Value from fields to be added as separate or combined arguments. True indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument.

Separate

The setting to require the Prefix and Value from fields to be added as separate or combined arguments. True indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument.

Shell quote

The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually.

ID

The file ID.

Label

A short description of the input.

Description

A long description of the input.

Type

The input type, which can be Boolean, Int, Long, Float, Double or String.

Default Value

The default value to use if the tool setting is not available.

Input options

Optional indicates the input is optional. Multi value indicates there can be more than one value for the input.

Position

The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added.

Prefix

The string prefix.

Item separator

The separator that is used between array values.

Value from

The source string or JavaScript expression.

Separate

The setting to require the Prefix and Value from fields to be added as separate or combined arguments. True indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument.

Shell quote

The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually.

ID

The file ID.

Label

A short description of the input.

Description

A long description of the input.

Type

The input type, which can be either a file or a directory.

Output options

Optional indicates the input is optional. Multi value indicates here is more than one input file or directory. Streamable indicates the file is read or written sequentially without seeking.

Secondary files

The required secondary files or folders.

Format

The input file format.

Globs

The pattern for searching file names.

Load contents

Automatically loads some contents. The system extracts up to the first 64 KiB of text from the file. Populates the contents field with the first 64 KiB of text from the file.

Output eval

Evaluate an expression to generate the output value.

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
label: echo
inputs:
  message:
    type: string
    default: testMessage
    inputBinding:
      position: 1
outputs:
  echoout:
    type: stdout
baseCommand:
- echo

Nextflow version

20.10.0 (deprecated *), 22.04.3, 24.10.2 (Experimental)

Executor

Kubernetes

GUI

Select the Nextflow version at Projects > your_project > flow > pipelines > your_pipeline > Details tab.

API

Select the Nextflow version by setting it in the optional field "pipelineLanguageVersionId". When not set, a default Nextflow version will be used for the pipeline.

pod annotation: 'scheduler.illumina.com/presetSize', value: 'fpga-medium'
process foo {
    // Assuming that params.compute_size is set to a valid size such as 'standard-small', 'standard-medium', etc.
    pod annotation: 'scheduler.illumina.com/presetSize', value: "${params.compute_size}"
}
// Set the default pod
pod = [
    annotation: 'scheduler.illumina.com/presetSize',
    value     : 'standard-small'
]

withName: 'big_memory_process' {
    pod = [
        annotation: 'scheduler.illumina.com/presetSize',
        value     : 'himem-large'
    ]
}

// Use an FPGA instance for dragen processes
withLabel: 'dragen' {
    pod = [
        annotation: 'scheduler.illumina.com/presetSize',
        value     : 'fpga-medium'
    ]
}
publishDir 'out', mode: 'symlink'
executor.name
executor.queueSize
k8s.namespace
k8s.serviceAccount
k8s.launchDir
k8s.projectDir
k8s.workDir
k8s.storageClaimName
k8s.storageMountPath
trace.enabled
trace.file
trace.fields
timeline.enabled
timeline.file
report.enabled
report.file
dag.enabled
dag.file
externally managed projects

S3 Standard

Available

S3 Intelligent-Tiering

Available

S3 Express One Zone

Available

S3 Standard-IA

Available

S3 One Zone-IA

Available

S3 Glacier Instant Retrieval

Available

S3 Glacier Flexible Retrieval

Archived

S3 Glacier Deep Archive

Archived

Reduced redundancy (not recommended)

Available

Create a Storage Configuration
Connect an AWS S3 Bucket with SSE-KMS Encryption Enabled

InputForm.json Syntax

{
  "$id": "#ica-pipeline-input-form",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "ICA Pipeline Input Forms",
  "description": "Describes the syntax for defining input setting forms for ICA pipelines",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "fields": {
      "description": "The list of setting fields",
      "type": "array",
      "items": {
        "$ref": "#/definitions/ica_pipeline_input_form_field"
      }
    }
  },
  "required": [
    "fields"
  ],
  "definitions": {
    "ica_pipeline_input_form_field": {
      "$id": "#ica_pipeline_input_form_field",
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "id": {
          "description": "The unique identifier for this field. Will be available with this key to the pipeline script.",
          "type": "string",
          "pattern": "^[a-zA-Z-0-9\\-_\\.\\s\\+\\[\\]]+$"
        },
        "type": {
          "type": "string",
          "enum": [
            "textbox",
            "checkbox",
            "radio",
            "select",
            "number",
            "integer",
            "data",
            "section",
            "text",
            "fieldgroup"
          ]
        },
        "label": {
          "type": "string"
        },
        "minValues": {
          "description": "The minimal amount of values that needs to be present. Default is 0 when not provided. Set to >=1 to make the field required.",
          "type": "integer",
          "minimum": 0
        },
        "maxValues": {
          "description": "The maximal amount of values that needs to be present. Default is 1 when not provided.",
          "type": "integer",
          "exclusiveMinimum": 0
        },
        "minMaxValuesMessage": {
          "description": "The error message displayed when minValues or maxValues is not adhered to. When not provided a default message is generated.",
          "type": "string"
        },
        "helpText": {
          "type": "string"
        },
        "placeHolderText": {
          "description": "An optional short hint (a word or short phrase) to aid the user when the field has no value."
          "type": "string"
        },
        "value": {
         "description": "The value for the field. Can be an array for multi-value fields. 
          For 'number' type values the exponent needs to be between -300 and +300 and max precision is 15. 
          For 'integer' type values the value needs to between -100000000000000000 and 100000000000000000"
         },
        "minLength": {
          "type": "integer",
          "minimum": 0
        },
        "maxLength": {
          "type": "integer",
          "exclusiveMinimum": 0
        },
        "min": {
          "description": "Minimal allowed value for 'integer' and 'number' type. Exponent needs to be between -300 and +300 and max precision is 15.",
          "type": "number"
        },
        "max": {
          "description": "Maximal allowed value for 'integer' and 'number' type. Exponent needs to be between -300 and +300 and max precision is 15.",
          "type": "number"
        },
        "choices": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/ica_pipeline_input_form_field_choice"
          }
        },
        "fields": {
          "description": "The list of setting sub fields for type fieldgroup",
          "type": "array",
          "items": {
            "$ref": "#/definitions/ica_pipeline_input_form_field"
          }
        },
        "dataFilter": {
          "description": "For defining the filtering when type is 'data'.",
          "type": "object",
          "additionalProperties": false,
          "properties": {
            "nameFilter": {
              "description": "Optional data filename filter pattern that input files need to adhere to when type is 'data'. Eg parts of the expected filename",
              "type": "string"
            },
            "dataFormat": {
              "description": "Optional dataformat name array that input files need to adhere to when type is 'data'",
              "type": "array",
              "contains": {
                "type": "string"
              }
            },
            "dataType": {
              "description": "Optional data type (file or directory) that input files need to adhere to when type is 'data'",
              "type": "string",
              "enum": [
                "file",
                "directory"
              ]
            }
          }
        },
        "regex": {
          "type": "string"
        },
        "regexErrorMessage": {
          "type": "string"
        },
        "hidden": {
          "type": "boolean"
        },
        "disabled": {
          "type": "boolean"
        },
        "emptyValuesAllowed": {
          "type": "boolean",
          "description": "When maxValues is greater than 1 and emptyValuesAllowed is true, the values may contain null entries. Default is false."
        },
        "updateRenderOnChange": {
          "type": "boolean",
          "description": "When true, the onRender javascript function is triggered ech time the user changes the value of this field. Default is false."
        }
        "streamable": {
          "type": "boolean",
          "description": "EXPERIMENTAL PARAMETER! Only possible for fields of type 'data'. When true, the data input files will be offered in streaming mode to the pipeline instead of downloading them."
        },
      },
      "required": [
        "id",
        "type"
      ],
      "allOf": [
        {
          "if": {
            "description": "When type is 'textbox' then 'dataFilter', 'fields', 'choices', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "textbox"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "choices",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'checkbox' then 'dataFilter', 'fields', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "checkbox"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "choices",
                  "placeHolderText",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'radio' then 'dataFilter', 'fields', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "radio"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "placeHolderText",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'select' then 'dataFilter', 'fields', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "select"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'number' or 'integer' then 'dataFilter', 'fields', 'choices', 'regex', 'regexErrorMessage', 'maxLength' and 'minLength' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "number",
                  "integer"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "fields",
                  "choices",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'data' then 'dataFilter' is required and 'fields', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "data"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "required": [
              "dataFilter"
            ],
            "propertyNames": {
              "not": {
                "enum": [
                  "fields",
                  "choices",
                  "placeHolderText",
                  "regex",
                  "regexErrorMessage",
                  "max",
                  "min",
                  "maxLength",
                  "minLength"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'section' or 'text' then 'disabled', 'fields', 'updateRenderOnChange', 'classification', 'value', 'minValues', 'maxValues', 'minMaxValuesMessage', 'dataFilter', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "section",
                  "text"
                ]
              }
            },
            "required": [
              "type"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "disabled",
                  "fields",
                  "updateRenderOnChange",
                  "classification",
                  "value",
                  "minValues",
                  "maxValues",
                  "minMaxValuesMessage",
                  "dataFilter",
                  "choices",
                  "regex",
                  "placeHolderText",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        },
        {
          "if": {
            "description": "When type is 'fieldgroup' then 'fields' is required and then 'dataFilter', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
            "properties": {
              "type": {
                "enum": [
                  "fieldgroup"
                ]
              }
            },
            "required": [
              "type",
              "fields"
            ]
          },
          "then": {
            "propertyNames": {
              "not": {
                "enum": [
                  "dataFilter",
                  "choices",
                  "placeHolderText",
                  "regex",
                  "regexErrorMessage",
                  "maxLength",
                  "minLength",
                  "max",
                  "min"
                ]
              }
            }
          }
        }
      ]
    },
    "ica_pipeline_input_form_field_choice": {
      "$id": "#ica_pipeline_input_form_field_choice",
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "value": {
        "description": "The value which will be set when selecting this choice. Must be unique over the choices within a field",
        },
        "text": {
          "description": "The display text for this choice, similar as the label of a field. ",
          "type": "string"
        },
        "selected": {
          "description": "Optional. When true, this choice value is picked as default selected value. 
          As in selected=true has precedence over an eventual set field 'value'. 
          For clarity it's better however not to use 'selected' but use field 'value' as is used to set default values for the other field types. 
          Only maximum 1 choice may have selected true.",
          "type": "boolean"
        },
        "disabled": {
          "type": "boolean"
        },
        "parent": {
          "description": "Value of the parent choice item. Can be used to build hierarchical choice trees."
        }
      },
      "required": [
        "value",
        "text"
      ]
    }
  }
}
endpoint
bundle
Nextflow
CWL
Metadata Models
Common Workflow Language (CWL)
CWL documentation
CWL CommandLineTool Specification
File CWL documentation
Nextflow
this tutorial
here
configuration file
XML input form
tutorial
publishDir directive
symlink
failOnError
Nextflow Configuration documentation
Project Details
Storage Configuration
files
samples
bundles
bench
cohorts
notifications
end-to-end tutorial
access
Connect AWS S3 Bucket
AWS security credentials
page
AWS documentation
Intelligent Tiering
analysis settings
pipeline report
Compute Types
CWL ResourceRequirement
Compute Types
pod directive
here
externally-managed
creating a new project

Snowflake

User

Every Base user has 1 snowflake username: ICA_U_<id>

User/Project-Bundle

For each user/project-bundle combination a role is created: ICA_UR_<id>_<name project/bundle>__<id>

This role receives the viewer or contributor role of the project/bundle, depending on their permissions in ICA.

Roles

Every project or bundle has a dedicated Snowflake database.

For each database, 2 roles are created:

  • <project/bundle name>_<id>_VIEWER

  • <project/bundle name>_<id>_CONTRIBUTOR

Project viewer role

This role receives

  • REFERENCE and SELECT rights on the tables/views within the project's PUBLIC schema.

  • Grants on the viewer roles of the bundles linked to the project.

Project contributor role

This role receives the following rights on current an future objects in the project's/bundle database in the PUBLIC schema:

  • ownership

  • select, insert, update, delete, truncate and references on tables/views/materialized views

  • usage on sequences/functions/procedures/file formats

  • write, read and usage on stages

  • select on streams

  • monitor and operate on tasks

It also receives grant on the viewer role of the project.

Warehouses

For each project (not bundle!) 2 warehouses are created, whose size can be changed ICA at projects > your_project > project settings > details.

  • <projectname>_<id>_QUERY

  • <projectname>_<id>_LOAD

Using Load instead of Query warehouse

When you generates an oauth token, ICA always uses the QUERY warehouse by default (see bold part below):

snowsql -a iap.us-east-1 -u ICA_U_277853 --authenticator=oauth -r ICA_UR_274853_603465_264891 -d atestbase2_264891 -s PUBLIC -w ATESTBASE2_264891_QUERY --token=<token>

If you wish to use the LOAD warehouse in a session, you have 2 options :

  1. Change the name in the connect string : snowsql -a iapdev.us-east-1 -u ICA_U_277853 --authenticator=oauth -r ICA_UR_277853_603465_264891 -d atestbase2_264891 -s PUBLIC -w ATESTBASE2_264891_LOAD ``--token=<token>

  2. Execute the following statement after logging in : “use warehouse ATESTBASE2_264891_LOAD”

To determine which warehouse you are using, execute : select current_warehouse();

Code

The name of the pipeline. The name must be unique within the tenant, including linked and unlinked pipelines.

Nextflow Version

User selectable Nextflow version available only for Nextflow pipelines

Categories

One or more tags to categorize the pipeline. Select from existing tags or type a new tag name in the field.

Description

A short description of the pipeline.

Proprietary

Hide the pipeline scripts and details from users who do not belong to the tenant who owns the pipeline. This also prevents cloning the pipeline.

Status

Storage size

Family

A group of pipeline versions. To specify a family, select Change, and then select a pipeline or pipeline family. To change the order of the pipeline, select Up or Down. The first pipeline listed is the default and the remainder of the pipelines are listed as Other versions. The current pipeline appears in the list as this pipeline.

Version comment

A description of changes in the updated version.

Links

External reference links. (max 100 chars as name and 2048 chars as link)

Machine profiles

Shared settings

Settings for pipelines used in more than one tool.

Reference files

Descriptions of reference files used in the pipeline.

Input files

Descriptions of input files used in the pipeline.

Output files

Descriptions of output files used in the pipeline.

Tool

Details about the tool selected in the visualization panel.

Tool repository

A list of tools available to be used in the pipeline.

Title

Add and format title text.

Analysis details

Add heading text and select the analysis metadata details to display.

Free text

Inline viewer

Add options to view the content of an analysis output file.

Analysis comments

Add comments that can be edited after an analysis has been performed.

Input details

Add heading text and select the input details to display. The widget includes an option to group details by input name.

Project details

Add heading text and select the project details to display.

Page break

Add a page break widget where page breaks should appear between report sections.

Name

The name of the tool.

Description

Free text description for information purposes.

Icon

The icon for the tool.

Status

Docker image

The registered Docker image for the tool.

Categories

One or more tags to categorize the tool. Select from existing tags or type a new tag name in the field.

Tool version

The version of the tool specified by the end user. Could be any string.

Release version

The version number of the tool.

Version comment

A description of changes in the updated version.

Links

External reference links.

Documentation

The Documentation field provides options for configuring the HTML description for the tool. The description appears in the Tool Repository but is excluded from exported CWL definitions.

Tips and Tricks

Developing on the cloud incurs inherent runtime costs due to compute and storage used to execute workflows. Here are a few tips that can facilitate development.

  • withName: 'process1|process2|process3' { scratch = '/scratch/' }
    withName: 'process3' { stageInMode = 'copy' } // Copy the input files to scratch instead of symlinking to shared network disk
  • For scenarios in which instances are terminated prematurely (for example, while using spot instances) without warning, you can implement scripts like the following to retry the job a certain number of times. Adding the following script to 'nextflow.config' enables five retries for each job, with increasing delays between each try.

    process {
        maxRetries = 4
        errorStrategy = { sleep(task.attempt * 60000 as long); return'retry'} // Retry with increasing delay
    }

    Note: Adding the retry script where it is not needed might introduce additional delays.

  • When publishing your Nextflow pipeline, make sure your have defined a container such as 'public.ecr.aws/lts/ubuntu:22.04' and are not using the default container 'ubuntu:latest'.

  • To limit potential costs, there is a timeout of 96 hours: if the analysis does not complete within four days, it will go to a 'Failed' state. This time begins to count as soon as the input data is being downloaded. This takes place during the ICA 'Requested' step of the analysis, before going to 'In Progress'. In case parallel tasks are executed, running time is counted once. As an example, let's assume the initial period before being picked up for execution is 10 minutes and consists of the request, queueing and initializing. Then, the data download takes 20 minutes. Next, a task runs on a single node for 25 minutes, followed by 10 minutes of queue time. Finally, three tasks execute simultaneously, each of them taking 25, 28, and 30 minutes, respectively. Upon completion, this is followed by uploading the outputs for one minute. The overall analysis time is then 20 + 25 + 10 + 30 (as the longest task out of three) + 1 = 86 minutes:

Analysis task

request

queued

initializing

input download

single task

queue

parallel tasks

generating outputs

completed

96 hour limit

1m (not counted)

7m (not counted)

2m (not counted)

20m

25m

10m

30m

1m

-

Status in ICA

status requested

status queued

status initializing

status preparing inputs

status in progress

status in progress

status in progress

status generating outputs

status succeeded

If there are no available resources or your project priority is low, the time before download commences will be substantially longer.

  • By default, Nextflow will not generate the trace report. If you want to enable generating the report, add the section below to your userNextflow.config file.

trace.enabled = true
trace.file = '.ica/user/trace-report.txt'
trace.fields = 'task_id,hash,native_id,process,tag,name,status,exit,module,container,cpus,time,disk,memory,attempt,submit,start,complete,duration,realtime,queue,%cpu,%mem,rss,vmem,peak_rss,peak_vmem,rchar,wchar,syscr,syscw,read_bytes,write_bytes,vol_ctxt,inv_ctxt,env,workdir,script,scratch,error_action'
  1. Useful Links

Base

Introduction to Base

Base is a genomics data aggregation and knowledge management solution suite. It is a secure and scalable integrated genomics data analysis solution which provides Information management and knowledge mining. Users are able to analyze, aggregate and query data for new insights that can inform and improve diagnostic assay development, clinical trials, patient testing and patient care. For this, all clinically relevant data generated from routine clinical testing needs to be extracted and clinical questions need to be asked across all data and information sources. As a large data store, Base provides a secure and compliant environment to accumulate data, allowing for efficient exploration of the aggregated data. This data consists of test results, patient data, metadata, reference data, consent and QC data.

Base User Personas and Use Cases

Base can be used by different user personas supporting different use cases:

  • Clinical and Academic Researchers:

    • Big data storage solution housing all aggregated sample test outcomes

    • Analyze information by way of a convenient query formalism

    • Look for signals in combined phenotypic and genotypic data

    • Analyze QC patterns over large cohorts of patients

    • Securely share (sub)sets of data with other scientists

    • Generate reports and analyze trends in a straightforward and simple manner.

  • Bioinformaticians:

    • Access, consult, audit, and query all relevant data and QC information for tests run

    • All accumulated data and accessible pipelines can be used to investigate and improve bioinformatics for clinical analysis

    • Metadata is captured via automatic pipeline version tracking, including information on individual tools and/or reference files used during processing for each sample analyzed, information on the duration of the pipeline, the execution path of the different analytical steps, or in case of failure, exit codes can be warehoused.

  • Product Developers and Service Providers:

    • Better understand the efficiency of kits and tests

    • Analyze usage, understand QC data trends, improve products

    • Store and aggregate business intelligence data such as lab identification, consumption patterns and frequency, as well as allow renderings of test result outcome trends and much more.

Base Action Possibilities

  • Data Warehouse Creation: Build a relational database for your Project in which desired data sets can be selected and aggregated. Typical data sets include pipeline output metrics and other suitable data files generated by the ICA platform which can be complemented by additional public (or privately built) databases.

  • Report and Export: Once created, a data warehouse can be mined using standard database query instructions. All Base data is stored in a structured and easily accessible way. An interface allows for the selection of specific datasets and conditional reporting. All queries can be stored, shared, and re-used in the future. This type of standard functionality supports most expected basic mining operations, such as variant frequency aggregation. All result sets can be downloaded or exported in various standard data formats for integration in other reporting or analytical applications.

  • Detect Signals and Patterns: extensive and detailed selection of subsets of patients or samples adhering to any imaginable set of conditions is possible. Users can, for example, group and list subjects based on a combination of (several) specific genetic variants in combination with patient characteristics such as therapeutic (outcome) information. The built-in integration with public datasets allows users to retrieve all relevant publications, or clinically significant information for a single individual or a group of samples with a specific variant. Virtually any possible combination of stored sample and patient information allow for detecting signals and patterns by a simple single query on the big data set.

  • Profile/Cluster patients: use and re-analyze patient cohort information based on specific sample or individual characteristics. For instance, they might want to run a next agile iteration of clinical trials with only patients that respond. Through integrated and structured consent information allowing for time-boxed use, combined with the capability to group subjects by the use of a simple query, patients can be stratified and combined to export all relevant individuals with their genotypic and phenotypic information to be used for further research.

  • Share your data: Data sharing is subject to strict ethical and regulatory requirements. Base provides built-in functionality to securely share (sub)sets of your aggregated data with third parties. All data access can be monitored and audited, in this way Base data can be shared with people in and outside of an organization in a compliant and controlled fashion.

Access

Base is a module that can be found in a project. It is shown in the menu bar of the project.

To access Base:

  • On the domain level, Base needs to be included in the subscription

  • On the project level, the project owner needs to enable Base

  • On the user level, the project administrator needs to enable workgroups to access the Base pages

Permission to Enable Base

The access to activate the Base module is controlled based upon the chosen subscription (full and premium subscriptions give access to Base) when registering the account. This will all happen automatically after the first user logs into the system for that account. So from the moment the account is up and running, the Base module will also be ready to be enabled.

Enable Base

When a user has created a project, they can go to the Base pages and click the Enable button. From that moment on, every user who has the proper permissions has access to the Base module in that project.

Only the project owner can enable Illumina Connected Analytics Base. Make sure that your subscription for the domain includes Base.

  1. Navigate to Projects > your_project > Base > Tables / Query / Schedule.

  2. Select Enable

Access Base pages

Access to the projects and all modules located within the project is provided via the Team page within the project.

Activity

JSON Scatter Gather Pipeline

Pay close attention to uppercase and lowercase characters when creating pipelines.

Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create > Nextflow > JSON based button to start creating a Nextflow pipeline.

In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.

Nextflow files

split.nf

First, we present the individual processes. Select Nextflow files > + Create and label the file split.nf. Copy and paste the following definition.

process split {
    cpus 1
    memory '512 MB'
    
    input:
    path x
    
    output:
    path("split.*.tsv")
    
    """
    split -a10 -d -l3 --numeric-suffixes=1 --additional-suffix .tsv ${x} split.
    """
    }

sort.nf

Next, select +Create and name the file sort.nf. Copy and paste the following definition.

process sort {
    cpus 1
    memory '512 MB'
    
    input:
    path x
    
    output:
    path '*.sorted.tsv'
    
    """
    sort -gk1,1 $x > ${x.baseName}.sorted.tsv
    """
}

merge.nf

Select +Create again and label the file merge.nf. Copy and paste the following definition.

process merge {
  cpus 1
  memory '512 MB'
 
  publishDir 'out', mode: 'move'
 
  input:
  path x
 
  output:
  path 'merged.tsv'
 
  """
  cat $x > merged.tsv
  """
}

main.nf

Edit the main.nf file by navigating to the Nextflow files > main.nf tab and copying and pasting the following definition.

nextflow.enable.dsl=2
 
include { sort } from './sort.nf'
include { split } from './split.nf'
include { merge } from './merge.nf'
 
params.myinput = "test.test"
 
workflow {
    input_ch = Channel.fromPath(params.myinput)
    split(input_ch)
    sort(split.out.flatten())
    merge(sort.out.collect())
}

Here, the operators flatten and collect are used to transform the emitting channels. The Flatten operator transforms a channel in such a way that every item of type Collection or Array is flattened so that each single entry is emitted separately by the resulting channel. The collect operator collects all the items emitted by a channel to a List and return the resulting object as a sole emission.

Inputform files

On the Inputform files tab, edit the inputForm.json to allow selection of a file.

inputForm.json

{
  "fields": [
    {
      "id": "myinput",
      "label": "myinput",
      "type": "data",
      "dataFilter": {
        "dataType": "file",
        "dataFormat": ["TSV"]
      },
      "maxValues": 1,
      "minValues": 1
    }
  ]
}

Click the Simulate button (at the bottom of the text editor) to preview the launch form fields.

The onSubmit.js and onRender.js can remain with their default scripts and are just shown here for reference.

onSubmit.js

function onSubmit(input) {
    var validationErrors = [];

    return {
        'settings': input.settings,
        'validationErrors': validationErrors
    };
}

onRender.js

function onRender(input) {

    var validationErrors = [];
    var validationWarnings = [];

    if (input.currentAnalysisSettings === null) {
        //null first time, to use it in the remainder of he javascript
        input.currentAnalysisSettings = input.analysisSettings;
    }

    switch(input.context) {
        case 'Initial': {
            renderInitial(input, validationErrors, validationWarnings);
            break;
        }
        case 'FieldChanged': {
            renderFieldChanged(input, validationErrors, validationWarnings);
            break;
        }
        case 'Edited': {
            renderEdited(input, validationErrors, validationWarnings);
            break;
        }
        default:
            return {};
    }

    return {
        'analysisSettings': input.currentAnalysisSettings,
        'settingValues': input.settingValues,
        'validationErrors': validationErrors,
        'validationWarnings': validationWarnings
    };
}

function renderInitial(input, validationErrors, validationWarnings) {
}

function renderEdited(input, validationErrors, validationWarnings) {
}

function renderFieldChanged(input, validationErrors, validationWarnings) {
}

function findField(input, fieldId){
    var fields = input.currentAnalysisSettings['fields'];
    for (var i = 0; i < fields.length; i++){
        if (fields[i].id === fieldId) {
            return fields[i];
        }
    }
    return null;
}

Click the Save button to save the changes.

Analyses

An Analysis is the execution of a pipeline.

Starting Analyses

You can start an analysis from both the dedicated analysis screen or from the actual pipeline.

From Analyses

  1. Navigate to Projects > Your_Project > Flow > Analyses.

  2. Select Start.

  3. Select a single Pipeline.

  4. Configure the analysis settings.

  5. Select Start Analysis.

  6. If for some reason, you want to end the analysis before it can complete, select Projects > Your_Project > Flow > Analyses > Manage > Abort. Refresh to see the status update.

From Pipelines or Pipeline details

  1. Navigate to Projects > <Your_Project> > Flow > Pipelines

  2. Select the pipeline you want to run or open the pipeline details of the pipeline which you want to run.

  3. Select Start Analysis.

  4. Configure analysis settings.

  5. Select Start Analysis.

  6. If for some reason, you want to end the analysis before it can complete, select Manage > Abort on the Analyses page.

Aborting Analyses

You can abort a running analysis from either the analysis overview (Projects > your_project > Flow > Analyses > your_analysis > Manage > Abort) or from the analysis details (Projects > your_project > Flow > Analyses > your_analysis > Details tab > Abort).

Rerunning Analyses

Once an analysis has been executed, you can rerun it with the same settings or choose to modify the parameters when rerunning. Modifying the parameters is possible on a per-analysis basis. When selecting multiple analyses at once, they will be executed with the original parameters. Draft pipelines are subject to updates and thus can result in a different outcome when rerunning. ICA will display a warning message to inform you of this when you try to rerun an analysis based on a draft pipeline.

When rerunning an analysis, the user reference will be the original user reference (up to 231 characters), followed by _rerun_yyyy-MM-dd_HHmmss.

When there is an XML configuration change on a a pipline for which you want to rerun an analysis, ICA will display a warning and not fill out the parameters as it cannot guarantee their validity for the new XML.

Some restrictions apply when trying to rerun an analysis.

Analyses
Rerun
Rerun with modified parameters

Analyses using external data

Allowed

-

Analyses using mount paths on input data

Allowed

-

Analyses using user-provided input json

Allowed

-

Analyses using advanced output mappings

-

-

Analyses with draft pipeline

Warn

Warn

Analyses with XML configuration change

Warn

Warn

To rerun one or more analyses with te same settings:

  1. Navigate to Projects > Your_Project > Flow > Analyses.

  2. In the overview screen, select one or more analyses.

  3. Select Manage > Rerun. The analyses will now be executed with the same parameters as their original run.

To rerun a single analysis with modified parameters:

  1. Navigate to Projects > Your_Project > Flow > Analyses.

  2. In the overview screen, open the details of the analysis you want to rerun by clicking on the analysis user reference.

  3. Select Rerun. (at the top right)

  4. Update the parameters you want to change.

  5. Select Start Analysis The analysis will now be executed with the updated parameters.

Lifecycle

Status
Description
Final State

Requested

The request to start the Analysis is being processed

No

Queued

Analysis has been queued

No

Initializing

Initializing environment and performing validations for Analysis

No

Preparing Inputs

Downloading inputs for Analysis

No

In Progress

Analysis execution is in progress

No

Generating outputs

Transferring the Analysis results

No

Aborting

Analysis has been requested to be aborted

No

Aborted

Analysis has been aborted

Yes

Failed

Analysis has finished with error

Yes

Succeeded

Analysis has finished with success

Yes

When an analysis is started, the availability of resources may impact the start time of the pipeline or specific steps after execution has started. Analyses are subject to delay when the system is under high load and the availability of resources is limited.

During analysis start, ICA runs a verification on the input files to see if they are available. When it encounters files that have not completed their upload or transfer, it will report "Data found for parameter [parameter_name], but status is Partial instead of Available". Wait for the file to be available and restart the analysis.

Analysis steps logs

During the execution of an analysis, logs are produced for each process involved in the analysis lifecyle. In the analysis details view, the Steps tab is used to view the steps in near real time as they're produced in the running processes. A grid layout is used for analyses with more than 50 steps, a tiled view for analyses with 50 steps or less, though you can choose to also use the grid layout for those by means of the tile/grid button on the top right of the analysis log tab.

There are system processes involved in the lifeycle for all analyses (ie. downloading inputs, uploading outputs, etc.) and there are processes which are pipeline-specific, such as processes which execute the pipeline steps. The below table describes the system processes. You can choose to display or hide these system processes with the Show technical steps

Process
Description

Setup Environment

Validate analysis execution environment is prepared

Run Monitor

Monitor resource usage for billing and reporting

Prepare Input Data

Download and mount input data to the shared file system

Pipeline Runner

Parent process to execute the pipeline definition

Finalize Output Data

Upload Output Data

Additional log entries will show for the processes which execute the steps defined in the pipeline.

Each process shows as a distinct entry in the steps view with a Queue Date, Start Date, and End Date.

Timestamp
Description

Queue Date

The time when the process is submitted to the processes scheduler for execution

Start Date

The time when the process has started exection

End Date

The time when the process has stopped execution

The time between the Start Date and the End Date is used to calculate the duration. The time of the duration is used to calculate the usage-based cost for the analysis. Because this is an active calculation, sorting on this field is not supported.

Each log entry in the Steps view contains a checkbox to view the stdout and stderr log files for the process. Clicking a checkbox adds the log as a tab to the log viewer where the log text is displayed and made available for download.

Analysis Cost

To see the price of an analysis in iCredits, look at Projects > your_project > Flow > Analyses > your_analysis > Details tab. The pricing section will show you the entitlement bundle, storage detail and price in iCredits once the analysis has succeded, failed or been aborted.

Log Files

In the analysis output folder, the ica_logs subfolder will contain the stdout and stderr files.

If you delete these files, no log information will be available on the analysis details > Steps tab.

Log Streaming

Logs can also be streamed using websocket client tooling. The API to retrieve analysis step details returns websocket URLs for each step to stream the logs from stdout/stderr during the step's execution. Upon completion, the websocket URL is no longer available.

Analysis Output Mappings

Currently, this feature is only availabe when launching analyses via API.

Currently, only FOLDER type output mappings are supported

By default, analysis outputs are directed to a new folder within the project where the analysis is launched. Analysis output mappings may be specified to redirect outputs to user-specified locations consisting of project and path. An output mapping consists of:

  • the source path on the local disk of the analysis execution environment, relative to the working folder.

  • the data type, either FILE or FOLDER

  • the target project ID to direct outputs to; analysis launcher must have contributor access to the project.

  • the target path relative to the root of the project data to write the outputs.

If the output folder already exists, any existing contents with the same filenames as those output from the pipeline will be overwritten by the new analysis

Example

In this example, 2 analysis output mappings are specified. The analysis writes data during execution in the working directory at paths out/test and out/test2. The data contained in these folders are directed to project with ID 4d350d0f-88d8-4640-886d-5b8a23de7d81 and at paths /output-testing-01/ and /output-testing-02/ respectively, relative to the root of the project data.

The following demonstrates the construction of the request body to start an analysis with the output mappings described above:

```json
{
...
    "analysisOutput":
    [
        {
            "sourcePath": "out/test1",
            "type": "FOLDER",
            "targetProjectId": "4d350d0f-88d8-4640-886d-5b8a23de7d81",
            "targetPath": "/output-testing-01/"
        },
        {
            "sourcePath": "out/test2",
            "type": "FOLDER",
            "targetProjectId": "4d350d0f-88d8-4640-886d-5b8a23de7d81",
            "targetPath": "/output-testing-02/"
        }
    ]
}
```

When the analysis completes the outputs can be seen in the ICA UI, within the folders designated in the payload JSON during pipeline launch (output-testing-01 and output-testing-02).

You can jump from the Analysis Details output section to the individual files and folders by opening the detail view (projects > your_project > Flow > Analyses > your_analysis > Details tab > Output files section > your_output_file) and selecting open in data.

The Output files section of the analyses will always show the generated outputs, even when they have since been deleted from storage. This is done so you can always see which files were generated during the analysis. In this case it will no longer be possible to navigate to the actual output files.

Tags

You can add and remove tags from your analyses.

  1. Navigate to Projects > Your_Project > Flow > Analyses.

  2. Select the analyses whose tags you want to change.

  3. Select Manage > Manage tags.

  4. Edit the user tags, reference data tags (if applicable) and technical tags.

  5. Select Save to confirm the changes.

Both system tags and customs tags exist. User tags are custom tags which you set to help identify and process information while technical tags are set by the system for processing. Both run-in and run-out tags are set on data to identify which analyses use the data. Connector tags determine data entry methods and reference data tags identify where data is used as reference data.

Hyperlinking

If you want to share a link to an analysis, you can copy and paste the URL from your browser when you have the analysis open. The syntax of the analysis link will be <hostURL>/ica/link/project/<projectUUID>/analysis/<analysisUUID>. Likewise, workflow sessions will use the syntax <hostURL>/ica/link/project/<projectUUID>/workflowSession/<workflowsessionUUID>. To prevent third parties from accessing data via the link when it is shared or forwarded, ICA will verify the access rights of every user when they open the link.

Restrictions

Input for analysis is limited to a total of 50,000 files (including multiple copies of the same file). You can have up to 50 concurrent analyses running per tenant. Additional analyses will be queued and scheduled when currently running analyses complete and free up positions.

Troubleshooting

When your analysis fails, open the analysis details view (Projects > your_project> Flow > Analyses > your_analysis) and select display failed steps. This will give you the steps view filtered on those steps that had non-0 exit codes. If there is only one failed step which has logfiles, the stderr of that step will be displayed.

  • Exit code 55 indicates analysis failure due to an external event such as spot termination or node draining. Retry the analysis.

Data Catalogue

Data Catalogues provide views on data from Illumina hardware and processes (Instruments, Cloud software, Informatics software and Assays) so that this data can be distributed to different applications. This data consists of read-only tables to prevent updates by the applications accessing it. Access to data catalogues is included with professional and enterprise subscriptions.

Available views

Project-level views

  • ICA_PIPELINE_ANALYSES_VIEW (Lists project-specific ICA pipeline analysis data)

  • ICA_DRAGEN_QC_METRIC_ANALYSES_VIEW (project-specific quality control metrics)

Tenant-level views

  • ICA_PIPELINE_ANALYSES_VIEW (Lists ICA pipeline analysis data)

  • CLARITY_SEQUENCINGRUN_VIEW_tenant (sequencing run data coming from the lab workflow software)

  • CLARITY_SAMPLE_VIEW_tenant (sample data coming from the lab workflow software)

  • CLARITY_LIBRARY_VIEW_tenant (library data coming from the lab workflow software)

  • CLARITY_EVENT_VIEW_tenant (event data coming from the lab workflow software)

  • ICA_DRAGEN_QC_METRIC_ANALYSES_VIEW (quality control metrics)

Preconditions for view content

  • DRAGEN metrics will only have content when DRAGEN pipelines have been executed.

  • Analysis views will only have content when analyses have been executed.

Who can add or remove Catalogue data (views) to a project?

Members of a project, who have both base contributor and project contributor or administrator rights and who belong to the same tenant as the project can add views from a Catalogue. Members of a project with the same rights who do not belong to the same tenant can remove the catalogue views from a project. Therefore, if you are invited to collaborate on a project, but belong to a different tenant, you can remove catalogue views, but cannot add them again.

Adding Catalogue data (views) to your project

To add Catalogue data,

  1. Go to Projects > your_project > Base > Tables.

  2. Select Add table > Import from Catalogue.

  3. A list of available views will be displayed. (Note that views which are already part of your project are not listed)

  4. Select the table you want to add and choose +Select

Catalogue data will have View as type, the same as tables which are linked from other projects.

Removing Catalogue data (views) from your project

To delete Catalogue data,

  1. go to Projects > your_project > Base > Tables.

  2. Select the table you want to delete and choose Delete.

  3. A warning will be presented to confirm your choice. Once deleted, you can add the Catalogue data again if needed.

Catalogue table details (Catalogue Table Selection Screen)

  • View: The name of the Catalogue table.

  • Description: An explanation of which data is contained in the view.

  • Category: The identification of the source system which provided the data.

  • Tenant/project. Appended to the view name as _tenant or _project. Determines if the data is visible for all projects within the same tenant or only within the project. Only the tenant administrator can see the non-project views.

Catalogue table details (Table Schema Definition)

Querying views

In this section, we provide examples of querying selected views from the Base UI, starting with ICA_PIPELINE_ANALYSES_VIEW (project view). This table includes the following columns: TENANT_UUID, TENANT_ID, TENANT_NAME, PROJECT_UUID, PROJECT_ID, PROJECT_NAME, USER_UUID, USER_NAME, and PIPELINE_ANALYSIS_DATA. While the first eight columns contain straightforward data types (each holding a single value), the PIPELINE_ANALYSIS_DATA column is of type VARIANT, which can store multiple values in a nested structure. In SQL queries, this column returns data as a JSON object. To filter specific entries within this complex data structure, a combination of JSON functions and conditional logic in SQL queries is essential.

The following query extracts

  • USER_NAME directly from the ICA_PIPELINE_ANALYSES_VIEW_project table.

  • PIPELINE_ANALYSIS_DATA:reference and PIPELINE_ANALYSIS_DATA:price. These are direct accesses into the JSON object stored in the PIPELINE_ANALYSIS_DATA column. They extract specific values from the JSON object.

  • Entries from the array 'steps' in the JSON object. The query uses LATERAL FLATTEN(input => PIPELINE_ANALYSIS_DATA:steps) to expand the steps array within the PIPELINE_ANALYSIS_DATA JSON object into individual rows. For each of these rows, it selects various elements (like bpeResourceLifeCycle, bpeResourcePresetSize, etc.) from the JSON.

Furthermore, the query filters the rows based on the status being 'FAILED' and the stepId not containing the word 'Workflow': it allows the user to find steps which failed.

Now let's have a look at DRAGEN_METRICS_VIEW_project view. Each DRAGEN pipeline on ICA creates multiple metrics files, e.g. SAMPLE.mapping_metrics.csv, SAMPLE.wgs_coverage_metrics.csv, etc for DRAGEN WGS Germline pipeline. Each of these files is represented by a row in DRAGEN_METRICS_VIEW_project table with columns ANALYSIS_ID, ANALYSIS_UUID, PIPELINE_ID, PIPELINE_UUID, PIPELINE_NAME, TENANT_ID, TENANT_UUID, TENANT_NAME, PROJECT_ID, PROJECT_UUID, PROJECT_NAME, FOLDER, FILE_NAME, METADATA, and ANALYSIS_DATA. ANALYSIS_DATA column contains the content of the file FILE_NAME as an array of JSON objects. Similarly to the previous query we will use FLATTEN command. The following query extracts

  • Sample name from the file names.

  • Two metrics 'Aligned bases in genome' and 'Aligned bases' for each sample and the corresponding values.

The query looks for files SAMPLE.wgs_coverage_metrics.csv only and sorts based on the sample name:

Lastly, you can combine these views (or rather intermediate results derived from these views) using the WITH and JOIN commands. The SQL snippet below demonstrates how to join two intermediate results referred to as 'flattened_dragen_scrna' and 'pipeline_table'. The query:

  • Selects two metrics ('Invalid barcode read' and 'Passing cells') associated with single-cell RNA analysis from records where the FILE_NAME ends with 'scRNA.metrics.csv', and then stores these metrics in a temporary table named 'flattened_dragen_scrna'.

  • Retrieves metadata related to all scRNA analyses by filtering on the pipeline ID from the 'ICA_PIPELINE_ANALYSES_VIEW_project' view and stores this information in another temporary table named 'pipeline_table'.

  • Joins the two temporary tables using the JOIN operator, specifying the join condition with the ON operator.

An example how to obtain the costs incurred by the individual steps of an analysis

You can use ICA_PIPELINE_ANALYSES_VIEW to obtained the costs of individual steps of an analysis. Using the following SQL snippet you can retrieve the costs of individual steps for every analyses run in the past week.

Limitations

  • Data Catalogue views cannot be shared as part of a Bundle.

  • Data size is not shown for views because views are a subset of data.

  • By removing Base from a project, the Data Catalogue will also be removed from that project.

Best Practices

As tenant-level Catalogue views can contain sensitive data, it is best to save this (filtered) data to a new table and share that table instead of sharing the entire view as part of a project. To do so, add your view to a separate project and run a query on the data at Projects > your_project > Base > Query > New Query. When the query completes, you can export the result as a new table. This ensures no new data will be added on consequent runs.

Schedule

On the Schedule page at Projects > your_project > Base > Schedule, it’s possible to create a job for importing different types of data you have access to into an existing table.

When creating or editing a schedule, Automatic import is performed when the Active box is checked. The job will run at 10 minute intervals. In addition, for both active and inactive schedules, a manual import is performed when selecting the schedule and clicking the »run button.

Configure a schedule

There are different types of schedules that can be set up:

  • Files

  • Metadata

  • Administrative data.

Files

This type will load the content of specific files from this project into a table. When adding or editing this schedule you can define the following parameters:

  • Name (required): The name of the scheduled job

  • Description: Extra information about the schedule

  • File name pattern (required): Define in this field a part or the full name of the file name or of the tag that the files you want to upload contain. For example, if you want to import files named sample1_reads.txt, sample2_reads.txt, … you can fill in _reads.txt in this field to have all files that contain _reads.txt imported to the table.

  • Generated by Pipelines: Only files generated by these selected pipelines are taken into account. When left clear, files from all pipelines are used.

  • Target Base Table (required): The table to which the information needs to be added. A drop-down list with all created tables is shown. This means the table needs to be created before the schedule can be created.

  • Write preference (required): Define data handling; whether it can overwrite the data

  • Data format (required): Select the data format of the files (CSV, TSV, JSON)

  • Delimiter (required): to indicate which delimiter is used in the delimiter separated file. If the delimiter is not present in list, it can be indicated as custom.

  • Active: The job will run automatically if checked

  • Custom delimiter: the custom delimiter that is used in the file. You can only enter a delimiter here if custom delimiter is selected.

  • Header rows to skip: The number of consecutive header rows (at the top of the table) to skip.

  • References: Choose which references must be added to the table

  • Advanced Options

    • Encoding (required): Select the encoding of the file.

    • Null Marker: Specifies a string that represents a null value in a CSV/TSV file.

    • Quote: The value (single character) that is used to quote data sections in a CSV/TSV file. When this character is encountered at the beginning and end of a field, it will be removed. For example, entering " as quote will remove the quotes from "bunny" and only store the word bunny itself.

    • Ignore unknown values: This applies to CSV-formatted files. You can use this function to handle optional fields without separators, provided that the missing fields are located at the end of the row. Otherwise, the parser can not detect the missing separator and will shift fields to the left, resulting in errors.

      • If headers are used: The columns that have matching fields are loaded, those that have no matching fields are loaded with NULL and remaining fields are discarded.

      • If no headers are used: The fields are loaded in order of occurrence and trailing missing fields are loaded with NULL, trailing additional fields are discarded.

Metadata

This type will create two new tables: BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL and ICA_PROJECT_SAMPLE_META_DATA. The job will load metadata (added to the samples) into ICA_PROJECT_SAMPLE_META_DATA. The process gathers the metadata from the samples via the data linked to the project and the metadata from the analyses in this project. Furthermore, the schedular will add provenance data to BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL. This process gathers the execution details of all the analyses in the project: the pipeline name and status, the user reference, the input files (with identifiers), and the settings selected at runtime. This enables you to track the lineage of your data and to identify any potential sources of errors or biases. So, for example, the following query will count how many times each of the pipelines was executed and sort it accordingly:

To obtained the similar table for the failed runs, you can execute the following SQL query:

When adding or editing this schedule you can define the following parameters:

  • Name (required): the name of this scheduled job

  • Description: Extra information about the schedule

  • Anonymize references: when selected, the references will not be added

  • Include sensitive meta data fields: in the meta data fields configuration, fields can be set to sensitive. When checked, those fields will also be added.

  • Active: the job will run automatically if ticked

  • Source (Tenant Administrators Only):

    • Project (default): All administrative data from this project will be added

    • Account: All administrative data from every project in the account will be added. When a tenant admin creates the tenant-wide table with administrative data in a project and invites other users to this project, these users will see this table as well.

Administrative data

This type will automatically create a table and load administrative data into this table. A usage overview of all executions is considered administrative data.

When adding or editing this schedule the following parameters can be defined:

  • Name (required): The name of this scheduled job

  • Description: Extra information about the schedule

  • Anonymize references: When checked, any platform references will not be added

  • Include sensitive metadata fields: In the metadata fields configuration, fields can be set to sensitive. When checked, those fields will also be added.

  • Active: The job will run automatically if checked

  • Source (Tenant Administrators Only):

    • Project (default): All administrative data from this project will be added

    • Account: All administrative data from every project in the account will be added. When a tenant admin creates the tenant-wide table with administrative data in a project and invites other users to this project, these users will see this table as well.

Delete schedule

Schedules can be deleted. Once deleted, they will no longer run, and they will not be shown in the list of schedules.

Run schedule

When clicking the Run button, or Save & Run when editing, the schedule will start the job of importing the configured data in the correct tables. This way the schedule can be run manually. The result of the job can be seen in the tables.

Query

Queries can be used for data mining. On the Projects > your_project > Base > Query page:

  • New queries can be created and executed

  • Already executed queries can be found in the query history

  • Saved queries and query templates are listed under the saved queries tab.

New Query

Available tables

All available tables and their details are listed on the New Query tab.

Metadata tables are created by syncing with the Base module. This synchronization is configured on the Details page within the project.

Input

Queries are executed using SQL (for example Select * From table_name). When there is a syntax issue with the query, the error will be displayed on the query screen when trying to run it. The query can be immediately executed or saved for future use.

Best practices and notes

Do not use queries such as ALTER TABLE to modify your table structure as it will go out of sync with the table definition and will result in processing errors.

  • When you have duplicate column names in your query, put the columns explicitly in the select clause and use column aliases for columns with the same name.

  • Case sensitive column names (such as the VARIANTS table) must be surrounded by double quotes. For example, select * from MY_TABLE where "PROJECT_NAME" = 'MyProject'.

  • The syntax for ICA case-sensitive subfields is without quotes, for example select * from MY_TABLE where ica:Tenant = 'MyTenant' As these are case sensitive, the upper and lowercasing must be respected.

  • For more information on queries, please also see the snowflake documentation: https://docs.snowflake.com/en/user-guide/

Querying data within columns.

Some tables contain columns with an array of values instead of a single value.

Querying data within an array

As of ICA version 2.27, there is a change in the use of capitals for ICA array fields. In previous versions, the data name within the array would start with a capital letter. As of 2.27, lowercase is used. For example ICA:Data_reference has become ICA:data_reference.

You can use the GET_IGNORE_CASE option to adapt existing queries when you have both data in the old syntax and new data in the lowercase syntax. The syntax is GET_IGNORE_CASE(Table_Name.Column_Name,'Array_field')

For example:

select ICA:Data_reference as MY_DATA_REFERENCE from TestTable becomes:

select GET_IGNORE_CASE(TESTTABLE.ICA,'Data_reference') as MY_DATA_REFERENCE from TestTable

You can also modify the data to have consistent capital usage by executing the query update YOUR_TABLE_NAME set ica = object_delete(object_insert(ica, 'data_name', ica:Data_name), 'Data_name') and repeating this process for all field names (Data_name, Data_reference, Execution_reference, Pipeline_name, Pipeline_reference, Sample_name, Sample_reference, Tenant_name and Tenant_reference).

Suppose you have a table called YOUR_TABLE_NAME consisting of three fields. The first is a name, the second is a code and the third field is an array of data called ArrayField:

Query results

If the query is valid for execution, the result will be shown as a table underneath the input box. From within the result page of the query, it is possible to save the result in two ways:

  • Download: As Excel or JSON file to the computer.

Run a new query

  1. Navigate to Projects > your_project > Base > Query.

  2. Enter the query to execute using SQL.

  3. Select »Run Query.

  4. Optionally, select Save Query to add the query to your saved queries list.

If the query takes more than 30 seconds without returning a result, a message will be displayed to inform you the query is still in progress and the status can be consulted on Projects > your_project > Activity > Base Jobs. Once this Query is successfully completed, the results can be found in Projects > your_project > Base > Query > Query History tab.

Query history

The query history lists all queries that were executed. Historical queries are shown with their date, executing user, returned rows and duration of the run.

  1. Navigate to Projects > your_project > Base > Query.

  2. Select the Query History tab.

  3. Select a query.

  4. Perform one of the following actions:

    • Open Query—Open the query in the New Query tab. You can then select Run Query to execute the query again.

    • Save Query—Save the query to the saved queries list.

    • View Results—Download the results from a query or export results to a new table, view, or file in the project. Results are available for 24 hours after the query is executed. To view results after 24 hours, you need to execute the query again.

Saved Queries

All queries saved within the project are listed under the Saved Queries tab together with the query templates.

The saved queries can be:

  • Opened: This will open the query in the “New query” tab.

  • Saved as template: The saved query becomes a query template.

  • Deleted: The query is removed from the list and cannot be opened again.

The query templates can be:

  • Opened: This will open the query again in the “New query” tab.

  • Deleted: The query is removed from the list and cannot be opened again.

It is possible to edit the saved queries and templates by double-clicking on each query or template. Specifically for Query Templates, the data classification can be edited to be:

  • Account: The query template will be available for everyone within the account

  • User: The query template will be available for the user who created it

Run a saved Query

If you have saved a query, you can run the query again by selecting it from the list of saved queries.

  1. Navigate to Projects > your_project > Base > Query.

  2. Select the Saved Queries tab.

  3. Select a query.

  4. Select Open Query to open the query in the New Query tab from where it can be edited if needed and run by selecting Run Query.

Shared database for project

Shared databases are displayed under the list of Tables as Shared Database for project <project name>.

Tables

All tables created within Base are gathered on the Projects > your_project > Base > Tables page. New tables can be created and existing tables can be updated or deleted here.

Create a new Table

To create a new table, click Add table > New table on the Tables page. Tables can be created from scratch or from a template that was previously saved.

If you make a mistake in the order of columns when creating your table, then as long as you have not saved your table, you can switch to Edit as text to change the column order. The text editor can swap or move columns whereas the built-in editor can only delete columns or add columns to the end of the sequence. When editing in text mode, it is best practice to copy the content of the text editor to a notepad before you make changes because a corrupted syntax will result in the text being wiped or reverted when switching between text and non-text mode.

Once a table is saved it is no longer possible to edit the schema, only new fields can be added. The workaround is switching to text mode, copying the schema of the table to which you want to make modifications and paste it into a new empty table where the necessary changes can be made before saving.

Once created, do not try to modify your table column layout via the Query module as even though you can execute ALTER TABLE commands, the definitions and syntax of the table will go out of sync resulting in processing issues.

Empty Table

To create a table from scratch, complete the fields listed below and click the Save button. Once saved, a job will be created to create the table. To view table creation progress, navigate to the Activity page.

Table information

The table name is a required field and must be unique. The first character of the table must be a letter followed by letters, numbers or underscores. The description is optional.

References

Including or excluding references can be done by checking or un-checking the Include reference checkbox. These reference fields are not shown on the table creation page, but are added to the schema definition, which is visible after creating the table (Projects > your_project > Base > Tables > your_table > Schema definition). By including references, additional columns will be added to the schema (see next paragraph) which can contain references to the data on the platform:

  • data_reference: reference to the data element in the Illumina platform from which the record originates

  • data_name: original name of the data element in the Illumina platform from which the record originates

  • sample_reference: reference to the sample in the Illumina platform from which the record originates

  • sample_name: name of the sample in the Illumina platform from which the record originates

  • pipeline_reference: reference to the pipeline in the Illumina platform from which the record originates

  • pipeline_name: name of the pipeline in the Illumina platform from which the record originates

  • execution_reference: reference to the pipeline execution in the Illumina platform from which the record originates

  • account_reference: reference to the account in the Illumina platform from which the record originates

  • account_name: name of the account in the Illumina platform from which the record originates

Schema

In an empty table, you can create a schema by adding a field for each column of the table and defining it. The + Add field button is located to the right of the schema. At any time during the creation process, it is possible to switch to the edit as text mode and back. The text mode shows the JSON code, whereas the original view shows the fields in a table.

Each field requires:

  • a name – this has to be unique (*1)

  • a type

    • String – collection of characters

    • Bytes – raw binary data

    • Integer – whole numbers

    • Float – fractional numbers (*2)

    • Numeric – any number (*3)

    • Boolean – only options are “true” or “false”

    • Timestamp - Stores number of (milli)seconds passed since the Unix epoch

    • Date - Stores date in the format YYYY-MM-DD

    • Time - Stores time in the format HH:MI:SS

    • Datetime - Stores date and time information in the format YYYY-MM-DD HH:MI:SS

    • Record – has a child field

    • Variant - can store a value of any other type, including OBJECT and ARRAY

  • a mode

    • Required - Mandatory field

    • Nullable - Field is allowed to have no value

    • Repeated - Multiple values are allowed in this field (will be recognized as array in Snowflake)

(*1) Do not use reserved Snowflake keywords such as left, right, sample, select, table,... (https://docs.snowflake.com/en/sql-reference/reserved-keywords) for your schema name as this will lead to SQL compilation errors.

(*2) Float values will be exported differently depending on the output format. For example JSON will use scientific notation so verify that your consecutive processing methods support this.

(*3) Defining the precision when creating tables with SQL is not supported as this will result in rounding issues.

From template

Users can create their own template by making a table which is turned into a template at Projects > your_project > Base > Tables > your_table > Save as template.

If a template is created and available/active, it is possible to create a new table based on this template. The table information and references follow the rules of the empty table but in this case the schema will be pre-filled. It is possible to still edit the schema that is based on the template.

Table information

Table status

The status of a table can be found at Projects > your_project > Base > Tables. The possible statuses are:

  • Available: Ready to be used, both with or without data

  • Pending: The system is still processing the table, there is probably a process running to fill the table with data

  • Deleted: The table is deleted functionally; it still exists and can be shown in the list again by clicking the “Show deleted tables” button

Additional Considerations

  • Tables created from empty data or from a template are the fastest available.

  • When copying a table with data, it can remain in a Pending for longer periods of time.

  • Clicking on the page's refresh button will update the list.

Table details

For any available table, the following details are shown:

  • Table information: Name, description, number of records and data size

  • Schema definition: An overview of the table schema, also available in text. Fields can be added to the schema but not deleted. For deleting fields: copy the schema as text and paste in a new empty table where the schema is still editable.

  • Preview: A preview of the table for the 50 first rows (when data is uploaded into the table)

  • Source Data: the files that are currently uploaded into the table. You can see the Load Status of the files which can be Prepare Started, Prepare Succeeded or Prepare Failed and finally Load Succeeded or Load Failed. You can change the order and visible columns by hovering over the column headers and clicking on the cog symbol.

Table actions

From within the details of a table it is possible to perform the following actions related to the table:

  • Copy: Create a copy from this table in the same or a different project. In order to copy to another project, data sharing of the original project should be enabled in the details of this project. The user also has to have access to both original and target project.

  • Export as file: Export this table as a CSV or JSON file. The exported file can be found in a project where the user has the access to download it.

  • Save as template: Save the schema or an edited form of it as a template.

  • Add data: Load additional data into the table manually. This can be done by selecting data files previously uploaded to the project, or by dragging and dropping files directly into the popup window for adding data to the table. It’s also possible to load data into a table manually or automatically via a pre-configured job. This can be done on the Schedule page.

  • Delete: Delete the table.

Manually importing data to your Table

To manually add data to your table, go to Projects > your_project > Base > Tables > your_table > +Add Data

Data selection

The data selection screen will show options to define the structure and location of your source data:

  • Write preference: Define if data can be written to the table only when the table is empty, if the data should be appended to the table or if the table should be overwrtitten.

  • Data format (required): Select the format of the data which you want to import. CSV(comma-separated), TSV (tab-separated) or JSON (JavaScript Object Notation).

  • Delimiter: Which delimiter is used in the delimiter separated file. If the required delimiter is not comma, tab or pipe, select custom and define the custom delimiter.

  • Custom delimiter: If a custom delimiter is used in the source data, it must be defined here.

  • Header rows to skip: The number of consecutive header rows (at the top of the table) to skip.

  • References: Choose which references must be added to the table.

Most of the advanced options are legacy functions and should not be used. The only exceptions are

  • Encoding: Select if the encoding is UTF-8 (any Unicode character) or ISO-8859-1 (first 256 Unicode characters).

  • Ignore unknown values: This applies to CSV-formatted files. You can use this function to handle optional fields without separators, provided that the missing fields are located at the end of the row. Otherwise, the parser can not detect the missing separator and will shift fields to the left, resulting in errors.

    • If headers are used: The columns that have matching fields are loaded, those that have no matching fields are loaded with NULL and remaining fields are discarded.

    • If no headers are used: The fields are loaded in order of occurrence and trailing missing fields are loaded with NULL, trailing additional fields are discarded.

At the bottom of the select data screen, you can select the data you manually want to upload. You can select local files, drop files via the browser or choose files from your project.

Data import progress

To see the status of your data import, go to Projects > your_project > Activity > Base Jobs where you will see a job of type Prepare Data which will have succeeded or failed. If it has failed, you can see the error message and details by double-clicking the base job. You can then take corrective actions if the input mismatched with the table design and try to run the import again (with a new copy of the file as each input file can only be used once)

If you need to cancel the import, you can do so while it is scheduled by navigating to the Base Jobs inventory and selecting the job followed by Abort.

List of table data sources

To see which data has been used to populate your table go to Projects > your_project > Base > Tables > your_table > Source Data. This will list all the source data files, even those that failed to be imported. You can not use these files anymore to import again to prevent double entries.

How to load array data in Base

Base Table schema definitions do not include an array type, but arrays can be ingested using either the Repeated mode for arrays containing a single type (ie, String), or the Variant type.

JSON-Based input forms

Introduction

Pipelines defined using the "Code" mode require an XML or JSON-based input form to define the fields shown on the launch view in the user interface (UI).

To create a JSON-based Nextflow (or CWL) pipeline, go to Projects > your_project > Flow > Pipelines > +Create > Nextflow (or CWL) > JSON-based.

Three files, located on the inputform files tab, work together for evaluating and presenting JSON-based input.

  • inputForm.json contains the actual input form which is rendered when starting the pipeline run.

  • onRender.js is triggered when a value is changed.

  • onSubmit.js is triggered when starting a pipeline via the GUI or API.

Use + Create to add additional files and Simulate to test your inputForms.

Scripting execution supports crossfield validation of the values, hiding fields, making them required, .... based on value changes.


inputForm.json

Parameter types

Parameter Attributes

These attributes can be used to configure all parameter types.

Tree structure example

"choices" can be used for a single list or for a tree-structured list. See below for an example for how to set up a tree structure.

Experimental Features


onSubmit.js

The onSubmit.js javascript function receives an input object which holds information about the chosen values of the input form and the pipeline and pipeline execution request parameters. This javascript function is not only triggered when submitting a new pipeline execution request in the user interface, but also when submitting one through the rest API..

Input parameters

Return values (taken from the response object)

AnalysisError

This is the object used for representing validation errors.


onRender.js

Receives an input object which contains information about the current state of the input form, the chosen values and the field value change that triggered the onrender call. It also contains pipeline information. Changed objects are present in the onRender return value object. Any object not present is considered to be unmodified. Changing the storage size in the start analysis screen triggers an onRender execution with storageSize as changed field.

Input Parameters

Return values (taken from the response object)

RenderMessage

This is the object used for representing validation errors and warnings. The attributes can be used with first letter lowercase (consistent with the input object attributes) or uppercase.

The of the pipeline.

User selectable for running the pipeline. This must be large enough to run the pipeline, but setting it too large incurs unnecessary costs.

available to use with Tools in the pipeline.

Add formatted free text. The widget includes options for that display the corresponding project values.

The release of the tool.

Leverage the cross-platform nature of these workflow languages. Both CWL and Nextflow can be run locally in addition to on ICA. When possible, testing should be performed locally before attempting to run in the cloud. For Nextflow, can be utilized to specify settings to be used either locally or on ICA. An example of advanced usage of a config would be applying the to a set of process names (or labels) so that they use the higher performance local scratch storage attached to an instance instead of the shared network disk,

When trying to test on the cloud, it's oftentimes beneficial to create scripts to automate the deployment and launching / monitoring process. This can be performed either using the or by creating your own scripts integrating with the REST API.

When hardening a Nextflow to handle resource shortages (for example exit code 2147483647), an immediate retry will in most circumstances fail because the resources have not yet been made available. It is best practice to use which has an increasing backoff delay, allowing the system time to provide the necessary resources.

The status and history of Base activities and jobs are shown on the page.

Let's create the with a JSON input form.

Refresh to see the analysis status. See for more information on statuses.

View the analysis status on the Analyses page. See for more information on statuses.

Views containing Clarity data will only have content if you have a Clarity LIMS instance with minimum version 6.0 and the Product Analytics service installed and configured. Please see the for more information.

In the Projects > your_project > Base > Tables view, double-click the Catalogue table to see the details. For an overview of the available actions and details, see .

Since Snowflake offers robust JSON processing capabilities, the function can be utilized to expand JSON arrays within the PIPELINE_ANALYSIS_DATA column, allowing for the filtering of entries based on specific criteria. It's important to note that each entry in the JSON array becomes a separate row once flattened. Snowflake aligns fields outside of this FLATTEN operation accordingly, i.e. the record USER_ID in the SQL query below is "recycled".

If you want to query data from a table shared from another tenant (indicated in green), select the table to see the unique name. In the example below, the query will be select * from demo_alpha_8298.public.TestFiles

NameField
CodeField
ArrayField
Examples

Export: As a new table, as a view or as file to the project in CSV (Tab, Pipe or a custom delimeter is also allowed.) or JSON format. When exporting in JSON format, the result will be saved in a text file that contains a JSON object for each entry, similar to when exporting a . The exported file can be located in the Data page under the folder named base_export_<user_supplied_name>_<auto generated unique id>.

For ICA Cohorts Customers, shared databases are available in a project Base instance. For more information on specific Cohorts shared database tables that are viewable, See .

Be careful when naming tables when you want to use them in . Table names have to be unique per bundle, so no two tables with the same name can be part of the same bundle.

The JSON schema allowing you to define the input parameters. See the page for syntax details.

Type
Usage
Attribute
Purpose
Feature
Value
Meaning
Value
Meaning
Value
Meaning
Value
Meaning
Value
Meaning
configuration files
scratch directive
ICA CLI
Dynamic retry with backoff
Nextflow on Kubernetes: Best Practices
The State of Kubernetes in Nextflow
Activity
Nextflow Scatter Gather pipeline
release status
Compute types
placeholder variables
status
lifecycle
lifecycle
SELECT
    USER_NAME as user_name,
    PIPELINE_ANALYSIS_DATA:reference as reference,
    PIPELINE_ANALYSIS_DATA:price as price,
    PIPELINE_ANALYSIS_DATA:totalDurationInSeconds as duration,
    f.value:bpeResourceLifeCycle::STRING as bpeResourceLifeCycle,
    f.value:bpeResourcePresetSize::STRING as bpeResourcePresetSize,
    f.value:bpeResourceType::STRING as bpeResourceType,
    f.value:completionTime::TIMESTAMP as completionTime,
    f.value:durationInSeconds::INT as durationInSeconds,
    f.value:price::FLOAT as price,
    f.value:pricePerSecond::FLOAT as pricePerSecond,
    f.value:startTime::TIMESTAMP as startTime,
    f.value:status::STRING as status,
    f.value:stepId::STRING as stepId
FROM
    ICA_PIPELINE_ANALYSES_VIEW_project,
    LATERAL FLATTEN(input => PIPELINE_ANALYSIS_DATA:steps) f
WHERE
    f.value:status::STRING = 'FAILED'
    AND f.value:stepId::STRING NOT LIKE '%Workflow%';
SELECT DISTINCT
    SPLIT_PART(FILE_NAME, '.wgs_coverage_metrics.csv', 1) as sample_name,
    f.value:column_2::STRING as metric,
    f.value:column_3::FLOAT as value
FROM
    DRAGEN_METRICS_VIEW_project,
    LATERAL FLATTEN(input => ANALYSIS_DATA) f
WHERE
    FILE_NAME LIKE '%wgs_coverage_metrics.csv'
    AND (
        f.value:column_2::STRING = 'Aligned bases in genome'
        OR f.value:column_2::STRING = 'Aligned bases'
    )
ORDER BY
    sample_name;
WITH flattened_dragen_scrna AS (   
SELECT DISTINCT
    SPLIT_PART(FILE_NAME, '.scRNA.metrics.csv', 1) as sample_name,
    ANALYSIS_UUID, 
    f.value:column_2::STRING as metric,
    f.value:column_3::FLOAT as value
FROM
    DRAGEN_METRICS_VIEW_project,
    LATERAL FLATTEN(input => ANALYSIS_DATA) f
WHERE
    FILE_NAME LIKE '%scRNA.metrics.csv'
    AND (
        f.value:column_2::STRING = 'Invalid barcode read'
        OR f.value:column_2::STRING = 'Passing cells'
    )
),
pipeline_table AS (
SELECT
    PIPELINE_ANALYSIS_DATA:reference::STRING as reference,
    PIPELINE_ANALYSIS_DATA:id::STRING as analysis_id,
    PIPELINE_ANALYSIS_DATA:status::STRING as status,
    PIPELINE_ANALYSIS_DATA:pipelineId::STRING as pipeline_id,
    PIPELINE_ANALYSIS_DATA:requestTime::TIMESTAMP as start_time
FROM
    ICA_PIPELINE_ANALYSES_VIEW_project
WHERE
    PIPELINE_ANALYSIS_DATA:pipelineId = 'c9c9a2cc-3a14-4d32-b39a-1570c39ebc30'
    )
SELECT * FROM flattened_dragen_scrna JOIN pipeline_table 
ON
     flattened_dragen_scrna.ANALYSIS_UUID = pipeline_table.analysis_id;
SELECT
    USER_NAME as user_name,
    PROJECT_NAME as project,
    SUBSTRING(PIPELINE_ANALYSIS_DATA:reference, 1, 30) as reference,
    PIPELINE_ANALYSIS_DATA:status as status,
    ROUND(PIPELINE_ANALYSIS_DATA:computePrice,2) as price,
    PIPELINE_ANALYSIS_DATA:totalDurationInSeconds as duration,
    PIPELINE_ANALYSIS_DATA:startTime::TIMESTAMP as startAnalysis,
    f.value:bpeResourceLifeCycle::STRING as bpeResourceLifeCycle,
    f.value:bpeResourcePresetSize::STRING as bpeResourcePresetSize,
    f.value:bpeResourceType::STRING as bpeResourceType,
    f.value:durationInSeconds::INT as durationInSeconds,
    f.value:price::FLOAT as priceStep,
    f.value:status::STRING as status,
    f.value:stepId::STRING as stepId
FROM
    ICA_PIPELINE_ANALYSES_VIEW_project,
    LATERAL FLATTEN(input => PIPELINE_ANALYSIS_DATA:steps) f
WHERE
   PIPELINE_ANALYSIS_DATA:startTime > CURRENT_TIMESTAMP() - INTERVAL '1 WEEK'
ORDER BY
   priceStep DESC;
SELECT PIPELINE_NAME, COUNT(*) AS Appearances
FROM BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL
GROUP BY PIPELINE_NAME
ORDER BY Appearances DESC;
SELECT PIPELINE_NAME, COUNT(*) AS Appearances
FROM BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL
WHERE PIPELINE_STATUS = 'Failed'
GROUP BY PIPELINE_NAME
ORDER BY Appearances DESC;

Name A

Code A

{ “userEmail”: “email_A@server.com”, "bundleName": null, “boolean”: false }

Name B

Code B

{ “userEmail”: “email_B@server.com”, "bundleName": "thisbundle", “boolean”: true }

You can use the name field and code field to do queries by running

Select * from YOUR_TABLE_NAME where NameField = "Name A".

If you want to show specific data like the email and bundle name from the array, this becomes

Select ArrayField:userEmail as User_Email, ArrayField:bundleName as Bundle_Name from YOUR_TABLE_NAME where NameField = "Name A".

If you want to use data in the array as your selection criteria, the expression becomes

Select ArrayField:userEmail as User_Email, ArrayField:bundleName as Bundle_Name from YOUR_TABLE_NAME where ArrayField:boolean = true.

If your criteria is text in the array, use the ' to delimit the text. For example:

Select ArrayField:userEmail as User_Email, ArrayField:bundleName as Bundle_Name from YOUR_TABLE_NAME where ArrayField:userEmail = 'email_A@server.com'.

You can also use the LIKE operator with the % wildcard if you do not know the exact content.

Select ArrayField:userEmail as User_Email, ArrayField:bundleName as Bundle_Name from YOUR_TABLE_NAME where ArrayField:userEmail LIKE '%A@server%'

textbox

Corresponds to stringType in xml.

checkbox

A checkbox that supports the option of being required, so can serve as an active consent feature. (corresponds to the booleanType in xml).

radio

A radio button group to select one from a list of choices. The values to choose from must be unique.

select

A dropdown selection to select one from a list of choices. This can be used for both single-level lists and tree-based lists.

number

The value is of Number type in javascript and Double type in java. (corresponds to doubleType in xml).

integer

Corresponds to java Integer.

data

Data such as files.

section

For splitting up fields, to give structure. Rendered as subtitles. No values are to be assigned to these fields.

text

To display informational messages. No values are to be assigned to these fields.

fieldgroup

Can contain parameters or other groups. Allows to have repeating sets of parameters, for instance when a father|mother|child choice needs to be linked to each file input. So if you want to have the same elements multiple times in your form, combine them into a fieldgroup.

label

The display label for this parameter. Optional but recommended, id will be used if missing.

minValues

The minimal amount of values that needs to be present. Default when not set is 0. Set to >=1 to make the field required.

maxValues

The maximal amount of values that need to be present. Default when not set is 1.

minMaxValuesMessage

The error message displayed when minValues or maxValues is not adhered to. When not set, a default message is generated.

helpText

A helper text about the parameter. Will be displayed in smaller font with the parameter.

placeHolderText

An optional short hint ( a word or short phrase) to aid the user when the field has no value.

value

The value of the parameter. Can be considered default value.

minLength

Only applied on type="textbox". Value is a positive integer.

maxLength

Only applied on type="textbox". Value is a positive integer.

min

Minimal allowed value for 'integer' and 'number' type.

  • for 'integer' type fields the minimal and maximal values are -100000000000000000 and 100000000000000000.

  • for 'number' type fields the max precision is 15 significant digits and the exponent needs to be between -300 and +300.

max

Maximal allowed value for 'integer' and 'number' type.

  • for 'integer' type fields the minimal and maximal values are -100000000000000000 and 100000000000000000.

  • for 'number' type fields the max precision is 15 significant digits and the exponent needs to be between -300 and +300.

choices

A list of choices with for each a "value", "text" (is label), "selected" (only 1 true supported), "disabled". "parent" can be used to build hierarchical choicetrees. "availableWhen" can be used for conditional presence of the choice based on values of other fields. Parent and value must be unique, you can not use the same value for both.

fields

The list of sub fields for type fieldgroup.

dataFilter

For defining the filtering when type is 'data'. nameFilter, dataFormat and dataType are additional properties.

regex

The regex pattern the value must adhere to. Only applied on type="textbox".

regexErrorMessage

The optional error message when the value does not adhere to the "regex". A default message will be used if this parameter is not present. It is highly recommended to set this as the default message will show the regex which is typically very technical.

hidden

Makes this parameter hidden. Can be made visible later in onRender.js or can be used to set hardcoded values of which the user should be aware.

disabled

Shows the parameter but makes editing it impossible. The value can still be altered by onRender.js.

emptyValuesAllowed

When maxValues is 1 or not set and emptyValuesAllowed is true, the values may contain null entries. Default is false.

updateRenderOnChange

When true, the onRender javascript function is triggered each time the user changes the value of this field. Default is false.

dropValueWhenDisabled

When this is present and true and the field has disabled being true, then the value will be omitted during the submit handling (on the onSubmit result).

{
  "fields": [
    {
      "id": "myTreeList",
      "type": "select",
      "label": "Selection Tree Example",
      "choices": [
        {
          "text": "trunk",
          "value": "treetrunk"
        },
        {
          "text": "branch",
          "value": "treebranch",
          "parent":"treetrunk"
        },
        {
          "text": "leaf",
          "value": "treeleaf",
          "parent":"treebranch"
        },
        {
          "text": "bird",
          "value": "happybird",
          "parent":"treebranch"
        },
        {
          "text": "cat",
          "value": "cat",
          "parent": "treetrunk",
          "disabled": true
        }
      ],
      "minValues": 1,
      "maxValues": 3,
      "helpText": "This is a tree example"
    }
  ]
}

Streamable inputs

Adding "streamable":true to an input field of type "data" makes it a streamable input.

settings

The value of the setting fields. Corresponds to settingValues in the onRender.js. This is a map with field id as key and an array of field values as value. For convenience, values of single-value fields are present as the individual value and not as an array of length 1. In case of fieldGroups, the value can be multiple levels of arrays.

settingValues

To maximize the opportunity for reusing code between onRender and onSubmit, the 'settings' are also exposed as settingValues like in the onRender input.

pipeline

Info about the pipeline: code, tenant, version, and description are all available in the pipeline object as string.

analysis

Info about this run: userReference, userName, and userTenant are all available in the analysis object as string.

storageSize

The storage size as chosen by the user. This will initially be null. StorageSize is an object containing an 'id' and 'name' property.

storageSizeOptions

The list of storage sizes available to the user when creating an analysis. Is a list of StorageSize objects containing an 'id' and 'name' property.

analysisSettings

The input form json as saved in the pipeline. So the original json, without eventual changes.

currentAnalysisSettings

The current input form JSON as rendered to the user. This can contain already applied changes form earlier onRender passes. Null in the first call, when context is 'Initial' or when analysis is created through CLI/API.

settings

The value of the setting fields. This allows modifying the values or applying defaults and such. Or taking info of the pipeline or analysis input object. When settings are not present in the onSubmit return value object, they are assumed to be not modified.

validationErrors

A list of AnalysisError essages representing validation errors. Submitting a pipeline execution request is not possible while there are still validation errors.

analysisSettings

The input form json with potential applied changes. The discovered changes will be applied in the UI when viewing the analysis.

fieldId / FieldId

The field which has an erroneous value. When not present, a general error/warning is displayed. To display an error on the storage size, use the storageSizeFieldid.

index / Index

The 0-starting index of the value which is incorrect. Use this when a particular value of a multivalue field is not correct. When not present, the entire field is marked as erroneous. The value can also be an array of indexes for use with fieldgroups. For instance, when the 3rd field of the 2nd instance of a fieldgroup is erroneous, a value of [ 1 , 2 ] is used.

message / Message

The error/warning message to display.

context

"Initial"/"FieldChanged"/"Edited".

  • Initial is the value when first displaying the form when a user opens the start run screen.

  • The value is FieldChanged when a field with 'updateRenderOnChange'=true is changed by the user.

  • Edited (Not yet supported in ICA) is used when a form is displayed later again, this is intended for draft runs or when editing the form during reruns.

changedFieldId

The id of the field that changed and which triggered this onRender call. context will be FieldChanged. When the storage size is changed, the fieldId will be storageSize.

analysisSettings

The input form json as saved in the pipeline. This is the original json, without changes.

currentAnalysisSettings

The current input form json as rendered to the user. This can contain already applied changes form earlier onRender passes. Null in the first call, when context is Initial.

settingValues

The current value of all settings fields. This is a map with field id as key and an array of field values as value for multivalue fields. For convenience, values of single-value fields are present as the individual value and not as an array of length 1. In case of fieldGroups, the value can be multiple levels of arrays.

pipeline

Information about the pipeline: code, tenant, version, and description are all available in the pipeline object as string.

analysis

Information about this run: userReference, userName, and userTenant are all available in the analysis object as string.

storageSize

The storage size as chosen by the user. This will initially be null. StorageSize is an object containing an 'id' and 'name' property.

storageSizeOptions

The list of storage sizes available to the user when creating an analysis. Is a list of StorageSize objects containing an 'id' and 'name' property.

analysisSettings

The input form json with potential applied changes. The discovered changes will be applied in the UI.

settingValues

The current, potentially altered map of all setting values. These will be updated in the UI.

validationErrors

A list of RenderMessages representing validation errors. Submitting a pipeline execution request is not possible while there are still validation errors.

validationWarnings

A list of RenderMessages representing validation warnings. A user may choose to ignore these validation warnings and start the pipeline execution request.

storageSize

The suitable value for storageSize. Must be one of the options of input.storageSizeOptions. When absent or null, it is ignored.

validation errors and validation warnings can use 'storageSize' as fieldId to let an error appear on the storage size field. 'storageSize' is the value of the changedFieldId when the user alters the chosen storage size.

fieldId / FieldId

The field which has an erroneous value. When not present, a general error/warning is displayed. To display an error on the storage size, use the storageSizeFieldid.

index / Index

The 0-starting index of the value which is incorrect. Use this when a particular value of a multivalue field is not correct. When not present, the entire field is marked as erroneous. The value can also be an array of indexes for use with fieldgroups. For instance, when the 3rd field of the 2nd instance of a fieldgroup is erroneous, a value of [ 1 , 2 ] is used.

message / Message

The error/warning message to display.

Clarity LIMS documentation
Tables
FLATTEN
table
Cohorts Base
bundles
inputForm.json

Create a Cohort

ICA Cohorts lets you create a research cohort of subjects and associated samples based on the following criteria:

  • Project:

    • Include subjects that are part of any ICA Project that you own or that is shared with you.

    • Sample:

      • Sample type such as FFPE.

      • Tissue type.

      • Sequencing technology: Whole genome DNA-sequencing, RNAseq, single-cell RNAseq, etc.

  • Subject:

    • Subject inclusion by Identifier:

      • Input a list of Subject Identifiers (up to 100 entries) when defining a cohort.

      • The Subject Identifier filter is combined using AND logic with any other applied filters.

      • Within the list of subject identifiers, OR logic is applied (i.e., a subject matches if it is in the provided list).

    • Demographics such as age, sex, ancestry.

    • Biometrics such as body height, body mass index.

    • Family and patient medical history.

  • Sample:

    • Sample type such as FFPE.

    • Tissue type.

    • Sequencing technology: Whole genome DNA-sequencing, RNAseq, single-cell RNAseq, etc.

  • Disease:

    • Phenotypes and diseases from standardized ontologies.

  • Drug:

    • Drugs from standardized ontologies along with specific typing, stop reasons, drug administration routes, and time points.

  • Molecular attributes:

    • Samples with a somatic mutation in one or multiple, specified genes.

    • Samples with a germline variant of a specific type in one or multiple, specified genes.

    • Samples over- or under-expressed in one or multiple, specified genes.

    • Samples with a copy number gain or loss involving one or multiple, specified genes.

Disease search

ICA Cohorts currently uses six standard medical ontologies to 1) annotate each subject during ingestion and then to 2) search for subjects: HPO for phenotypes, MeSH, SNOMED-CT, ICD9-CM, ICD10-CM, and OMIM for diseases. By default, any 'type-ahead' search will find matches from all six; and you can limit the search to only the one(s) you prefer. When searching for subjects using names or codes from one of these ontologies, ICA Cohorts will automatically match your query against all the other ontologies, therefore returning subjects that have been ingested using a corresponding entry from another ontology.

In the 'Disease' tab, you can search for subjects diagnosed with one or multiple diseases, as well as phenotypes, in two ways:

  • Start typing the English name of a disease/phenotype and pick from the suggested matches. Continue typing if your disease/phenotype of interest is not listed initially.

    • Use the mouse to select the term or navigate to the term in the dropdown using the arrow buttons.

    • If applicable, the concept hierarchy is shown, with ancestors and immediate children visible.

    • For diagnostic hierarchies, concept children count and descendant count for each disease name is displayed.

      • Descendant Count: Displays next to each disease name in the tree hierarchy (e.g., "Disease (10)").

      • Leaf Nodes: No children count shown for leaf nodes.

      • Missing Counts: Children count is hidden if unavailable.

      • Show Term Count: A new checkbox below "Age of Onset" that is always checked. Unchecking it hides the descendant count.

    • Select a checkbox to include the diagnostic term along with all of its children and decedents.

    • Expand the categories and select or deselect specific disease concepts.

  • Paste one or multiple diagnostic codes separated by a pipe (‘|’).

Drug Search

In the 'Drug' tab, you can search for subjects who have a specific medication record:

  • Start typing the concept name for the drug and pick from suggested matches. Continue typing if the drug is not listed initially.

  • Paste one or multiple drug concept codes. ICA Cohorts currently use RXNorm as a standard ontology during ingestion. If multiple concepts are in your instance of ICA Cohorts, they will be listed under 'Concept Ontology.'

  • 'Drug Type' is a static list of qualifiers that denote the specific administration of the drug. For example, where the drug was dispensed.

  • 'Stop Reason' is a static list of attributes describing a reason why a drug was stopped if available in the data ingestion.

  • 'Drug Route' is a static list of attributes that describe the physical route of administration of the drug. For example, Intravenous Route (IV).

Measurement Search

In the ‘Measurements’ tab, you can search for vital signs and laboratory test data leveraging LOINC concept codes. ·

  • Start typing the English name of the LOINC term, for example, ‘Body height’. A dropdown will appear with matching terms. Use the mouse or down arrows to select the term.

  • Upon selecting a term, the term will be available for use in a query.

  • Terms can be added to your query criteria.

  • For each term, you can set a value `Greater than or equal`, `Equals`, `Less than or equal`, `In range`, or `Any value`.

  • `Any value` will find any record where there is an entry for the measurement independent of an available value.

  • Click `Apply` to add your criteria to the query.

  • Click `Update Now` to update the running count of the Cohort.Include/Exclude

Include/Exclude

  • As attributes are added to the 'Selected Condition' on the right-navigation panel, you can choose to include or exclude the criteria selected.

    • Select a criterion from 'Subject', 'Disease', and/or 'Molecular' attributes by filling in the appropriate checkbox on the respective attribute selection pages.

    • When selected, the attribute will appear in the right-navigation panel.

    • You can use the 'Include' / 'Exclude' dropdown next to the selected attribute to decide if you want to include or exclude subjects and samples matching the attribute.

    • Note: the semantics of 'Include' work in such a way that a subject needs to match only one or multiple of the 'included' attributes in any given category to be included in the cohort. (Category refers to disease, sex, body height, etc.) For example, if you specify multiple diseases as inclusion criteria, subjects will only need to be diagnosed with one of them. Using 'Exclude', you can exclude any subject who matches one or multiple exclusion criteria; subjects do not have to match all exclusion criteria in the same category to be excluded from the cohort.

    • Note: This feature is not available on the 'Project' level selections as there is no overlap between subjects in datasets.

    • Note: Using exclusion criteria does not account for NULL values. For example, if the Super-population 'Europeans' is excluded, subjects will be in your cohort even if they do not contain this data point.

Once you selected Create Cohort, the above data are organized in tabs such as Project, Subject, Disease, and Molecular. Each tab then contains the aforementioned sections, among others, to help you identify cases and/or controls for further analysis. Navigate through these tabs, or search for an attribute by name to directly jump to that tab and section, and select attributes and values that are relevant to describe your subjects and samples of interest. Assign a new name to the cohort you created, and click Apply to save the cohort.

Duplicate a Cohort Definition

  • After creating a Cohort, select the Duplicate icon.

  • A copy of the Cohort definition will be created and tagged with "_copy".

Delete a Cohort Definition

  • Deleting a Cohort Definition can be accomplished by clicking the Delete Cohort icon.

  • This action cannot be undone.

Sharing a Cohort within an ICA Project

After creating a Cohort, users can set a Cohort bookmark as Shared. By sharing a Cohort, the Cohort will be available to be applied across the project by other users with access to the Project. Cohorts created in a Project are only accessible at scope of the user. Other users in the project cannot see the cohort created unless they use this sharing functionality.

Share Cohort Definition

  • Create a Cohort using the directions above.

  • To make the Cohort available to other users in your Project, click the Share icon.

  • The Share icon will be filled in black and the Shared Status will be turned from Private to Shared.

  • Other users with access to Cohorts in the Project can now apply the Cohort bookmark to their data in the project.

Unshare a Cohort Definition

  • To unshare the Cohort, click the Share icon.

  • The icon will turn from black to white, and other users within the project will no longer have access to this cohort definition.

Archive a Cohort Definition

  • A Shared Cohort can be Archived.

  • Select a Shared Cohort with a black Shared Cohort icon.

  • Click the Archive Cohort icon.

  • You will be asked to confirm this selection.

  • Upon archiving the Cohort definition, the Cohort will no longer be seen by other users in the Project.

  • The archived Cohort definition can be unarchived by clicking the Unarchive Cohort icon.

  • When the Cohort definition is unarchived, it will be visible to all users in the Project.

Sharing a Cohort as Bundle

You can link cohorts data sets to a bundle as follows:

  • Create or edit a bundle at Bundles from the main navigation.

  • Navigate to Bundles > your_bundle > Cohorts > Data Sets.

  • Select Link Data Set to Bundle.

  • Select the data set which you want to link and +Select.

  • After a brief time, the cohorts data set will be linked to your bundle and ICA_BASE_100 will be logged.

If you can not find the cohorts data sets which you want to link, verify if

  • Your data set is part of a project (Projects > your_project > Cohorts > Data Sets)

  • This project is set to Data Sharing (Projects > your_project > Project Settings > Details)

Stop sharing a Cohort as Bundle

You can unlink cohorts data sets from bundles as follows:

  • Edit the desired bundle at Bundles from the main navigation.

  • Navigate to Bundles > your_bundle > Cohorts > Data Sets.

  • Select the cohorts data set which you want to unlink.

  • Select Unlink Data Set from Bundle.

  • After a brief time, the cohorts data set will be unlinked from your bundle and ICA_BASE_101 will be logged.

FUSE Driver

Bench Workspaces use a FUSE driver to mount project data directly into a workspace file system. There are both read and write capabilities with some limitations on write capabilities that are enforced by the underlying AWS S3 storage.

As a user, you are allowed to do the following actions from Bench (when having the correct user permissions compared to the workspace permissions) or through the CLI:

  • Copy project data

  • Delete project data

  • Mount project data (CLI only)

  • Unmount project data (CLI only)

When you have a running workspace, you will find a file system in Bench under the project folder along with the basic and advanced tutorials. When opening that folder, you will see all the data that resides in your project.

WARNING: This is a fully mounted version of the project data. Changes in the workspace to project data cannot be undone.

Copy project data

The FUSE driver allows the user to easily copy data from /data/project to the local workspace and vice versa. There is a file size limit of 500 GB per file for the FUSE driver.

Delete project data

The FUSE driver also allows you to delete data from your project. This is different from the use of Bench before where you took a local copy and still kept the original file in your project.

WARNING: Deleting project data through Bench workspace through the FUSE driver will permanently delete the data in the Project. This action cannot be undone.

CLI

Using the FUSE driver through the CLI is not supported for Windows users. Linux users will be able to use the CLI without any further actions, Mac users will need to install the kernel extension from macFuse.

MacOS uses hidden metadata files beginning with ._ ,which are copied over and exposed during CLI copy to your project data. These can be safely deleted from your project.

Mount and unmount of data needs to be done through the CLI. In Bench this happens automatically and is not needed anymore.

WARNING Do NOT use the CP -f command to copy or move data to a mounted location. This will result in data loss as data on the destination location will be deleted.

Restrictions

❗️ Once a file is written, it cannot be changed! You will not be able to update it in the project location because of the restrictions mentioned above.

Trying to update files or saving you notebook in the project folder will typically result in File Save Error for fusedrivererror.ipynb Invalid response: 500 Internal Server Error.

Some examples of other actions or commands that will not work because of the above mentioned limitations:

  • Save a jupyter notebook or R script on the /project location

  • Add/remove a file from an existing zip file

  • Redirect with append to an existing file e.g. echo "This will not work" >> myTextFile.txt

  • Rename a file due to the existing association between ICA and AWS

  • Move files or folders.

  • Using vi or another editor

A file can be written only sequentially. This is a restriction that comes from the library the FUSE driver uses to store data in AWS. That library supports only sequential writing, random writes are currently not supported. The FUSE driver will detect random writes and the write will fail with an IO error return code. Zip will not work since zip writes a table of contents at the end of the file. Please use gzip.

Listing data (ls -l) reads data from the platform. The actual data comes from AWS and there can be a short delay between the writing of the data and the listing being up to date. As a result, a file that is written may appear temporarily as a zero length file, a file that is deleted may appear in the file list. This is a tradeoff, the FUSE driver caches some information for a limited time and during that time the information may seem wrong. Note that besides the FUSE driver, the library used by the FUSE driver to implement the raw FUSE protocol and the OS kernel itself may also do caching.

Jupyter notebooks

To use a specific file in a jupyter notebook, you will need to use '/data/project/filename'.

Old Bench workspaces

This functionality won't work for old workspaces unless you enable the permissions for that old workspace.

Bench

ICA provides a tool called Bench for interactive data analysis. This is a sandboxed workspace which runs a docker image with access to the data and pipelines within a project. This workspace runs on the Amazon S3 system and comes with associated processing and provisioning costs. It is therefore best practice to not keep your Bench instances running indefinitely, but stopping them when not in use.

Access

Having access to Bench depends on the following conditions:

  • Bench needs to be included in your ICA subscription.

  • The project owner needs to enable Bench for their project.

  • Individual users of that project need to be given access to Bench.

Enabling Bench for your project

After creating a project, go to Projects > your_project > Bench > Workspaces page and click the Enable button. If you do not see this option, then either your tenant subscription does not include Bench or you belong to a tenant different from the one where the project was created. Users from other tenants cannot enable the Bench module, but can create workspaces. Once enabled, every user who has the correct permissions has access to the Bench module in that project.

Setting user level access.

Once Bench has been enabled for your project, the combination of roles and teams settings determines if a user can access Bench.

  • Tenant administrators and project owners are always able to access Bench and perform all actions.

  • The teams settings page at Projects > your_project > Project Settings > Team determines the role for the user/workgroup.

    • No Access means you have no access to the Bench workspace for that project.

    • Contributor gives you the right to start and stop the Bench workspace and to access the workspace contents, but not to create or edit the workspace.

    • Administrator gives you the right to create, edit, delete, start and stop the Bench workspace, and to access the actual workspace contents. In addition, the administrator can also build new derived Bench images and tools.

  • Finally, a verification is done of your user rights against the required workspace permissions. You will only have access when your user rights meet or exceed the required workspace permissions. The possible required Workspace permissions include:

    • Upload / Download rights (Download rights are mandatory for technical reasons)

    • Project Level (No Access / Data Provider / Viewer / Contributor)

    • Flow (No Access / Viewer / Contributor)

    • Base (No Access / Viewer / Contributor)

Creating a Pipeline from Scratch

Introduction

This tutorial shows you how to start a new pipeline from scratch


Preparation

Start Bench workspace

  • For this tutorial, any instance size will work, even the smallest standard-small.

  • Select the single user workspace permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines.

  • A small amount of disk space (10GB) will be enough.

We are going to wrap the "gzip" linux compression tool with inputs:

  • 1 file

  • compression level: integer between 1 and 9

We intentionally do not include sanity checks, to keep this scenario simple.

Creation of test file:

mkdir demo_gzip
cd demo_gzip
echo test > test_input.txt

Wrapping in Nextflow

Here is an example of NextFlow code that wraps the bzip2 command and publishes the final output in the “out” folder:

mkdir nextflow-src
# Create nextflow-src/main.nf using contents below
vi nextflow-src/main.nf

nextflow-src/main.nf

nextflow.enable.dsl=2
 
process COMPRESS {
  input:
    path input_file
    val compression_level
 
  output:
    path "${input_file.simpleName}.gz" // .simpleName keeps just the filename
    publishDir 'out', mode: 'symlink'
 
  script:
    """
    gzip -c -${compression_level} ${input_file} > ${input_file.simpleName}.gz
    """
}
 
workflow {
    input_path = file(params.input_file)
    gzip_out = COMPRESS(input_path, params.compression_level)
}

Save this file as nextflow-src/main.nf, and check that it works:

nextflow run nextflow-src/ --input_file test_input.txt --compression_level 5

Result


Wrap the Pipeline in Bench

We now need to:

  • Use Docker

  • Follow some nf-core best practices to make our source+test compatible with the pipeline-dev tools

Using Docker:

In NextFlow, Docker images can be specified at the process level

  • Each process may use a different docker image

  • It is highly recommended to always specify an image. If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.

Specifying the Docker image is done with the container '<image_name:version>' directive, which can be specified

  • at the start of each process definition

  • or in nextflow config files (preferred when following nf-core guidelines)

For example, create nextflow-src/nextflow.config:

process.container = 'ubuntu:latest'

We can now run with nextflow's -with-docker option:

nextflow run nextflow-src/ --input_file test_input.txt --compression_level 5 -with-docker

Create NextFlow “test” profile

Here is an example of “test” profile that can be added to nextflow-src/nextflow.config to define some input values appropriate for a validation run:

nextflow-src/nextflow.config

process.container = 'ubuntu:latest'
 
profiles {
  test {
    params {
      input_file = 'test_input.txt'
      compression_level = 5
    }
  }
}

With this profile defined, we can now run the same test as before with this command:

nextflow run nextflow-src/ -profile test -with-docker

Create NextFlow “docker” profile

A “docker” profile is also present in all nf-core pipelines. Our pipeline-dev tools will make use of it, so let’s define it:

nextflow-src/nextflow.config

process.container = 'ubuntu:latest'
 
profiles {
  test {
    params {
      input_file = 'test_input.txt'
      compression_level = 5
    }
  }
 
  docker {
    docker.enabled = true
  }
}

We can now run the same test as before with this command:

nextflow run nextflow-src/ -profile test,docker

We also have enough structure in place to start using the pipeline-dev command:

pipeline-dev run-in-bench

In order to deploy our pipeline to ICA, we need to generate the user interface input form.

This is done by using nf-core's recommended nextflow_schema.json.

For our simple example, we generate a minimal one by hand (done by using one of the nf-core pipelines as example):

nextflow-src/nextflow_schema.json

{
    "$defs": {
        "input_output_options": {
            "title": "Input/output options",
            "properties": {
                "input_file": {
                    "description": "Input file to compress",
                    "help_text": "The file that will get compressed",
                    "type": "string",
                    "format": "file-path"
                },
                "compression_level": {
                    "type": "integer",
                    "description": "Compression level to use (1-9)",
                    "default": 5,
                    "minimum": 1,
                    "maximum": 9
               }
            }
        }
    }
}

In the next step, this gets converted to the ica-flow-config/inputForm.json file.

Manually building JSONSchema documents is not trivial and can be very error prone. Instead, the nf-core pipelines schema build command collects your pipeline parameters and gives interactive prompts about any missing or unexpected params. If no existing schema is found it will create one for you.

We recommend looking into "nf-core pipelines schema build -d nextflow-src/", which comes with a web builder to add descriptions etc.


Deploy as a Flow Pipeline

We just need to create a final file, which we had skipped until now: Our project description file, which can be created via the command pipeline-dev project-info --init:

pipeline-dev.project_info

$ pipeline-dev project-info --init
 
pipeline-dev.project-info not found. Let's create it with 2 questions:
 
Please enter your project name: demo_gzip
Please enter a project description: Bench gzip demo

We can now run:

pipeline-dev deploy-as-flow-pipeline

After generating the ICA-Flow-specific files in the ica-flow-config folder (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name).

It then asks if we want to update the latest version or create a new one.

Choose "3" and enter a name of your choice to avoid conflicts with all the others users following this same tutorial.

At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.


Run Validation Test in Flow

pipeline-dev launch-validation-in-flow

This launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.

Some of the input files will have been copied to your ICA project in order for the analysis launch to work. They are stored in the folder /data/project/bench-pipeline-dev/temp-data.

Result

/data/demo $ pipeline-dev launch-validation-in-flow

pipelineld: 331f209d-2a72-48cd-aa69-070142f57f73
Getting Analysis Storage Id
Launching as ICA Flow Analysis...
ICA Analysis created:
- Name: Test demo_gzip
- Id: 17106efc-7884-4121-a66d-b551a782b620
- Url: https://stage.v2.stratus.illumina.com/ica/projects/1873043/analyses/17106efc-7884-4121-a66d-b551a782620

Bring Your Own Bench Image

The following steps are needed to get your bench image running in ICA.

Requirements

You need to have Docker installed in order to build your images.

For your Docker bench image to work in ICA, they must run on Linux X86 architecture, have the correct user id and initialization script in the Docker file.

Bench-console provides an example to build a minimal image compatible with ICA Bench to run a SSH Daemon.

Bench-web provides an example to build a minimal image compatible with ICA Bench to run a Web Daemon.

Bench-rstudio provides an example to build a minimal image compatible with ICA Bench to run a rStudio Open Source.

These examples come with information on the available parameters.

Scripts

Init Script (Dockerfile)

This script copies the ica_start.sh file which takes care of the Initialization and termination of your workspace to the location in your project from where it can be started by ICA when you request to start your workspace.

# Init script invoked at start of a bench workspace
COPY --chmod=0755 --chown=root:root ${FILES_BASE}/ica_start.sh /usr/local/bin/ica_start.sh

User (Dockerfile)

The user settings must be set up so that bench runs with UID 1000.

# Bench workspaces need to run as user with uid 1000 and be part of group with gid 100
RUN adduser -H -D -s /bin/bash -h ${HOME} -u 1000 -G users ica

Shutdown Script (ica_start.sh)

To do a clean shutdown, you can capture the sigterm which is transmitted 30 seconds before the workspace is terminated.

# Terminate function
function terminate() {
        # Send SIGTERM to child processes
        kill -SIGTERM $(jobs -p)

        # Send SIGTERM to waitpid
        echo "Stopping ..."
        kill -SIGTERM ${WAITPID}
}

# Catch SIGTERM signal and execute terminate function.
# A workspace will be informed 30s before forcefully being shutdown.
trap terminate SIGTERM

# Hold init process until TERM signal is received
tail -f /dev/null &
WAITPID=$!
wait $WAITPID

Building a Bench Image

Once you have Docker installed and completed the configuration of your Docker files, you can build your bench image.

  1. Open the command prompt on your machine.

  2. Navigate to the root folder of your Docker files.

  3. Once the image has been built, save it as docker tar file with the command docker save mybenchimage:0.0.1 | bzip2 > ../mybenchimage-0.0.1.tar.bz2 The resulting tar file will appear next to the root folder of your docker files.

If you want to build on a mac with Apple Silicon, then the build command is docker buildx build --platform linux/amd64 -f Dockerfile -t mybenchimage:0.0.1 .

Upload Your Docker Image to ICA

  1. Open ICA and log in.

  2. Go to Projects > your_project > Data.

  3. Select the uploaded image file and perform Manage > Change Format.

  4. From the format list, select DOCKER and save the change.

  5. Go to System Settings > Docker Repository > Create > Image.

  6. Select the uploaded docker image and fill out the other details.

    • Name: The name by which your docker image will be seen in the list

    • Version: A version number to keep track of which version you have uploaded. In our example this was 0.0.1

    • Description: Provide a description explaining what your docker images does or is suited for.

    • Type: The type of this image is Bench. The Tool type is reserved for tool images.

    • Cluster compatible: [For Future Use, not currently supported] Indicates if this docker images is suited for cluster computing

    • Access: This setting must match the available access options of your Docker image. You can choose web access (HTTP), console access (SSH) or both. What is selected here becomes available on the + New Workspace screen. Enabling an option here which your Docker image does not support, will result in access denied errors when trying to run the workspace.

    • Regions: If your tenant has access to multiple regions, you can select to which regions to replicate the docker image.

  7. Once the settings are entered, select Save. The creation of the Docker image typically takes between 5 and 30 minutes. The status of your docker image will be partial during creation and available once completed.

Start Your Bench Image

  1. Navigate to Projects > your_project > Bench > Workspaces.

  2. Create a new workspace with + Create Workspace or edit an existing workspace.

  3. Save your changes.

  4. Select Start Workspace

  5. Wait for the workspace to be started and you can access it either via console or the GUI.

Access Bench Image

Once your bench image has been started, you can access it via console, web or both, depending on your configuration.

  • Web access (HTTP) is done from either Projects > your_project > Bench > Workspaces > your_Workspace > Access tab or from the link provided at provided in your running workspace at Projects > your_project > Bench > Workspaces > your_Workspace > Details tab > Access section.

  • Console access (SSH) is performed from your command prompt by going to the path provided in your running workspace at Projects > your_project > Bench > Workspaces > your_Workspace > Details tab > Access section.

Command-line Interface

Restrictions

Root User

  • The bench image will be instantiated as a container which will be forcedly started as user with UID 1000 and GID 100.

  • You cannot elevate your permissions in a running workspace.

Do not run containers as root as this is bad security practice.

Read-only Root Filesystem

Only the following folders are writeable:

  • /data

  • /tmp

All other folders are mounted as read-only.

Network Access

For inbound access, the following ports on the container are publicly exposed, depending on the selection made at startup.

  • Web: TCP/8888

  • Console: TCP/2222

For outbound access, a workspace can be started in two modes:

  • Public: Access to public IP’s is allowed using TCP protocol.

  • Restricted: Access to list of URLs are allowed.

Context

Environment Variables

At runtime, the following Bench-specific environment variables are made available to the workspace instantiated from the Bench image.

Name
Description
Example Values

ICA_WORKSPACE

The unique identifier related to the started workspace. This value is bound to a workspace and will never change.

32781195

ICA_CONSOLE_ENABLED

Whether Console access is enabled for this running workspace.

true, false

ICA_WEB_ENABLED

Whether Web access is enabled for this running workspace.

true, false

ICA_SERVICE_ACCOUNT_USER_API_KEY

An API key that allows interaction with ICA using the ICA CLI and is bound to the permissions defined at startup of the workspace.

ICA_BENCH_URL

The host part of the public URL which provides access to the running workspace.

use1-bench.platform.illumina.com

ICA_PROJECT_UUID

The unique identifier related to the ICA project in which the workspace was started.

ICA_URL

The ICA Endpoint URL.

HTTP_PROXY

HTTPS_PROXY

The proxy endpoint in case the workspace was started in restricted mode.

HOME

The home folder.

/data

Configuration Files

Following files and folders will be provided to the workspace and made accessible for reading at runtime.

Name
Description

/etc/workspace-auth

Contains the SSH rsa public/private keypair which is required to be used to run the workspace SSHD.

Software Files

At runtime, ICA-related software will automatically be made available at /data/.software in read-only mode.

New versions of ICA software will be made available after a restart of your workspace.

Important Folders

Name
Description

/data

This folder contains all data specific to your workspace.

Data in this folder is not persisted in your project and will be removed at deletion of the workspace.

/data/project

This folder contains all your project data.

/data/.software

This folder contains ICA-related software.

Bench Lifecycle

Workspace Lifecycle

When a bench workspace is instantiated from your selected bench image, the following script is invoked: /usr/local/bin/ica_start.sh

This script needs to be available and executable otherwise your workspace will not boot.

This script can be used to invoke other scripts.

Troubleshooting

Build Argument

If you get the error "docker buildx build" requires exactly 1 argument when trying to build your docker image, then a possible cause is missing the last . of the command.

Server Connection Error

When you stop the workspace when users are still actively using it, they will receive a message showing a Server Connection Error.

JupyterLab

The JupyterLab docker image contains the following environment variables:

  • ICA_URL set to the ICA server URL https://ica.illumina.com/ica

  • ICA_PROJECT (OBSOLETE) set to the current ICA project ID

  • ICA_PROJECT_UUID set to the current ICA project UUID

  • ICA_SNOWFLAKE_ACCOUNT set to the ICA Snowflake (Base) Account ID

  • ICA_SNOWFLAKE_DATABASE set to the ICA Snowflake (Base) Database ID

  • ICA_PROJECT_TENANT_NAME set to the tenant name of the owning tenant of the project where the workspace is created

  • ICA_STARTING_USER_TENANT_NAME set to the tenant name of the tenant of the user which last started the workspace

  • ICA_COHORTS_URL set to the URL of the Cohorts web application used to support the Cohorts view

To export data from your workspace to your local machine, it is best practice to move the data in your workspace to the /data/project/ folder so that it becomes available in your project under projects > your_project > Data.

ICA Python Library

The ICA Python library API documentation can be found in folder /etc/ica/data/ica_v2_api_docs within the JupyterLab docker image.

Updating an Existing Flow Pipeline

Introduction

This tutorial shows you how to

Preparation

Make sure you have access in ICA Flow to:

  • the pipeline you want to work with

  • an analysis exercising this pipeline, preferably with a short execution time, to use as validation test

Start Bench Workspace

For this tutorial, the instance size depends on the flow you import, and whether you use a Bench cluster:

  • When using a cluster, choose standard-small or standard-medium for the workspace master node

  • Otherwise, choose at least standard-large if you re-import a pipeline that originally came from nf-core, as they typically need 4 or more CPUs to run.

  • Select the "single user workspace" permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines

  • Specify at least 100GB of disk space

  • Optional: After choosing the image, enable a cluster with at least one standard-large instance type.

  • Start the workspace, then (if applicable) also start the cluster

Import Existing Pipeline and Analysis to Bench

mkdir demo-flow-dev
cd demo-flow-dev
 
pipeline-dev import-from-flow
 or
pipeline-dev import-from-flow --analysis-id=9415d7ff-1757-4e74-97d1-86b47b29fb8f

The starting point is the analysis id that is used as pipeline validation test (the pipeline id is obtained from the analysis metadata).

If no --analysis-id is provided, the tool lists all the successfull analyses in the current project and lets the developer pick one.

  • If conda and/or nextflow are not installed, pipeline-dev will offer to install them.

  • A folder called imported-flow-analysis is created.

  • Pipeline Nextflow assets are downloaded into the nextflow-src sub-folder.

  • Pipeline input form and associated javascript are downloaded into the ica-flow-config sub-folder.

  • Analysis input specs are downloaded to the ica-flow-config/launchPayload_inputFormValues.json file.

  • The analysis inputs are converted into a "test" profile for Nextflow, stored - among other items - in nextflow_bench.conf

Results

Enter the number of the entry you want to use: 21
Fetching analysis 9415d7ff-1757-4e74-97d1-86b47b29fb8f ...
Fetching pipeline bb47d612-5906-4d5a-922e-541262c966df ...
Fetching pipeline files... main.nf
Fetching test inputs
New Json inputs detected
Resolving test input ids to /data/mounts/project paths
Fetching input form..
Pipeline "GWAS pipeline_1.
_2_1_20241215_130117" successfully imported.
pipeline name: GWAS pipeline_1_2_1_20241215_130117 
analysis name: Test GWAS pipeline_1_2_1_20241215_130117 
pipeline id : bb47d612-5906-4d5a-922e-541262c966df
analysis id : 9415d7ff-1757-4e74-97d1-86b47b29fb8f
Suggested actions:
pipeline-dev run-in-bench 
I Iterative dev: Make code changes + re-validate with previous command ] 
pipeline-dev deploy-as-flow-pipeline
pipeline-dev run-in-flow

Run Validation Test in Bench

The following command runs this test profile. If a Bench cluster is active, it runs on your Bench cluster, otherwise it runs on the main workspace instance:

cd imported-flow-analysis
pipeline-dev run-in-bench

The pipeline-dev tool is using "nextflow run ..." to run the pipeline. The full nextflow command is printed on stdout and can be copy-pasted+adjusted if you need additional options.

Monitoring

When a pipeline is running on your Bench cluster, a few commands help to monitor the tasks and cluster. In another terminal, you can use:

  • qstat to see the tasks being pending or running

  • tail /data/logs/sge-scaler.log.<latest available workspace reboot time> to check if the cluster is scaling up or down (it currently takes 3 to 5 minutes to get a new node)

/data/demo $ tail /data/logs/sge-scaler.log.*
2025-02-10 18:27:19,657 - SGEScaler - INFO: SGE Marked Overview - {'UNKNOWN': O, 'DEAD': O, 'IDLE': O, 'DISABLED': O, 'DELETED': O, 'UNRESPONSIVE': 0}
2025-02-10 18:27:19,657 - SGEScaler - INFO: Job Status - Active jobs : 0, Pending jobs : 6
2025-02-10 18:27:26,291 - SGEScaler - INFO: Cluster Status - State: Transitioning,
Online Members: 0, Offline Members: 2, Requested Members: 2, Min Members: 0, Max Members: 2

Data Locations

  • The output of the pipeline is in the outdir folder

  • Nextflow work files are under the work folder

  • Log files are .nextflow.log* and output.log

Modify Pipeline

Nextflow files (located in the nextflow-src folder) are easy to modify. Depending on your environment (ssh access / docker image with JupyterLab or VNC, with and without Visual Studio code), various source code editors can be used.

code nextflow-src # Open in Visual Studio Code
code .            # Open current dir in Visual Studio Code
vi nextflow-src/main.nf

After modifying the source code, you can run a validation iteration with the same command as before:

pipeline-dev run-in-bench

Identify Docker Image

Modifying the Docker image is the next step.

Nextflow (and ICA) allow the Docker images to be specified at different places:

  • in config files such as nextflow-src/nextflow.config

  • in nextflow code files:

/data/demo-flow-dev $ head nextflow-src/main.nf
nextflow.enable.dsl = 2
process top_level_process t
container 'docker.io/ljanin/gwas-pipeline:1.2.1'

grep container may help locate the correct files:

Docker Image Update: Dockerfile Method

Use case: Update some of the software (mimalloc) by compiling a new version

IMAGE_BEFORE=docker.io/ljanin/gwas-pipeline:1.2.1
IMAGE_AFTER=docker.io/ljanin/gwas-pipeline:tmpdemo
 
# Create directory for Dockerfile
mkdir dirForDockerfile
cd dirForDockerfile

# Create Dockerfile
cat <<EOF > Dockerfile
FROM ${IMAGE_BEFORE}
RUN mkdir /mimalloc-compile \
 && cd /mimalloc-compile \
 && git clone -b v2.0.6 https://github.com/microsoft/mimalloc \
 && mkdir -p mimalloc/out/release \
 && cd mimalloc/out/release \
 && cmake ../.. \
 && make \
 && make install \
 && cd / \
 && rm -rf mimalloc-compile
EOF

# Build image
docker build -t ${IMAGE_AFTER} .

With the appropriate permissions, you can then "docker login" and "docker push" the new image.

Docker Image Update: Interactive Method

IMAGE_BEFORE=docker.io/ljanin/gwas-pipeline:1.2.1
IMAGE_AFTER=docker.io/ljanin/gwas-pipeline:1.2.2
docker run -it --rm ${IMAGE_BEFORE} bash
 
# Make some modifications
vi /scripts/plot_manhattan.py
<Fix "manhatten.png" into "manhattAn.png">
<Enter :wq to save and quit vi>
<Start another terminal (try Ctrl+Shift+T if using wezterm)>
# Identify container id
# Save container changes into new image layer
CONTAINER_ID=c18670335247
docker commit ${CONTAINER_ID} ${IMAGE_AFTER}

With the appropriate permissions, you can then "docker login" and "docker push" the new image.

Fun fact: VScode with the "Dev Containers" extension lets you edit the files inside your running container:

Beware that this extension creates a lot of temp files in /tmp and in $HOME/.vscode-server. Don't include them in your image...

Update the nextflow code and/or configs to use the new image

sed --in-place "s/${IMAGE_BEFORE}/${IMAGE_AFTER}/" nextflow-src/main.nf

Validate your changes in Bench:

pipeline-dev run-in-bench

Deploy as Flow Pipeline

pipeline-dev deploy-as-flow-pipeline

After generating a few ICA-specific files (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name, so that's what is checked here).

It then asks if we want to update the latest version or create a new one.

Choice: 2
Creating ICA Flow pipeline dev-nf-core-demo_v4
Sending inputForm.json
Sending onRender.js
Sending main.nf
Sending nextflow.config

At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.

Result

/data/demo $ pipeline-dev deploy-as-flow-pipeline

Generating ICA input specs...
Extracting nf-core test inputs...
Deploying project nf-core/demo
- Currently being developed as: dev-nf-core-demo
- Last version updated in ICA:  dev-nf-core-demo_v3
- Next suggested version:       dev-nf-core-demo_v4

How would you like to deploy?
1. Update dev-nf-core-demo (current version)
2. Create dev-nf-core-demo_v4
3. Enter new name
4. Update dev-nf-core-demo_v3 (latest version updated in ICA)
Sending docs/images/nf-core-demo-subway.svg
Sending docs/images/nf-core-demo_logo_dark.png
Sending docs/images/nf-core-demo_logo_light.png
Sending docs/images/nf-core-demo-subway.png
Sending docs/README. md
Sending docs/output.md

Pipeline successfully deployed
- Id : 26bc5aa5-0218-4e79-8a63-ee92954c6cd9
- URL: https://stage.v2.stratus.illumina.com/ica/projects/1873043/pipelines/26bc5aa5-0218-4e79-8a63-ee92954C6cd9

Suggested actions:
  pipeline-dev run-in-flow

Run Validation Test in Flow

pipeline-dev launch-validation-in-flow

This launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.

Some of the input files will have been copied to your ICA project to allow the launch to take place. They are stored in the folder /data/project/bench-pipeline-dev/temp-data.

Result

/data/demo $ pipeline-dev launch-validation-in-flow

pipelineld: 26bc5aa5-0218-4e79-8a63-ee92954c6cd9
Getting Analysis Storage Id
Launching as ICA Flow Analysis...

ICA Analysis created:
- Name: Test dev-nf-core-demo_v4
- Id:   cadcee73-d975-435d-b321-5d60e9aec1ec
- Url:   https://stage.v2.stratus.illumina.com/ica/projects/1873043/analyses/cadcee73-d975-435d-b321-5d60e9aec1ec

Workspaces

The main concept in Bench is the Workspace. A workspace is an instance of a Docker image that runs the framework which is defined in the image (for example JupyterLab, R Studio). In this workspace, you can write and run code and graphically represent data. You can use API calls to access data, analyses, Base tables and queries in the platform. Via the command line, R-packages, tools, libraries, IGV browsers, widgets, etc. can be installed.

You can create multiple workspaces within a project and each workspace runs on an individual node and is available in different resource sizes. Each node has local storage capacity, where files and results can be temporarily stored and exported from to be permanently stored in a Project. The size of the storage capacity can range from 1GB – 16TB.

For each workspace, you can see the status by the colour red is stopped, orange is started and green is running.

Create new Workspace

If this is the first time you are using a workspace in a Project, click Enable to create new Bench Workspaces. In order to use Bench, you first need to have a workspace. This workspace determines which docker image will be used with which node and storage size.

  1. Click Projects > Your_Project > Bench > Workspaces > + Create Workspace

  2. Complete the following fields:

    • Name: (required) must be a unique name.

    • Docker image: (required) The list of docker images includes base images from ICA and images uploaded to the docker repository for that domain.

    • Storage size (GB): (required) Represents the size of the storage available on the workspace. A storage from 10GB to 64TB can be provided.

    • Description: A place to provide additional information about the workspace.

      • Web allows to interact with the workspace via a browser.

      • Console provides a terminal to interact with the workspace.

    • Internet Access: (required) Type of access to the internet which should be provided for this workspace

      • Open: Internet access is allowed

      • Restricted: Creates a workspace with no internet access. Access to the ICA Project Data is still available in this mode.

        • Whitelisted URLs: Specify URLs and paths that are allowed in a restricted workspace. Separate URLS with a new line. Only domains and subdomains in the specified URL will be allowed.

        • URLs must comply with the following:

          • URLs can be between 1 and 263 characters including dot (.).

          • URLs can begin with a leading dot (.).

          • Domain and Sub-domains:

            • Can include alphanumeric characters (Letters A-Z and digits 0-9). Case insensitive.

            • Can contain hyphens (-) and underscores (_), but not as a first or last character.

            • Length between 1 and 63 characters.

          • Dot (.) must be placed after a domain or sub-domain.

          • Note that if you use a trailing slash like in the path ftp.example.net/folder/ then you will not be able to access the path ftp.example.net/folder without the trailing slash included.

          • Regex for URL : [(http(s)?):\/\/(www\.)?a-zA-Z0-9@:%._\+~#=-]\{2,256}\.[a-z]\{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)

        • Accepted Example URLs:

        example.com www.example.com https://www.example.com subdomain.example.com subdomain.example.com/folder subdomain.example.com/folder/subfolder sub-domain.example.com sub_domain.example.com example.co.uk subdomain.example.co.uk sub-domain.example.co.uk\

        • Example data science specific whitelist compatible with restricted Bench workspaces. Note there are two required URLs to allow for Python pip installs:\

        pypi.org files.pythonhosted.org repo.anaconda.com conda.anaconda.org github.com cran.r-project.org bioconductor.org www.npmjs.com mvnrepository.com\

      • Access limited to workspace owner. When this field is selected, only the workspace owner can access the workspace. Everything created in that workspace will belong to the workspace owner.

      • Download/Upload allowed

      • Project/Flow/Base access

  3. Click “Save”

The workspace can be edited afterwards when it is stopped, on the Details tab within the workspace. The changes will be applied when the workspace is restarted.

Workspace permissions

  • Access limited to workspace owner. When this field is selected, only the workspace owner can access the workspace. Everything created in that workspace will belong to the workspace owner.

  • Bench administrators are able to create, edit and delete workspaces and start and stop workspaces. If their permissions match or exceed those of the workspace, they can also access the workspace contents.

  • Contributors are able to start and stop workspaces and if their permissions match or exceed those of the workspace, they can also access the workspace contents.

create/edit
delete
start/stop
access contents

Contributor

-

-

X

when permissions match those of the workspace

Administrator

X

X

X

when permissions match those of the workspace

For security reasons, the Tenant administrator and Project owner can always access the workspace.

If one of your permissions is not high enough as bench contributor, you will see the following message "You are not allowed to use this workspace as your user permissions are not sufficient compared to the permissions of this workspace".

The permissions that a Bench workspace can receive are the following:

  • Upload rights

  • Download rights (required)

  • Project (No Access - Dataprovider - Viewer - Contributor)

  • Flow (No Access - Viewer - Contributor)

  • Base (No Access - Viewer - Contributor)

Based on these permissions, you will be able to upload or download data to your ICA project (upload and download rights) and will be allowed to take actions in the Project, Flow and Base modules related to the granted permission.

If you encounter issues when uploading/downloading data in a workspace, the security settings for that workspace may be set to not allow uploads and downloads. This can result in RequestError: send request failed and read: connection reset by peer. This is by design in restricted workspaces and thus limits data access to your project via /data/project to prevent the extraction of large amounts of (proprietary) data.

Workspaces which were created before this functionality existed can be upgraded by enabling these workspace permissions. If the workspaces are not upgraded, they will continue working as before.

Delete workspace (Bench Administrators Only)

To delete a workspace, go to Projects > your_project > Bench > Workspaces > your_workspace and click “Delete”. Note that the delete option is only available when the workspace is stopped.

The workspace will not be accessible anymore, nor will it be shown in the list of workspaces. The content of it will be deleted so if there is any information that should be kept, you can either put it in a docker image which you can use to start from next time, or export it using the API.

Use workspace

The workspace is not always accessible. It needs to be started before it can be used. From the moment a workspace is Running, a node with a specific capacity is assigned to this workspace. From that moment on, you can start working in your workspace.

As long as the workspace is running, the resources provided for this workspace will be charged.

Start workspace

To start the workspace, follow the next steps:

  1. Go to Projects > your_project > Bench > Workspaces > your_workspace > Details

  2. Click on Start Workspace button

  3. On the top of the details tab, the status changes to “Starting”. When you click on the >_Access tab, the message “The workspace is starting” appears.

  4. Wait until the status is “Running” and the “Access” tab can be opened. This can take some time because the necessary resources have to be provisioned.

You can refresh the workspace status by selecting the round refresh symbol at the top right.

If you want to open a running workspace in a new tab, then select the link at Projects > your_project > Bench > Workspaces > Details tab > Access. You can also copy the link with the copy symbol in front of the link.

Stop workspace

When you exit a workspace, you can choose to stop the workspace or keep it running. Keeping the workspace running means that it will continue to use resources and incur associated costs. To stop the workspace, select stop in the displayed dialog. You can also stop a workspace by opening it and selecting stop at the top right. If you choose to keep it running, the workspace will be stopped if it is not accessed for more than 7 days to avoid unnecessary costs.

Stopping the workspace will stop the notebook, but will not delete local data. Content will no longer be accessible and no actions can be performed until it is restarted. Any work that has been saved will stay stored.

Storage will continue to be charged until the workspace is deleted. Administrators have a delete option for the workspace in the exit screen.

The project/tenant administrator can enter and stop workspaces for their project/tenant even if they did not start those workspaces at Projects > your_project > Bench > Workspaces > your_workspace > Details. Be careful not to stop workspaces that are processing data. For security reasons, a log entry is added when a project/tenant administrator enters and exits a workspace.

You can see who is using a workspace in the workspace list view.

Workspace actions

Access tab

Once the Workspace is running, the default applications are loaded. These are defined by the start script of the docker image.

The docker images provided by Illumina will load JupyterLab by default. It also contains Tutorial notebooks that can help you get started. Opening a new terminal can be done via the Launcher, + button above the folder structure.

Docker Builds tab (Bench Administrators only)

To ensure that packages (and other objects, including data) are permanently installed on a Bench image, a new Bench image needs to be created, using the BUILD option in Bench. A new image can only be derived from an existing one. The build process uses the DOCKERFILE method, where an existing image is the starting point for the new Docker Image (The FROM directive), and any new or updated packages are additive (they are added as new layers to the existing Docker file).

NOTE: The Dockerfile commands are all run as ROOT, so it is possible to delete or interfere with an image in such a way that the image is no longer running correctly. The image does not have access to any underlying parts of the platform so will not be able to harm the platform, but inoperable Bench images will have to be deleted or corrected.

In order to create a derived image, open up the image that you would like to use as the basis and select the Build tab.

  • Name: By default, this is the same name as the original image and it is recommended to change the name.

  • Version: Required field which can by any value.

  • Description: The description for your docker image (for example, indicating which apps it contains).

  • Code: The Docker file commands must be provided in this section.

The first 4 lines of the Docker file must NOT be edited. It is not possible to start a docker file with a different FROM directive. The main docker file commands are RUN and COPY. More information on them is available in the official Docker documentation.

Once all information is present, click the Build button. Note that the build process can take a while. Once building has completed, the docker image will be available on the Data page within the Project. If the build has failed, the log will be displayed here and the log file will be in the Data list.

Tools (Bench Administrators Only)

From within the workspace it is possible to create a docker image and a tool from it at the same time.

  1. Click the Manage > Create CWL Tool button in the top right corner of the workspace.

  2. Give the tool a name.

  3. Replace the description of the tool to describe what it does.

  4. Add a version number for the tool.

  5. Click the Image tab.

    • Here the image that accompanies the tool will be created.

    • Change the name for the image.

    • Change the version.

    • Replace the description to describe what the image does.

    • Below the line where it says “#Add your commands below.” write the code necessary for running this docker image.

  6. Click the Save button in the upper, right-hand corner to start the build process.

The building can take a while. When it has completed, the tool will be available in the Tool Repository.

Workspace Data

To export data from your workspace to your local machine, it is best practice to move the data in your workspace to the /data/project/ folder so that it becomes available in your project under projects > your_project > Data. Although this storage is slow, it offers read and write access and access to the content from within ICA.

Every workspace you start has a read-only /data/.software/ folder which contains the icav2 command-line interface (and readme file).

Activity tab

The last tab of the workspace is the activity tab. On this tab all actions performed in the workspace are shown. For example, the creation of the workspace, starting or stopping of the workspace,etc. The activities are shown with their date, the user that performed the action and the description of the action. This page can be used to check how long the workspace has run.

In the general Activity page of the project, there is also a Bench activity tab. This shows all activities performed in all workspaces within the project, even when the workspace has been deleted. The Activity tab in the workspace only shows the action performed in that workspace. The information shown is the same as per workspace, except that here the workspace in which the action is performed is listed as well.

Containers in Bench

Bench has the ability to handle containers inside a running workspace. This allows you to install and package software more easily as a container image and provides capabilities to pull and run containers inside a workspace.

Bench offers a container runtime as a service in your running workspace. This allows you to do standardized container operations such as pulling in images from public and private registries, build containers at runtime from a Dockerfile, run containers and eventually publish your container to a registry of choice to be used in different ICA products such as ICA Flow.

Setup

The Container Service is accessible from your Bench workspace environment by default.

The container service uses the workspace disk to store any container images you pulled in or created.

To interact with the Container Service, a container remote client CLI is exposed automatically in the /data/.local/bin folder. The Bench workspace environment is preconfigured to automatically detect where the Container Service is made available using environment variables. These environment variables are automatically injected into your environment and are not determined by the Bench Workspace Image.

Container Management

Use either docker or podman cli to interact with the Container Service. Both are interchangeable and support all the standardized operations commonly known.

Pulling a Container Image

To run a container, the first step is to either build a container from a source container or pull in a container from a registry

Public Registry

A public image registry does not require any form of authentication to pull the container layers.

The following command line example shows how to pull in a commonly known image.

The Container Service uses Dockerhub by default to pull images from if no registry hostname is defined in the container image URI.

# Pull Container image from Dockerhub 
/data $ docker pull alpine:latest  

Private Registry

To pull images from a private registry, the Container Service needs to authenticate to the Private Registry.

The following command line example shows how to instruct the Container Service to login into the Private registry.hub.docker.com registry

# Pull a Container Image from Dockerhub 
/data $ docker login -u <username> registry.hub.docker.com 
Password:  
Login Succeeded! 
/data $ docker pull registry.hub.docker.com/<privateContainerUri>:<tag> 

Depending on your authorisations in the private registry you will be able to pull and push images. These authorisations are managed outside of the scope of ICA.

Pushing a Container Image

The following command line example shows how to publish a locally available Container Image to a private registry in Dockerhub.

# Push a Container Image to a Private registry in Dockerhub 
/data $ docker pull alpine:latest 
/data $ docker tag alpine:latest registry.hub.docker.com/<privateContainerUri>:<tag> 
/data $ docker push registry.hub.docker.com/<privateContainerUri>:<tag> 

Saving a Container Image as an Archive

The following example shows how to save a locally available Container Image as a compressed tar archive.

# Save a Container Image as a compressed archive 
/data $ docker pull alpine:latest 
/data $ docker save alpine:latest | bzip2 > /data/alpine_latest.tar.bz2 

Listing Locally Available Container Images

The following example shows how to list all locally available Container Images

# List all local available images 
/data $ docker images 
REPOSITORY                TAG         IMAGE ID      CREATED      SIZE 
docker.io/library/alpine  latest      aded1e1a5b37  3 weeks ago  8.13 MB 

Deleting a Container Image

Container Images require storage capacity on the Bench Workspace disk. The capacity is shown when listing the locally available container images. The container Images are persisted on disk and remain available whenever a workspace stops and restarts.

The following example shows how to clean up a locally available Container Image

# Remove a locally available image 
/data $ docker rmi alpine:latest 

When a Container Image has multiple tags, all the tags need to be removed individually to free up disk capacity.

Running a Container

A Container Image can be instantiated in a Container running inside a Bench Workspace.

By default the workspace disk (/data) will be made available inside the running Container. This lets you to access data from the workspace environment.

When running a Container, the default user defined in the Container Image manifest will be used and mapped to the uid and the gid of the user in the running Bench Workspace (uid:1000, gid: 100). This will ensure files created inside the running container on the workspace disk will have the same file ownership permissions.

Run a Container as a normal user

The following command line example shows how to run an instance a locally available Container Image as a normal user

# Run a Container as a normal user 
/data $ docker run -it --rm alpine:latest 
~ $ id 
uid=1000(ica) gid=100(users) groups=100(users)  

Run a Container as root user

Running a Container as root user maps the uid and gid inside the running Container to the running non-root user in the Bench Workspace. This lets you act as user with uid 0 and gid 0 inside the context of the container.

By enabling this functionality, you can install system level packages inside the context of the Container. This can be leveraged to run tools that require additional system level packages at runtime.

The following command line example shows how to run an instance of a locally available Container as root user and install system level packages

# Run a Container as root user 
/data $ docker run -it --rm --userns keep-id:uid=0,gid=0 --user 0:0 alpine:latest 
/ # id 
uid=0(root) gid=0(root) groups=0(root) 
/ # apk add rsync 
... 
/ # rsync  
rsync  version 3.4.0  protocol version 32 
... 

When no specific mapping is defined using the --userns flag, the user in the running Container user will be mapped to an undefined uid and gid based on an offset of id 100000. Files created in your workspace disk from the running Container will also use this uid and gid to define the ownership of the file.

# Run a Container as a non-mapped root user 
/data $ docker run -it --rm --user 0:0 alpine:latest 
/ # id 
uid=0(root) gid=0(root) groups=100(users),0(root) 
/ # touch /data/myfile 
/ #  
# Exited the running Container back to the shell in the running Bench Workspace 
/data $ ls -al /data/myfile  
-rw-r--r-- 1 100000 100000 0 Mar 13 08:27 /data/myfile 

Building a Container

To build a Container Image, you need to describe the instructions in a Dockerfile.

This next example builds a local Container Image and tags it as myimage:1.0 The Dockerfile used in this example is

FROM alpine:latest 
RUN apk add rsync 
COPY myfile /root/myfile 

The following command line example will build the actual Container Image

# Build a Container image locally 
/data $ mkdir /tmp/buildContext 
/data $ touch /tmp/buildContext/myFile 
/data $ docker build -f /tmp/Dockerfile -t myimage:1.0 /tmp/buildContext 
... 
/data $ docker images 
REPOSITORY                TAG         IMAGE ID      CREATED             SIZE 
docker.io/library/alpine  latest      aded1e1a5b37  3 weeks ago         8.13 MB 
localhost/myimage         1.0         06ef92e7544f  About a minute ago  12.1 MB 

When defining the build context location, keep in mind that using the HOME folder (/data) will index all files available in /data, which can be a lot and will slow down the process of building. Hence the reason to use a minimal build context whenever possible.

Bench Command Line Interface

Command Index

The following is a list of available bench CLI commands and thier options.

workspace-ctl

Usage:
  workspace-ctl [flags]
  workspace-ctl [command]

Available Commands:
  completion  Generate completion script
  compute     
  data        
  help        Help about any command
  software    
  workspace   

Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
  -h, --help               help for workspace-ctl
      --help-tree          
      --help-verbose       
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

Use "workspace-ctl [command] --help" for more information about a command.

workspace-ctl completion

cmd execute error:  accepts 1 arg(s), received 0

workspace-ctl compute

Usage:
  workspace-ctl compute [flags]
  workspace-ctl compute [command]

Available Commands:
  get-cluster-details 
  get-logs            
  get-pools           
  scale-pool          

Flags:
  -h, --help           help for compute
      --help-tree      
      --help-verbose

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

Use "workspace-ctl compute [command] --help" for more information about a command.

workspace-ctl compute get-cluster-details

Usage:
  workspace-ctl compute get-cluster-details [flags]

Flags:
  -h, --help           help for get-cluster-details
      --help-tree      
      --help-verbose

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

workspace-ctl compute get-logs

Usage:
  workspace-ctl compute get-logs [flags]

Flags:
  -h, --help           help for get-logs
      --help-tree      
      --help-verbose

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

workspace-ctl compute get-pools

Usage:
  workspace-ctl compute get-pools [flags]

Flags:
      --cluster-id string   Required. Cluster ID
  -h, --help                help for get-pools
      --help-tree           
      --help-verbose

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

workspace-ctl compute scale-pool

Usage:
  workspace-ctl compute scale-pool [flags]

Flags:
      --cluster-id string       Required. Cluster ID
  -h, --help                    help for scale-pool
      --help-tree               
      --help-verbose            
      --pool-id string          Required. Pool ID
      --pool-member-count int   Required. New pool size

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

workspace-ctl data

Usage:
  workspace-ctl data [flags]
  workspace-ctl data [command]

Available Commands:
  create-mount Create a data mount under /data/mounts. Return newly created mount.
  delete-mount Delete a data mount
  get-mounts   Returns the list of data mounts

Flags:
  -h, --help           help for data
      --help-tree      
      --help-verbose

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

Use "workspace-ctl data [command] --help" for more information about a command.

workspace-ctl data create-mount

Create a data mount under /data/mounts. Return newly created mount.

Usage:
  workspace-ctl data create-mount [flags]

Aliases:
  create-mount, mount

Flags:
  -h, --help                help for create-mount
      --help-tree           Display commands as a tree
      --help-verbose        Extended help topics and options
      --mode string         Enum:["read-only","read-write"]. Mount mode i.e. read-only, read-write
      --mount-path string   Where to mount the data, e.g. /data/mounts/hg38data (or simply hg38data)
      --source string       Required. Source data location, e.g. /data/project/myData/hg38 or fol.bc53010dec124817f6fd08da4cf3c48a (ICA folder id)
      --wait                Wait for new mount to be available on all nodes before sending response
      --wait-timeout int    Max number of seconds for wait option. Absolute max: 300 (default 300)

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

workspace-ctl data delete-mount

Delete a data mount

Usage:
  workspace-ctl data delete-mount [flags]

Aliases:
  delete-mount, unmount

Flags:
  -h, --help                help for delete-mount
      --help-tree           
      --help-verbose        
      --id string           Id of mount to remove
      --mount-path string   Path of mount to remove

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

workspace-ctl data get-mounts

Returns the list of data mounts

Usage:
  workspace-ctl data get-mounts [flags]

Aliases:
  get-mounts, list-mounts

Flags:
  -h, --help           help for get-mounts
      --help-tree      
      --help-verbose

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

workspace-ctl help

Usage:
  workspace-ctl [flags]
  workspace-ctl [command]

Available Commands:
  completion  Generate completion script
  compute     
  data        
  help        Help about any command
  software    
  workspace   

Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
  -h, --help               help for workspace-ctl
      --help-tree          
      --help-verbose       
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

Use "workspace-ctl [command] --help" for more information about a command.

workspace-ctl help completion

To load completions:

Bash:

  $ source <(yourprogram completion bash)

  # To load completions for each session, execute once:
  # Linux:
  $ yourprogram completion bash > /etc/bash_completion.d/yourprogram
  # macOS:
  $ yourprogram completion bash > /usr/local/etc/bash_completion.d/yourprogram

Zsh:

  # If shell completion is not already enabled in your environment,
  # you will need to enable it.  You can execute the following once:

  $ echo "autoload -U compinit; compinit" >> ~/.zshrc

  # To load completions for each session, execute once:
  $ yourprogram completion zsh > "${fpath[1]}/_yourprogram"

  # You will need to start a new shell for this setup to take effect.

fish:

  $ yourprogram completion fish | source

  # To load completions for each session, execute once:
  $ yourprogram completion fish > ~/.config/fish/completions/yourprogram.fish

PowerShell:

  PS> yourprogram completion powershell | Out-String | Invoke-Expression

  # To load completions for every new session, run:
  PS> yourprogram completion powershell > yourprogram.ps1
  # and source this file from your PowerShell profile.

Usage:
  workspace-ctl completion [bash|zsh|fish|powershell]

Flags:
  -h, --help   help for completion

workspace-ctl help compute

Usage:
  workspace-ctl compute [flags]
  workspace-ctl compute [command]

Available Commands:
  get-cluster-details 
  get-logs            
  get-pools           
  scale-pool          

Flags:
  -h, --help           help for compute
      --help-tree      
      --help-verbose

Use "workspace-ctl compute [command] --help" for more information about a command.

workspace-ctl help compute get-cluster-details

Usage:
  workspace-ctl compute get-cluster-details [flags]

Flags:
  -h, --help           help for get-cluster-details
      --help-tree      
      --help-verbose

workspace-ctl help compute get-logs

Usage:
  workspace-ctl compute get-logs [flags]

Flags:
  -h, --help           help for get-logs
      --help-tree      
      --help-verbose

workspace-ctl help compute get-pools

Usage:
  workspace-ctl compute get-pools [flags]

Flags:
      --cluster-id string   Required. Cluster ID
  -h, --help                help for get-pools
      --help-tree           
      --help-verbose

workspace-ctl help compute scale-pool

Usage:
  workspace-ctl compute scale-pool [flags]

Flags:
      --cluster-id string       Required. Cluster ID
  -h, --help                    help for scale-pool
      --help-tree               
      --help-verbose            
      --pool-id string          Required. Pool ID
      --pool-member-count int   Required. New pool size

workspace-ctl help data

Usage:
  workspace-ctl data [flags]
  workspace-ctl data [command]

Available Commands:
  create-mount Create a data mount under /data/mounts. Return newly created mount.
  delete-mount Delete a data mount
  get-mounts   Returns the list of data mounts

Flags:
  -h, --help           help for data
      --help-tree      
      --help-verbose

Use "workspace-ctl data [command] --help" for more information about a command.

workspace-ctl help data create-mount

Create a data mount under /data/mounts. Return newly created mount.

Usage:
  workspace-ctl data create-mount [flags]

Aliases:
  create-mount, mount

Flags:
  -h, --help                help for create-mount
      --help-tree           
      --help-verbose        
      --mount-path string   Where to mount the data, e.g. /data/mounts/hg38data (or simply hg38data)
      --source string       Required. Source data location, e.g. /data/project/myData/hg38 or fol.bc53010dec124817f6fd08da4cf3c48a (ICA folder id)
      --wait                Wait for new mount to be available on all nodes before sending response
      --wait-timeout int    Max number of seconds for wait option. Absolute max: 300 (default 300)

workspace-ctl help data delete-mount

Delete a data mount

Usage:
  workspace-ctl data delete-mount [flags]

Aliases:
  delete-mount, unmount

Flags:
  -h, --help                help for delete-mount
      --help-tree           
      --help-verbose        
      --id string           Id of mount to remove
      --mount-path string   Path of mount to remove

workspace-ctl help data get-mounts

Returns the list of data mounts

Usage:
  workspace-ctl data get-mounts [flags]

Aliases:
  get-mounts, list-mounts

Flags:
  -h, --help           help for get-mounts
      --help-tree      
      --help-verbose

workspace-ctl help help

Help provides help for any command in the application.
Simply type workspace-ctl help [path to command] for full details.

Usage:
  workspace-ctl help [command] [flags]

Flags:
  -h, --help   help for help

workspace-ctl help software

Usage:
  workspace-ctl software [flags]
  workspace-ctl software [command]

Available Commands:
  get-server-metadata   
  get-software-settings 

Flags:
  -h, --help           help for software
      --help-tree      
      --help-verbose

Use "workspace-ctl software [command] --help" for more information about a command.

workspace-ctl help software get-server-metadata

Usage:
  workspace-ctl software get-server-metadata [flags]

Flags:
  -h, --help           help for get-server-metadata
      --help-tree      
      --help-verbose

workspace-ctl help software get-software-settings

Usage:
  workspace-ctl software get-software-settings [flags]

Flags:
  -h, --help           help for get-software-settings
      --help-tree      
      --help-verbose

workspace-ctl help workspace

Usage:
  workspace-ctl workspace [flags]
  workspace-ctl workspace [command]

Available Commands:
  get-cluster-settings   
  get-connection-details 
  get-workspace-settings 

Flags:
  -h, --help           help for workspace
      --help-tree      
      --help-verbose

Use "workspace-ctl workspace [command] --help" for more information about a command.

workspace-ctl help workspace get-cluster-settings

Usage:
  workspace-ctl workspace get-cluster-settings [flags]

Flags:
  -h, --help           help for get-cluster-settings
      --help-tree      
      --help-verbose

workspace-ctl help workspace get-connection-details

Usage:
  workspace-ctl workspace get-connection-details [flags]

Flags:
  -h, --help           help for get-connection-details
      --help-tree      
      --help-verbose

workspace-ctl help workspace get-workspace-settings

Usage:
  workspace-ctl workspace get-workspace-settings [flags]

Flags:
  -h, --help           help for get-workspace-settings
      --help-tree      
      --help-verbose

workspace-ctl software

Usage:
  workspace-ctl software [flags]
  workspace-ctl software [command]

Available Commands:
  get-server-metadata   
  get-software-settings 

Flags:
  -h, --help           help for software
      --help-tree      
      --help-verbose

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

Use "workspace-ctl software [command] --help" for more information about a command.

workspace-ctl software get-server-metadata

Usage:
  workspace-ctl software get-server-metadata [flags]

Flags:
  -h, --help           help for get-server-metadata
      --help-tree      
      --help-verbose

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

workspace-ctl software get-software-settings

Usage:
  workspace-ctl software get-software-settings [flags]

Flags:
  -h, --help           help for get-software-settings
      --help-tree      
      --help-verbose

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

workspace-ctl workspace

Usage:
  workspace-ctl workspace [flags]
  workspace-ctl workspace [command]

Available Commands:
  get-cluster-settings   
  get-connection-details 
  get-workspace-settings 

Flags:
  -h, --help           help for workspace
      --help-tree      
      --help-verbose

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

Use "workspace-ctl workspace [command] --help" for more information about a command.

workspace-ctl workspace get-cluster-settings

Usage:
  workspace-ctl workspace get-cluster-settings [flags]

Flags:
  -h, --help           help for get-cluster-settings
      --help-tree      
      --help-verbose

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

workspace-ctl workspace get-connection-details

Usage:
  workspace-ctl workspace get-connection-details [flags]

Flags:
  -h, --help           help for get-connection-details
      --help-tree      
      --help-verbose

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

workspace-ctl workspace get-workspace-settings

Usage:
  workspace-ctl workspace get-workspace-settings [flags]

Flags:
  -h, --help           help for get-workspace-settings
      --help-tree      
      --help-verbose

Global Flags:
      --X-API-Key string   
      --base-path string   For example: / (default "/")
      --config string      config file path
      --debug              output debug logs
      --dry-run            do not send the request to server
      --hostname string    hostname of the service (default "api:8080")
      --print-curl         print curl equivalent do not send the request to server
      --scheme string      Choose from: [http] (default "http")

Prepare Metadata Sheets

In ICA Cohorts, metadata describe any subjects and samples imported into the system in terms of attributes, including:

  • subject:

    • demographics such as age, sex, ancestry;

    • phenotypes and diseases;

    • biometrics such as body height, body mass index, etc.;

    • pathological classification, tumor stages, etc.;

    • family and patient medical history;

  • sample:

    • sample type such as FFPE,

    • tissue type,

    • sequencing technology: whole genome DNA-sequencing, RNAseq, single-cell RNAseq, among others.

A metadata sheet will need to contain at least these four columns per row:

  • Subject ID - identifier referring to individuals; use the column header "SubjectID".

  • Sample ID - identifier for a sample. Sample IDs need to match the corresponding column header in VCF/GVCFs; each subject can have multiple samples, these need to be specified in individual rows for the same SubjectID; use the column header "SampleID".

  • Biological sex - can be "Female (XX)", "Female"; "Male (XY)", "Male"; "X (Turner's)"; "XXY (Klinefelter)"; "XYY"; "XXXY" or "Not provided". Use the column header "DM_Sex" (demographics).

  • Sequencing technology - can be "Whole genome sequencing", "Whole exome sequencing", "Targeted sequencing panels", or "RNA-seq"; use the column header "TC" (technology).

Cohorts

Introduction to Cohorts

ICA Cohorts is a cohort analysis tool integrated with Illumina Connected Analytics (ICA). ICA Cohorts combines subject- and sample-level metadata, such as phenotypes, diseases, demographics, and biometrics, with molecular data stored in ICA to perform tertiary analyses on selected subsets of individuals.

Overview Video

Features At-a-glance

  • Intuitive UI for selecting subjects and samples to analyze and compare: deep phenotypical and clinical metadata, molecular features including germline, somatic, gene expression.

  • Comprehensive, harmonized data model exposed to ICA Base and ICA Bench users for custom analyses.

  • Run analyses in ICA Base and ICA Bench and upload final results back into Cohorts for visualization.

  • Out-of-the-box statistical analyses including genetic burden tests, GWAS/PheWAS.

  • Rich public data sets covering key disease areas to enrich private data analysis.

  • Easy-to-use visualizations for gene prioritization and genetic variation inspection.

Functionality

Walk-throughs

Public Data Sets

Pipeline Development in Bench (Experimental)

Introduction

The Pipeline Development Kit in Bench makes it easy to create Nextflow pipelines for ICA Flow. This kit consists of a number of development tools which are installed in /data/.software (regardless of which Bench image is selected) and provides the following features:

  • Import to Bench

    • From public nf-core pipelines

    • From existing ICA Flow Nextflow pipelines

  • Run in Bench

  • Modify and re-run in Bench, providing fast development iterations

  • Deploy to Flow

  • Launch validation in Flow

Prerequisites

  • Recommended workspace size: Nf-core Nextflow pipelines typically require 4 or more cores to run.

  • The pipeline development tools require

    • Conda which is automatically installed by “pipeline-dev” if conda-miniconda.installer.ica-userspace.sh is present in the image.

    • Nextflow (version 24.10.2 is automatically installed using conda, or you can use other versions)

    • git (automatically installed using conda)

    • jq, curl (which should be made available in the image)

NextFlow Requirements / Best Practices

Pipeline development tools work best when the following items are defined:

  • Nextflow profiles:

    • test profile, specifying inputs appropriate for a validation run

    • docker profile, instructing NextFlow to use Docker

ICA Flow adds one additional constraint. The output directory out is the only one automatically copied to the Project data when an ICA Flow Analysis completes. The -outdir parameter recommended by nf-core should therefore be set to--outdir=out when running as a Flow pipeline.

Pipeline Development Tools

These are installed in /data/.software (which should be in your $PATH), the pipeline-dev script is the front-end to the other pipeline-dev-* tools.

Pipeline-dev fulfils a number of roles:

  • Checks that the environment contains the required tools (conda, nextflow, etc) and offers to install them if needed.

  • Checks that the fast data mounts are present (/data/mounts/project etc.) – it is useful to check regularly, as they get unmounted when a workspace is stopped and restarted.

  • Redirects stdout and stderr to .pipeline-dev.log, with the history of log files kept as .pipeline-dev.log.<log date>.

  • Launches the appropriate sub-tool.

  • Prints out errors with backtrace, to help report issues.


Usage

1) Starting a new Project

A pipeline-dev project relies on the following Folder structure, which is auto-generated when using the pipeline-dev import* tools.

If you start a project manually, you must follow the same folder structure.

  • Project base folder

    • nextflow-src: Platform-agnostic Nextflow code, for example the github contents of an nf-core pipeline, or your usual nextflow source code.

      • main.nf

      • nextflow.config

      • nextflow_schema.json

    • pipeline-dev.project-info: contains project name, description, etc.

    • nextflow-bench.config (automatically generated when needed): contains definitions for bench.

    • ica-flow-config: Directory of files used when deploying pipeline to Flow.

      • inputForm.json (if not present, gets generated from nextflow-src/nextflow_schema.json): input form as defined in ICA Flow.

      • onSubmit.js, onRender.js (optional, generated at the same time as inputForm.json): javascript code to go with the input form.

      • launchPayload_inputFormValues.json (if not present, gets generated from the test profile): used by “pipeline-dev launch-validation-in-flow”.

Pipeline Sources

A directory with the same name as the nextflow/nf-core pipeline is created, and the Nextflow files are pulled into the nextflow-src subdirectory.

A directory called imported-flow-analysis is created and the analysis+pipeline assets are downloaded.

Currently only pipelines with publicly available Docker images are supported. Pipelines with ICA-stored images are not yet supported.


2) Running in Bench

Optional parameters --local / --sge can be added to force the execution on the local workspace node, or on the workspace cluster (when available). Otherwise, the presence of a cluster is automatically detected and used.

The script then launches nextflow. The full nextflow command line is printed and launched.

In case of errors, full logs are saved as .pipeline-dev.log

Currently, not all corner cases are covered by command line options. Please start from the nextflow command printed by the tool and extend it based on your specific needs.

Output Example

Container (Docker) images

Nextflow can run processes with and without Docker images. In the context of pipeline development, the pipeline-dev tools assume Docker images are used, in particular during execution with the nextflow --profile docker.

In NextFlow, Docker images can be specified at the process level

  • This is done with the container "<image_name:version>" directive, which can be specified

    • in nextflow config files (preferred method when following the nf-core best practices)

    • or at the start of each process definition.

  • Each process can use a different docker image

  • It is highly recommended to always specify an image. If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.


3) Deploying to ICA Flow

This command does the following:

  1. Generate the JSON file describing the ICA Flow user interface.

    • If ica-flow-config/inputForm.json doesn’t exist: generate it from nextflow-src/nextflow_swagger.json .

  2. Generate the JSON file containing the validation launch inputs.

    • If ica-flow-config/launchPayload_inputFormValues.json doesn’t exist: generate it from nextflow --profile test inputs.

    • If local files are used as validation inputs or as default input values:

      • copy them to /data/project/pipeline-dev-files/temp .

      • get their ICA file ids.

      • use these file ids in the launch specifications.

    • If remote files are used as validation inputs or as default input values of an input of type “file” (and not “string”): do the same as above.

  3. Identify the pipeline name to use for this new pipeline deployment:

    • If a deployment has already occurred in this project, or if the project was imported from an existing Flow pipeline, start from this pipeline name. Otherwise start from the project name.

    • Identify which already-deployed pipelines have the same base name, with or without suffixes that could be some versioning (_v<number>, _<number>, _<date>) .

    • Ask the user if they prefer to update the current version of the pipeline, create a new version, or enter a new name of their choice – or use the --create/--update parameters when specified, for scripting without user interactions.

  4. New ICA Flow pipeline gets created (except in case of pipeline update) .

    • The current Nextflow version in Bench is used to select the best Nextflow version to be used in Flow

  5. nextflow-src folder is uploaded file by file as pipeline assets.

Output Example:

The pipeline name, id and URL are printed out, and if your environment allows, Ctrl+Click/Option+Click/Right click can open the URL in a browser.

Opening the URL of the pipeline and clicking on Start Analysis shows the generated user interface:


4) Launching Validation in Flow

The ica-flow-config/launchPayload_inputFormValues.json file generated in the previous step is submitted to ICA Flow to start an analysis with the same validation inputs as “nextflow --profile test”.

Output Example:

The analysis name, id and URL are printed out, and if your environment allows, Ctrl+Click/Option+Click/Right click can open the URL in a browser.


Tutorials

Import New Samples

Import New Samples

ICA Cohorts can pull any molecular data available in an ICA Project, as well as additional sample- and subject-level metadata information such as demographics, biometrics, sequencing technology, phenotypes, and diseases.

To import a new data set, select Import Jobs from the left navigation tab underneath Cohorts, and click the Import Files button. The Import Files button is also available under the Data Sets left navigation item.

The Data Set menu item is used to view imported data sets and information. The Import Jobs menu item is used to check the status of data set imports.

Confirm that the project shown is the ICA Project that contains the molecular data you would like to add to ICA Cohorts.

  1. Choose a data type among

    • Germline variants

    • Somatic mutations

    • RNAseq

    • GWAS

  2. Choose a new study name by selecting the radio button: Create new study and entering a Study Name.

  3. To add new data to an existing Study, select the radio button: Select from list of studies and select an existing Study Name from the dropdown.

  4. To add data to existing records or add new records, select Job Type, Append.

  5. Append does not wipe out any data ingested previously and can be used to ingest the molecular data in an incremental manner.

  6. To replace data, select Job Type, Replace. If you are ingesting data again, use the Replace job type.

  7. Enter an optional Study description.

  8. Select the metadata model (default: Cohorts; alternatively, select OMOP version 5.4 if your data is formatted that way.)

  9. Select the genome build your molecular data is aligned to (default: GRCh38/hg38)

  10. For RNAseq, specify whether you want to run differential expression (see below) or only upload raw TPM.

  11. Click Next.

  12. Navigate to VCFs located in the Project Data.

  13. Select each single-sample VCF or multi-sample VCF to ingest. For GWAS, select CSV files produced by Regenie.

    • As an alernative to selecting individual files, you can also opt to select a folder instead. Toggle the radio button on Step 2 from "Select files" to "Select folder".

    • This option is currently only available for germline variant ingestion: any combination of small variants, structural variation, and/or copy number variants.

    • ICA Cohorts will scan the selected folder and all sub-folders for any VCF files or JSON files and try to match them against the Sample ID column in the metadata TSV file (Step 3).

    • Files not matching sample IDs will be ignored; allowed file extensions for VCF files after the sample ID are: *.vcf.gz, *.hard-filtered.vcf.gz, *.cnv.vcf.gz, and *.sv.vcf.gz .

    • Files not matching sample IDs will be ignored; allowed file extensions for JSON files after the sample ID are: .json,.json.gz, *.json.bgz, *.json.gzip.

  14. Click Next.

  15. Navigate to the metadata (phenotype) data tsv in the project Data.

  16. Select the TSV file or files for ingestion.

  17. Click Finish.

Search Spinner behavior in input jobs table

  • Search a term and press ** Enter.

  • The search spinner will appear while the results are loading.

  • Once the results are displayed in the table, the spinner will disappear immediately

All VCF types, specifically from DRAGEN, can be ingested using the Germline variants selection. Cohorts will distinguish the variant types that it is ingesting. If Cohorts cannot determine the variant file type, it will default to ingest small variants.

Alternatively to VCFs, you can select Nirvana JSON files for DNA variants: small variants, structural variants, and copy number variation.

The maximum amount of files that can be part of a single manual ingestion batch is capped at 1000

Alternatively, users can choose a single folder and ICA Cohorts will identify all ingestible files within that folder and its sub-folders. In this scenario, cohorts will select molecular data files matching the samples listed in the metadata sheet which is the next step in the import process.

Users have the option to ingest either VCF files or Nirvana JSON files for any given batch, regardless of the chosen ingestion method.

The sample identifiers used in the VCF columns need to match the sample identifiers used in subject/sample metadata files; accordingly, if you are starting from JSON files containing variant- and gene-level annotations provided by ILMN Nirvana, the samples listed in the header need to match the metadata files.

Variant file formats

ICA Cohorts supports VCF files formatted according to VCF v4.2 and v4.3 specifications. VCF files require at least one of the following header rows to identify the genome build:

  • ##reference=file://... --- needs to contain a reference to hg38/GRCh38 in the file path or name (numerical value is sufficient)

  • ##contig=<ID=chr1,length=248956422> --- for hg38/GRCh38

  • ##DRAGENCommandLine= ... --ht-reference

ICA Cohorts accepts VCFs aligned to hg38/GRCh38 and hg19/GRCh37. If your data uses hg19/GRCh37 coordinates, Cohorts will convert these to hg38/GRCh38 during the ingestion process [see Reference 1]. Harmonizing data to one genome build facilitates searches across different private, shared, and public projects when building and analyzing a cohort. If your data contains a mixture of samples mapped to hg38 and hg19, please ingest these in separate batches, as each import job into Cohorts is limited to one genome build.

RNAseq file format

ICA Cohorts can process gene- and transcript-level quantification files produced by the Illumina DRAGEN RNA pipeline. The file naming convention needs to match .quant.genes.sf for genes; and .quant.sf for transcript-level TPM (transcripts per million.)

GWAS file format

Metadata and File Types

Note: If annotating large sets of samples with molecular data, expect the annotation process to take over 20 minutes per whole genome batch of samples. You will receive two e-mail notifications: once your ingestion starts and once completed successfully or failed.

  • PERSON (mandatory),

  • CONCEPT (mandatory if any of the following is provided),

  • CONDITION_OCCURRENCE (optional),

  • DRUG_EXPOSURE (optional), and

  • PROCEDURE_OCCURRENCE (optional.)

Additional files such as measurement and observation will be supported in a subsequent release of Cohorts.

Note that Cohorts requires that all such files do not deviate from the OMOP CDM 5.4 standard. Depending on your implementation, you may have to adjust file formatting to be OMOP CDM 5.4-compatible.

References

[1] VcfMapper: https://stratus-documentation-us-east-1-public.s3.amazonaws.com/downloads/cohorts/main_vcfmapper.py

[2] crossMap: https://crossmap.sourceforge.net/

[3] liftOver: https://genome.ucsc.edu/cgi-bin/hgLiftOver

Following some nf-core to make our source+test compatible with the pipeline-dev tools:

Note: For large pipelines, as described on the nf-core

Bench images are Docker containers tailored to run in ICA with the necessary permissions, configuration and resources. For more information of Docker images, please refer to

For easy reference, you can find examples of preconfigured Bench images on the which you can copy to your local machine and edit to suit your needs.

The following scripts must be part of your Docker bench image. Please refer to the examples from the for more details.

Execute docker build -f Dockerfile -t mybenchimage:0.0.1 . with mybenchimage being the name you want to give to your image and 0.0.1 replaced with the version number which you want your bench image to be. For more information on this command, see

For small Docker images, upload the docker image file which you generated in the previous step. For large Docker images use the to better performance and reliability to import the Docker image.

Fill in the bench workspace details according to .

The password needed for SSH access is any one of your personal

To execute , your workspace needs a way to run them such as the inclusion of an SSH daemon, be it integrated into your web access image or into your console access. There is no need to download the workspace command-line interface, you can run it from within the workspace.

This script is the main process in your running workspace and cannot run to completion as it will stop the workspace and instantiate a restart (see ).

When you stop a workspace, a TERM signal is sent to the main process in your bench workspace. You can trap this signal to handle the stop gracefully (see and shut down child processes of the main process. The workspace will be forcedly shut down after 30 seconds if your main process hasn’t stopped within the given period.

Bench workspaces require setting a docker image to use as the image for the workspace. Illumina Connected Analytics (ICA) provides a default docker image with installed.

JupyterLab supports (.ipynb). Notebook documents consist of a sequence of cells which may contain executable code, markdown, headers, and raw text.

Included in the default JupyterLab docker image is a python library with APIs to perform actions in ICA, such as add data, launch pipelines, and operate on Base tables. The python library is generated from the using .

See the for examples on using the ICA Python library.

with a supporting validation analysis

the execution

Iterative development: and validate in Bench

Modify code

Modify Docker image contents ( or method)

Resource model: (required) Size of the machine on which the workspace will run and whether or not the machine should contain a Graphics Processing Unit (GPU). See for available sizes.

Access: The options here are determined by the . The options you select will become available on the details tab of the Workspace when it is running.

Workspace Permissions: Your workspace will operate with these . For security reasons, users will need to have the permissions matching what you set at the following permissions to run the workspace, regardless of their role.

The determines if someone is an administrator or contributor, while the dedicated indicate what the workspace itself can and cannot do within your project. For this reason, the users need to meet or exceed the required permissions to enter this workspace and use it.

Click the General Tool tab. This tab and all next tabs will look familiar from Flow. Enter the information required for the tool in each of the tabs. For more detailed instruction check out the section in the Flow documentation.

For fast read-only access, link folders with the workspace-ctl data create-mount --mode read-only.

For fast read/write access, link which are visible, but whose contents are not accessible from ICA. Use the workspace-ctl data create-mount --mode read-write to do so. You can not have fast read-write access to indexed folders as the indexing mechanism on those would deteriorate the performance.

Depending on the Registry setup you can publish Container Images with or without authentication. If Authentication is required, follow the login procedure described in

This lets you upload the into the Private ICA Docker Registry.

Please refer to the examples from the for more details.

You can use these attributes while to define the cases and/or controls that you want to include.

During , you will be asked to upload a metadata sheet as a tab-delimited (TSV) file. An example sheet is available for download on the Import files page in the ICA Cohorts UI.

A description of all attributes and data types currently supported by ICA Cohorts can be found here:

You can download an example of a metadata sheet, which contains some samples from The Cancer Genome Atlas () and their publicly available clincal attributes, here:

A list of concepts and diagnoses that cover all public data subjects to easily navigate the new concept code browser for diagnosis can be found here:

This video is an overview of Illumina Connnected Analytics. It walks through a Multi-Omics Cancer workflow that can be found here:

ICA Cohorts contains a variety of freely available data sets covering different disease areas and sequencing technologies. For a list of currently available data, .

nextflow_schema.json, as described . This is useful for the launch UI generation. The nf-core CLI tool (installable via pip install nf-core) offers extensive help to create and maintain this schema.

The above-mentioned project structure must be generated manually. The nf-core CLI tools can assist to generate the nextflow_schema.json. Tutorial goes into more details about this use case.

Tutorial goes into more details about this use case.

Tutorial goes into more details about this use case.

Resources such as #cpu and memory can be specified as described See or our for details about Nextflow-Docker syntax.

Bench can push/pull/create/modify Docker images, as described in .

Alternative to VCFs, ICA Cohorts accepts the JSON output of for hg38/GRCh38-aligned data for small germline variants and somatic mutations, copy number variations other structural variants.

Please also see the online documentation for the for more information on output file formats.

ICA Cohorts currently support upload of SNV-level GWAS results produced by and saved as CSV files.

As an alternative to ICA Cohorts' metadata file format, you can provide files formatted according to the . Cohorts currently ingests data for these OMOP 5.4 tables, formatted as tab-delimited files:

[4] Chain files:

best practices
website
https://docs.docker.com/reference/dockerfile/
Illumina website
Illumina website
https://docs.docker.com/reference/cli/docker/buildx/build/
service connector
Workspaces
the commands
JupyterLab
Jupyter Notebook documents
ICA Open API specification
openapi-generator
Bench ICA Python Library Tutorial
Docker image settings
CLI command
non-indexed folders
CLI command
container image
Illumina website
prepare linux tool + validation inputs
wrap in Nextflow
wrap the pipeline in Bench
deploy pipeline as an ICA Flow pipeline
launch Flow validation test from Bench
API keys
init script
shutdown script)
import an existing ICA Flow pipeline
run the pipeline in Bench
monitor
modify pipeline code
nextflow
Dockerfile
Interactive
redeploy pipeline to ICA Flow
launch Flow validation test from Bench
permissions
teams setting
permissions you set on the workspace level
Tool creation
Private Registry
$ pipeline-dev import-from-nextflow <repo name e.g. nf-core/demo>
$ pipeline-dev import-from-flow [--analysis-id=…] 
$ pipeline-dev run-in-bench [--local|--sge] 
$ pipeline-dev deploy-as-flow-pipeline [--create|--update] 
$ pipeline-dev launch-validation-in-flow 

Field

Description

Project name

The ICA project for your cohort analysis (cannot be changed.)

Study name

Create or select a study. Each study represents a subset of data within the project.

Description

Short description of the data set (optional).

Job type

Append: Appends values to any existing values. If a field supports only a single value, the value is replaced.

Replace: Overwrites existing values with the values in the uploaded file.

Subject metadata files

Subject metadata file(s) in tab-delimited format. For Append and Replace job types, the following fields are required and cannot be changed: - Sample identifier - Sample display name - Subject identifier - Subject display name - Sex

Precomputed GWAS and PheWAS

The GWAS and PheWAS tabs in ICA Cohorts allow you to visualize precomputed analysis results for phenotypes/diseases and genes, respectively. Note that these do not reflect the subjects that are part of the cohort that you created.

ICA Cohorts currently hosts GWAS and PheWAS analysis results for approximately 150 quantitative phenotypes (such as "LDL direct" and "sitting height") and about 700 diseases.

Visualize Results from Precomputed Genome-Wide Association Studies (GWAS)

Navigate to the GWAS tab and start looking for phenotypes and diseases in the search box. Cohorts will suggest the best matches against any partial input ("cancer") you provide. After selecting a phenotype/disease, Cohorts will render a Manhattan plot, by default collapsed to gene level and organized by their respective position in each chromosome.

Circles in the Manhattan plot indicate binary traits, potential associations between genes and diseases. Triangles indicate quantitative phenotypes with regression Beta different from zero, and point up or down to depict positive or negative correlation, respectively.

Hovering over a circle/triangle will display the following information:

  • gene symbol

  • variant group (see below)

  • P-value, both raw and FDR-corrected

  • number of carriers of variants of the given type

  • number of carriers of variants of any type

  • regression Beta

For gene-level results, Cohorts distinguishes five different classes of variants: protein truncating; deleterious; missense; missense with a high ILMN PrimateAI score (indicating likely damaging variants); and synonymous variants. You can limit results to either of these five classes, or select All to display all results together.

  • Deleterious variants (del): the union of all protein-truncating variants (PTVs, defined below), pathogenic missense variants with a PrimateAI score greater than a gene-specific threshold, and variants with a SpliceAI score greater than 0.2.

  • Protein-truncating variants (ptv): variant consequences matching any of stop_gained, stop_lost, frameshift_variant, splice_donor_variant, splice_acceptor_variant, start_lost, transcript_ablation, transcript_truncation, exon_loss_variant, gene_fusion, or bidirectional_gene_fusion.

  • missense_all: all missense variants regardless of their pathogenicity.

  • missense, PrimateAI optimized (missense_pAI_optimized): only pathogenic missense variants with primateAI score greater than a gene-specific threshold.

  • missenses and PTVs (missenses_and_ptvs_all): the union of all PTVs, SpliceAI > 0.2 variants and all missense variants regardless of their pathogenicity scores.

  • all synonymous variants (syn).

To zoom in to a particular chromosome, click the chromosome name underneath the plot, or select the chromosome from the drop down box, which defaults to Whole genome.

Visualize Results from Precomputed Phenome-Wide Association Studies (PheWAS)

To browse PheWAS analysis results by gene, navigate to the PheWAS tab and enter a gene of interest into the search box. The resulting Manhattan plot will show phenotypes and diseases organized into a number of categories, such as "Diseases of the nervous system" and "Neoplasms". Click on the name of a category, shown underneath the plot, to display only those phenotypes/diseases, or select a category from the drop down, which defaults to All.


A future release of ICA Cohorts will allow you to run your own customized GWAS analysis inside ICA Bench and then upload variant- or gene-level results for visualization in the ICA Cohorts graphical user interface.

Illumina Connected Analytics: Cohorts Multi-Omic Cancer

Connectivity

The platform provides Connectors to facilitate automation for operations on data (ie, upload, download, linking).

Rare Genetic Disorders Walk-through

Cohorts Walk-through: Rare Genetic Disorders

This walk-through is meant to represent a typical workflow when building and studying a cohort of rare genetic disorder cases.

Login and Create a new ICA Project

Create a new Project to track your study:

  1. Login to the ICA

  2. Navigate to Projects

  3. Create a new project using the New Project button.

  4. Give your project a name and click Save.

  5. Navigate to the ICA Cohorts module by clicking COHORTS in the left navigation panel then choose Cohorts.

Create and Review a Rare Disease Cohort

  1. Navigate to the ICA Cohorts module by clicking Cohorts in the left navigation panel.

  2. Click Create Cohort button.

  3. Enter a name for your cohort, like Rare Disease + 1kGP at top, left of pencil icon.

  4. From the Public Data Sets list select:

    1. DRAGEN-1kGP

    2. All Rare genetic disease cohorts

  5. Notice that a cohort can also be created based on Technology, Disease Type and Tissue.

  6. Under Selected Conditions in right panel, click on Apply

  7. A new page opens with your cohort in a top-level tab.

  8. Expand Query Details to see the study makeup of your cohort.

  9. A set of 4 Charts will be open by default. If they are not, click Show Charts.

    1. Use the gear icon in the top-right of the Charts pane to change chart settings.

  10. The bottom section is demarcated by 8 tabs (Subjects, Marker Frequency, Genes, GWAS, PheWAS, Correlation, Molecular Breakdown, CNV).

  11. The Subjects tab displays a list of exportable Subject IDs and attributes.

    1. Clicking on a Subject ID link pops up a Subject details page.

Analyze Your Rare Disease Cohort Data

  1. A recent GWAS publication identified 10 risk genes for intellectual disability (ID) and autism. Our task is to evaluate them in ICA Cohorts: TTN, PKHD1, ANKRD11, ARID1B, ASXL3, SCN2A, FHL1, KMT2A, DDX3X, SYNGAP1.

  2. First let’s Hide charts for more visual space.

  3. Click the Genes tab where you need to query a gene to see and interact with results.

  4. Type SCN2A into the Gene search field and select it from autocomplete dropdown options.

  5. The Gene Summary tab now lists information and links to public resources about SCN2A.

  6. Click on the Variants tab to see an interactive Legend and analysis tracks.

    1. The Needle Plot displays gnomAD Allele Frequency for variants in your cohort.

      1. Note that some are in SCN2A conserved protein domains.

    2. In Legend, switch the Plot by option to Sample Count in your cohort.

    3. In Legend, uncheck all Variant Types except Stop gained. Now you should see 7 variants.

    4. Hover over pin heads to see pop-up information about particular variants.

  7. The Primate AI track displays Scores for potential missense variants, based on polymorphisms observed in primate species. Points above the dashed line for the 75th percentile may be considered "likely pathogenic" as cross-species sequence is highly conserved; you often see high conservancy at the functional domains. Points below the 25th percentile may be considered "likely benign".

  8. The Pathogenic variants track displays markers from ClinVar color-coded by variant type. Hover over to see pop-ups with more information.

  9. The Exons track shows mRNA exon boundaries with click and zoom functionality at the ends.

  10. Below the Needle Plot and analysis tracks is a list of "Variants observed in the selected cohort"

    1. Export Gene Variants table icon is above the legend on right side.

  11. Now let's click on the Gene Expression tab to see a Bar chart of 50 normal tissues from GTEx in transcripts per million (TPM). SCN2A is highly expressed in certain Brain tissues, indicating specificity to where good markers for intellectual disability and autism could be expected.

  12. As a final exercise in discovering good markers, click on the tab for Genetic Burden Test. The table here associates Phenotypes with Mutations Observed in each Study selected for our cohort, alongside Mutations Expected to derive p-values. Given all considerations above, SCN2A is good marker for intellectual disability (p < 1.465 x 10 -22) and autism (p < 5.290 x 10 -9).

  13. Continue to check the other genes of interest in step 1.

https://ica.illumina.com/ica
creating a cohort
import
ICA_Cohorts_Supported_Attributes.xlsx
TCGA
ICA_Cohorts_Example_Metadata.tsv
PublicData_AllConditionsSummarized.xlsx
Oncology Walkthrough
Create a Cohort
Import New Samples
Prepare Metadata Sheets
Precomputed GWAS and PheWAS
Cohort Analysis
Compare Cohorts
Cohorts Data in ICA Base
Oncology
Rare Genetic Disorders
see here
here
Pipeline from Scratch
Nf Core Pipelines
Updating an Existing Flow Pipeline
Containers
Creating a Pipeline from Scratch
nf-core Pipelines
Updating an Existing Flow Pipeline
Illumina Nirvana
Illumina DRAGEN RNA Pipeline
Regenie
OMOP common data model 5.4
ftp://ftp.ensembl.org/pub/assembly_mapping/homo_sapiens/
here
containers
tutorials

Oncology Walk-through

This walk-through is intended to represent a typical workflow when building and studying a cohort of oncology cases.

Create a Cancer Cohort and View Subject Details

  1. Click Create Cohort button.

  2. Select the following studies to add to your cohort:

    1. TCGA – BRCA – Breast Invasive Carcinoma

    2. TCGA – Ovarian Serous Cystadenocarcinoma

  3. Add a Cohort Name = TCGA Breast and Ovarian_1472

  4. Click on Apply.

  5. Expand Show query details to see the study makeup of your cohort.

  6. Charts will be open by default. If not, click Show charts

  7. Use the gear icon in the top-right to change viewable chart settings.

    Tip: Disease Type, Histological Diagnosis, Technology, Overall Survival have interesting data about this cohorts

  8. The Subject tab with all Subjects list is displayed below Charts with a link to each Subject by ID and other high-level information, like Data Types measured and reported. By clicking a subject ID, you will be brought to the data collected at the Subject level.

  9. Search for subject TCGA-E2-A14Y and view the data about this Subject.

  10. Click the TCGA-E2-A14Y Subject ID link to view clinical data for this Subject that was imported via the metadata.tsv file on ingest.

    Note: the Subject is a 35 year old Female with vital status and other phenotypes that feed up into the Subject attribute selection criteria when creating or editing cohorts.

  11. Click X to close the Subject details.

  12. Click Hide charts to increase interactive landscape.

Data Analysis, Multi-Omic Biomarker Discovery, and Interpretation

  1. Click the Marker Frequency tab, then click the Somatic Mutation tab.

  2. Review the gene list and mutation frequencies.

  3. Note that PIK3CA has a high rate of mutation in the Cohort (ranked 2nd with 33% mutation frequency in 326 of the 987 Subjects that have Somatic Mutation data in this cohort).

    1. Do Subjects with PIK3CA mutations have changes in PIK3CA RNA Expression?

  4. Click the Gene Expression tab, search for PIK3CA

    1. PIK3CA RNA is down-regulated in 27% of the subjects relative to normal samples.

      1. Switch from normal to disease Reference where the Subject’s denominator is the median of all disease samples in your cohort.

      2. The count of matching vs. total subjects that have PIK3CA up-regulated RNA which may indicate a distinctive sub-phenotype.

  5. Click directly on PIK3CA gene link in the Gene Expression table.

  6. You are brought to the Gene tab under the Gene Summary sub-tab that lists information and links to public resources about PIK3CA.

  7. Click the Variants tab and Show legend and filters if it does not open by default.

  8. Below the interactive legend you see a set of analysis tracks: Needle Plot, Primate AI, Pathogenic variants, and Exons.

  9. The Needle Plot allows toggling the plot by gnomAD frequency and Sample Count. Select Sample Count in the Plot by legend above the plot.

    1. There are 87 mutations distributed across the 1068 amino acid sequence, listed below the analysis tracks. These can be exported via the icon into a table.

  10. We know that missense variants can severely disrupt translated protein activity. Deselect all Variant Types except for Missense from the Show Variant Type legend above the needle plot.

    1. Many mutations are in the functional domains of the protein as seen by the colored boxes and labels on the x-axis of the Needle Plot.

  11. Hover over the variant with the highest sample count in the yellow PI3Ka protein domain.

    1. The pop-up shows variant details for the 64 Subjects observed with it: 63 in the Breast Cancer study and 1 in the Ovarian Cancer Study.

  12. Use the Exon zoom bar from each end of the Amino Acid sequence to zoom in to the PI3Ka domain to better separate observations.

  13. There are three different missense mutations at this locus changing the wildtype Glutamine at different frequencies to Lysine (64), Glycine (6), or Alanine (2).

  14. The Pathogenic Variant Track shows 7 ClinVar entries for mutations stacked at this locus affecting amino acid 545. Pop up details with pathogenicity calls, phenotypes, submitter and a link to the ClinVar entry is seen by hovering over the purple triangles.

  15. Note the Primate AI track and high Primate AI score.

    1. Primate AI track displays Scores for potential missense variants, based on polymorphisms observed in primate species. Points above the dashed line for the 75th percentile may be considered likely pathogenic as cross-species sequence is highly conserved; you often see high conservancy at the functional domains. Points below the 25th percentile may be considered "likely benign".

  16. Click the Expression tab and notice that normal Breast and normal Ovarian tissue have relatively high PIK3CA RNA Expression in GTex RNAseq tissue data but ubiquitously expressed.

Cohort Analysis

Cohort Analysis

From the Cohorts menu in the left hand navigation, select a cohort created in Create Cohort to begin a cohort analysis.

Query Details

The query details can be accessed by clicking the triangle next to Show Query Details. The query details displays the selections used to create a cohort. The selections can be edited by clicking the pencil icon in the top right.

Charts

  1. Charts will be open by default. If not, click Show Charts.

  2. Use the gear icon in the top-right to change viewable chart settings.

  3. There are four charts available to view summary counts of attributes within a cohort as histogram plots.

  4. Click Hide Charts to hide the histograms.

Single Subject Timeline View:

  1. Display time-stamped events and observations for a single subject on a timeline.The timeline view is visible to only those subjects which have time-series data.

  2. Below attributes are displayed in timeline view: • Diagnosed and Self-Reported Diseases: • Start and end dates • Progression vs. remission • Medication and Other Treatments: • Prescribed and self-medicated • Start date, end date, and dosage at every time point

  3. The timeline utilizes age (at diagnosis, at event, at measurement) as the x-axis and attribute name as the y-axis. If the birthdate is not recorded for a subject, the user can now switch to Date to visualize data.

  4. In the default view, the timeline shows the first five disease data and the first five drug/medication data in the plot. Users can choose different attributes or change the order of existing attributes by clicking on the “select attribute” button.

  5. The x-axis shows the person’s age in years, with data points initially displayed between ages 0 to 100. Users can zoom in by selecting the desired range to visualize data points within the selected age range.

  6. Each event is represented by a dot in the corresponding track. Events in the same track can be connected by lines to indicate the start and end period of an event.

Measurement Section: A summary of measurements (without values) is displayed under the section titled "Measurements and Laboratory Values Available." Users can click a link to access the Timeline View for detailed results.

Drug Section: The "Drug Name" section lists drug names without repeating the header "Drug Name" for each entry.

Subjects

  1. By Default, the Subjects tab is displayed.

  2. The Subjects tab with a list of all subjects matching your criteria is displayed below Charts with a link to each Subject by ID and other high-level information. By clicking a subject ID, you will be brought to the data collected at the Subject level.

  3. Search for a specific subject by typing the Subject ID into the Search Subjects text box.

  4. Get all details available on a subject by clicking the hyperlinked Subject ID in the Subject list.

To Exclude specific subjects from subsequent analysis, such as marker frequencies or gene-level aggregated views, you can uncheck the box at the beginning of each row in the subject list. You will then be prompted to save any exclusion(s).

You can Export the list of subjects either to your ICA Project's data folder or to your local disk as a TSV file for subsequent use. Any export will omit subjects that you excluded after you saved those changes. For more information, see at the bottom of this page.

Remove a Subject

  1. Specific subjects can be removed from a Cohort.

  2. Select the Subjects tab.

  3. Subjects in the Cohort, by default are checked.

  4. To remove a specific subject from a Cohort, uncheck the checkbox next to subjects to remove from a Cohort.

  5. Check box selections are maintained while browsing through the pages of the subject list.

  6. Click Save Cohort to save the subjects you would like to exclude.

  7. The specific subjects will no longer be counted in all analysis visualizations.

  8. The specific excluded subjects will be saved for the Cohort.

  9. To add the subjects back to the Cohort, select the checkboxes to checked and click Save Cohort.

Structural variant aggregation: Marker Frequency analysis

For each individual cohort, display a table of all observed SVs that overlap with a given gene.

Marker Frequency

  1. Click the Marker Frequency tab, then click the Gene Expression tab.

  2. Down-regulated genes are displayed in blue and Up-regulated genes are displayed in red.

  3. A frequency in the Cohort is displayed and the Matching number/Total is also displayed in the chart.

  4. Genes can be searched by using the Search Genes text box.

Genes

  1. You are brought to the Gene tab under the Gene Summary sub-tab.

  2. Select a Gene by typing the gene name into the Search Genes text box.

  3. A Gene Summary will be displayed that lists information and links to public resources about the selected gene.

  4. A cytogenic map will be displayed based on the selected gene and a vertical orange bar represents gene location in the chromosome.

  5. Click the Variants tab and Show legend and filters if it does not open by default.

  6. Below the interactive legend, you see a set of analysis tracks: Needle Plot, Primate AI, Pathogenic variants, and Exons.

  7. The Needle Plot allows toggling the plot by gnomAD frequency and Sample Count. Select Sample Count in the Plot by legend above the plot. You can also filter the plot to only show variants above/below a certain cut-off for gnomAD frequency (in percent) or absolute sample count.

  8. The Needle Plot allows filtering by PrimateAI Score.

    • Set a lower (>=) or upper (<=) threshold for the PrimateAI Score to filter variants.

    • Enter the threshold value in the text box located below the gnomadFreq/SampleCount input box.

    • If no threshold value is entered, no filter will be applied.

    • The filter affects both the plot and the table when the “Display only variants shown in the plot above” toggle is enabled.

    • Filter preferences persist across gene views for a seamless experience.

  9. The following filters are always shown and can be independently set: %gnomAD Frequency Sample Count PrimateAI Score . Changes made to these filters are immediately reflected in both the needle plot and the variant list below.

  10. Click on a variant's needle pin to view details about the variant from public resources and counts of variants in the selected cohort by disease category. If you want to view all subjects that carry the given variant, click on the sample count link, which will take you to the list of subjects (see above).

  11. Use the Exon zoom bar from each end of the Amino Acid sequence to zoom in on the gene domain to better separate observations.

  12. The Pathogenic Variant Track shows pop up details with pathogenicity calls, phenotypes, submitter and a link to the ClinVar entry is seen by hovering over the purple triangles.

  13. Below the needle plot is a full listing of variants displayed in the needle plot visualization

    • Display only variants shown in the plot above. toggle (enabled by default) syncs the table with the Needle Plot. When the toggle is on, the table will display only the variants shown in the Needle Plot, applying all active filters (e.g., variant type, somatic/germline, sample count). When the toggle is off, all reported variants are displayed in the table and table-based filters can be used.

    • Export to CSV: When the views are synchronized (toggle on), the filtered list of variants can be exported to a CSV file for further analysis.The Phenotypes tab shows a stacked horizontal bar chart which displays molecular breakdown (disease type vs Gene) and subject count for the selected gene.

    Note on "Stop Lost" Consequence Variants:

    • The stop_lost consequence is mapped as Frameshift, Stop lost in the tooltip.

    • The l Stop gained|lost value includes both stop gain and stop loss variants.

    • When the Stop gained filter is applied, Stop lost variants will not appear in the plot or table if the "Display only variants shown in the plot above" toggle is enabled

  14. The Gene Expression tab shows known gene expression data from tissue types in GTEx.

  15. The Genetic Burden Test will only be available for de novo variants only.

Correlation

For every correlation, subjects contained in each count can be viewed by selecting the count on the bubble or the count on the X-axis and Y-axis.

Clinical vs. Clinical Attribute Comparison – Bubble Plot

  1. Click the Correlation Tab.

  2. In X-axis category, select Clinical.

  3. In X-axis Attribute, select a clinical attribute.

  4. In Y-axis category, select Clinical.

  5. In Y-Axis Attribute, select another clinical attribute.

  6. You will be shown a bubble plot comparing the first clinical attribute on the x-axis to second attributes on the y-axis.

  7. The size of the bubbles correspond to the number of subjects falling into those categories.

Molecular vs. Molecular Attribute Comparison – Bubble Plot

To see a breakdown of Somatic Mutations vs. RNA Expression levels perform the following steps:

Note this comparison is for a Cancer case.

  1. Click the Correlation Tab.

  2. In X-axis category, select Somatic.

  3. In X-axis Attribute, select a gene.

  4. In Y-axis category, select RNA expression.

  5. In Y-Axis Attribute, type a gene and leave Reference Type, NORMAL.

  6. Click Continuous to see violin plots of compared variables.

Clinical vs. Molecular Attribute Comparison – Bubble Plot

Note this comparison is for a Cancer case.

  1. Click the Correlation Tab.

  2. In X-axis category, select Somatic.

  3. In X-axis Attribute, type a gene name.

  4. In Y-axis category, select Clinical.

  5. In Y-Axis Attribute, select a clinical attribute.

Molecular Breakdown

  1. Click the Molecular Breakdown Tab.

  2. In Enter a clinical Attribute, and select a clinical attribute.

  3. In Enter a gene, select a gene by typing a gene name.

  4. You are shown a stacked bar-chart by the clinical attribute selected values on the Y-axis.

  5. For each attribute value the bar represents the % of Subjects with RNA Expression, Somatic Mutation, and Multiple Alterations.

Note: for each of the aforementioned bubble plots, you can view the list of subjects by following the link under each subject count associated with an individual bubble or axis label. This will take you to the list of subjects view, see above.

CNV

If there is Copy Number Variant data in the cohort:

  1. Click the CNV tab.

  2. A graph will show CNV a Sample Percentage on the Y-axis and Chromosomes on the X-axis.

  3. Any value above Zero is a copy number gain, and any value below Zero is a copy number loss.

  4. Click Chromosome: to select a specific chromosome position.

Subject Export for Analysis in ICA Bench

ICA allows for integrated analysis in a computation workspace. You can export your cohorts definitions and, in combination with molecular data in your ICA Project Data, perform, for example, a GWAS analysis.

  1. Confirm the VCF data for your analysis is in ICA Project Data.

  2. Navigate back to ICA Cohorts.

  3. From the Subjects Tab click the Export subjects... from the top-right of the subject list. The file can be downloaded to the Browser or ICA Project Data.

  4. We suggest using export ...to Data Folder for immediate access to this data in Bench or other areas of ICA.

  5. Create another cohort if needed for your Research and complete the last 3 steps.

  6. Navigate to the Bench workspace created in the second step.

  7. After the workspace has started up, click Access.

  8. Find the /Project/ folder in the Workspace file navigation.

  9. This folder will contain your cohort files created along with any pipeline output data needed for your workspace analysis.

Compare Cohorts

You can compare up to four previously created individual cohorts, to view differences in variants and mutations, RNA expression, copy number variation, and distribution of clinical attributes. Once comparisons are created, they are saved in the Comparisons left-navigation tab of the Cohorts module.

Create a comparison view

  1. Select Cohorts from the left-navigation panel.

  2. Select 2 to 4 cohorts already created. If you have not created any cohorts, See Create a Cohort documentation.

  3. Click Compare Cohorts in the right-navigation panel.

  4. Note you are now in the Comparisons left-navigation tab of the Cohorts module.

  5. In the Charts Section, if the COHORTS item is not displayed, click the gear icon in the top right and add Cohorts as the first attribute and click Save.

  6. The COHORTS item in the charts panel will provide a count of subjects in each cohort and act as a legend for color representation throughout comparison screens.

  7. For each clinical attribute category, a bar chart is displayed. Use the gear icon to select attributes to display in the charts panel.

You can share a comparison with other team members in the same ICA Project. Please refer to the section on "Sharing a Cohort" on "Create a Cohort" for details on sharing, unsharing, deleting, and archiving, which are analogous for sharing comparisons.

Attribute Comparison

  1. Select the Attributes tab

  2. Attribute categories are listed and can be expanded using the down-arrows next to the category names. The categories available are based on cohorts selected. Categories and attributes are part of the ICA Cohorts metadata template that map to each Subject.

  3. For example, use the drop-down arrow next to Vital status to view sub-categories and frequencies across selected cohorts.

Variants Comparison

  1. Select the Genes tab

  2. Search for a gene of interest using its HUGO/HGNC gene symbol

  3. As additional filter options, you can view only those variants that are occur in every cohort; that are unique to one cohort; that have been observed in at least two cohorts; or any variant.

Survival Summary

  1. Select the Survival Summary tab.

  2. Attribute categories are listed and can be expanded using the down-arrows next to the category names.

  3. Select the drop-down arrow for Therapeutic interventions.

  4. In each subcategory there is a sum of the subject counts across select cohorts.

  5. For each cohort, designated by a color, there is a Subject count and Median survival (years) column.

  6. Type Malignancy in the Search Box and an auto-complete dropdown suggests three different attributes.

  7. Select Synchronous malignancy and the results are automatically opened and highlighted in orange.

Survival Comparison

  1. Click Survival Comparison tab.

  2. A Kaplan-Meier Curve is rendered based on each cohort.

  3. P-Value Displayed at the top of Survival Comparison indicates whether there is statistically significant variance between survival probabilities over time of any pair of cohorts (CI=0.95).

When comparing two cohorts, the P-Value is shown above the two survival curves. For three or four cohorts, P-Values are shown as a pair-wise heatmap, comparing each cohort to every other cohort.

Marker Frequency Comparison

  1. Select the Marker Frequency tab.

  2. Select either Gene expression (default), Somatic mutation, or Copy number variation

  3. For gene expression (up- versus down-regulated) and for copy number variation (gain versus loss), Cohorts will display a list of all genes with bidirectional barcharts

  4. For somatic mutations, the barcharts are unidirectional and indicate the percentage of samples with a mutation in each gene per cohort.

  5. Bars are color-coded by cohort, see the accompanying legend.

  6. Each row shows P-value(s) resulting from pairwise comparison of all cohorts. In the case of comparing two cohorts, the numerical P-value will be displayed in the table. In the case of comparing three or more cohorts, the pairwise P-values are shown as a triangular heatmap, with details available as a tooltip.

Correlation Comparison

  1. Select the Correlation tab.

  2. Similar to the single-cohort view (Cohort Analysis | Correlation), choose two clinical attributes and/or genes to compare.

  3. Depending on the available data types for the two selections (categorical and/or continuous), Cohorts will display a bubble plot, violin plot, or scatter plot.

Public Data Sets

ICA Cohorts comes front-loaded with a variety of publicly accessible data sets, covering multiple disease areas and also including healthy individuals.

Details

The project details are configured during project creation and may be updated by the project owner, entities with the project Adminstrator role, and tenant administrators.

Adding Linked Bundle Assets

  1. Click the Edit button at the top of the Details page.

  2. Click the + button, under LINKED BUNDLES.

  3. Click on the desired bundle, then click the Link button.

  4. Click Save.

If your linked bundle contained a pipeline, then it will appear in Projects > your_project > Flow > Pipelines.

Details List

Detail
Description

Name

Name of the project unique within the tenant. Alphanumerics, underscores, dashes, and spaces are permitted.

Short Description

Short description of the project

Project Owner

Owner of the project (has Administrator access to the project)

Storage Configuration

Storage configuration to use for data stored in the project

User Tags

User tags on the project

Technical Tags

Technical tags on the project

Metadata Model

Metadata model assigned to the project

Project Location

Project region where data is stored and pipelines are executed. Options are derived from the Entitlement(s) assigned to user account, based on the purchased subscription

Storage Bundle

Storage bundle assigned to the project. Derived from the selected Project Location based on the Entitlement in the purchased subscription

Billing Mode

Billing mode assigned to the project

Data sharing

Enables data and samples in the project to be linked to other projects

Billing Mode

A project's billing mode determines the strategy for how costs are charged to billable accounts.

Billing Mode
Description

Project

All incurred costs will be charged to the tenant of the project owner

Tenant

Incurred costs will be charged to the tenant of the user owning the project resource (ie, data, analysis). The only exceptions are base tables and queries, as well as bench compute and storage costs, which are always billed to the project owner.

For example, with billing mode set to Tenant, if tenant A has created a project resource and uses it in their project, then tenant A will pay for the resource data, compute costs and storage costs of any output they generate within the project. When they share the project with tenant B, then tenant B will pay the compute and storage for the data which they generate in that project. Put simply, in billing mode tenant, the person who generates data pays for the processing and storage of that data, regardless of who owns the actual project.

If the project billing mode is updated after the project has been created, the updated billing mode will only be applied to resources generated after the change.

If you are using your own S3 storage, then the billing mode impacts where collaborator data is stored.

  • Project billing will result in using your S3 storage for the data.

  • Tenant billing will result in collaborator data being stored in Illumina-managed storage instead of your own S3 storage.

  • Tenant billing, when your collaborators also have their own S3 storage and have it set as default, will result in their data being stored in their S3 storage.

Authentication Token

Use the Create OAuth access token button to generate an OAuth access token which is valid for 12 hours after generation. This token can be used by Snowflake and Tableau to access the data in your Base databases and tables for this Project.

Team

Projects may be shared by modifying the project's Team. Team members can be added using one of the following entities:

  • User within the current tenant

  • E-mail address

  • Workgroup within the current tenant

Select the corresponding option under + Add.

Email invites are sent out as soon as you click the save button on the add team member dialog.

Project Owner

To change the project owner, select the Edit project owner button at the top right and select the new project owner from the list. This can be done by the current project owner, the tenant administrator or a project administrator of the current project.

Roles

Each entity added to the project team will have an assigned role with regards to specific categories of functionality in the application. These categories are:

  • Project

  • Flow

  • Base

  • Bench

Upload and Download rights

While the categories will determine most of what a user can do or see, explicit upload and download rights need to be granted for users. This is done by selecting the appropriate upload and download icons.

Upload and download rights are independent of the assigned role. A user with only viewer rights will still be able to perform uploads and downloads if their upload and download rights are not disabled. Likewise, an administrator can only perform uploads and downloads if their upload and download rights are enabled.

Upload allowed
Download allowed

No upload allowed

No Download allowed

The sections below describe the roles for each category and the allowed actions.

Project

No Access
Data Provider
Viewer
Contributor
Administrator

Create a Connector

x

x

x

x

View project resources

x

x

x

Link/Unlink data to a project

x

x

Subscribe to notifications

x

x

View Activity

x

x

Create samples

x

x

Delete/archive data

x

Manage notification channels

x

Manage project team

x

Flow

No Access
Viewer
Contributor

View analyses results

x

x

Create analyses

x

Create pipelines and tools

x

Edit pipelines and tools

x

Add docker image

x

Base

No Access
Viewer
Contributor

View table records

x

x

Click on links in table

x

x

Create queries

x

x

Run queries

x

x

Export query

x

x

Save query

x

x

Export tables

x

x

Create tables

x

Load files into a table

x

Bench

No Access
Contributor
Administrator

Execute a notebook

x

x

Start/Stop Workspace

x

x

Create/Delete/Modify workspaces

x

Install additional tools, packages, libraries, …

x

Build a new Bench docker image

x

Create a tool for pipeline-execution

x

If a user qualifies for multiple entities added to the project team (ie, added as an individual user and is a member of an added workgroup), the highest level of access provided by an intersection of the roles is granted.

Cohorts Data in ICA Base

ICA Cohorts Base Tables

  1. Post ingestion, data will be represented in Base.

  2. Select BASE from the ICA left-navigation and click Query.

  3. Under the New Query window, a list of tables is displayed. Expand the Shared Database for Project \<your project name\> .

  4. Cohorts tables will be displayed.

  5. To preview the table and fields click each view listed.

  6. Clicking any of these views then selecting PREVIEW on the right-hand side will show you a preview of the data in the tables.

If your ingestion includes Somatic variants, there will be two molecular tables: ANNOTATED_SOMATIC_MUTATIONS and ANNOTATED_VARIANTS. All ingestions will include a PHENOTYPE table.

The PHENOTYPE table includes a harmonized set that is collected across all data ingestions and is not representative of all data ingested for the Subject or Sample. Sample information is also displayed in this table, if applicable. Sample information drives the annotation process if molecular data is included in the ingestion. That data is stored in the PHENOTYPE table.

Phenotype Data

Field Name
Type
Description

SAMPLE_BARCODE

STRING

Sample Identifier

SUBJECTID

STRING

Identifer for Subject entity

STUDY

STRING

Study designation

AGE

NUMERIC

Age in years

SEX

STRING

Sex field to drive annotation

POPULATION

STRING

Population Designation for 1000 Genomes Project

SUPERPOPULATION

STRING

Superpopulation Designation from 1000 Genomes Project

RACE

STRING

Race according to NIH standard

CONDITION_ONTOLOGIES

VARIANT

Diagnosis Ontology Source

CONDITION_IDS

VARIANT

Diagnosis Concept Ids

CONDITIONS

VARIANT

Diagnosis Names

HARMONIZED_CONDITIONS

VARIANT

Diagnosis High-level concept to drive UI

LIBRARYTYPE

STRING

Seqencing technology

ANALYTE

STRING

Substance sequenced

TISSUE

STRING

Tissue source

TUMOR_OR_NORMAL

STRING

Tumor designation for somatic

GENOMEBUILD

STRING

Genome Build to drive annotations - hg38 only

SAMPLE_BARCODE_VCF

STRING

Sample ID from VCF

AFFECTED_STATUS

NUMERIC

Affected, Unaffected, or Unknown for Family Based Analysis

FAMILY_RELATIONSHIP

STRING

Relationship designation for Family Based Analysis

Sample Information

Field Name
Type
Description

SAMPLE_BARCODE

STRING

Original sample barcode used in VCF column

SUBJECTID

STRING

Original identifier for the subject record

DATATYPE

ARRAY

The categorization of molecular data

TECHNOLOGY

ARRAY

The sequencing method

CREATEDATE

DATE

Date and time of record creation

LASTUPDATEDATE

DATE

Date and time of last update of record

Sample Attribute

This table is an entity-attribute value table of supplied sample data matching Cohorts accepted attributes.

Field Name
Type
Description

SAMPLE_ BARCODE

STRING

Original sample barcode used in VCF column

SUBJECTID

STRING

Original identifier for the subject record

ATTRIBUTE_NAME

STRING

Cohorts meta-data driven field name

ATTRIBUTE_VALUE

VARIANT

List of values entered for the field

Study Information

Field Name
Type
Description

NAME

STRING

Study name

CREATEDATE

DATE

Date and time of study creation

LASTUPDATEDATE

DATE

Data and time of record update

Subject

Field
Type
Description

SUBJECTID

STRING

Original identifier for the subject record

AGE

FLOAT

Age entered on subject record if applicable

SEX

STRING

-

ETHNICITY

STRING

-

STUDY

STRING

Study subject belongs to

CREATEDATE

DATE

Date and time of record creation

LASTUPDATEDATE

DATE

Date and time of record update

Subject Attribute

This table is an entity-attribute value table of supplied subject data matching Cohorts accepted attributes.

Field
Type
Description

SUBJECTID

STRING

Original identifier for the subject record

ATTRIBUTE_NAME

STRING

Cohorts meta-data driven field name

ATTRIBUTE_VALUE

VARIANT

List of values entered for the field

Disease

Field
Type
Description

SUBJECTID

STRING

Original identifier for the subject record

TERM

STRING

Code for disease term

OCCURRENCES

STRING

List of occurrence related data

Drug Exposure

Field
Type
Description

SUBJECTID

STRING

Original identifier for the subject record

TERM

STRING

Code for drug term

OCCURRENCES

STRING

List of occurrence related data of drug exposure

Measurement

Field
Type
Description

SUBJECTID

STRING

Original identifier for the subject record

TERM

STRING

Code for measurement term

OCCURRENCES

STRING

List of occurrences and values related to lab or measurement data

Procedure

Field
Type
Description

SUBJECTID

STRING

Original identifier for the subject record

TERM

STRING

Code for procedure term

OCCURRENCES

STRING

List of occurrences and values related procedure data

Annotated Variants

This table will be available for all projects with ingested molecular data

Field Name

Type

Description

SAMPLE_BARCODE

STRING

Original sample barcode used in VCF column

STUDY

STRING

Study designation

GENOMEBUILD

STRING

Only hg38 is supported

CHROMOSOME

STRING

Chromosome without 'chr' prefix

CHROMOSOMEID

NUMERIC

Chromosome ID: 1..22, 23=X, 24=Y, 25=Mt

DBSNP

STRING

dbSNP Identifiers

VARIANT_KEY

STRING

Variant ID in the form "1:12345678:12345678:C"

NIRVANA_VID

STRING

Broad Institute VID: "1-12345678-A-C"

VARIANT_TYPE

STRING

Description of Variant Type (e.g. SNV, Deletion, Insertion)

VARIANT_CALL

NUMERIC

1=germline, 2=somatic

DENOVO

BOOLEAN

true / false

GENOTYPE

STRING

"G|T"

READ_DEPTH

NUMERIC

Sequencing read depth

ALLELE_COUNT

NUMERIC

Counts of each alternate allele for each site across all samples

ALLELE_DEPTH

STRING

Unfiltered count of reads that support a given allele for an individual sample

FILTERS

STRING

Filter field from VCF. If all filters pass, field is PASS

ZYGOSITY

NUMERIC

0 = hom ref, 1 = het ref/alt, 2 = hom alt, 4 = hemi alt

GENEMODEL

NUMERIC

1=Ensembl, 2=RefSeq

GENE_HGNC

STRING

HUGO/HGNC gene symbol

GENE_ID

STRING

Ensembl gene ID ("ENSG00001234")

GID

NUMERIC

NCBI Entrez Gene ID (RefSeq) or numerical part of Ensembl ENSG ID

TRANSCRIPT_ID

STRING

Ensembl ENST or RefSeq NM_

CANONICAL

STRING

Transcript designated 'canonical' by source

CONSEQUENCE

STRING

missense, stop gained, intronic, etc.

HGVSC

STRING

The HGVS coding sequence name

HGVSP

STRING

The HGVS protein sequence name

Annotated Somatic Mutations

This table will only be available for data sets with ingested Somatic molecular data.

Field Name

Type

Description

SAMPLE_BARCODE

STRING

Original sample barcode, used in VCF column

SUBJECTID

STRING

Identifier for Subject entity

STUDY

STRING

Study designation

GENOMEBUILD

STRING

Only hg38 is supported

CHROMOSOME

STRING

Chromosome without 'chr' prefix

DBSNP

NUMERIC

dbSNP Identifiers

VARIANT_KEY

STRING

Variant ID in the form "1:12345678:12345678:C"

MUTATION_TYPE

NUMERIC

Rank of consequences by expected impact: 0 = Protein Truncating to 40 = Intergenic Variant

VARIANT_CALL

NUMERIC

1=germline, 2=somatic

GENOTYPE

STRING

"G|T"

REF_ALLELE

STRING

Reference allele

ALLELE1

STRING

First allele call in the tumor sample

ALLELE2

STRING

Second allele call in the tumor sample

GENEMODEL

NUMERIC

1=Ensembl, 2=RefSeq

GENE_HGNC

STRING

HUGO/HGNC gene symbol

GENE_ID

STRING

Ensembl gene ID ("ENSG00001234")

TRANSCRIPT_ID

STRING

Ensembl ENST or RefSeq NM_

CANONICAL

BOOLEAN

Transcript designated 'canonical' by source

CONSEQUENCE

STRING

missense, stop gained, intronic, etc.

HGVSP

STRING

HGVS nomenclature for AA change: p.Pro72Ala

Annotated Copy Number Variants

This table will only be available for data sets with ingested CNV molecular data.

Field Name

Type

Description

SAMPLE_BARCODE

STRING

Sample barcode used in the original VCF

GENOMEBUILD

STRING

Genome build, always 'hg38'

NIRVANA_VID

STRING

Variant ID of the form 'chr-pos-ref-alt'

CHRID

STRING

Chromosome without 'chr' prefix

CID

NUMERIC

Numerical representation of the chromosome, X=23, Y=24, Mt=25

GENE_ID

STRING

NCBI or Ensembl gene identifier

GID

NUMERIC

Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix

START_POS

NUMERIC

First affected position on the chromosome

STOP_POS

NUMERIC

Last affected position on the chromosome

VARIANT_TYPE

NUMERIC

1 = copy number gain, -1 = copy number loss

COPY_NUMBER

NUMERIC

Observed copy number

COPY_NUMBER_CHANGE

NUMERIC

Fold-chang of copy number, assuming 2 for diploid and 1 for haploid as the baseline

SEGMENT_VALUE

NUMERIC

Average FC for the identified chromosomal segment

PROBE_COUNT

NUMERIC

Probes confirming the CNV (arrays only)

REFERENCE

NUMERIC

Baseline taken from normal samples (1) or averaged disease tissue (2)

GENE_HGNC

STRING

HUGO/HGNC gene symbol

Annotated Structural Variants

This table will only be available for data sets with ingested SV molecular data. Note that ICA Cohorts stores copy number variants in a separate table.

Field Name

Type

Description

SAMPLE_BARCODE

STRING

Sample barcode used in the original VCF

GENOMEBUILD

STRING

Genome build, always 'hg38'

NIRVANA_VID

STRING

Variant ID of the form 'chr-pos-ref-alt'

CHRID

STRING

Chromosome without 'chr' prefix

CID

NUMERIC

Numerical representation of the chromosome, X=23, Y=24, Mt=25

BEGIN

NUMERIC

First affected position on the chromosome

END

NUMERIC

Last affected position on the chromosome

BAND

STRING

Chromosomal band

QUALIITY

NUMERIC

Quality from the original VCF

FILTERS

ARRAY

Filters from the original VCF

VARIANT_TYPE

STRING

Insertion, deletion, indel, tandem_duplication, translocation_breakend, inversion ("INV"), short tandem repeat ("STR2")

VARIANT_TYPE_ID

NUMERIC

21=insertion, 22=deletion, 23=indel, 24=tandem_duplication, 25=translocation_breakend, 26=inversion ("INV"), 27=short tandem repeat ("STR2")

CIPOS

ARRAY

Confidence interval around first position

CIEND

ARRAY

Confidence interval around last position

SVLENGTH

NUMERIC

Overall size of the structural variant

BONDCHR

STRING

For translocations, the other affected chromosome

BONDCID

NUMERIC

For translocations, the other affected chromosome as a numeric value, X=23, Y=24, Mt=25

BONDPOS

STRING

For translocations, positions on the other affected chromosome

BONDORDER

NUMERIC

3 or 5: Whether this fragment (the current variant/VID) "receives" the other chromosome's fragment on it's 3' end, or attaches to the 5' of the other chromosome fragment

GENOTYPE

STRING

Called genotype from the VCF

GENOTYPE_QUALITY

NUMERIC

Genotype call quality

READCOUNTSSPLIT

ARRAY

Read counts

READCOUNTSPAIRED

ARRAY

Read counts, paired end

REGULATORYREGIONID

STRING

Ensembl ID for the affected regulatory region

REGULATORYREGIONTYPE

STRING

Type of the regulatory region

CONSEQUENCE

ARRAY

Variant consequence according to SequenceOntology

TRANSCRIPTID

STRING

Ensembl of RefSeq transcript identifier

TRANSCRIPTBIOTYPE

STRING

Biotype of the transcript

INTRONS

STRING

Count of impacted introns out of the total number of introns, specified as "M/N"

GENEID

STRING

Ensembl or RefSeq gene identifier

GENEHGNC

STRING

HUGO/HGNC gene symbol

ISCANONICAL

BOOLEAN

Is the transcript ID the canonical one according to Ensembl?

PROTEINID

STRING

RefSeq or Ensembl protein ID

SOURCEID

NUMERICAL

Gene model: 1=Ensembl, 2=RefSeq

Raw RNAseq data tables for genes and transcripts

These tables will only be available for data sets with ingested RNAseq molecular data.

Table for gene quantification results:

Field Name

Type

Description

GENOMEBUILD

STRING

Genome build, always 'hg38'

STUDY_NAME

STRING

Study designation

SAMPLE_BARCODE

STRING

Sample barcode used in the original VCF

LABEL

STRING

Group label specified during import: Case or Control, Tumor or Normal, etc.

GENE_ID

STRING

Ensembl or RefSeq gene identifier

GID

NUMERIC

Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix

GENE_HGNC

STRING

HUGO/HGNC gene symbol

SOURCE

STRING

Gene model: 1=Ensembl, 2=RefSeq

TPM

NUMERICAL

Transcripts per million

LENGTH

NUMERICAL

The length of the gene in base pairs.

EFFECTIVE_LENGTH

NUMERICAL

The length as accessible to RNA-seq, accounting for insert-size and edge effects.

NUM_READS

NUMERICAL

The estimated number of reads from the gene. The values are not normalized.

The corresponding transcript table uses TRANSCRIPT_ID instead of GENE_ID and GENE_HGNC.

Differential expression tables for genes and transcripts

These tables will only be available for data sets with ingested RNAseq molecular data.

Table for differential gene expression results:

Field Name

Type

Description

GENOMEBUILD

STRING

Genome build, always 'hg38'

STUDY_NAME

STRING

Study designation

SAMPLE_BARCODE

STRING

Sample barcode used in the original VCF

CASE_LABEL

STRING

Study designation

GENE_ID

STRING

Ensembl or RefSeq gene identifier

GID

NUMERIC

Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix

GENE_HGNC

STRING

HUGO/HGNC gene symbol

SOURCE

STRING

Gene model: 1=Ensembl, 2=RefSeq

BASEMEAN

NUMERICAL

FC

NUMERICAL

Fold-change

LFC

NUMERICAL

Log of the fold-change

LFCSE

NUMERICAL

Standard error for log fold-change

PVALUE

NUMERICAL

P-value

CONTROL_SAMPLECOUNT

NUMERICAL

Number of samples used as control

CONTROL_LABEL

NUMERICAL

Label used for controls

The corresponding transcript table uses TRANSCRIPT_ID instead of GENE_ID and GENE_HGNC.

Config Settings

The ICA CLI accepts configuration settings from multiple places, such as environment variables, configuration file, or passed in as command line arguments. When configuration settings are retrieved, the following precedence is used to determine which setting to apply:

  1. Command line options - Passed in with the command such as --access-token

  2. Environment variables - Stored in system environment variables such as ICAV2_ACCESS_TOKEN

  3. Default config file - Stored by default in the ~/.icav2/config.yaml on macOS/Linux and C:\Users\USERNAME\.icav2\.config on Windows

Command Line Options

The following global flags are available in the CLI interface:

Environment Variables

Environment variables provide another way to specify configuration settings. Variable names align with the command line options with the following modifications:

  • Upper cased

  • Prefix ICAV2_

  • All dashes replaced by underscore

For example, the corresponding environment variable name for the --access-token flag is ICAV2_ACCESS_TOKEN.

Disable Retry Mechanism

The environment variable ICAV2_ICA_NO_RETRY_RATE_LIMITING allows to disable the retry mechanism. When it is set to 1, no retries are performed. For any other value, http code 429 will result in 4 retry attempts:

  • after 500 milliseconds

  • after 2 seconds

  • after 10 seconds

  • after 30 seconds

Config File

Upon launching icav2 for the first time, the configuration yaml file is created and the default config settings are set. Enter an alternative server URL or press enter to leave it as the default. Then enter your API Key and press enter.

After installing the CLI, open a terminal window and enter the icav2 command. This will initialize a default configuration file in the home folder at .icav2/config.yaml.

To reset the configuration, use ./icav2 config reset

Resetting the configuration removes the configuration from the host device and cannot be undone. The configuration needs to be recreated.

Configuration settings is stored in the default configuration file:

The file ~/.icav2/.session.ica.yamlon macOS/Linux and C:\Users\USERNAME\.icav2\.session.ica on Windows will contain the access-token and project-id. These are output files and should not be edited as they are automatically updated.

Examples

ICAV2_X_API_KEY

  1. Command line options - Passed as --x-api-key <your_api_key> or -k <your_api_key>

  2. Environment variables - Stored in system as ICAV2_X_API_KEY

  3. Default config file - Use icav2 config set to update ~/.icav2/config.yaml(macOS/Linux) or C:\Users\USERNAME\.icav2\.config (Windows)

Data Transfer

icav2 projects list

The first column of the output (table format, which is default) will show the ID. This is the project ID and will be used in the examples below.

Upload Data

To upload a file called Sample-1_S1_L001_R1_001.fastq.gz to the project, copy the project id and use the command syntax below:

icav2 projectdata upload Sample-1_S1_L001_R1_001.fastq.gz --project-id <project-id>

To verify the file has uploaded, run the following to get a list of all files stored within the specified project:

icav2 projectdata list --project-id <project-id>

This will show a file ID starting with fil. which can then be used to get more information about the file and its attributes:

icav2 projectdata get <file-id> --project-id <project-id>

It is necessary to use --project-id in the above example if not entered into a specific project context. In order to enter a project context use the command below.

icav2 projects enter <project-name or project-id>

This will infer the project id, so that it does not need to be entered into each command.

Note: filenames beginning with / are not allowed, so be careful when entering full path names as those will result in the file being stored on S3 but not being visible in ICA. Likewise, folders containing a / in their individual folder name and folders named '.' are not supported

Download Data

The ICA CLI can also be used to download files via command line. This can be especially helpful if the download destination is a remote server or HPC cluster that you are logged into from a local machine. To download into the current folder, run the following from the command line terminal:

icav2 projectdata download <file-id> ./

The above assumes you have entered into a project context. If this is not the case, either enter the project that contains the desired data, or be sure to supply the --project-id option in the command.

Temporary Credentials

To fetch temporary AWS credentials for given project data, use the command icav2 projectdata temporarycredentials [path or data Id] [flags]. If the path is provided, the project id from the flag --project-id is used. If the --project-id flag is not present, then the project id is taken from the context. The returned AWS credentials for file or folder upload expire after 36 hours.

Data Transfer Options

Installation

Both the CLI and the service connector require x86 architecture. For ARM-based architecture on Mac or Windows, you need to keep x86 emulation enabled. Linux does not support x86 emulation.

Mac/Linux Instructions

After the file is downloaded, place the CLI in a folder that is included in your $PATH environment variable list of paths, typically /usr/local/bin. Open the Terminal application, navigate to the folder where the downloaded CLI file is located (usually your Downloads folder), and run the following command to copy the CLI file to the appropriate folder. If you do not have write access to your /usr/local/bin folder, then you may use sudo prior to the cp command. For example:

If you do not have sudo access on your system, contact your administrator for installation. Alternately, you may place the file in an alternate location and update your $PATH to include this location (see the documentation for your shell to determine how to update this environment variable).

You will also need to make the file executable so that the CLI can run:

Windows Instructions

You will likely want to place the CLI in a folder that is included in your $PATH environment variable list of paths. In Windows, you typically want to save your applications in the C:\Program Files folder. If you do not have write access to that folder, then open a CMD window in administrative mode (hold down the SHIFT key as you right-click on the CMD application and select "Run as administrator"). Type in the following commands (assuming you have saved ica.exe in your current folder):

Then you make sure that the C:\Program Files\Illumina folder is included in your %path% list of paths. Please do a web search for how to add a path to your %path% system environment variable for your particular version of Windows.

Authentication

Authenticate using icav2 config set command. The CLI will prompt for an x-api-key value. Input the API Key generated from the product dashboard here. See the example below (replace EXAMPLE_API_KEY with the actual API Key).

The CLI will save the API Key to the config file as an encrypted value.

If you want to overwrite existing environment values, use the command icav2 config set. To remove an existing configuration/session file, use the command icav2 config reset.\

Check the server and confirm you are authenticated using icav2 config get

If during these steps or in the future you need to reset the authentication, you can do so using the command: icav2 config reset

Notifications

Notifications (Projects > your_project > Project Settings > Notifications ) are events to which you can subscribe. When they are triggered, they deliver a message to an external target system such as emails, Amazon SQS or SNS systems or HTTP post requests. The following table describes available system events to which you can subscribe:

When you subscribe to overlapping event codes such as ICA_EXEC_002 (analysis success) and ICA_EXEC_028 (analysis status change) you will get both notifications when analysis success occurs.

When integrating with external systems, it is advised to not solely rely on ICA notifications, but to also add a polling system to check the status of long-running tasks. For example verifying the status of long-running (>24h) analyses with a 12 hour interval.

Delivery Targets

Event notifications can be delivered to the following delivery targets:

Administration

To create a subscription via the GUI, select Projects > your_project > Project Settings > Notifications > +Create > ICA event. Select an event from the dropdown menu and fill out the requested fields, depending on the selected delivery targets, the fields will change.

Once created, you can disable, enable or delete the notification subscriptions at Projects > your_project > Project Settings > Notifications.

Amazon Resource Policy Settings

In order to allow the platform to deliver events to Amazon SQS or SNS delivery targets, a cross-account policy needs to be added to the target Amazon service.

Substitute the variables in the example above according to the table below.

Amazon SNS Topic

To create a subscription to deliver events to an Amazon SNS topic, one can use either GUI or API endpoints.

To create a subscription via GUI, select Projects > your_project > Project Settings > Notifications > +Create > ICA event. Select an event from the dropdown menu, insert optional filter, select the channel type (SNS), and then insert the ARN from the target SNS topic and the AWS region.

To create a subscription via API, use the endpoint /api/notificationChannel to create a channel and then /api/projects/{projectId}/notificationSubscriptions to create a notification subscription.

Amazon SQS Queue

To create a subscription to deliver events to an Amazon SQS queue, you can use either GUI or API endpoints.

To create a subscription via the GUI, select Projects > your_project > Project Settings > Notifications > +Create > ICA event. Next, select an event from the dropdown menu, choose SQS as the way to receive the notifications, enter your SQS URL, and if applicable for that event, choose a payload version. Not all payload versions are applicable for all events and targets, so the system will filter the options out for you. Finally, you can enter a filter expression to filter which events are relevant for you. Only those events matching the expression will be received.

To create a subscription via API, use the endpoint /api/notificationChannel to create a channel and then /api/projects/{projectId}/notificationSubscriptions to create a notification subscription.

Messages delivered to AWS SQS contain the following event body attributes:

The following example is a Data Updated event payload sent to an AWS SQS delivery target (condensed for readability):

Filtering

Notification subscriptions will trigger for all events matching the configured event type. A filter may be configured on a subscription to limit the matching strategy to only those event payloads which match the filter.

Examples

The below examples demonstrate various filters operating on the above event payload:

  • Filter on a pipeline, with a code that starts with ‘Copy’. You’ll need a regex expression for this:

    [?($.pipeline.code =~ /Copy.*/)]

  • Filter on status (note that the Analysis success event is only emitted when the analysis is successful):

    [?($.status == 'SUCCEEDED')]

    Both payload Version V3 and V4 guarantee the presence of the final state (SUCCEEDED, FAILED, FAILED_FINAL, ABORTED) but depending on the flow (so not every intermediate state is guaranteed):

    • V3 can have REQUESTED - IN_PROGRESS - SUCCEEDED

    • V4 can have the status REQUESTED - QUEUED - INITIALIZING - PREPARING_INPUTS - IN_PROGRESS - GENERATING_OUTPUTS - SUCCEEDED

  • Filter on pipeline, having a technical tag “Demo":

    [?('Demo' in $.pipeline.pipelineTags.technicalTags)]

  • Combination of multiple expressions using &&. It's best practice to surround each individual expression with parentheses:

    [?(($.pipeline.code =~ /Copy.*/) && $.status == 'SUCCEEDED')]

Examples for other events

  • Filtering ICA_DATA_104 on owning project name. The top level keys on which you can filter are under the payload key, so payload is not included in this filter expression.

    [?($.details.owningProjectName == 'my_project_name')]

Custom Events

Custom events enable triggering notification subscriptions using event types beyond the system-defined event types. When creating a custom subscription, a custom event code may be specified to use within the project. Events may then be sent to the specified event code using a POST API with the request body specifying the event payload.

Custom events can be defined using the API. In order to create a custom event for your project please follow the steps below:

  1. Create a new custom event POST {ICA_URL}/ica/rest/api/projects/{projectId}/customEvents a. Your custom event code must be 1-20 characters long, e.g. 'ICA_CUSTOM_123'. b. That event code will be used to reference that custom event type.\

  2. Create a new notification channel POST {ICA_URL}/ica/rest/api/notificationChannels a. If there is already a notification channel created with the desired configuration within the same project, it is also possible to get the existing channel ID using the call GET {ICA_URL}/ica/rest/api/notificationChannels.\

  3. Create a notification subscription POST {ICA_URL}/ica/rest/api/projects/{projectId}/customNotificationSubscriptions. a. Use the event code created in step 1. b. Use the channel ID from step 2.\

To create a subscription via the GUI, select Projects > your_project > Project Settings > Notifications > +Create > Custom event.

Once the steps above have been completed successfully, the call from the first step POST {ICA_URL}/ica/rest/api/projects/{projectId}/customEvents could be reused with the same event code to continue sending events through the same channel and subscription.

Following is a sample Python function used inside an ICA pipeline to post custom events for each failed metric:

Project Connector

Prepare Source Project

  1. Select the source project (project that will own the data to be linked) from the Projects page (Projects > your_source_project).

  2. Select Project Settings > Details.

  3. Select Edit

  4. Under Data Sharing ensure the value is set to Yes

  5. Select Save

Creating a New Project Connector

  1. Select the destination project (the project to which data from the source project will be linked) from the Projects page (Projects > your_destination_project).

  2. From the projects menu, select Project Settings > Connectivity > Project Connector

  3. Select + Create and complete the necessary fields.

    • Check the box next to Active to ensure the connector will be active.

    • Name (required) — Provide a unique name for the connector.

    • Type (required) — Select the data type that will be linked (either File or Sample)

    • Source Project - Select the source poject whose data will be linked to.

    • Tags (optional) — Add tags to restrict what data will be linked via the connector. Any data in the source project with matching tags will be linked to the destination project.

Filter Expression Examples

The examples below will link Files based on the Format field.

  • Only Files with Format of FASTQ will be linked:

    [?($.details.format.code == 'FASTQ')]

  • Only Files with Format of VCF will be linked:

    [?($.details.format.code == 'VCF')]

The examples below will restrict linked Files based on a filenames.

  • Exact match to 'Sample-1_S1_L001_R1_001.fastq.gz':

    [?($.details.name == 'Sample-1_S1_L001_R1_001.fastq.gz')]

  • Ends with '.fastq.gz':

    [?($.details.name =~ /.*\.fastq.gz/)]

  • Starts with 'Sample-':

    [?($.details.name =~ /Sample-.*/)]

  • Contains '_R1_':

    [?($.details.name =~ /.*_R1_.*/)]

The examples below will link Samples based on User Tags and Sample name, respectively.

  • Only Samples with the User Tag 'WGS-Project-1'

    [?('WGS-Project-1' in $.tags.userTags)]

  • Link a Sample with the name 'BSSH_Sample_1':

    [?($.name == 'BSSH_Sample_1')]

From within your ICA Project, Start a Bench Workspace -- See for more details.

Create a Cohort of subjects of interest using .

Variants and mutations will be displayed as one needle plot for each cohort that is part of the comparison (see in this online help for more details)

Data set
Samples
Diseases/Phenotypes
Reference

1kGP-DRAGEN

3202 WGS: 2504 original samples plus 698 relateds

Presumed healthy

DDD

4293 (3664 affected), de novos only

Developmental disorders

EPI4K

356, de novos only

Epilepsy

ASD Cohorts

6786 (4266 affected), de novos only

Autism Spectrum disorder

; ; ; ; ;

De Ligt et al.

100, de novos only

Intellectual disability

Homsy et al.

1213, de novos only

Congenital heart disease (HP:0030680)

Lelieveld et al.

820, de novos only

Intellectual disability

Rauch et al.

51, de novos only

Intellectual disability

Rare Genomes Project

315 WES (112 pedigrees)

Various

https://raregenomes.org/

TCGA

ca. 4200 WES, ca. 4000 RNAseq

12 tumor types

https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga

GEO

RNAseq

Auto-immune disorders, incl. asthma, arthritis, SLE, MS, Crohn's disease, Psoriasis, Sjögren's Syndrome

For GEO/GSE study identifiers, please refer to the in-product list of studies

RNAseq

Kidney diseases

For GEO/GSE study identifiers, please refer to the in-product list of studies

RNAseq

Central nervous system diseases

For GEO/GSE study identifiers, please refer to the in-product list of studies

RNAseq

Parkinson's disease

For GEO/GSE study identifiers, please refer to the in-product list of studies

The project details page contains the properties of the project, such as the location, owner, storage and linked bundles. This is also the place where you add assets in the form of linked .

See for more information.

ICA Cohorts data can be viewed in an ICA Project Base instance as a shared database. A shared database in ICA Base operates as a database view. To use this feature, enable Base for your project prior to starting any ICA Cohorts ingestions. See for more information on enabling this feature in your ICA Project.

After ingesting data into your project, select Phenotypic and Molecular data are available to view in Base. See for instruction on importing data sets into Cohorts.

This variable is used to set the .

The ICA CLI is a useful tool for uploading, downloading and viewing information about data stored within ICA projects. If not already authenticated, please see the section of the CLI help pages. Once the CLI has been authenticated with your account, use the command below to list all projects:

For information on options such as using the ICA API and AWS CLI to transfer data, visit the .

Download links for the CLI can be found at the .

The ICA CLI uses an Illumina API Key to authenticate. An Illumina API Key can be acquired through the product dashboard after logging into a domain. See for instructions to create an Illumina API Key.

Description
Code
Details
Payload
Delivery Target
Description
Value
Variable
Description

See examples for setting policies in and

Attribute
Description

The filter expressions leverage the library for describing the matching pattern to be applied to event payloads. The filter must be in the format [?(<expression>)].

The Analysis Success event delivers a JSON event payload matching the Analysis data model (as output from the API to ).

The platform GUI provides the Project Connector utility which allows data to be linked automatically between projects. This creates a one-way dynamic link for files and samples from source to destination, meaning that additions and deletions of data in the source project also affect the destination project. This differs from or which create editable copies of the data. In the destination project, you can delete data which has been moved or copied and unlink data which has been linked.

one-way
files
folders
erases source data
propagate source edits
editable on destination

Filter Expression (optional) — Enter an expression to restrict which files will be linked via the connector (see below)

Bench
Create a Cohort
Cohort analysis -> Genes
bundles
SnowSQL
Base
Cohorts Import
-t, --access-token string    JWT used to call rest service
-h, --help                   help for icav2
-o, --output-format string   output format (default "table")
-s, --server-url string      server url to direct commands
-k, --x-api-key string       api key used to call rest service
access-token: ""
colormode: none
output-format: table
server-url: ica.illumina.com
x-api-key: !!binary SMWV6dEXAMPLE
sudo cp icav2 /usr/local/bin
sudo chmod a+x /usr/local/bin/icav2
 mkdir "C:\Program Files\Illumina"
 copy icav2.exe "C:\Program Files\Illumina"
icav2 config set
Creating /Users/johngenome/.icav2/config.yaml
Initialize configuration settings [default]
server-url [ica.illumina.com]: 
x-api-key : EXAMPLE_API_KEY
output-format (allowed values table,yaml,json defaults to table) : 
colormode (allowed values none,dark,light defaults to none) :
icav2 config get
access-token: ""
colormode: none
output-format: table
server-url: ica.illumina.com
x-api-key: !!binary HASHED_EXAMPLE_API_KEY

Analysis failure

ICA_EXEC_001

Emitted when an analysis fails

Analysis

Analysis success

ICA_EXEC_002

Emitted when an analysis succeeds

Analysis

Analysis aborted

ICA_EXEC_027

Emitted when an analysis is aborted either by the system or the user

Analysis

Analysis status change

ICA_EXEC_028

Emitted when an state transition on an analysis occurs

Analysis

Base Job failure

ICA_BASE_001

Emitted when a Base job fails

BaseJob

Base Job success

ICA_BASE_002

Emitted when a Base job succeeds

BaseJob

Data transfer success

ICA_DATA_002

Emitted when a data transfer is marked as Succeeded

DataTransfer

Data transfer stalled

ICA_DATA_025

Emitted when data transfer hasn't progressed in the past 2 minutes

DataTransfer

Data <action>

ICA_DATA_100

Subscribing to this serves as a wildcard for all project data status changes and covers those changes that have no separate code. This does not include DataTransfer events or changes that trigger no data status changes such as adding tags to data.

ProjectData

Data linked to project

ICA_DATA_104

Emitted when a file is linked to a project

ProjectData

Data can not be created in non-indexed folder

ICA_DATA_105

Emitted when attempting to create data in a non-indexed folder

ProjectData

Data deleted

ICA_DATA_106

Emitted when data is deleted

ProjectData

Data created

ICA_DATA_107

Emitted when data is created

ProjectData

Data uploaded

ICA_DATA_108

Emitted when data is uploaded

ProjectData

Data updated

ICA_DATA_109

Emitted when data is updated

ProjectData

Data archived

ICA_DATA_110

Emitted when data is archived

ProjectData

Data unarchived

ICA_DATA_114

Emitted when data is unarchived

ProjectData

Job status changed

ICA_JOB_001

Emitted when a job changes status (INITIALIZED, WAITING_FOR_RESOURCES, RUNNING, STOPPED, SUCCEEDED, PARTIALLY_SUCCEEDED, FAILED)

JobId

Sample completed

ICA_SMP_002

Emitted when a sample is marked as completed

ProjectSample

Sample linked to a project

ICA_SMP_003

Emitted when a sample is linked to a project

ProjectSample

Workflow session start

ICA_WFS_001

Emitted when workflow is started

WorkflowSession

Workflow session failure

ICA_WFS_002

Emitted when workflow fails

WorkflowSession

Workflow session success

ICA_WFS_003

Emitted when workflow succeeds

WorkflowSession

Workflow session aborted

ICA_WFS_004

Emitted when workflow is aborted

WorkflowSession

Mail

E-mail delivery

E-mail Address

Sqs

AWS SQS Queue

AWS SQS Queue URL

Sns

AWS SNS Topic

AWS SNS Topic ARN

Http

Webhook (POST request)

URL

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Allow",
         "Principal":{
            "AWS":"arn:aws:iam::<platform_aws_account>:root"
         },
         "Action":"<action>",
         "Resource": "<arn>"
      }
   ]
}

platform_aws_account

The platform AWS account ID: 079623148045

action

For SNS use SNS:Publish. For SQS, use SQS:SendMessage

arn

The Amazon Resource Name (ARN) of the target SNS topic or SQS queue

correlationId

GUID used to identify the event

timestamp

Date when the event was sent

eventCode

Event code of the event

description

Description of the event

payload

Event payload

{
    "correlationId": "2471d3e2-f3b9-434c-ae83-c7c7d3dcb4e0",
    "timestamp": "2022-10-06T07:51:09.128Z",
    "eventCode": "ICA_DATA_100",
    "description": "Data updates",
    "payload": {
        "id": "fil.8f6f9511d70e4036c60908daa70ea21c",
        ...
    }
}
{
  "id": "0c2ed19d-9452-4258-809b-0d676EXAMPLE",
  "timeCreated": "2021-09-20T12:23:18Z",
  "timeModified": "2021-09-20T12:43:02Z",
  "ownerId": "15d51d71-b8a1-4b38-9e3d-74cdfEXAMPLE",
  "tenantId": "022c9367-8fde-48fe-b129-741a4EXAMPLE",
  "reference": "210920-1-CopyToolDev-9d78096d-35f4-47c9-b9b6-e0cbcEXAMPLE",
  "userReference": "210920-1",
  "pipeline": {
    "id": "20261676-59ac-4ea0-97bd-8a684EXAMPLE",
    "timeCreated": "2021-08-25T01:49:41Z",
    "timeModified": "2021-08-25T01:49:41Z",
    "ownerId": "15d51d71-b8a1-4b38-9e3d-74cdfEXAMPLE",
    "tenantId": "022c9367-8fde-48fe-b129-741a4EXAMPLE",
    "code": "CopyToolDev",
    "description": "CopyToolDev",
    "language": "CWL",
    "pipelineTags": {
      "technicalTags": ["Demo"]
    }
  },
  "status": "SUCCEEDED",
  "startDate": "2021-09-20T12:23:21Z",
  "endDate": "2021-09-20T12:43:00Z",
  "summary": "",
  "finishedSteps": 0,
  "totalSteps": 1,
  "tags": {
    "technicalTags": [],
    "userTags": [],
    "referenceTags": []
  }
}
def post_custom_event(metric_name: str, metric_value: str, threshold: str, sample_name: str):
    api_url = f"{ICA_HOST}/api/projects/{PROJECT_ID}/customEvents"
    headers = {
        "Content-Type": "application/vnd.illumina.v3+json",
        "accept": "application/vnd.illumina.v3+json",
        "X-API-Key": f"{ICA_API_KEY}"
    }
    content = {\"code\": \"ICA_CUSTOM_123\", \"content\": { \"metric_name\": metric_name, \"metric_value\": metric_value,\"threshold\": threshold, \"sample_name\": sample_name}}
    json_data = json.dumps(content)
    response = requests.post(api_url, data=json_data, headers=headers)

    if response.status_code != 204:
        print(f"[EVENT-ERROR] Could not post metric failure event for the metric {metric_name} (sample {sample_name}).")
                

move

x

x

x

x

x

copy

x

x

x

x

manual link

x

x

x

project connector

x

x

x

Output Format

The CLI supports outputs in table, JSON, and YAML formats. The format is set using the output-format configuration setting through a command line option, environment variable, or configuration file.

Dates are output as UTC times when using JSON/YAML output format and local times when using table format.

To set the output format, use the following setting:

--output-format <string>

  • json - Outputs in JSON format

  • yaml - Outputs in YAML format

  • table - Outputs in tabular format

Multi-Omic Cancer Workflow
Project Connector Setup

Base: Access Tables via Python

You can access the databases and tables within the Base module using Python from your local machine. Once retrieved as e.g. pandas object, the data can be processed further. In this tutorial, we will describe how you could create a Python script which will retrieve the data and visualize it using Dash framework. The script will contain the following parts:

  • Importing dependencies and variables.

  • Function to fetch the data from Base table.

  • Creating and running the Dash app.

Importing dependencies and variables

This part of the code imports the dependencies which have to be installed on your machine (possibly with pip). Furthermore, it imports the variables API_KEY and PROJECT_ID from the file named config.

from dash import Dash, html, dcc, callback, Output, Input
import plotly.express as px

from config import API_KEY, PROJECT_ID
import requests
import snowflake.connector
import pandas as pd

Function to fetch the data from Base table

We will be creating a function called fetch_data to obtain the data from Base table. It can be broken into several logically separated parts:

  • Retrieving the token to access the Base table together with other variables using API.

  • Establishing the connection using the token.

  • SQL query itself. In this particular example, we are extracting values from two tables Demo_Ingesting_Metrics and BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL. The table Demo_Ingesting_Metrics contains various metrics from DRAGEN analyses (e.g. the number of bases with quality at least 30 Q30_BASES) and metadata in the column ica which needs to be flattened to access the value Execution_reference. Both tables are joined on this Execution_reference value.

  • Fetching the data using the connection and the SQL query.

Here is the corresponding snippet:

def fetch_data():
    # Your data fetching and processing code here
    # retrieving the Base oauth token
    url = 'https://ica.illumina.com/ica/rest/api/projects/' + PROJECT_ID +  '/base:connectionDetails'

    # set the API headers
    headers = {
                'X-API-Key': API_KEY,
                'accept': 'application/vnd.illumina.v3+json'
                }

    response = requests.post(url, headers=headers)
    ctx = snowflake.connector.connect(
        account=response.json()['dnsName'].split('.snowflakecomputing.com')[0],
        authenticator='oauth',
        token=response.json()['accessToken'], 
        database=response.json()['databaseName'],
        role=response.json()['roleName'],
        warehouse=response.json()['warehouseName']
    )
    cur = ctx.cursor()
    sql = '''
    WITH flattened_Demo_Ingesting_Metrics AS (
        SELECT 
            flattened.value::STRING AS execution_reference_Demo_Ingesting_Metrics,
            t1.SAMPLEID,
            t1.VARIANTS_TOTAL_PASS,
            t1.VARIANTS_SNPS_PASS,
            t1.Q30_BASES,
            t1.READS_WITH_MAPQ_3040_PCT
        FROM 
            Demo_Ingesting_Metrics t1,
            LATERAL FLATTEN(input => t1.ica) AS flattened
        WHERE 
            flattened.key = 'Execution_reference'
    ) SELECT 
        f.execution_reference_Demo_Ingesting_Metrics,
        f.SAMPLEID,
        f.VARIANTS_TOTAL_PASS,
        f.VARIANTS_SNPS_PASS,
        t2."EXECUTION_REFERENCE",
        t2.END_DATE,
        f.Q30_BASES,
        f.READS_WITH_MAPQ_3040_PCT
    FROM 
        flattened_Demo_Ingesting_Metrics f
    JOIN 
        BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL t2
    ON 
        f.execution_reference_Demo_Ingesting_Metrics = t2."EXECUTION_REFERENCE";
    '''

    cur.execute(sql)
    data = cur.fetch_pandas_all()
    return data

df = fetch_data()

Creating and running the Dash app

Once the data is fetched, it is visualized in an app. In this particular example, a scatter plot is presented with END_DATE as x axis and the choice of the customer from the dropdown as y axis.


app = Dash(__name__)
#server = app.server


app.layout = html.Div([
    html.H1("My Dash Dashboard"),
    
    html.Div([
        html.Label("Select X-axis:"),
        dcc.Dropdown(
            id='x-axis-dropdown',
            options=[{'label': col, 'value': col} for col in df.columns],
            value=df.columns[5]  # default value
        ),
        html.Label("Select Y-axis:"),
        dcc.Dropdown(
            id='y-axis-dropdown',
            options=[{'label': col, 'value': col} for col in df.columns],
            value=df.columns[2]  # default value
        ),
    ]),
    
    dcc.Graph(id='scatterplot')
])


@callback(
    Output('scatterplot', 'figure'),
    Input('y-axis-dropdown', 'value')
)
def update_graph(value):
    return px.scatter(df, x='END_DATE', y=value, hover_name='SAMPLEID')

if __name__ == '__main__':
    app.run(debug=True)

Now we can create a single Python script called dashboard.py by concatenating the snippets and running it. The dashboard will be accessible in the browser on your machine.

DRAGEN reanalysis of the 1000 Genomes Dataset
McRae et al., Nature 19:1194-1196
Epi4K Consortium, Nature 501:217-221
Iossifov et al. Neuron 74:285-299
Iossifov et al. Nature 498:216-221
O'Roak et al. Nature 485:246-250
Sanders et al. Nature 485:237-241
Sanders et al. Neuron 87:1215-1233
De Rubeis et al. Nature 515:209-215
De Ligt et al., N Engl J Med 367:1921-1929
Homsy et al., Science 350:1262-1266
Lelieveld et al., Nature Neuroscience19:1194-1196
Rauch et al., Lancet 380:1674-1682
API Key
Authentication
Data Transfer Options tutorial
Release History
Amazon SQS
Amazon SNS
JsonPath
retrieve a project analysis
copying
moving
API Keys
Filter Expression Examples
CLI command

1 DRAGEN pipelines running on fpga2 compute type will incur a DRAGEN License cost of 0.10 iCredits per gigabase of data processed, with additional discounts as shown below.

  • 80 or less gigabase per sample - no discount - 0.10 iCredits per gigabase

  • > 80 to 160 gigabase per sample - 20% discount - 0.08 iCredits per gigabase

  • > 160 to 240 gigabase per sample - 30% discount - 0.07 iCredits per gigabase

  • > 240 to 320 gigabase per sample - 40% discount - 0.06 iCredits per gigabase

  • > 320 and more gigabase per sample - 50% discount - 0.05 iCredits per gigabase

If your DRAGEN job fails, no DRAGEN license cost will be charged.

1 DRAGEN pipelines running on fpga2 compute type will incur a DRAGEN License cost of 0.10 iCredits per gigabase of data processed, with additional discounts as shown below.

  • 80 or less gigabase per sample - no discount - 0.10 iCredits per gigabase

  • > 80 to 160 gigabase per sample - 20% discount - 0.08 iCredits per gigabase

  • > 160 to 240 gigabase per sample - 30% discount - 0.07 iCredits per gigabase

  • > 240 to 320 gigabase per sample - 40% discount - 0.06 iCredits per gigabase

  • > 320 and more gigabase per sample - 50% discount - 0.05 iCredits per gigabase

If your DRAGEN job fails, no DRAGEN license cost will be charged.

Software Release Notes

This page contains the release notes for the current release. See the subpages for historic release notes.

2025 June 2 - ICA v2.36.0

New Features and Enhancements:

  • General

    • Data providers are now redirected to the project details page instead of the project data view, improving navigation.

    • Improved the UI for creating or modifying tools, enhancing usability.

    • There will be a change in the next release where users are no longer able to edit connectors of other users through the API. This will be made consistent with the UI. \

  • Data Management

    • .yml extension is now recognized as YAML format.

  • Flow

    • The limit of analysis reports has been increased from 20 to 200 per analysis report pattern.

    • Added “dropValueWhenDisabled” as an optional field attribute; if set to true and the field is disabled, its values are excluded from the submitted result.

    • The endpoint GET /api/projects/{projectId}/analyses/{analysisId}/reports now returns report configs with optional titles and matching data. These configs can be used in all POST /api/projects/{projectId}/pipelines:create<pipelinetype>Pipeline endpoints.

    • Added support for searching pipelines by ID in the pipeline grid.

    • The price of aborted analyses is now visible in the UI.

    • New endpoint: POST /api/projects/{projectId}/pipelines/{pipelineId}/generalAttributes to update general attributes of a project pipeline.

    • Users can now create a new output folder directly from the pipeline output selection screen.

    • Added “Standard 3Xlarge” as a new compute type for Flow & Bench.

  • Cohorts

    • Cohorts view page now has collapsable items for "show charts" and "query details" on the right corner as an icon thus creating more real estate for cohort subject list.

Fixed Issues

  • General

    • Added option to return from editing project and bundle details.

    • Updated the CLI so that the Storage-config-sub-folder is not required when creating a project with your own storage by using the flag --storage-config.

    • Improved information displayed on screen when a public domain non-basespace user tries to access ICA.

    • Improved searching on size of Docker images.

    • Updated tool repository screen to new standard: ▪

      • The selection window for docker images while creating a new tool has been updated to be in line with the new layout.

      • Removed the unnecessary "Technical" option from the tool inputs tab when editing tools.

    • When linking a bundle which includes base to a project that does not have base enabled, the following scenarios apply:

      • Base is not allowed due to entitlements: The bundle will be linked, and you will be given access to the data, pipelines, samples, ... but you will not see the base tables in your project.

      • Base is allowed, but not yet enabled for the project. The bundle will be linked and you will be given access to the data, pipelines, samples, ...but you will not see the base tables in your project and base remains disabled until you enable it.

    • API: /api/projects/{projectId} returns a new attribute projectOwner to identify the current project owner in addition to the existing ownerid of the project creator.

    • Owners of notifications are now able to delete those notification subscriptions. Previously, only the project administrator could do this.

    • Improved error message when trying to share storage credentials from another user.

  • Data Management

    • Performance improvement when creating a data copy in batch in the GUI

    • Improved connector resilience when downloading data elements with billing mode project.

    • There will be a change in one of the upcoming releases where users are no longer able to edit connectors of other users through the API. This will be made consistent with the UI.

    • Improved navigation in the edit connector screen.

  • Flow

    • The currently selected report file is now highlighted on the reports tab.

    • Fixed an issue which caused show failed steps to fail when opening in an analysis

    • Corrected API documentation for /api/projects/{projectId}/analyses/{analysisId}/outputs

    • Fixed an issue which caused files and folders to not appear correctly when using the grid's search field in the dialog when selecting input files

  • Bench

    • Fixed issue which caused object references an unsaved transient instance during preview of new tools in bench workspaces

  • Cohorts

    • Fixed issue where RNASeq data ingestions are failing.

    • Fixed issue where UI elements are scrambled upon opening subject timeline view.

    • Fixed issue where PRKN gene not properly loading pertinent information on the variant needle plot

    • Fixed issue where phenotype tabs plot is missing when loading molecular breakdown plot on molecular breakdown tab first. \

      \

Cloud Analysis Auto-launch

Nextflow Pipeline

In this tutorial, we will show how to create and launch a pipeline using the Nextflow language in ICA.

Create the pipeline

After creating the project, select the project from the Projects view to enter the project. Within the project, navigate to the Flow > Pipelines view in the left navigation pane. From the Pipelines view, click +Create Pipeline > Nextflow > XML based to start creating the Nextflow pipeline.

In the Nextflow pipeline creation view, the Information tab is used to add information about the pipeline. Add values for the required Code (unique pipeline name) and Description fields.

  • Add the container directive to each process with the latest ubuntu image. If no Docker image is specified, public.ecr.aws/lts/ubuntu:22.04_stable is used as default.

  • Add the publishDir directive with value 'out' to the reverse process.

  • Modify the reverse process to write the output to a file test.txt instead of stdout.

The description of the pipeline from the linked Nextflow docs:

This example shows a pipeline that is made of two processes. The first process receives a FASTA formatted file and splits it into file chunks whose names start with the prefix seq_.

The process that follows, receives these files and it simply reverses their content by using the rev command line tool.

Syntax example:

process iwantstandardsmallresources {
    cpus 2
    memory '8 GB'
    ...

Navigate to the Nextflow files > main.nf tab to add the definition to the pipeline. Since this is a single file pipeline, we won't need to add any additional definition files. Paste the following definition into the text editor:

#!/usr/bin/env nextflow

params.in = "$HOME/sample.fa"

sequences = file(params.in)
SPLIT = (System.properties['os.name'] == 'macOS' ? 'gcsplit' : 'csplit')

process splitSequences {

    container 'public.ecr.aws/lts/ubuntu:22.04'

    input:
    file 'input.fa' from sequences

    output:
    file 'seq_*' into records

    """
    $SPLIT input.fa '%^>%' '/^>/' '{*}' -f seq_
    """

}

process reverse {
    
    container 'public.ecr.aws/lts/ubuntu:22.04'
    publishDir 'out'

    input:
    file x from records
    
    output:
    file 'test.txt'

    """
    cat $x | rev > test.txt
    """
}

Next we'll create the input form used when launching the pipeline. This is done through the XML Configuration tab. Since the pipeline takes in a single FASTA file as input, the XML-based input form will include a single file input.

Paste the below XML input form into the XML CONFIGURATION text editor. Click the Generate button to preview the launch form fields.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition">
    <pd:dataInputs>
        <pd:dataInput code="in" format="FASTA" type="FILE" required="true" multiValue="false">
            <pd:label>in</pd:label>
            <pd:description>fasta file input</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps/>
</pd:pipeline>

With the definition added and the input form defined, the pipeline is complete.

On the Documentation tab, you can fill out additional information about your pipeline. This information will be presented under the Documentation tab whenever a user starts a new analysis on the pipeline.

Click the Save button at the top right. The pipeline will now be visible from the Pipelines view within the project.

Launch the pipeline

To upload the FASTA file to the project, first navigate to the Data section in the left navigation pane. In the Data view, drag and drop the FASTA file from your local machine into the indicated section in the browser. Once the file upload completes, the file record will show in the Data explorer. Ensure that the format of the file is set to "FASTA".

Now that the input data is uploaded, we can proceed to launch the pipeline. Navigate to the Analyses view and click the button to Start Analysis. Next, select your pipeline from the list. Alternatively you can start your pipeline from Projects > your_project > Flow > Pipelines > Start new analysis.

In the Launch Pipeline view, the input form fields are presented along with some required information to create the analysis.

  • Enter a User Reference (identifier) for the analysis. This will be used to identify the analysis record after launching.

  • Set the Entitlement Bundle (there will typically only be a single option).

  • In the Input Files section, select the FASTA file for the single input file. (chr1_GL383518v1_alt.fa)

  • Set the Storage size to small. This will attach a 1.2TB shared file system to the environment used to run the pipeline.

With the required information set, click the button to Start Analysis.

Monitor Analysis

After launching the pipeline, navigate to the Analyses view in the left navigation pane.

The analysis record will be visible from the Analyses view. The Status will transition through the analysis states as the pipeline progresses. It may take some time (depending on resource availability) for the environment to initialize and the analysis to move to the In Progress status.

Click the analysis record to enter the analysis details view.

Once the pipeline succeeds, the analysis record will show the "Succeeded" status. Do note that this may take considerable time if it is your first analysis because of the required resource management. (in our example, the analysis took 28 minutes)

From the analysis details view, the logs produced by each process within the nextflow pipeline are accessible via the Logs tab.

View Results

Analysis outputs are written to an output folder in the project with the naming convention {Analysis User Reference}-{Pipeline Code}-{GUID}. (1)

Inside of the analysis output folder are the files output by the analysis processes written to the 'out' folder. In this tutorial, the file test.txt (2) is written to by the reverse process. Navigating into the analysis output folder, clicking into the test.txt file details, and opening the VIEW tab (3) shows the output file contents.

The "Download" button (4) can be used to download the data to the local machine.

Nextflow DRAGEN Pipeline

Prerequisites

After a project has been created, a DRAGEN bundle must be linked to a project to obtain access to a DRAGEN docker image. Enter the project by clicking on it, and click Edit in the Project Details page. From here, you can link a DRAGEN Demo Tool bundle into the project. The bundle that is selected here will determine the DRAGEN version that you have access to. For this tutorial, you can link DRAGEN Demo Bundle 3.9.5. Once the bundle has been linked to your project, you can now access the docker image and version by navigating back to the All Projects page, clicking on Docker Repository, and double clicking on the docker image dragen-ica-4.0.3. The URL of this docker image will be used later in the container directive for your DRAGEN process defined in Nextflow.

Creating the pipeline

Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click +Create Pipeline > Nextflow > XML based to start creating a Nextflow pipeline.

In the Nextflow pipeline creation view, the Details tab is used to add information about the pipeline. Add values for the required Code (pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values. For the customized DRAGEN pipeline, Nextflow Version must be changed to 22.04.3.

Next, add the Nextflow pipeline definition by navigating to the Nextflow files > MAIN.NF tab. You will see a text editor. Copy and paste the following definition into the text editor. Modify the container directive by replacing the current URL with the URL found in the docker image dragen-ica-4.0.3.

nextflow.enable.dsl = 2

process DRAGEN {

    // The container must be a DRAGEN image that is included in an accepted bundle and will determine the DRAGEN version
    container '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/7ecddc68-f08b-4b43-99b6-aee3cbb34524:latest'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'fpga-medium'
    pod annotation: 'volumes.illumina.com/scratchSize', value: '1TiB'

    // ICA will upload everything in the "out" folder to cloud storage 
    publishDir 'out', mode: 'symlink'

    input:
        tuple path(read1), path(read2)
        val sample_id
        path ref_tar

    output:
        stdout emit: result
        path '*', emit: output

    script:
        """
        set -ex
        mkdir -p /scratch/reference
        tar -C /scratch/reference -xf ${ref_tar}
        
        /opt/edico/bin/dragen --partial-reconfig HMM --ignore-version-check true
        /opt/edico/bin/dragen --lic-instance-id-location /opt/instance-identity \\
            --output-directory ./ \\
            -1 ${read1} \\
            -2 ${read2} \\
            --intermediate-results-dir /scratch \\
            --output-file-prefix ${sample_id} \\
            --RGID ${sample_id} \\
            --RGSM ${sample_id} \\
            --ref-dir /scratch/reference \\
            --enable-variant-caller true
        """
}

workflow {
    DRAGEN(
        Channel.of([file(params.read1), file(params.read2)]),
        Channel.of(params.sample_id),
        Channel.fromPath(params.ref_tar)
    )
}

This pipeline takes two FASTQ files, one reference file and one sample_id parameter as input.

Paste the following XML input form into the XML CONFIGURATION text editor.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
    <pd:dataInputs>
        <pd:dataInput code="read1" format="FASTQ" type="FILE" required="true" multiValue="false">
            <pd:label>FASTQ Read 1</pd:label>
            <pd:description>FASTQ Read 1</pd:description>
        </pd:dataInput>
        <pd:dataInput code="read2" format="FASTQ" type="FILE" required="true" multiValue="false">
            <pd:label>FASTQ Read 2</pd:label>
            <pd:description>FASTQ Read 2</pd:description>
        </pd:dataInput>
        <pd:dataInput code="ref_tar" format="TAR" type="FILE" required="true" multiValue="false">
            <pd:label>Reference</pd:label>
            <pd:description>Reference TAR</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps>
        <pd:step execution="MANDATORY" code="General">
            <pd:label>General</pd:label>
            <pd:description></pd:description>
            <pd:tool code="generalparameters">
                <pd:label>General Parameters</pd:label>
                <pd:description></pd:description>
                <pd:parameter code="sample_id" minValues="1" maxValues="1" classification="USER">
                    <pd:label>Sample ID</pd:label>
                    <pd:description></pd:description>
                    <pd:stringType/>
                    <pd:value></pd:value>
                </pd:parameter>
            </pd:tool>
        </pd:step>
    </pd:steps>
</pd:pipeline>

Click the Generate button (at the bottom of the text editor) to preview the launch form fields.

Click the Save button to save the changes.

The dataInputs section specifies file inputs, which will be mounted when the workflow executes. Parameters defined under the steps section refer to string and other input types.

Each of the dataInputs and parameters can be accessed in the Nextflow within the workflow's params object named according to the code defined in the XML (e.g. params.sample_id).

Running the pipeline

If you have no test data available, you need to link the Dragen Demo Bundle to your project at Projects > your_project > Project Settings > Details > Linked Bundles.

Go to the pipelines page from the left navigation pane. Select the pipeline you just created and click Start New Analysis.

Fill in the required fields indicated by red "*" sign and click on Start Analysis button.

You can monitor the run from the analysis page.

Once the Status changes to Succeeded, you can click on the run to access the results page.

Useful Links

Nextflow: Pipeline Lift

Nextflow Pipeline Liftover

nextflow-to-icav2-config

This is not an official Illumina product, but is intended to make your Nextflow experience in ICA more fruitful.

Some additional repos that can help with your ICA experience can be found below:

Local testing your Nextflow pipeline after using these scripts

What do these scripts do?

What these scripts do:

  1. Parse configuration files and the Nextflow scripts (main.nf, workflows, subworkflows, modules) of a pipeline and update the configuration of the pipeline with pod directives to tell ICA what compute instance to run

  • Strips out parameters that ICA utilizes for workflow orchestration

  • Migrates manifest closure to conf/base.ica.config file

  • Ensures that docker is enabled

  1. Adds workflow.onError (main.nf, workflows, subworkflows, modules) to aid troubleshooting

  2. Modifies the processes that reference scripts and tools in the bin/ folder of a pipeline's projectDir, so that when ICA orchestrates your Nextflow pipeline, it can find and properly execute your pipeline process

  3. Additional edits to ensure your pipeline runs more smoothly on ICA

ICA concepts to better understand ICA liftover of Nextflow pipelines

  • Nextflow workflows on ICA are orchestrated by kubernetes and require a parameters XML file containing data inputs (i.e. files + folders) and other string-based options for all configurable parameters to properly be passed from ICA to your Nextflow workflows

  • Nextflow processes will need to contain a reference to a container --- a Docker image that will run that specific process

  • Nextflow processes will need a pod annotation specified for ICA to know what instance type to run the process.

Using these scripts

The scripts mentioned below can be run in a docker image keng404/nextflow-to-icav2-config:0.0.3

This has:

  • nf-core installed

  • All Rscripts in this repo with relevant R libraries installed

  • The ICA CLI installed, to allow for pipeline creation and CLI templates to request pipeline runs after the pipeline is created in ICA

You'll likely need to run the image with a docker command like this for you to be able to run git commands within the container:

docker run -itv `pwd`:`pwd` -e HOME=`pwd` -u $(id -u):$(id -g) keng404/nextflow-to-icav2-config:0.0.3 /bin/bash

where pwd is your $HOME folder.

Prerequitsites

STEP 0 Github credentials

STEP 1 [OPTIONAL] : create JSON of nf-core pipeline metadata or specify pipeline of interest

If you have a specific pipeline from Github, you can skip this statement below.

You'll first need to download the python module from nf-core via a pip install nf-core command. Then you can use nf-core list --json to return a JSON metadata file containing current pipelines in the nf-core repository.

You can choose which pipelines to git clone, but as a convenience, the wrapper nf-core.conversion_wrapper.R will perform a git pull, parse nextflow_schema.json files and generate parameter XML files, and then read configuration and Nextflow scripts and make some initial modifications for ICA development. Lastly, these pipelines are created in an ICA project of your choosing, so you will need to generate and download an API key from the ICA domain of your choosing.

STEP 2: Obtain API key file

STEP 3: Create a project in ICA

STEP 4: Download and configure the ICA CLI (see STEP 2):

The Project view should be the default view after logging into your private domain (https://my_domain.login.illumina.com) and clicking on your ICA 'card' ( This will redirect you to https://illumina.ica.com/ica).

Let's do some liftovers

Rscript nf-core.conversion_wrapper.R --input {PIPELINE_JSON_FILE} --staging_directory {DIRECTORY_WHERE_NF_CORE_PIPELINES_ARE_LOCATED} --run-scripts {DIRECTORY_WHERE_THESE_R_SCRIPTS_ARE_LOCATED}  --intermediate-copy-template {DIRECTORY_WHERE_THESE_R_SCRIPTS_ARE_LOCATED}/dummy_template.txt --create-pipeline-in-ica --api-key-file {API_KEY_FILE} --ica-project-name {ICA_PROJECT_NAME} --nf-core-mode 

[OPTIONAL PARAMETER]
--git-repos {GIT_HUB_URL}
--pipeline-dirs {LOCAL_DIRECTORY_WITH_NEXTFLOW_PIPELINE}

GIT_HUB_URL can be specified to grab pipeline code from github. If you intend to liftover anything in the master branch, your GIT_HUB_URL might look like https://github.com/keng404/my_pipeline. If there is a specific release tag you intend to use, you can use the convention https://github.com/keng404/my_pipeline:my_tag.

Alternatively, if you have a local copy/version of a Nextflow pipeline you'd like to convert and use in ICA, you can use the --pipeline-dirs argument to specify this.

In summary, you will need the following prerequisites, either to run the wrapper referenced above or to carry out individual steps below.

  1. git clone nf-core pipelines of interest

  2. Install the python module nf-core and create a JSON file using the command line nf-core list --json > {PIPELINE_JSON_FILE}

A detailed step-by-step breakdown of what nf-core.conversion_wrapper.R does for each Nextflow pipeline

Rscript create_xml/nf-core.json_to_params_xml.R --json {PATH_TO_SCHEMA_JSON}
  • A Nextflow schema JSON is generated by nf-core's python library nf-core

  • nf-core can be installed via a pip install nf-core command

nf-core schema build -d {PATH_NF-CORE_DIR}

Step 2: Create a nextflow.config and a base config file so that it is compatible with ICA.

Rscript ica_nextflow_config.test.R --config-file {DEFAULT_NF_CONFIG} [OPTIONAL: --base-config-files  {BASE_CONFIG}] [--is-simple-config]

This script will update your configuration files so that it integrates better with ICA. The flag --is-simple-config will create a base config file from a template. This flag will also be active if no arguments are supplied to --base-config-files.

Step 3: Add helper-debug code and other modifications to your Nextflow pipeline

Rscript develop_mode.downstream.R  --config-file {DEFAULT_NF_CONFIG} --nf-script {MAIN_NF_SCRIPT} --other-workflow-scripts {OTHER_NF_SCRIPT1 } --other-workflow-scripts {OTHER_NF_SCRIPT2} ...  --other-workflow-scripts {OTHER_NF_SCRIPT_N}

This step adds some updates to your module scripts to allow for easier troubleshooting (i.e. copy work folder back to ICA if an analysis fails). It also allows for ICA's orchestration of your Nextflow pipeline to properly handle any script/binary in your bin/ folder of your pipeline $projectDir.

Step 4: Update XML to add parameter options --- if your pipeline uses/could use iGenomes

Rscript update_xml_based_on_additional_configs.R --config-file {DEFAULT_NF_CONFIG} --parameters-xml {PARAMETERS_XML}

You may have to edit your {PARAMETERS_XML} file if these edits are unnecessary.

Step 5: Sanity check your pipeline code to see if it is valid prior to uploading it into ICA

Rscript testing_pipelines/test_nextflow_script.R --nextflow-script {MAIN_NF_SCRIPT} --docker-image nextflow/nextflow:22.04.3 --nextflow-config {DEFAULT_NF_CONFIG}

Currently ICA supports Nextflow versions nextflow/nextflow:22.04.3 and nextflow/nextflow:20.10.0 (with 20.10.0 to be deprecated soon)

Step 6: Create a pipeline in ICA by using the following helper script nf-core.create_ica_pipeline.R

Rscript nf-core.create_ica_pipeline.R --nextflow-script {NF_SCRIPT} --workflow-language nextflow --parameters-xml {PARAMETERS_XML} --nf-core-mode --ica-project-name {NAME} --pipeline-name {NAME} --api-key-file {PATH_TO_API_KEY_FILE}

Developer mode --- if you plan to develop or modify a pipeline in ICA

Add the flag --developer-mode to the command line above if you have custom groovy libraries or modules files referenced in your workflow. When this flag is specified, the script will upload these files and directories to ICA and update the parameters XML file to allow you to specify directories under the parameters project_dir and files under input_files. This will ensure that these files and directories will be placed in the $workflow.launchDir when the pipeline is invoked.

How to run a pipeline in ICA via CLI

As a convenience, you can also get a templated CLI command to help run a pipeline (i.e. submit a pipeline request) in ICA via the following:

Rscript create_cli_templates_from_xml.R --workflow-language {xml or nextflow} --parameters-xml {PATH_TO_PARAMETERS_XML}

There will be a corrsponding JSON file (i.e. a file with a file extension *ICAv2_CLI_template.json) that saves these values that one could modify and configure to build out templates or launch the specific pipeline run you desire. You can specify the name of this JSON file with the parameter --output-json.

Once you modify this file, you can use --template-json and specify this file to create the CLI you can use to launch your pipeline.

If you have a previously successful analysis with your pipeline, you may find this approach more useful.

Creating your own tests/pipeline runs via the CLI

Where possible, these scripts search for config files that refer to a test (i.e. test.config,test_full.config,test*config) and creates a boolean parameter params.ica_smoke_test that can be toggled on/off as a sanity check that the pipeline works as intended. By default, this parameter is set to false.

When set to true, these test config files are loaded in your main nextflow.config.

Additional todos

Nextflow: Pipeline Lift: RNASeq

How to lift a simple NextFlow pipeline?

This approach is applicable in situations where your main.nf file contains all your pipeline logic and illustrates what the liftover process would look like.

Creating the pipeline

Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create pipeline > Nextflow > XML based button to start creating a Nextflow pipeline.

In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.

How to modify the main.nf file

#!/usr/bin/env nextflow

+nextflow.enable.dsl=2
 
/*
 * The following pipeline parameters specify the reference genomes
 * and read pairs and can be provided as command line options
 */
-params.reads = "$baseDir/data/ggal/ggal_gut_{1,2}.fq"
-params.transcriptome = "$baseDir/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa"
params.outdir = "results"

+println("All input parameters: ${params}")
 
workflow {
-    read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true )
+    read_pairs_ch = channel.fromFilePairs("${params.reads}/*_{1,2}.fq")
 
-    INDEX(params.transcriptome)
+    INDEX(Channel.fromPath(params.transcriptome))
     FASTQC(read_pairs_ch)
     QUANT(INDEX.out, read_pairs_ch)
}
 
process INDEX {
-    tag "$transcriptome.simpleName"
+    container 'quay.io/nextflow/rnaseq-nf:v1.1'
+    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-medium'
 
    input:
    path transcriptome
 
    output:
    path 'index'
 
    script:
    """
    salmon index --threads $task.cpus -t $transcriptome -i index
    """
}
 
process FASTQC {
+    container 'quay.io/nextflow/rnaseq-nf:v1.1'
+    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-medium'

    tag "FASTQC on $sample_id"
    publishDir params.outdir
 
    input:
    tuple val(sample_id), path(reads)
 
    output:
    path "fastqc_${sample_id}_logs"
 
    script:
-    """
-    fastqc.sh "$sample_id" "$reads"
-    """
+    """
+    # we need to explicitly specify the output directory for fastqc tool
+    # we are creating one using sample_id variable
+    mkdir fastqc_${sample_id}_logs
+    fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads}
+    """
}
 
process QUANT {
+    container 'quay.io/nextflow/rnaseq-nf:v1.1'
+    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-medium'

    tag "$pair_id"
    publishDir params.outdir
 
    input:
    path index
    tuple val(pair_id), path(reads)
 
    output:
    path pair_id
 
    script:
    """
    salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
    """
}

The XML configuration

In the XML configuration, the input files and settings are specified. For this particular pipeline, you need to specify the transcriptome and the reads folder. Navigate to the XML Configuration tab and paste the following:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
    <pd:dataInputs>
        <pd:dataInput code="reads" format="UNKNOWN" type="DIRECTORY" required="true" multiValue="false">
            <pd:label>Folder with FASTQ files</pd:label>
            <pd:description></pd:description>
        </pd:dataInput>
        <pd:dataInput code="transcriptome" format="FASTA" type="FILE" required="true" multiValue="false">
            <pd:label>FASTA</pd:label>
            <pd:description>FASTA file</pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps/>
</pd:pipeline>

Click the Generate button (at the bottom of the text editor) to preview the launch form fields.

Click the Save button to save the changes.

Running the pipeline

Go to the Pipelines page from the left navigation pane. Select the pipeline you just created and click Start New Analysis.

Fill in the required fields indicated by red "*" sign and click on Start Analysis button. You can monitor the run from the Analyses page. Once the Status changes to Succeeded, you can click on the run to access the results page.

Nextflow: Scatter-gather Method

In this tutorial, we will create a pipeline which will split a TSV file into chunks, sort them, and merge them together.

Creating the pipeline

Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create pipeline > Nextflow > XML based button to start creating a Nextflow pipeline.

In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.

First, we present the individual processes. Select +Nextflow files > + Create file and label the file split.nf. Copy and paste the following definition.

process split {
    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
    cpus 1
    memory '512 MB'
    
    input:
    path x
    
    output:
    path("split.*.tsv")
    
    """
    split -a10 -d -l3 --numeric-suffixes=1 --additional-suffix .tsv ${x} split.
    """
    }

Next, select +Create file and name the file sort.nf. Copy and paste the following definition.

process sort {
    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
    cpus 1
    memory '512 MB'
    
    input:
    path x
    
    output:
    path '*.sorted.tsv'
    
    """
    sort -gk1,1 $x > ${x.baseName}.sorted.tsv
    """
}

Select +Create file again and label the file merge.nf. Copy and paste the following definition.

process merge {
    container 'public.ecr.aws/lts/ubuntu:22.04'
    pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
    cpus 1
    memory '512 MB'

    publishDir 'out', mode: 'symlink'
    
    input:
    path x
    
    output:
    path 'merged.tsv'
    
    """
    cat $x > merged.tsv
    """
}

Add the corresponding main.nf file by navigating to the Nextflow files > main.nf tab and copying and pasting the following definition.

nextflow.enable.dsl=2
 
include { sort } from './sort.nf'
include { split } from './split.nf'
include { merge } from './merge.nf'
 
 
params.myinput = "test.test"
 
workflow {
    input_ch = Channel.fromPath(params.myinput)
    split(input_ch)
    sort(split.out.flatten())
    merge(sort.out.collect())
}

Here, the operators flatten and collect are used to transform the emitting channels. The Flatten operator transforms a channel in such a way that every item of type Collection or Array is flattened so that each single entry is emitted separately by the resulting channel. The collect operator collects all the items emitted by a channel to a List and return the resulting object as a sole emission.

Finally, copy and paste the following XML configuration into the XML Configuration tab.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
    <pd:dataInputs>
        <pd:dataInput code="myinput" format="TSV" type="FILE" required="true" multiValue="false">
            <pd:label>myinput</pd:label>
            <pd:description></pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps/>
</pd:pipeline>

Click the Generate button (at the bottom of the text editor) to preview the launch form fields.

Click the Save button to save the changes.

Running the pipeline

Go to the Pipelines page from the left navigation pane. Select the pipeline you just created and click Start New Analysis.

Fill in the required fields indicated by red "*" sign and click on Start Analysis button. You can monitor the run from the Analyses page. Once the Status changes to Succeeded, you can click on the run to access the results page.

Select Projects > your_project > Flow > Analyses, and open the Logs tab. From the log files, it is clear that in the first step, the input file is split into multiple chunks, then these chunks are sorted and merged.

CWL Graphical Pipeline

This tutorial aims to guide you through the process of creating CWL tools and pipelines from the very beginning. By following the steps and techniques presented here, you will gain the necessary knowledge and skills to develop your own pipelines or transition existing ones to ICA.

Build and push to ICA your own Docker image

The foundation for every tool in ICA is a Docker image (externally published or created by the user). Here we present how to create your own Docker image for the popular tool (FASTQC).

Copy the contents displayed below to a text editor and save it as a Dockerfile. Make sure you use an editor which does not add formatting to the file.

FROM centos:7
WORKDIR /usr/local

# DEPENDENCIES
RUN yum -y install java-1.8.0-openjdk wget unzip perl && \
    yum clean all && \
    rm -rf /var/cache/yum

# INSTALLATION fastqc
RUN wget http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.9.zip --no-check-certificate && \
    unzip fastqc_v0.11.9.zip && \
    chmod a+rx /usr/local/FastQC/fastqc && rm -rf fastqc_v0.11.9.zip

# Adding FastQC to the PATH
ENV PATH $PATH:/usr/local/FastQC

# DEFAULTS
ENV LANG=en_US.UTF-8
ENV LC_ALL=en_US.UTF-8
ENTRYPOINT []

## how to build the docker image
## docker build --file fastqc-0.11.9.Dockerfile --tag fastqc-0.11.9:0 .
## docker run --rm -i -t --entrypoint /bin/bash fastqc-0.11.9:0

Open a terminal window, place this file in a dedicated folder and navigate to this folder location. Then use the following command:

docker build --file fastqc-0.11.9.Dockerfile --tag fastqc-0.11.9:1 .

Check the image has been successfully built:

docker images

Check that the container is functional:

docker run --rm -i -t --entrypoint /bin/bash fastqc-0.11.9:1

Once inside the container check that the fastqc command is responsive and prints the expected help message. Remember to exit the container.

Save a tar of the previously built image locally:

docker save fastqc-0.11.9:1 -o fastqc-0.11.9:1.tar.gz

Upload your docker image .tar to an ICA project (browser upload, Connector, or CLI). Important: In Data tab, select the uploaded .tar file, then click “Manage --> Change Format”, select 'DOCKER' and Save.

Now step outside of the Project and go to Docker Repository, Select New and click on the Search Icon. You can filter on Project names and locations, select your docker file (use the checkbox on the left) and Press Select.

Create a CWL tool

There are 2 ways you can create a (cwl) tool on top of a docker image in ICA UI:

1: Navigate to the Tool cwl tab and use the Text Editor to create the tool definition in CWL syntax. 2: Use the other tabs to independently define inputs, outputs, arguments, settings, etc …

In this tutorial we will present the 1st option using the CWL file: paste the following content into the Tool CWL tab

#!/usr/bin/env cwl-runner

# (Re)generated by BlueBee Platform

$namespaces:
  ilmn-tes: http://platform.illumina.com/rdf/iap/
cwlVersion: cwl:v1.0
class: CommandLineTool
label: FastQC
doc: FastQC aims to provide a simple way to do some quality control checks on raw
  sequence data coming from high throughput sequencing pipelines.
inputs:
  Fastq1:
    type: File
    inputBinding:
      position: 1
  Fastq2:
    type:
    - File
    - 'null'
    inputBinding:
      position: 3
outputs:
  HTML:
    type:
      type: array
      items: File
    outputBinding:
      glob:
      - '*.html'
  Zip:
    type:
      type: array
      items: File
    outputBinding:
      glob:
      - '*.zip'
arguments:
- position: 4
  prefix: -o
  valueFrom: $(runtime.outdir)
- position: 1
  prefix: -t
  valueFrom: '2'
baseCommand:
- fastqc

Since the user needs to specify the output folder for FASTQC application (-o prefix), we are using the $(runtime.outdir) runtime parameter to point to the designated output folder.

Create the pipeline

While inside a Project, navigate to Pipelines and click on cwl and then Graphical.

Fill the mandatory fields (Code = pipeline name and free text Description) and click on the Definition tab to open the Graphical Editor.

Expand the Tool Repository menu (lower right) and drag your FastQC tool into the Editor field (center).

Now drag one Input and one Output file icon (on top) into the Editor field as well. Both may be given a Name (editable fields on the right when icon is selected) and need a Format attribute. Set the Input Format to fastq and Output Format to html. Connect both Input and Output files to the matching nodes on the tool itself (mouse over the node, then hold-click and drag to connect).

Press Save, you just created your first FastQC pipeline on ICA!

Run a pipeline

First make sure you have at least one Fastq file uploaded and/or linked to your Project. You may use Fastq files available in the Bundle.

Navigate to Pipelines and select the pipeline you just created, then press Start New Run.

Fill the mandatory field (User Reference = pipeline execution name) and click on the Select button to open the File Selection dialog box. Select any of the Fastq files available to you (use the checkbox on the left and press Select on the lower right).

Press Start Run on the top right, the platform is now orchestrating the workflow execution.

View Results

Navigate to Runs and observe that the pipeline execution is now listed and will first appear to be in “Requested” Status. After a few minutes the Status should change to “In Progress” and then to “Succeeded”.

Once this Run is succeeded click on the row (a single click is enough) to enter Result view. You should see the FastQC HTML output file listed on the right. Click on the file to open Data Details view. Since it is an HTML file Format there is a View tab that allows visualizing the HTML within the browser.

Command Index

The build number, together with the used libraries and licenses are provided in the accompanying readme file.

icav2

Command line interface for the Illumina Connected Analytics, a genomics platform-as-a-service

Usage:
  icav2 [command]

Available Commands:
  analysisstorages      Analysis storages commands
  completion            Generate the autocompletion script for the specified shell
  config                Config actions
  dataformats           Data format commands
  help                  Help about any command
  jobs                  Job commands
  metadatamodels        Metadata model commands
  pipelines             Pipeline commands
  projectanalyses       Project analyses commands
  projectdata           Project Data commands
  projectpipelines      Project pipeline commands
  projects              Project commands
  projectsamples        Project samples commands
  regions               Region commands
  storagebundles        Storage bundle commands
  storageconfigurations Storage configurations commands
  tokens                Tokens commands
  version               The version of this application

Flags:
  -t, --access-token string    JWT used to call rest service
  -h, --help                   help for icav2
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -v, --version                version for icav2
  -k, --x-api-key string       api key used to call rest service

Use "icav2 [command] --help" for more information about a command.

icav2 analysisstorages

This is the root command for actions that act on analysis storages

Usage:
  icav2 analysisstorages [command]

Available Commands:
  list        list of storage id's

Flags:
  -h, --help   help for analysisstorages

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 analysisstorages [command] --help" for more information about a command.

icav2 analysisstorages list

This command lists all the analysis storage id's

Usage:
  icav2 analysisstorages list [flags]

Flags:
  -h, --help   help for list

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 completion

This command generates custom completion functions for icav2 tool. These functions facilitate the generation of context-aware suggestions based on the user's input and specific directives provided by the icav2 tool. For example, for ZSH shell the completion function _icav2() is generated. It could provide suggestions for available commands, flags, and arguments depending on the context, making it easier for the user to interact with the tool without having to constantly refer to documentation.

To enable this custom completion function, you would typically include it in your Zsh configuration (e.g., in .zshrc or a separate completion script) and then use the compdef command to associate the function with the icav2 command:

compdef _icav2 icav2

This way, when the user types icav2 followed by a space and presses the TAB key, Zsh will call the _icav2 function to provide context-aware suggestions based on the user's input and the icav2 tool's directives.

Generate the autocompletion script for icav2 for the specified shell.
See each sub-command's help for details on how to use the generated script.

Usage:
  icav2 completion [command]

Available Commands:
  bash        Generate the autocompletion script for bash
  fish        Generate the autocompletion script for fish
  powershell  Generate the autocompletion script for powershell
  zsh         Generate the autocompletion script for zsh

Flags:
  -h, --help   help for completion

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 completion [command] --help" for more information about a command.

icav2 completion bash

Generate the autocompletion script for the bash shell.

This script depends on the 'bash-completion' package.
If it is not installed already, you can install it via your OS's package manager.

To load completions in your current shell session:

	source <(icav2 completion bash)

To load completions for every new session, execute once:

#### Linux:

	icav2 completion bash > /etc/bash_completion.d/icav2

#### macOS:

	icav2 completion bash > $(brew --prefix)/etc/bash_completion.d/icav2

You will need to start a new shell for this setup to take effect.

Usage:
  icav2 completion bash

Flags:
  -h, --help              help for bash
      --no-descriptions   disable completion descriptions

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 completion fish

Generate the autocompletion script for the fish shell.

To load completions in your current shell session:

	icav2 completion fish | source

To load completions for every new session, execute once:

	icav2 completion fish > ~/.config/fish/completions/icav2.fish

You will need to start a new shell for this setup to take effect.

Usage:
  icav2 completion fish [flags]

Flags:
  -h, --help              help for fish
      --no-descriptions   disable completion descriptions

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 completion powershell

Generate the autocompletion script for powershell.

To load completions in your current shell session:

	icav2 completion powershell | Out-String | Invoke-Expression

To load completions for every new session, add the output of the above command
to your powershell profile.

Usage:
  icav2 completion powershell [flags]

Flags:
  -h, --help              help for powershell
      --no-descriptions   disable completion descriptions

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 completion zsh

Generate the autocompletion script for the zsh shell.

If shell completion is not already enabled in your environment you will need
to enable it.  You can execute the following once:

	echo "autoload -U compinit; compinit" >> ~/.zshrc

To load completions in your current shell session:

	source <(icav2 completion zsh)

To load completions for every new session, execute once:

#### Linux:

	icav2 completion zsh > "${fpath[1]}/_icav2"

#### macOS:

	icav2 completion zsh > $(brew --prefix)/share/zsh/site-functions/_icav2

You will need to start a new shell for this setup to take effect.

Usage:
  icav2 completion zsh [flags]

Flags:
  -h, --help              help for zsh
      --no-descriptions   disable completion descriptions

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 config

Config command provides functions for CLI configuration management.

Usage:
  icav2 config [command]

Available Commands:
  get         Get configuration information
  reset       Remove the configuration information
  set         Set configuration information

Flags:
  -h, --help   help for config

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 config [command] --help" for more information about a command.

icav2 config get

Get configuration information.

Usage:
  icav2 config get [flags]

Flags:
  -h, --help   help for get

icav2 config reset

Remove configuration information.

Usage:
  icav2 config reset [flags]

Flags:
  -h, --help   help for reset

icav2 config set

Set configuration information. Following information is asked when starting the command : 

 - server-url : used to form the url for the rest api's. 
 - x-api-key : api key used to fetch the JWT used to authenticate to the API server. 
 - colormode : set depending on your background color of your terminal. Input's and errors are colored. Default is 'none', meaning that no colors will be used in the output.
 - table-format : Output layout, defaults to a table, other allowed values are json and yaml

Usage:
  icav2 config set [flags]

Flags:
  -h, --help   help for set

icav2 dataformats

This is the root command for actions that act on Data formats

Usage:
  icav2 dataformats [command]

Available Commands:
  list        List data formats

Flags:
  -h, --help   help for dataformats

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 dataformats [command] --help" for more information about a command.

icav2 dataformats list

This command lists the data formats you can use inside of a project

Usage:
  icav2 dataformats list [flags]

Flags:
  -h, --help   help for list

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 help

Help provides help for any command in the application.
Simply type icav2 help [path to command] for full details.

Usage:
  icav2 help [command] [flags]

Flags:
  -h, --help   help for help

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 jobs

This is the root command for actions that act on jobs

Usage:
  icav2 jobs [command]

Available Commands:
  get         Get details of a job

Flags:
  -h, --help   help for jobs

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 jobs [command] --help" for more information about a command.

icav2 jobs get

This command fetches the details of a job using the argument as an id (uuid).

Usage:
  icav2 jobs get [job id] [flags]

Flags:
  -h, --help   help for get

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 metadatamodels

This is the root command for actions that act on metadata models

Usage:
  icav2 metadatamodels [command]

Available Commands:
  list        list of metadata models

Flags:
  -h, --help   help for metadatamodels

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 metadatamodels [command] --help" for more information about a command.

icav2 metadatamodels list

This command lists all the metadata models

Usage:
  icav2 metadatamodels list [flags]

Flags:
  -h, --help   help for list

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 pipelines

This is the root command for actions that act on pipelines

Usage:
  icav2 pipelines [command]

Available Commands:
  get         Get details of a pipeline
  list        List pipelines

Flags:
  -h, --help   help for pipelines

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 pipelines [command] --help" for more information about a command.

icav2 pipelines get

This command fetches the details of a pipeline without a project context

Usage:
  icav2 pipelines get [pipeline id] [flags]

Flags:
  -h, --help   help for get

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 pipelines list

This command lists the pipelines without the context of a project

Usage:
  icav2 pipelines list [flags]

Flags:
  -h, --help   help for list

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectanalyses

This is the root command for actions that act on projects analysis

Usage:
  icav2 projectanalyses [command]

Available Commands:
  get         Get the details of an analysis 
  input       Retrieve input of analyses commands
  list        List of analyses for a project 
  output      Retrieve output of analyses commands
  update      Update tags of analyses

Flags:
  -h, --help   help for projectanalyses

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 projectanalyses [command] --help" for more information about a command.

icav2 projectanalyses get

This command returns all the details of a analysis.

Usage:
  icav2 projectanalyses get [analysis id] [flags]

Flags:
  -h, --help                help for get
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectanalyses input

Retrieve input of analyses commands

Usage:
  icav2 projectanalyses input [analysisId] [flags]

Flags:
  -h, --help                help for input
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectanalyses list

This command lists the analyses for a given project. Sorting can be done on 
- reference
- userReference
- pipeline
- status
- startDate
- endDate
- summary

Usage:
  icav2 projectanalyses list [flags]

Flags:
  -h, --help                help for list
      --max-items int       maximum number of items to return, the limit and default is 1000
      --page-offset int     Page offset, only used in combination with sort-by. Offset-based pagination has a result limit of 200K rows and does not guarantee unique results across pages
      --page-size int32     Page size, only used in combination with sort-by. The amount of rows to return. Use in combination with the offset or cursor parameter to get subsequent results. Default and max value of pagesize=1000 (default 1000)
      --project-id string   project ID to set current project context
      --sort-by string      specifies the order to list items

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectanalyses output

Retrieve output of analyses commands

Usage:
  icav2 projectanalyses output [analysisId] [flags]

Flags:
  -h, --help                help for output
      --project-id string   project ID to set current project context
      --raw-output          Add this flag if output should be in raw format. Applies only for Cwl pipelines ! This flag needs no value, adding it sets the value to true.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectanalyses update

Updates the user and technical tags of an analysis

Usage:
  icav2 projectanalyses update [analysisId] [flags]

Flags:
      --add-tech-tag stringArray      Tech tag to add to analysis. Add flag multiple times for multiple values.
      --add-user-tag stringArray      User tag to add to analysis. Add flag multiple times for multiple values.
  -h, --help                          help for update
      --project-id string             project ID to set current project context
      --remove-tech-tag stringArray   Tech tag to remove from analysis. Add flag multiple times for multiple values.
      --remove-user-tag stringArray   User tag to remove from analysis. Add flag multiple times for multiple values.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata

This is the root command for actions that act on projects data

Usage:
  icav2 projectdata [command]

Available Commands:
  archive              archive data 
  copy                 Copy data to a project
  create               Create data id for a project
  delete               delete data 
  download             Download a file/folder
  downloadurl          get download url 
  folderuploadsession  Get details of a folder upload
  get                  Get details of a data
  link                 Link data to a project
  list                 List data 
  mount                Mount project data 
  move                 Move data to a project
  temporarycredentials fetch temporal credentials for data
  unarchive            unarchive data 
  unlink               Unlink data to a project
  unmount              Unmount project data 
  update               Updates the details of a data
  upload               Upload a file/folder

Flags:
  -h, --help   help for projectdata

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 projectdata [command] --help" for more information about a command.

icav2 projectdata archive

This command archives data for a given project

Usage:
  icav2 projectdata archive [path or data Id] [flags]

Flags:
  -h, --help                help for archive
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata copy

This command copies data between projects. Use data id or a combination of path and --source-project-id to identify the source data. By default, the root folder of your current project will be used as destination. If you want to specify a destination, use --destination-folder to specify the destination path or folder id.

Usage:
  icav2 projectdata copy [data id] or [path] [flags]

Flags:
      --action-on-exist string      what to do when a file or folder with the same name already exists: OVERWRITE|SKIP|RENAME (default "SKIP")
      --background                  starts job in background on server. Does not provide upload progress updates. Use icav2 jobs get with the current job.id value
      --copy-instrument-info        copy instrument info form source data to destination data
      --copy-technical-tags         copy technical tags form source data to destination data
      --copy-user-tags              copy user tags form source data to destination data
      --destination-folder string   folder id or path to where you want to copy the data, default root of project
  -h, --help                        help for copy
      --polling-interval int        polling interval in seconds for job status, values lower than 30 will be set to 30 (default 30)
      --project-id string           project ID to set current project context
      --source-project-id string    project ID from where the data needs to be copied, mandatory when using source path notation

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata create

This command creates a data on a project. It takes name of file/folder as an argument

Usage:
  icav2 projectdata create [name] [flags]

Flags:
      --data-type string     (*) Data type : FILE or FOLDER
      --folder-id string     Id of the folder
      --folder-path string   Folder path under which the new project data will be created.
      --format string        Only allowed for file, sets the format of the file.
  -h, --help                 help for create
      --project-id string    project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata delete

This command deletes data for a given project

Usage:
  icav2 projectdata delete [path or dataId] [flags]

Flags:
  -h, --help                help for delete
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata download

Download a file/folder.  Source path can be a data id or a path. Source path for download of a folder should end with '*'. For files : Target defines either local folder into which the download will occur, or a path with a new name for the file. If the  file already exists locally, it is overwritten. For folders : If folder does not exist locally, it will be created automatically. Overwrite of an existing folder will need to be acknowledged.

Usage:
  icav2 projectdata download [source data id or path] [target path] [flags]

Flags:
      --exclude string        Regex filter for file names to exclude from download.
      --exclude-source-path   Indicates that on folder download, the CLI will not create the parent folders of the downloaded folder in ICA on your local machine.
  -h, --help                  help for download
      --project-id string     project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Example 1

icav2 projectdata list --data-type FILE --file-name VariantCaller- --match-mode FUZZY -o json | jq -r '.items[].id' > filelist.txt; for item in $(cat filelist.txt); do echo "--- $item ---"; icav2 projectdata download $item . ; done;

Example 2

Here an example of how to download all BAM files from a project (we are using some jq features to remove '.bam.bai' and '.bam.md5sum' files)

icav2 projectdata list --file-name .bam --match-mode FUZZY -o json | jq -r '.items[] | select(.details.format.code == "BAM") | [.id] | @tsv' > filelist.txt; for item in $(cat filelist.txt); do echo "--- $item ---"; icav2 projectdata download $item . ; done

Tip: If you want to look up a file id from the GUI, go to that file and open te details view. The file id can be found on the top left side and will begin with fil.

icav2 projectdata downloadurl

This command returns the data download url for a given project

Usage:
  icav2 projectdata downloadurl [path or data Id] [flags]

Flags:
  -h, --help                help for downloadurl
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata folderuploadsession

This command fetches the details a folder upload

Usage:
  icav2 projectdata folderuploadsession [project id] [data id] [folder upload session id] [flags]

Flags:
  -h, --help                help for folderuploadsession
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata get

This command fetches the details a data

Usage:
  icav2 projectdata get [data id] or [path] [flags]

Flags:
  -h, --help                help for get
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata link

This links data to a project. Use data id or the path + the source project flag identify the data.

Usage:
  icav2 projectdata link [data id] or [path] [flags]

Flags:
  -h, --help                       help for link
      --project-id string          project ID to set current project context
      --source-project-id string   project ID from where the data needs to be linked

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata list

It is best practice to always surround your path with quotes if you want to use the * wildcard. Otherwise, you may run into situations where the command results in "accepts at most 1 arg(s), received x" as it returns folders with the same name, but different amounts of subfolders.

If you want to look up a file id from the GUI, go to that file and open te details view. The file id can be found on the top left side and will begin with fil.

This command lists the data for a given project. Page-offset can only be used in combination with sort-by. Sorting can be done on 
- timeCreated
- timeModified
- name
- path
- fileSizeInBytes
- status
- format
- dataType
- willBeArchivedAt
- willBeDeletedAt

Usage:
  icav2 projectdata list [path] [flags]

Flags:
      --data-type string        Data type. Available values : FILE or FOLDER
      --eligible-link           Add this flag if output should contain only the data that is eligible for linking on the current project. This flag needs no value, adding it sets the value to true.
      --file-name stringArray   The filenames to filter on. The filenameMatchMode-parameter determines how the filtering is done. Add flag multiple times for multiple values.
  -h, --help                    help for list
      --match-mode string       Match mode for the file name. Available values : EXACT (default), EXCLUDE, FUZZY.
      --max-items int           maximum number of items to return, the limit and default is 1000
      --page-offset int         Page offset, only used in combination with sort-by. Offset-based pagination has a result limit of 200K rows and does not guarantee unique results across pages
      --page-size int32         Page size, only used in combination with sort-by. The amount of rows to return. Use in combination with the offset or cursor parameter to get subsequent results. Default and max value of pagesize=1000 (default 1000)
      --parent-folder           Indicates that the given argument is path of the parent folder. All children are selected for list, not the folder itself. This flag needs no value, adding it sets the value to true.
      --project-id string       project ID to set current project context
      --sort-by string          specifies the order to list items
      --status stringArray      Add the status of the data. Available values : PARTIAL, AVAILABLE, ARCHIVING, ARCHIVED, UNARCHIVING, DELETING. Add flag multiple times for multiple values.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Example to list files in the folder SOURCE

icav2 projectdata list --project-id <project_id> --parent-folder /SOURCE/

Example to list only subfolders in the folder SOURCE

icav2 projectdata list --project-id <project_id> --parent-folder /SOURCE/ --data-type FOLDER

icav2 projectdata mount

This command mounts the project data as a file system directory for a given project

Usage:
  icav2 projectdata mount [mount directory path] [flags]

Flags:
      --allow-other         Allow other users to access this project
  -h, --help                help for mount
      --list                List currently mounted projects
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata move

This command moves data between projects. Use data id or a combination of path and --source-project-id to identify the source data. By default, the root folder of your current project will be used as destination. If you want to specify a destination, use --destination-folder to specify the destination path or folder id.

Usage:
  icav2 projectdata move [data id] or [path] [flags]

Flags:
      --background                  starts job in background on server. Does not provide upload progress updates. Use icav2 jobs get with the current job.id value
      --destination-folder string   folder id or path to where you want to move the data, default root of project
  -h, --help                        help for move
      --polling-interval int        polling interval in seconds for job status, values lower than 30 will be set to 30 (default 30)
      --project-id string           project ID to set current project context
      --source-project-id string    project ID from where the data needs to be moved, mandatory when using source path notation

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata temporarycredentials

This command fetches  temporal AWS and Rclone credentials for a given project-data. If path is given, project id from the flag --project-id is used. If flag not present project is taken from the context

Usage:
  icav2 projectdata temporarycredentials [path or data Id] [flags]

Flags:
  -h, --help                help for temporarycredentials
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata unarchive

This command unarchives data for a given project

Usage:
  icav2 projectdata unarchive [path or dataId] [flags]

Flags:
  -h, --help                help for unarchive
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata unlink

This unlinks data from a project. Use path or id to identifiy the data.

Usage:
  icav2 projectdata unlink [data id] or [path] [flags]

Flags:
  -h, --help                help for unlink
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata unmount

This command unmounts previously mounted project data

Usage:
  icav2 projectdata unmount [flags]

Flags:
      --directory-path string   Set path to unmount
  -h, --help                    help for unmount
      --project-id string       project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata update

This command updates some details of a data. Only user/tech tags, format and dates of will be archived/delete can be updated.

Usage:
  icav2 projectdata update [data id] or [path] [flags]

Flags:
      --add-tech-tag stringArray      Tech tag to add. Add flag multiple times for multiple values.
      --add-user-tag stringArray      User tag to add. Add flag multiple times for multiple values.
      --format-code string            Format to assign to the data. Only available for files.
  -h, --help                          help for update
      --project-id string             project ID to set current project context
      --remove-tech-tag stringArray   Tech tag to remove. Add flag multiple times for multiple values.
      --remove-user-tag stringArray   User tag to remove. Add flag multiple times for multiple values.
      --will-be-archived-at string    Time when data will be archived. Format is YYYY-MM-DD. Time is set to 00:00:00UTC time. Only available for files.
      --will-be-deleted-at string     Time when data will be deleted. Format is YYYY-MM-DD. Time is set to 00:00:00UTC time. Only available for files.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectdata upload

Upload a file/folder.  For files : if the target path does not already exist, it will be created automatically. For folders : overwrite will need to be acknowledged. Argument "icapath" is optional.

Usage:
  icav2 projectdata upload [local path] [icapath] [flags]

Flags:
      --existing-sample                    Link to existing sample
  -h, --help                               help for upload
      --new-sample                         Create and link to new sample
      --num-workers int                    number of workers to parallelize.  Default calculated based on CPUs available.
      --project-id string                  project ID to set current project context
      --sample-description string          Set Sample Description for new sample
      --sample-id string                   Set Sample id of existing sample
      --sample-name string                 Set Sample name for new sample or from existing sample
      --sample-technical-tag stringArray   Set Sample Technical tag for new sample
      --sample-user-tag stringArray        Set Sample User tag for new sample

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Example for uploading multiple files

In this example all the fastq.gz files from source will be uploaded to target using xargs utility.

find $source -name '*.fastq.gz' | xargs -n 1 -P 10 -I {} icav2 projectdata upload {} /$target/

Example for uploading multiple files using a CSV file

In this example we upload multiple bam files specified with the corresponding path in the file bam_files.csv. The files will be renamed. We are using screen in detached mode (this creates a new session but not attaching to it):

while IFS=, read -r current_bam_file_name bam_path new_bam_file_name
do
screen -d -m icav2 projectdata upload ${bam_path}/${current_bam_file} /bam_files/${new_bam_file_name} --project-id $projectID
done <./bam_files.csv 2>./log.txt

icav2 projectpipelines

This is the root command for actions that act on projects pipeline

Usage:
  icav2 projectpipelines [command]

Available Commands:
  create      Create a pipeline
  input       Retrieve input parameters of pipeline
  link        Link pipeline to a project
  list        List of pipelines for a project 
  start       Start a pipeline
  unlink      Unlink pipeline from a project

Flags:
  -h, --help   help for projectpipelines

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 projectpipelines [command] --help" for more information about a command.

icav2 projectpipelines create

This command creates a  pipeline in the current project

Usage:
  icav2 projectpipelines create [command]

Available Commands:
  cwl          Create a cwl pipeline
  cwljson      Create a cwl Json pipeline
  nextflow     Create a nextflow pipeline
  nextflowjson Create a nextflow Json pipeline

Flags:
  -h, --help                help for create
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 projectpipelines create [command] --help" for more information about a command.

icav2 projectpipelines create cwl

This command creates a CWL pipeline in the current project using the argument as code for the pipeline

Usage:
  icav2 projectpipelines create cwl [code] [flags]

Flags:
      --category stringArray   Category of the cwl pipeline. Add flag multiple times for multiple values.
      --comment string         Version comments
      --description string     (*) Description of pipeline
  -h, --help                   help for cwl
      --html-doc string        Html documentation for the cwl pipeline
      --links string           links in json format
      --parameter string       (*) Path to the parameter XML file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
      --project-id string      project ID to set current project context
      --proprietary            Add the flag if this pipeline is proprietary
      --storage-size string    (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'.
      --tool stringArray       Path to the tool cwl file. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
      --workflow string        (*) Path to the workflow cwl file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectpipelines create cwljson

This command creates a CWL Json pipeline in the current project using the argument as code for the pipeline

Usage:
  icav2 projectpipelines create cwljson [code] [flags]

Flags:
      --category stringArray         Category of the cwl pipeline. Add flag multiple times for multiple values.
      --comment string               Version comments
      --description string           (*) Description of pipeline
  -h, --help                         help for cwljson
      --html-doc string              Html documentation for the cwl pipeline
      --inputForm string             (*) Path to the input form file.
      --links string                 links in json format
      --onRender string              Path to the on render file.
      --onSubmit string              Path to the on submit file.
      --otherInputForm stringArray   Path to the other input form files. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
      --project-id string            project ID to set current project context
      --proprietary                  Add the flag if this pipeline is proprietary
      --storage-size string          (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'.
      --tool stringArray             Path to the tool cwl file. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
      --workflow string              (*) Path to the workflow cwl file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectpipelines create nextflow

This command creates a Nextflow pipeline in the current project

Usage:
  icav2 projectpipelines create nextflow [code] [flags]

Flags:
      --category stringArray      Category of the nextflow pipeline. Add flag multiple times for multiple values.
      --comment string            Version comments
      --config string             Path to the config nextflow file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
      --description string        (*) Description of pipeline
  -h, --help                      help for nextflow
      --html-doc string           Html documentation fo the nexflow pipeline
      --links string              links in json format
      --main string               (*) Path to the main nextflow file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
      --nextflow-version string   Version of nextflow language to use.
      --other stringArray         Path to the other nextflow file. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
      --parameter string          (*) Path to the parameter XML file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
      --project-id string         project ID to set current project context
      --proprietary               Add the flag if this pipeline is proprietary
      --storage-size string       (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectpipelines create nextflowjson

This command creates a Nextflow Json pipeline in the current project

Usage:
  icav2 projectpipelines create nextflowjson [code] [flags]

Flags:
      --category stringArray         Category of the nextflow pipeline. Add flag multiple times for multiple values.
      --comment string               Version comments
      --config string                Path to the config nextflow file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
      --description string           (*) Description of pipeline
  -h, --help                         help for nextflowjson
      --html-doc string              Html documentation fo the nexflow pipeline
      --inputForm string             (*) Path to the input form file.
      --links string                 links in json format
      --main string                  (*) Path to the main nextflow file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
      --nextflow-version string      Version of nextflow language to use.
      --onRender string              Path to the on render file.
      --onSubmit string              Path to the on submit file.
      --other stringArray            Path to the other nextflow file. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
      --otherInputForm stringArray   Path to the other input form files. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
      --project-id string            project ID to set current project context
      --proprietary                  Add the flag if this pipeline is proprietary
      --storage-size string          (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectpipelines input

Retrieve input parameters of pipeline

Usage:
  icav2 projectpipelines input [pipelineId] [flags]

Flags:
  -h, --help                help for input
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectpipelines link

This links a pipeline to a project. Use code or id to identifiy the pipeline. If code is not found, argument is used as id.

Usage:
  icav2 projectpipelines link [pipeline code] or [pipeline id] [flags]

Flags:
  -h, --help                       help for link
      --project-id string          project ID to set current project context
      --source-project-id string   project ID from where the pipeline needs to be linked, mandatory when using pipeline code

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectpipelines list

This command lists the pipelines for a given project

Usage:
  icav2 projectpipelines list [flags]

Flags:
  -h, --help                help for list
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectpipelines start

This command starts a  pipeline in the current project

Usage:
  icav2 projectpipelines start [command]

Available Commands:
  cwl          Start a CWL pipeline
  cwljson      Start a CWL Json pipeline
  nextflow     Start a Nextflow pipeline
  nextflowjson Start a Nextflow Json pipeline

Flags:
  -h, --help                help for start
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 projectpipelines start [command] --help" for more information about a command.

icav2 projectpipelines start cwl

This command starts a CWL pipeline for a given pipeline id, or for a pipeline code from the current project.

Usage:
  icav2 projectpipelines start cwl [pipeline id] or [code] [flags]

Flags:
      --data-id stringArray           Enter data id's as follows : dataId{optional-mount-path} . Add flag multiple times for multiple values.  Mount path is optional and can be absolute and relative and can not contain curly braces.
      --data-parameters stringArray   Enter data-parameters as follows : parameterCode:referenceDataId . Add flag multiple times for multiple values.
  -h, --help                          help for cwl
      --idempotency-key string        Add a maximum 255 character idempotency key to prevent duplicate requests. The  response is retained for 7 days so the key must be unique during that timeframe.
      --input stringArray             Enter inputs as follows : parametercode:dataId,dataId{optional-mount-path},dataId,... . Add flag multiple times for multiple values. Mount path is optional and can be absolute and relative and can not contain curly braces and commas.
      --input-json string             Analysis input JSON string. JSON input works only with file-based CWL pipelines (built using code, not a graphical editor in ICA).
      --output-parent-folder string   The id of the folder in which the output folder should be created.
      --parameters stringArray        Enter single-value parameters as code:value. Enter multi-value parameters as code:"'value1','value2','value3'". To add multiple values, add the flag multiple times.
      --project-id string             project ID to set current project context
      --reference-tag stringArray     Reference tag. Add flag multiple times for multiple values.
      --storage-size string           (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'
      --technical-tag stringArray     Technical tag. Add flag multiple times for multiple values.
      --type-input string             (*) Input type STRUCTURED or JSON
      --user-reference string         (*) User reference
      --user-tag stringArray          User tag. Add flag multiple times for multiple values.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectpipelines start cwljson

This command starts a CWL Json pipeline for a given pipeline id, or for a pipeline code from the current project. See ICA CLI documentation for more information (https://help.ica.illumina.com/).

Usage:
  icav2 projectpipelines start cwljson [pipeline id] or [code] [flags]

Flags:
      --field stringArray             Fields. Add flag multiple times for multiple fields. --field fieldA:value --field multivalueFieldB:value1,value2
      --field-data stringArray        Data fields. Add flag multiple times for multiple fields. --field-data fieldA:fil.id --field-data multivalueFieldB:fil.id1,fil.id2
      --group stringArray             Groups. Add flag multiple times for multiple fields in the group. --group groupA.index1.multivalueFieldA:value1,value2 --group groupA.index1.fieldB:value --group groupB.index1.fieldA:value --group groupB.index2.fieldA:value
      --group-data stringArray        Data groups. Add flag multiple times for multiple fields in the group. --group-data groupA.index1.multivalueFieldA:fil.id1,fil.id2 --group-data groupA.index1.fieldB:fil.id --group-data groupB.index1.fieldA:fil.id --group-data groupB.index2.fieldA:fil.id
  -h, --help                          help for cwljson
      --idempotency-key string        Add a maximum 255 character idempotency key to prevent duplicate requests. The  response is retained for 7 days so the key must be unique during that timeframe.
      --output-parent-folder string   The id of the folder in which the output folder should be created.
      --project-id string             project ID to set current project context
      --reference-tag stringArray     Reference tag. Add flag multiple times for multiple values.
      --storage-size string           (*) Name of the storage size. Can be fetched using the command 'icav2  list'.
      --technical-tag stringArray     Technical tag. Add flag multiple times for multiple values.
      --user-reference string         (*) User reference
      --user-tag stringArray          User tag. Add flag multiple times for multiple values.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Field definition

A field can only have values (--field) and a data field can only have datavalues (--field-data). To create multiple fields or data fields, you have to repeat the flag.

For example

--field fieldA:valueA --fieldB multivalueFieldB:valueB1,valueB2 --field-data DataFieldC:file.id"

matches

    "fields": [
      {
        "id": "fieldA",
        "values": [
          "valueA"
        ]
      },
      {
        "id": "multivalueFieldB",
        "values": [
          "valueB1",
          "valueB2"
        ]
      },
      {
        "id": "DataFieldC",
        "values": [
          "file.id"
        ]
      }
    ]

The following example with --field and --field-data

--field asection:SECTION1
--field atext:"this is atext text"
--field ttt:tb1
--field notallowedrole:f
--field notallowedcondition:"this is a not allowed text box"
--field maxagesum:20
--field-data txts1:fil.ade9bd0b6113431a2de108d9fe48a3d8
--field-data txts2:fil.ade9bd0b6113431a2de108d9fe48a3d7{/dir1/dir2},fil.ade9bd0b6113431a2de108d9fe48a3d6{/dir3/dir4}

matches

    "fields": [
    {
      "id": "asection",
      "values": [
        "SECTION1"
      ]
    },
    {
      "id": "atext",
      "values": [
        "this is atext text"
      ]
    },
    {
      "id": "ttt",
      "values": [
        "tb1"
      ]
    },
    {
      "id": "notallowedrole",
      "values": [
        "f"
      ]
    },
    {
      "id": "notallowedcondition",
      "values": [
        "this is a not allowed text box"
      ]
    },
    {
      "id": "maxagesum",
      "values": [
        "20"
      ]
    },
    {
      "dataValues": [
        {
          "dataId": "fil.ade9bd0b6113431a2de108d9fe48a3d8"
        }
      ],
      "id": "txts1"
    },
    {
      "dataValues": [
        {
          "dataId": "fil.ade9bd0b6113431a2de108d9fe48a3d7",
          "mountPath": "/dir1/dir2"
        },
        {
          "dataId": "fil.ade9bd0b6113431a2de108d9fe48a3d6",
          "mountPath": "/dir3/dir4"
        }
      ],
      "id": "txts2"
    }
  ],

Group definition

A group will only have values (--group) and a data group can only have datavalues (--group-data). Add flags multiple times for multiple groups and fields in the group.

--group group1.0.age:80
--group group1.0.role:f
--group group1.0.conditions:cancer,covid
--group-data group1.0.info:fil.a4f17ecf13ca4f692fd008d9fe48a3d7

--group group1.1.age:20
--group group1.1.role:m
--group-data group1.1.info:fil.a4f17ecf13ca4f692fd008d9fe48a3d7

--group group2.0.roleForGroup2:f
"groups": [
    {
      "id": "group1",
      "values": [
        {
          "values": [
            {
              "id": "age",
              "values": [
                "80"
              ]
            },
            {
              "id": "role",
              "values": [
                "f"
              ]
            },
            {
              "id": "conditions",
              "values": [
                "cancer",
                "covid"
              ]
            },
            {
              "dataValues": [
                {
                  "dataId": "fil.a4f17ecf13ca4f692fd008d9fe48a3d7"
                }
              ],
              "id": "info"
            }
          ]
        },
        {
          "values": [
            {
              "id": "age",
              "values": [
                "20"
              ]
            },
            {
              "id": "role",
              "values": [
                "m"
              ]
            },
            {
              "dataValues": [
                {
                  "dataId": "fil.a4f17ecf13ca4f692fd008d9fe48a3d7"
                }
              ],
              "id": "info"
            }
          ]
        }
      ]
    },
    {
      "id": "group2",
      "values": [
        {
          "values": [
            {
              "id": "roleForGroup2",
              "values": [
                "f"
              ]
            }
          ]
        }
      ]
    }
  ]

icav2 projectpipelines start nextflow

This command starts a Nextflow pipeline for a given pipeline id, or for a pipeline code from the current project.

Usage:
  icav2 projectpipelines start nextflow [pipeline id] or [code] [flags]

Flags:
      --data-parameters stringArray   Enter data-parameters as follows : parameterCode:referenceDataId . Add flag multiple times for multiple values.
  -h, --help                          help for nextflow
      --idempotency-key string        Add a maximum 255 character idempotency key to prevent duplicate requests. The  response is retained for 7 days so the key must be unique during that timeframe.
      --input stringArray             Enter inputs as follows : parametercode:dataId,dataId{optional-mount-path},dataId,... . Add flag multiple times for multiple values. Mount path is optional and can be absolute and relative and can not contain curly braces and commas.
      --output-parent-folder string   The id of the folder in which the output folder should be created.
      --parameters stringArray        Enter single-value parameters as code:value. Enter multi-value parameters as code:"'value1','value2','value3'". To add multiple values, add the flag multiple times.
      --project-id string             project ID to set current project context
      --reference-tag stringArray     Reference tag. Add flag multiple times for multiple values.
      --storage-size string           (*) Name of the storage size. Can be fetched using the command 'icav2  list'.
      --technical-tag stringArray     Technical tag. Add flag multiple times for multiple values.
      --user-reference string         (*) User reference
      --user-tag stringArray          User tag. Add flag multiple times for multiple values.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectpipelines start nextflowjson

This command starts a Nextflow Json pipeline for a given pipeline id, or for a pipeline code from the current project.  See ICA CLI documentation for more information (https://help.ica.illumina.com/).

Usage:
  icav2 projectpipelines start nextflowjson [pipeline id] or [code] [flags]

Flags:
      --field stringArray             Fields. Add flag multiple times for multiple fields. --field fieldA:value --field multivalueFieldB:value1,value2
      --field-data stringArray        Data fields. Add flag multiple times for multiple fields. --field-data fieldA:fil.id --field-data multivalueFieldB:fil.id1,fil.id2
      --group stringArray             Groups. Add flag multiple times for multiple fields in the group. --group groupA.index1.multivalueFieldA:value1,value2 --group groupA.index1.fieldB:value --group groupB.index1.fieldA:value --group groupB.index2.fieldA:value
      --group-data stringArray        Data groups. Add flag multiple times for multiple fields in the group. --group-data groupA.index1.multivalueFieldA:fil.id1,fil.id2 --group-data groupA.index1.fieldB:fil.id --group-data groupB.index1.fieldA:fil.id --group-data groupB.index2.fieldA:fil.id
  -h, --help                          help for nextflowjson
      --idempotency-key string        Add a maximum 255 character idempotency key to prevent duplicate requests. The  response is retained for 7 days so the key must be unique during that timeframe.
      --output-parent-folder string   The id of the folder in which the output folder should be created.
      --project-id string             project ID to set current project context
      --reference-tag stringArray     Reference tag. Add flag multiple times for multiple values.
      --storage-size string           (*) Name of the storage size. Can be fetched using the command 'icav2  list'.
      --technical-tag stringArray     Technical tag. Add flag multiple times for multiple values.
      --user-reference string         (*) User reference
      --user-tag stringArray          User tag. Add flag multiple times for multiple values.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Field definition

A field can only have values (--field) and a data field can only have datavalues (--field-data). To create multiple fields or data fields, you have to repeat the flag.

For example

--field fieldA:valueA --fieldB multivalueFieldB:valueB1,valueB2 --field-data DataFieldC:file.id"

matches

    "fields": [
      {
        "id": "fieldA",
        "values": [
          "valueA"
        ]
      },
      {
        "id": "multivalueFieldB",
        "values": [
          "valueB1",
          "valueB2"
        ]
      },
      {
        "id": "DataFieldC",
        "values": [
          "file.id"
        ]
      }
    ]

The following example with --field and --field-data

--field asection:SECTION1
--field atext:"this is atext text"
--field ttt:tb1
--field notallowedrole:f
--field notallowedcondition:"this is a not allowed text box"
--field maxagesum:20
--field-data txts1:fil.ade9bd0b6113431a2de108d9fe48a3d8
--field-data txts2:fil.ade9bd0b6113431a2de108d9fe48a3d7{/dir1/dir2},fil.ade9bd0b6113431a2de108d9fe48a3d6{/dir3/dir4}

matches

    "fields": [
    {
      "id": "asection",
      "values": [
        "SECTION1"
      ]
    },
    {
      "id": "atext",
      "values": [
        "this is atext text"
      ]
    },
    {
      "id": "ttt",
      "values": [
        "tb1"
      ]
    },
    {
      "id": "notallowedrole",
      "values": [
        "f"
      ]
    },
    {
      "id": "notallowedcondition",
      "values": [
        "this is a not allowed text box"
      ]
    },
    {
      "id": "maxagesum",
      "values": [
        "20"
      ]
    },
    {
      "dataValues": [
        {
          "dataId": "fil.ade9bd0b6113431a2de108d9fe48a3d8"
        }
      ],
      "id": "txts1"
    },
    {
      "dataValues": [
        {
          "dataId": "fil.ade9bd0b6113431a2de108d9fe48a3d7",
          "mountPath": "/dir1/dir2"
        },
        {
          "dataId": "fil.ade9bd0b6113431a2de108d9fe48a3d6",
          "mountPath": "/dir3/dir4"
        }
      ],
      "id": "txts2"
    }
  ],

Group definition

A group will only have values (--group) and a data group can only have datavalues (--group-data). Add flags multiple times for multiple groups and fields in the group.

--group group1.0.age:80
--group group1.0.role:f
--group group1.0.conditions:cancer,covid
--group-data group1.0.info:fil.a4f17ecf13ca4f692fd008d9fe48a3d7

--group group1.1.age:20
--group group1.1.role:m
--group-data group1.1.info:fil.a4f17ecf13ca4f692fd008d9fe48a3d7

--group group2.0.roleForGroup2:f
"groups": [
    {
      "id": "group1",
      "values": [
        {
          "values": [
            {
              "id": "age",
              "values": [
                "80"
              ]
            },
            {
              "id": "role",
              "values": [
                "f"
              ]
            },
            {
              "id": "conditions",
              "values": [
                "cancer",
                "covid"
              ]
            },
            {
              "dataValues": [
                {
                  "dataId": "fil.a4f17ecf13ca4f692fd008d9fe48a3d7"
                }
              ],
              "id": "info"
            }
          ]
        },
        {
          "values": [
            {
              "id": "age",
              "values": [
                "20"
              ]
            },
            {
              "id": "role",
              "values": [
                "m"
              ]
            },
            {
              "dataValues": [
                {
                  "dataId": "fil.a4f17ecf13ca4f692fd008d9fe48a3d7"
                }
              ],
              "id": "info"
            }
          ]
        }
      ]
    },
    {
      "id": "group2",
      "values": [
        {
          "values": [
            {
              "id": "roleForGroup2",
              "values": [
                "f"
              ]
            }
          ]
        }
      ]
    }
  ]

icav2 projectpipelines unlink

This unlinks a pipeline from a project. Use code or id to identifiy the pipeline. If code is not found, argument is used as id.

Usage:
  icav2 projectpipelines unlink [pipeline code] or [pipeline id] [flags]

Flags:
  -h, --help                help for unlink
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projects

This is the root command for actions that act on projects

Usage:
  icav2 projects [command]

Available Commands:
  create      Create a project
  enter       Enter project context
  exit        Exit project context
  get         Get details of a project
  list        List projects

Flags:
  -h, --help   help for projects

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 projects [command] --help" for more information about a command.

icav2 projects create

This command creates a project.

Usage:
  icav2 projects create [projectname] [flags]

Flags:
      --billing-mode string                Billing mode , defaults to PROJECT (default "PROJECT")
      --data-sharing                       Indicates whether the data and samples created in this project can be linked to other Projects. This flag needs no value, adding it sets the value to true.
  -h, --help                               help for create
      --info string                        Info about the project
      --metadata-model string              Id of the metadata model. 
      --owner string                       Owner of the project. Default is the current user
      --region string                      Region of the project. When not specified : takes a default when there is only 1 region, else a choice will be given.
      --short-descr string                 Short pipelineDescription of the project
      --storage-bundle string              Id of the storage bundle. When not specified : takes a default when there is only 1 bundle, else a choice will be given. 
      --storage-config string              An optional storage configuration id to have self managed storage.
      --storage-config-sub-folder string   Required when specifying a storageConfigurationId. The subfolder determines the object prefix of your self managed storage.
      --technical-tag stringArray          Technical tags for this project. Add flag multiple times for multiple values.
      --user-tag stringArray               User tags for this project. Add flag multiple times for multiple values.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projects enter

This command sets the project context for future commands

Usage:
  icav2 projects enter [projectname] or [project id] [flags]

Flags:
  -h, --help   help for enter

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projects exit

This command switches the user back to their personal context

Usage:
  icav2 projects exit [flags]

Flags:
  -h, --help                help for exit
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projects get

This command fetches the details of the current project. If no project id is given, we take the one from the config file.

Usage:
  icav2 projects get [project id] [flags]

Flags:
  -h, --help   help for get

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projects list

This command lists the projects for the current user. Page-offset can only be used in combination with sort-by. Sorting can be done on 
- name
- shortDescription
- information

Usage:
  icav2 projects list [flags]

Flags:
  -h, --help              help for list
      --max-items int     maximum number of items to return, the limit and default is 1000
      --page-offset int   Page offset, only used in combination with sort-by. Offset-based pagination has a result limit of 200K rows and does not guarantee unique results across pages
      --page-size int32   Page size, only used in combination with sort-by. The amount of rows to return. Use in combination with the offset or cursor parameter to get subsequent results. Default and max value of pagesize=1000 (default 1000)
      --sort-by string    specifies the order to list items

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectsamples

This is the root command for actions that act on projects samples

Usage:
  icav2 projectsamples [command]

Available Commands:
  complete    Set sample to complete
  create      Create a sample for a project
  delete      Delete a sample for a project
  get         Get details of a sample
  link        Link data to a sample for a project
  list        List of samples for a project 
  listdata    List data from given sample
  unlink      Unlink data from a sample for a project
  update      Update a sample for a project

Flags:
  -h, --help   help for projectsamples

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 projectsamples [command] --help" for more information about a command.

icav2 projectsamples complete

The sample status will be set to 'Available' and a sample completed event will be triggered as well.

Usage:
  icav2 projectsamples complete [sampleId] [flags]

Flags:
  -h, --help                help for complete
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectsamples create

This command creates a sample for a project. It takes the name of the sample as argument.

Usage:
  icav2 projectsamples create [name] [flags]

Flags:
      --description string          Description 
  -h, --help                        help for create
      --project-id string           project ID to set current project context
      --technical-tag stringArray   Technical tag. Add flag multiple times for multiple values.
      --user-tag stringArray        User tag. Add flag multiple times for multiple values.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectsamples delete

This command deletes a sample from a project. The different flags indicate the way they are deleted. Only 1 flag can be used.

Usage:
  icav2 projectsamples delete [sampleId] [flags]

Flags:
      --deep         Delete the entire sample: sample and linked files will be deleted from your project.
  -h, --help         help for delete
      --mark         Mark a sample as deleted.
      --unlink       Unlinking the sample: sample is deleted and files are unlinked and available again for linking to another sample.
      --with-input   Delete the sample as well as its input data: sample is deleted from your project, the input files and pipeline output folders are still present in the project but will not be available for linking to a new sample.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectsamples get

This command fetches the details a sample using the argument as a name, if nothing found, the argument is used as an id (uuid).

Usage:
  icav2 projectsamples get [sample id] or [name] [flags]

Flags:
  -h, --help                help for get
      --project-id string   project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectsamples link

This command adds data to a project sample. Argument is the id of the project sample

Usage:
  icav2 projectsamples link [sampleId] [flags]

Flags:
      --data-id stringArray   (*) Data id of the data that needs to be linked to the project sample. Add flag multiple times for multiple values.
  -h, --help                  help for link
      --project-id string     project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectsamples list

This command lists the samples for a given project

Usage:
  icav2 projectsamples list [flags]

Flags:
  -h, --help                        help for list
      --include-deleted             Include the deleted samples in the list. Default set to false.
      --project-id string           project ID to set current project context
      --technical-tag stringArray   Technical tags to filter on. Add flag multiple times for multiple values.
      --user-tag stringArray        User tags to filter on. Add flag multiple times for multiple values.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectsamples listdata

This command lists the data for a given sample. It only supports offset based, and default sorting is done on timeCreated. Sorting can be done on timeCreated
- timeModified
- name
- path
- fileSizeInBytes
- status
- format
- dataType
- willBeArchivedAt
- willBeDeletedAt

Usage:
  icav2 projectsamples listdata [sampleId] [path] [flags]

Flags:
      --data-type string        Data type. Available values : FILE or FOLDER
      --file-name stringArray   The filenames to filter on. The filenameMatchMode-parameter determines how the filtering is done. Add flag multiple times for multiple values.
  -h, --help                    help for listdata
      --match-mode string       Match mode for the file name. Available values : EXACT (default), EXCLUDE, FUZZY.
      --max-items int           maximum number of items to return, the limit and default is 1000
      --page-offset int         Page offset, only used in combination with sort-by. Offset-based pagination has a result limit of 200K rows and does not guarantee unique results across pages
      --page-size int32         Page size, only used in combination with sort-by. The amount of rows to return. Use in combination with the offset or cursor parameter to get subsequent results. Default and max value of pagesize=1000 (default 1000)
      --parent-folder           Indicates that the given argument is path of the parent folder. All children are selected for list, not the folder itself. This flag needs no value, adding it sets the value to true.
      --project-id string       project ID to set current project context
      --sort-by string          specifies the order to list items (default "timeCreated Desc")
      --status stringArray      Add the status of the data. Available values : PARTIAL, AVAILABLE, ARCHIVING, ARCHIVED, UNARCHIVING, DELETING. Add flag multiple times for multiple values.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectsamples unlink

This command removes data from a project sample. Argument is the id of the project sample

Usage:
  icav2 projectsamples unlink [sampleId] [flags]

Flags:
      --data-id stringArray   (*) Data id of the data that will be removed from the project sample. Add flag multiple times for multiple values.
  -h, --help                  help for unlink
      --project-id string     project ID to set current project context

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 projectsamples update

This command updates a sample for a project. Name,description, user and technical tags can be updated

Usage:
  icav2 projectsamples update [sampleId] [flags]

Flags:
      --add-tech-tag stringArray      Tech tag to add. Add flag multiple times for multiple values.
      --add-user-tag stringArray      User tag to add. Add flag multiple times for multiple values.
  -h, --help                          help for update
      --name string                   Name 
      --project-id string             project ID to set current project context
      --remove-tech-tag stringArray   Tech tag to remove. Add flag multiple times for multiple values.
      --remove-user-tag stringArray   User tag to remove. Add flag multiple times for multiple values.

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 regions

This is the root command for actions that act on regions

Usage:
  icav2 regions [command]

Available Commands:
  list        list of regions

Flags:
  -h, --help   help for regions

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 regions [command] --help" for more information about a command.

icav2 regions list

This command lists all the regions

Usage:
  icav2 regions list [flags]

Flags:
  -h, --help   help for list

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 storagebundles

This is the root command for actions that act on storage bundles

Usage:
  icav2 storagebundles [command]

Available Commands:
  list        list of storage bundles

Flags:
  -h, --help   help for storagebundles

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 storagebundles [command] --help" for more information about a command.

icav2 storagebundles list

This command lists all the storage bundles id's

Usage:
  icav2 storagebundles list [flags]

Flags:
  -h, --help   help for list

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 storageconfigurations

This is the root command for actions that act on storage configurations

Usage:
  icav2 storageconfigurations [command]

Available Commands:
  list        list of storage configurations

Flags:
  -h, --help   help for storageconfigurations

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 storageconfigurations [command] --help" for more information about a command.

icav2 storageconfigurations list

This command lists all the storage configurations

Usage:
  icav2 storageconfigurations list [flags]

Flags:
  -h, --help   help for list

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 tokens

This is the root command for actions that act on tokens

Usage:
  icav2 tokens [command]

Available Commands:
  create      Create a JWT token
  refresh     Refresh a JWT token from basic authentication

Flags:
  -h, --help   help for tokens

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

Use "icav2 tokens [command] --help" for more information about a command.

icav2 tokens create

This command creates a JWT token from the API key.

Usage:
  icav2 tokens create [flags]

Flags:
  -h, --help   help for create

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 tokens refresh

This command refreshes a JWT token from basic authentication with gantype JWT-bearer that is set with the -t flag.

Usage:
  icav2 tokens refresh [flags]

Flags:
  -h, --help   help for refresh

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

icav2 version

The version of this application

Usage:
  icav2 version [flags]

Flags:
  -h, --help   help for version

Global Flags:
  -t, --access-token string    JWT used to call rest service
  -o, --output-format string   output format (default "table")
  -s, --server-url string      server url to direct commands
  -k, --x-api-key string       api key used to call rest service

CWL DRAGEN Pipeline

In this tutorial, we will demonstrate how to create and launch a DRAGEN pipeline using the CWL language.

In ICA, CWL pipelines are built using tools developed in CWL. For this tutorial, we will use the "DRAGEN Demo Tool" included with DRAGEN Demo Bundle 3.9.5.

Linking bundle to Project

1.) Start by selecting a project at the Projects inventory.

2.) In the details page, select Edit.

3.) In the edit mode of the details page, click the + button in the LINKED BUNDLES section.

4.) In the Add Bundle to Project window: Select the DRAGEN demo tool bundle from the list. Once you have selected the bundle, the Link Bundles button becomes available. Select it to continue.

Tip: You can select multiple bundles using Ctrl + Left mouse button or Shift + Left mouse button.

5.) In the project details page, the selected bundle will appear under the LINKED BUNDLES section. If you need to remove a bundle, click on the - button. Click Save to save the project with linked bundles.

Create Pipeline

1.) From the project details page, select Pipelines > CWL

2.) You will be given options to create pipelines using a graphical interface or code. For this tutorial, we will select Graphical.

  • Code: Provide pipeline name here.

  • Description: Provide pipeline description here.

  • Storage size: Select the storage size from the drop-down menu.

4.) The Documentation tab provides options for configuring the HTML description for the tool. The description appears in the tool repository but is excluded from exported CWL definitions.

6.) To build a pipeline, start by selecting Machine PROFILE from the component menu section on the right. All fields are required and are pre-filled with default values. Change them as needed.

  • The profile Name field will be updated based on the selected Resource. You can change it as needed.

  • Color assigns the selected color to the tool in the design view to easily identify the machine profile when more than one tool is used in the pipeline.

  • Resource lets you choose from various compute resources available. In this case, we are building a DRAGEN pipeline and we will need to select a resource with FPGA in it. Choose from FPGA resources (FPGA Medium/Large) based on your needs.

7.) Once you have selected the Machine Profile for the tool, find your tool from the Tool Repository at the bottom section of the component menu on the right. In this case, we are using the DRAGEN Demo Tool. Drag and drop the tool from the Tool Repository section to the visualization panel.

8.) The dropped tool will show the machine profile color, number of outputs and inputs, and warning to indicate missing parameters, mandatory values, and connections. Selecting the tool in the visualization panel activates the tool (DRAGEN Demo Tool) component menu. On the component menu section, you will find the details of the tool under Tool - DRAGEN Demo Tool. This section lists the inputs, outputs, additional parameters, and the machine profile required for the tool. In this case, the DRAGEN Demo Tool requires three inputs (FASTQ read 1, FASTQ read 2, and a Reference genome). The tool has two outputs (a VCF file and an output folder). The tool also has a mandatory parameter (Output File Prefix). Enter the value for the input parameter (Output File Prefix) in the text box.

9.) The top right corner of the visualization panel has icons to zoom in and out in the visualization panel followed by three icons: ref, in, and out. Based on the type of input/output needed, drag and drop the icons into the visualization area. In this case, we need three inputs (read 1, read 2, and Reference hash table.) and two outputs (VCF file and output folder). Start by dragging and dropping the first input (a). Connect the input to the tool by clicking on the blue dot at the bottom of the input icon and dragging it to the blue dot representing the first input on the tool (b). Select the input icon to activate the input component menu. The input section for the first input lets you enter the Name, Format, and other relevant information based on tool requirements. In this case, for the first input, enter the following information:

  • Name: FASTQ read 1

  • Format: FASTQ

  • Comments: any optional comments

10.) Repeat the step for other inputs. Note that the Reference hash table is treated as the input for the tool rather than Reference files. So, use the input icon instead of the reference icon.

11.) Repeat the process for two outputs by dragging and connecting them to the tool. Note that when connecting output to the tool, you will need to click on the blue dot at the bottom of the tool and drag it to the output.

12.) Select the tool and enter additional parameters. In this case, the tool requires Output File Prefix. Enter demo_ in the text box.

13.) Click on the Save button to save the pipeline. Once saved, you can run it from the Pipelines page under Flow from the left menus as any other pipeline.

CWL: Scatter-gather Method

Creating the tools

Furthermore, we will use the following CWL tool definitions:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
requirements:
- class: InlineJavascriptRequirement
label: fastp
doc: Modified from https://github.com/nigyta/bact_genome/blob/master/cwl/tool/fastp/fastp.cwl
inputs:
  fastq1:
    type: File
    inputBinding:
      prefix: -i
  fastq2:
    type:
    - File
    - 'null'
    inputBinding:
      prefix: -I
  threads:
    type:
    - int
    - 'null'
    default: 1
    inputBinding:
      prefix: --thread
  qualified_phred_quality:
    type:
    - int
    - 'null'
    default: 20
    inputBinding:
      prefix: --qualified_quality_phred
  unqualified_phred_quality:
    type:
    - int
    - 'null'
    default: 20
    inputBinding:
      prefix: --unqualified_percent_limit
  min_length_required:
    type:
    - int
    - 'null'
    default: 50
    inputBinding:
      prefix: --length_required
  force_polyg_tail_trimming:
    type:
    - boolean
    - 'null'
    inputBinding:
      prefix: --trim_poly_g
  disable_trim_poly_g:
    type:
    - boolean
    - 'null'
    default: true
    inputBinding:
      prefix: --disable_trim_poly_g
  base_correction:
    type:
    - boolean
    - 'null'
    default: true
    inputBinding:
      prefix: --correction
outputs:
  out_fastq1:
    type: File
    outputBinding:
      glob:
      - $(inputs.fastq1.nameroot).fastp.fastq
  out_fastq2:
    type:
    - File
    - 'null'
    outputBinding:
      glob:
      - $(inputs.fastq2.nameroot).fastp.fastq
  html_report:
    type: File
    outputBinding:
      glob:
      - fastp.html
  json_report:
    type: File
    outputBinding:
      glob:
      - fastp.json
arguments:
- prefix: -o
  valueFrom: $(inputs.fastq1.nameroot).fastp.fastq
- |
  ${
    if (inputs.fastq2){
      return '-O';
    } else {
      return '';
    }
  }
- |
  ${
    if (inputs.fastq2){
      return inputs.fastq2.nameroot + ".fastp.fastq";
    } else {
      return '';
    }
  }
baseCommand:
- fastp

and

#!/usr/bin/env cwl-runner

cwlVersion: cwl:v1.0
class: CommandLineTool
label: MultiQC
doc: MultiQC is a tool to create a single report with interactive plots for multiple
  bioinformatics analyses across many samples.
inputs:
  files:
    type:
    - type: array
      items: File
    - 'null'
    doc: Files containing the result of quality analysis.
    inputBinding:
      position: 2
  directories:
    type:
    - type: array
      items: Directory
    - 'null'
    doc: Directories containing the result of quality analysis.
    inputBinding:
      position: 3
  report_name:
    type: string
    doc: Name of output report, without path but with full file name (e.g. report.html).
    default: multiqc_report.html
    inputBinding:
      position: 1
      prefix: -n
outputs:
  report:
    type: File
    outputBinding:
      glob:
      - '*.html'
baseCommand:
- multiqc

Pipeline

Once the tools are created, we will create the pipeline itself using these two tools at Projects > your_project > Flow > Pipelines > CWL > Graphical:

  • On the Definition tab, go to the tool repository and drag and drop the two tools which you just created on the pipeline editor.

  • Connect the JSON output of fastp to multiqc input by hovering over the middle of the round, blue connector of the output until the icon changes to a hand and then drag the connection to the first input of multiqc. You can use the magnification symbols to make it easier to connect these tools.

  • Above the diagram, drag and drop two input FASTQ files and an output HTML file on to the pipeline editor and connect the blue markers to match the diagram below.

Relevant aspects of the pipeline:

  • Both inputs are multivalue (as can be seen on the screenshot)

  • Ensure that the step fastp has scattering configured: it scatters on both inputs using the scatter method 'dotproduct'. This means that as many instances of this step will be executed as there are pairs of FASTQ files. To indicate that this step is executed multiple times, the icons of both inputs have doubled borders.

Important remark

Both input arrays (Read1 and Read2) must be matched. Currently an automatic sorting of input arrays is not supported yet. One has to take care of matching the input arrays. There are two ways to achieve this (besides the manual specification in the GUI):

  • invoke this pipeline in CLI using Bash functionality to sort the arrays

  • add a tool to the pipeline which will intake array of all FASTQ files, spread them on R1 and R2 suffixes, and sort them.

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
requirements:
- class: InlineJavascriptRequirement
- class: InitialWorkDirRequirement
  listing:
  - entry: "import argparse\nimport os\nimport json\n\n# Create argument parser\n\
      parser = argparse.ArgumentParser()\nparser.add_argument(\"-i\", \"--inputFiles\"\
      , type=str, required=True, help=\"Input files\")\n\n# Parse the arguments\n\
      args = parser.parse_args()\n\n# Split the inputFiles string into a list of file\
      \ paths\ninput_files = args.inputFiles.split(',')\n\n# Sort the input files\
      \ by the base filename\ninput_files = sorted(input_files, key=lambda x: os.path.basename(x))\n\
      \n\n# Separate the files into left and right arrays, preserving the order\n\
      left_files = [file for file in input_files if '_R1_' in os.path.basename(file)]\n\
      right_files = [file for file in input_files if '_R2_' in os.path.basename(file)]\n\
      \n# Print the left files for debugging\nprint(\"Left files:\", left_files)\n\
      \n# Print the left files for debugging\nprint(\"Right files:\", right_files)\n\
      \n# Ensure left and right files are matched\nassert len(left_files) == len(right_files),\
      \ \"Mismatch in number of left and right files\"\n\n    \n# Write the left files\
      \ to a JSON file\nwith open('left_files.json', 'w') as outfile:\n    left_files_objects\
      \ = [{\"class\": \"File\", \"path\": file} for file in left_files]\n    json.dump(left_files_objects,\
      \ outfile)\n\n# Write the right files to a JSON file\nwith open('right_files.json',\
      \ 'w') as outfile:\n    right_files_objects = [{\"class\": \"File\", \"path\"\
      : file} for file in right_files]\n    json.dump(right_files_objects, outfile)\n\
      \n"
    entryname: spread_script.py
    writable: false
label: spread_items
inputs:
  inputFiles:
    type:
      type: array
      items: File
    inputBinding:
      separate: false
      prefix: -i
      itemSeparator: ','
outputs:
  leftFiles:
    type:
      type: array
      items: File
    outputBinding:
      glob:
      - left_files.json
      loadContents: true
      outputEval: $(JSON.parse(self[0].contents))
  rightFiles:
    type:
      type: array
      items: File
    outputBinding:
      glob:
      - right_files.json
      loadContents: true
      outputEval: $(JSON.parse(self[0].contents))
baseCommand:
- python3
- spread_script.py

Now this tool can added to the pipeline before fastp step.

Base: SnowSQL

You can access the databases and tables within the Base module using snowSQL command-line interface. This is useful for external collaborators who do not have access to ICA core functionalities. In this tutorial we will describe how to obtain the token and use it for accessing the Base module. This tutorial does not cover how to install and configure snowSQL.

Obtaining OAuth token and URL

Once the Base module has been enabled within a project, the following details are shown in Projects > your_project > Project Settings > Details.

After clicking the button Create OAuth access token, the pop-up authenticator is displayed.

After clicking the button Generate snowSQL command the pop-up authenticator presents the snowSQL command.

Copy the snowSQL command and run it in the console to log in.

You can also get the OAuth access token via API by providing <PROJECT ID> and <YOUR KEY>.

Example:

API Call:

curl -X 'POST' \
  'https://ica.illumina.com/ica/rest/api/projects/<PROJECT ID>/base:connectionDetails' \
  -H 'accept: application/vnd.illumina.v3+json' \
  -H 'X-API-Key: <YOUR KEY>' 

Response

{
  "authenticator": "oauth",
  "accessToken": "XXXXXXXXXX",
  "dnsName": "use1sf01.us-east-1.snowflakecomputing.com",
  "userPrincipalName": "xxxxx",
  "databaseName": "xxxxx",
  "schemaName": "xxx",
  "warehouseName": "xxxxxx",
  "roleName": "xxx"
}

Template snowSQL:

snowsql -a use1sf01.us-east-1 -u <userPrincipalName> --authenticator=oauth -r <roleName> -d <databaseName> -s PUBLIC -w <warehouseName> --token="<accessToken>"

Now you can perform a variety of tasks such as:

  1. Querying Data: execute SQL queries against tables, views, and other database objects to retrieve data from the Snowflake data warehouse.

  2. Creating and Managing Database Objects: create tables, views, stored procedures, functions, and other database objects in Snowflake. you can also modify and delete these objects as needed.

  3. Loading Data: load data into Snowflake from various sources such as local files, AWS S3, Azure Blob Storage, or Google Cloud Storage.

Overall, snowSQL CLI provides a powerful and flexible interface to work with Snowflake, allowing external users to manage data warehouse and perform a variety of tasks efficiently and effectively without access to the ICA core.

Example Queries:

Show all tables in the database:

>SHOW TABLES;

Create a new table:

create TABLE demo1(sample_name VARCHAR, count INT);

List records in a table:

SELECT * FROM demo1;

Load data from a file: To load data from a file, you can start by create a staging area in the internal storage using the following commend:

>CREATE STAGE myStage;

You can then upload the local file to the internal storage using the following command:

> PUT file:///path/to/data.tsv @myStage;

You can check if the file was uploaded properly using LIST command:

> LIST @myStage;

Finally, Load data by using COPY TO command. The command assumes the data.tsv is a tab delimited file. You can easily modify the following command to import JSON file setting TYPE=JSON.

> COPY INTO demo1(sample_name, count) FROM @mystage/data.tsv FILE_FORMAT = (TYPE = 'CSV' FIELD_DELIMITER = '\t');

Load data from a string: If you have data as JSON string, you can import the data into the tables using following commands.

> SET myJSON_str = '{"sample_name": "from-json-str", "count": 1}';
> INSERT INTO demo1(sample_name, count)
> SELECT
    PARSE_JSON($myJSON_str):sample_name::STRING,
    PARSE_JSON($myJSON_str):count::INT

Load data into specific columns: If you want to load sample_name into the table, you can remove the "count" from the column and the value list as below:

> SET myJSON_str = '{"sample_name": "from-json-str", "count": 1}';
> INSERT INTO demo1(sample_name)
  SELECT
    PARSE_JSON($myJSON_str):sample_name::STRING;

List the views of the database to which you are connected. As shared database and catalogue views are created within the project database, they will be listed. However, it does not show views which are granted via another database, role or from bundles.

>SHOW VIEW;

Show grants, both directly on the tables and views and grants to roles which in turn have grants on tables and views.

>SHOW GRANTS;

Base Basics

This tutorial provides an example for exercising the basic operations used with Base, including how to create a table, load the table with data, and query the table.

Prerequisites

  • An ICA project with access to Base

  • File to import

    • HES4-NM_021170-T00001  1392
      ISG15-NM_005101-T00002	46
      SLC2A5-NM_003039-T00003	14
      H6PD-NM_004285-T00004	30
      PIK3CD-NM_005026-T00005	200
      MTOR-NM_004958-T00006	156
      FBXO6-NM_018438-T00007	10
      MTHFR-NM_005957-T00008	154
      FHAD1-NM_052929-T00009	10
      PADI2-NM_007365-T00010	12

Create table

Tables are components of databases that store data in a 2-dimensional format of columns and rows. Each row represents a new data record in the table; each column represents a field in the record. On ICA, you can use Base to create custom tables to fit your data. A schema definition defines the fields in a table. On ICA you can create a schema definition from scratch, or from a template. In this activity, you will create a table for RNAseq count data, by creating a schema definition from scratch.

  1. Go to the Projects > your_project > Base > Tables and enable Base by clicking on the Enable button.

  1. Select Add Table > New Table.

  2. Create your table

    1. To create your table from scratch, select Empty Table from the Create table from dropdown.

    2. Name your table FeatureCounts

    3. Uncheck the box next to Include reference, to exclude reference data from your table.

    4. Check the box next to Edit as text. This will reveal a text box that can be used to create your schema.

    5. Copy the schema text below and paste it in into the text box to create your schema.

    {
      "Fields": [
        {
          "NAME_PATTERN": "[a-zA-Z][a-zA-Z0-9_]*",
          "Name": "TranscriptID",
          "Type": "STRING",
          "Mode": "REQUIRED",
          "Description": null,
          "DataResolver": null,
          "SubBluebaseFields": []
        },
        {
          "NAME_PATTERN": "[a-zA-Z][a-zA-Z0-9_]*",
          "Name": "ExpressionCount",
          "Type": "INTEGER",
          "Mode": "REQUIRED",
          "Description": null,
          "DataResolver": null,
          "SubBluebaseFields": []
        }
      ]
    }
  3. Click the Save button

Upload data to load into your table

  1. Upload sampleX.final.count.tsv file with the final count.

    1. Select Data tab (1) from the left menu.

    2. Click on the grey box (2) to choose the file to upload or drag and drop the sampleX.final.count.tsv into the grey box

    3. Refresh the screen (3)

    4. The uploaded file (4) will appear on the data page after successful upload.

Create a schedule to load data into your table

Data can be loaded into tables manually or automatically. To load data automatically, you can set up a schedule. The schedule specifies which files’ data should be automatically loaded into a table, when those files are uploaded to ICA or created by an analyses on ICA. Active schedules will check for new files every 24 hours.

In this exercise, you will create a schedule to automatically load RNA transcript counts from .final.count.tsv files into the table you created above.

  1. Go to Projects > your_project > Base > Schedule and click the + Add New button.

  1. Select the option to load the contents from files into a table.

  1. Create your schedule.

    1. Name your schedule LoadFeatureCounts

    2. Choose Project as the source of data for your table.

    3. To specify that data from .final.count.tsv files should be loaded into your table, enter .final.count.tsv in the Search for a part of a specific ‘Orignal Name’ or Tag text box.

    4. Specify your table as the one to load data into, by selecting your table (FeatureCounts) from the dropdown under Target Base Table.

    5. Under Write preference, select Append to table. New data will be appended to your table, rather than overwriting existing data in your table.

    6. The format of the .final.count.tsv files that will be loaded into your table are TSV/tab-delimited, and do not contain a header row. For the Data format, Delimiter, and Header rows to skip fields, use these values:

      • Data format: TSV

      • Delimiter: Tab

      • Header rows to skip: 0

    7. Click the Save button

  1. Highlight your schedule. Click the Run button to run your schedule now.

  • It will take a short time to prepare and load data into your table.

    1. Check the status of your job on your Projects > your_project > Activity page.

    2. Click the BASE JOBS tab to view the status of scheduled Base jobs.

    3. Click BASE ACTIVITY to view Base activity.

  1. Check the data in the table.

    1. Go back to your Projects > your_project > Base > Tables page.

    2. Double-click your table to view its details.

    3. You will land on the SCHEMA DEFINITION page.

    4. Click the PREVIEW tab to view the records that were loaded into your table.

    5. Click the DATA tab, to view a list of the files whose data has been loaded into your table.

Query a table

To request data or information from a Base table, you can run an SQL query. You can create and run new queries or saved queries.

In this activity, we will create and run a new SQL query to find out how many records (RNA transcripts) in your table have counts greater than 100.

  1. Go to your Projects > your_project > Base > Query page.

SELECT TranscriptID,ExpressionCount FROM FeatureCounts WHERE ExpressionCount > 100;
  1. Paste the above query into the NEW QUERY text box

  2. Click the Run Query button to run your query

  3. View your query results.

  4. Save your query for future use by clicking the Save Query button. You will be asked to "Name" the query before clicking on the "Create" button.

Export table data

Find the table you want to export on the "Tables" page under BASE. Go to the table details page by clicking twice on the table you want to export.

Click on the Export As File icon and complete the required fields

  1. Name: Name of the exported file

  2. Data Format: A table can be exported in CSV and JSON format. The exported files can be compressed using GZIP, BZ2, DEFLATE or RAW_DEFLATE.

    • CSV Format: In addition to Comma, the file can be Tab, Pipe or Custom character delimited.

    • JSON Format: Selecting JSON format exports the table in a text file containing a JSON object for each entry in the table. This is the standard snowflake behavior.

  1. Export to single/multiple files: This option allows the export of a table as a single (large) file or multiple (smaller) files. If "Export to multiple files" is selected, a user can provide "Maximum file size (in bytes)" for exported files. The default value is 16,000,000 bytes but can be increased to accommodate larger files. The maximum file size supported is 5 GB.

CWL CLI Workflow

In this tutorial, we will demonstrate how to create and launch a pipeline using the CWL language using the ICA command line interface (CLI).

Installation

Tutorial project

In this project, we will create two simple tools and build a workflow that we can run on ICA using CLI. The first tool (tool-fqTOfa.cwl) will convert a FASTQ file to a FASTA file. The second tool(tool-countLines.cwl) will count the number of lines in an input FASTA file. The workflow (workflow.cwl) will combine the two tools to convert an input FASTQ file to a FASTA file and count the number of lines in the resulting FASTA file.

tool-fqTOfa.cwl

#!/usr/bin/env cwltool

cwlVersion: v1.0
class: CommandLineTool
inputs:
  inputFastq:
    type: File
    inputBinding:
        position: 1
stdout: test.fasta
outputs:
  outputFasta:
    type: File
    streamable: true
    outputBinding:
        glob: test.fasta

arguments:
- 'NR%4 == 1 {print ">" substr($0, 2)}NR%4 == 2 {print}'
baseCommand:
- awk

tool-countLines.cwl

#!/usr/bin/env cwltool

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [wc, -l]
inputs:
  inputFasta:
    type: File
    inputBinding:
        position: 1
stdout: lineCount.tsv
outputs:
  outputCount:
    type: File
    streamable: true
    outputBinding:
        glob: lineCount.tsv

workflow.cwl

cwlVersion: v1.0
class: Workflow
inputs:
  ipFQ: File

outputs:
  count_out:
    type: File
    outputSource: count/outputCount
  fqTOfaOut:
    type: File
    outputSource: convert/outputFasta
   
steps:
  convert:
    run: tool-fqTOfa.cwl
    in:
      inputFastq: ipFQ
    out: [outputFasta]

  count:
    run: tool-countLines.cwl
    in:
      inputFasta: convert/outputFasta
    out: [outputCount]

[!IMPORTANT] Please note that we don't specify the Docker image used in both tools. In such a case, the default behaviour is to use public.ecr.aws/docker/library/bash:5 image. This image contains basic functionality (sufficient to execute wc and awk commands).

In case you want to use a different public image, you can specify it using requirements tag in cwl file. Assuming you want to use *ubuntu:latest' you need to add

requirements:
  - class: DockerRequirement
    dockerPull: ubuntu:latest

In case you want to use a Docker image from the ICA Docker repository, you would need the link to AWS ECR from ICA GUI. Double-click on the image name in the Docker repository and copy the URL to the clipboard. Add the URL to dockerPull key.

requirements:
  - class: DockerRequirement
    dockerPull: 079623148045.dkr.ecr.eu-central-1.amazonaws.com/cp-prod/XXXXXXXXXX:latest

Authentication

Enter/Create a Project

You can create a project or use an existing project for creating a new pipeline. You can create a new project using the "icav2 projects create" command.

% icav2 projects create basic-cli-tutorial --region c39b1feb-3e94-4440-805e-45e0c76462bf

If you do not provide the "--region" flag, the value defaults to the existing region when there is only one region available. When there is more than one region available, a selection must be made from the available regions at the command prompt. The region input can be determined by calling the "icav2 regions list" command first.

You can select the project to work on by entering the project using the "icav2 projects enter" command. Thus, you won't need to specify the project as an argument.

% icav2 projects enter basic-cli-tutorial

You can also use the "icav2 projects list" command to determine the names and ids of the project you have access to.

% icav2 projects list

Create a pipeline on ICA

"projectpipelines" is the root command to perform actions on pipelines in a project. "create" command creates a pipeline in the current project.

The parameter file specifies the input for the workflow with additional parameter settings for each step in the workflow. In this tutorial, input is a FASTQ file shown inside <dataInput> tag in the parameter file. There aren't any specific settings for the workflow steps resulting in a parameter file below with an empty <steps> tag. Create a parameter file (parameters.xml) with the following content using a text editor.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
    <pd:dataInputs>
        <pd:dataInput code="ipFQ" format="FASTQ" type="FILE" required="true" multiValue="false">
            <pd:label>ipFQ</pd:label>
            <pd:description></pd:description>
        </pd:dataInput>
    </pd:dataInputs>
    <pd:steps/>
</pd:pipeline>

The following command creates a pipeline called "cli-tutorial" using the workflow "workflow.cwl", tools "tool-fqTOfa.cwl" and "tool-countLines.cwl" and parameter file "parameter.xml" with small storage size.

% icav2 projectpipelines create cwl cli-tutorial --workflow workflow.cwl --tool tool-fqTOfa.cwl --tool tool-countLines.cwl --parameter parameters.xml --storage-size small --description "cli tutorial pipeline"

Once the pipeline is created, you can view it using the "list" command.

% icav2 projectpipelines list
ID                                   	CODE                      	DESCRIPTION                                      
6779fa3b-e2bc-42cb-8396-32acee8b6338	cli-tutorial             	cli tutorial pipeline 

Running the pipeline

@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
@SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
AGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
+SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
IIIIIIIIIIIIIIIIIIIIIIGII>IIIII-I)8I
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
@SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
AGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
+SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
IIIIIIIIIIIIIIIIIIIIIIGII>IIIII-I)8I
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
@SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
AGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
+SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
IIIIIIIIIIIIIIIIIIIIIIGII>IIIII-I)8I

The "icav2 projectdata upload" command lets you upload data to ica.

% icav2 projectdata upload test.fastq /
oldFilename= test.fastq en newFilename= test.fastq
bucket= stratus-gds-use1  prefix= 0a488bb2-578b-404a-e09d-08d9e3343b2b/test.fastq
Using: 1 workers to upload 1 files
15:23:32: [0]  Uploading /Users/user1/Documents/icav2_validation/for_tutorial/working/test.fastq
15:23:33: [0]  Uploaded /Users/user1/Documents/icav2_validation/for_tutorial/working/test.fastq to /test.fastq in 794.511591ms
Finished uploading 1 files in 795.244677ms

The "list" command lets you view the uploaded file. Note the ID of the file you want to use with the pipeline.

% icav2 projectdata list                
PATH          NAME        TYPE  STATUS    ID                                    OWNER                                 
/test.fastq  test.fastq FILE  AVAILABLE fil.c23246bd7692499724fe08da020b1014  4b197387-e692-4a78-9304-c7f73ad75e44

The "icav2 projectpipelines start" command initiates the pipeline run. The following command runs the pipeline. Note the id for exploring the analysis later.

Note: If for some reason your "create" command fails and needs to rerun, you might get an error (ConstraintViolationException). If so, try your command with a different name.

% icav2 projectpipelines start cwl cli-tutorial --type-input STRUCTURED --input ipFQ:fil.c23246bd7692499724fe08da020b1014 --user-reference tut-test
analysisStorage.description           1.2 TB
analysisStorage.id                    6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
analysisStorage.name                  Small
analysisStorage.ownerId               8ec463f6-1acb-341b-b321-043c39d8716a
analysisStorage.tenantId              f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
analysisStorage.tenantName            ica-cp-admin
analysisStorage.timeCreated           2021-11-05T10:28:20Z
analysisStorage.timeModified          2021-11-05T10:28:20Z
id                                    461d3924-52a8-45ef-ab62-8b2a29621021
ownerId                               7fa2b641-1db4-3f81-866a-8003aa9e0818
pipeline.analysisStorage.description  1.2 TB
pipeline.analysisStorage.id           6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
pipeline.analysisStorage.name         Small
pipeline.analysisStorage.ownerId      8ec463f6-1acb-341b-b321-043c39d8716a
pipeline.analysisStorage.tenantId     f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
pipeline.analysisStorage.tenantName   ica-cp-admin
pipeline.analysisStorage.timeCreated  2021-11-05T10:28:20Z
pipeline.analysisStorage.timeModified 2021-11-05T10:28:20Z
pipeline.code                         cli-tutorial
pipeline.description                  Test, prepared parameters file from working GUI
pipeline.id                           6779fa3b-e2bc-42cb-8396-32acee8b6338
pipeline.language                     CWL
pipeline.ownerId                      7fa2b641-1db4-3f81-866a-8003aa9e0818
pipeline.tenantId                     d0696494-6a7b-4c81-804d-87bda2d47279
pipeline.tenantName                   icav2-entprod
pipeline.timeCreated                  2022-03-10T13:13:05Z
pipeline.timeModified                 2022-03-10T13:13:05Z
reference                             tut-test-cli-tutorial-eda7ee7a-8c65-4c0f-bed4-f6c2d21119e6
status                                REQUESTED
summary                               
tenantId                              d0696494-6a7b-4c81-804d-87bda2d47279
tenantName                            icav2-entprod
timeCreated                           2022-03-10T20:42:42Z
timeModified                          2022-03-10T20:42:43Z
userReference                         tut-test

You can check the status of the run using the "icav2 projectanalyses get" command.

%   icav2 projectanalyses get 461d3924-52a8-45ef-ab62-8b2a29621021
analysisStorage.description           1.2 TB
analysisStorage.id                    6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
analysisStorage.name                  Small
analysisStorage.ownerId               8ec463f6-1acb-341b-b321-043c39d8716a
analysisStorage.tenantId              f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
analysisStorage.tenantName            ica-cp-admin
analysisStorage.timeCreated           2021-11-05T10:28:20Z
analysisStorage.timeModified          2021-11-05T10:28:20Z
endDate                               2022-03-10T21:00:33Z
id                                    461d3924-52a8-45ef-ab62-8b2a29621021
ownerId                               7fa2b641-1db4-3f81-866a-8003aa9e0818
pipeline.analysisStorage.description  1.2 TB
pipeline.analysisStorage.id           6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
pipeline.analysisStorage.name         Small
pipeline.analysisStorage.ownerId      8ec463f6-1acb-341b-b321-043c39d8716a
pipeline.analysisStorage.tenantId     f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
pipeline.analysisStorage.tenantName   ica-cp-admin
pipeline.analysisStorage.timeCreated  2021-11-05T10:28:20Z
pipeline.analysisStorage.timeModified 2021-11-05T10:28:20Z
pipeline.code                         cli-tutorial
pipeline.description                  Test, prepared parameters file from working GUI
pipeline.id                           6779fa3b-e2bc-42cb-8396-32acee8b6338
pipeline.language                     CWL
pipeline.ownerId                      7fa2b641-1db4-3f81-866a-8003aa9e0818
pipeline.tenantId                     d0696494-6a7b-4c81-804d-87bda2d47279
pipeline.tenantName                   icav2-entprod
pipeline.timeCreated                  2022-03-10T13:13:05Z
pipeline.timeModified                 2022-03-10T13:13:05Z
reference                             tut-test-cli-tutorial-eda7ee7a-8c65-4c0f-bed4-f6c2d21119e6
startDate                             2022-03-10T20:42:42Z
status                                SUCCEEDED
summary                               
tenantId                              d0696494-6a7b-4c81-804d-87bda2d47279
tenantName                            icav2-entprod
timeCreated                           2022-03-10T20:42:42Z
timeModified                          2022-03-10T21:00:33Z
userReference                         tut-test

The pipelines can be run using JSON input type as well. The following is an example of running pipelines using JSON input type. Note that JSON input works only with file-based CWL pipelines (built using code, not a graphical editor in ICA).

 % icav2 projectpipelines start cwl cli-tutorial --data-id fil.c23246bd7692499724fe08da020b1014 --input-json '{
  "ipFQ": {
    "class": "File",
    "path": "test.fastq"
  }
}' --type-input JSON --user-reference tut-test-json

Notes

runtime.ram and runtime.cpu

runtime.ram and runtime.cpu values are by default evaluated using the compute environment running the host CWL runner. CommandLineTool Steps within a CWL Workflow run on different compute environments than the host CWL runner, so the valuations of the runtime.ram and runtime.cpu for within the CommandLineTool will not match the runtime environment the tool is running in. The valuation of runtime.ram and runtime.cpu can be overridden by specifying cpuMin and ramMin in the ResourceRequirements for the CommandLineTool.

Nextflow CLI Workflow

Nextflow CLI Workflow

In this tutorial, we will demonstrate how to create and launch a Nextflow pipeline using the ICA command line interface (CLI).

Installation

Tutorial project

The 'main.nf' file defines the workflow that orchestrates various RNASeq analysis processes.

main.nf

nextflow.enable.dsl = 2

process INDEX {
   input:
       path transcriptome_file

   output:
       path 'salmon_index'

   script:
       """
       salmon index -t $transcriptome_file -i salmon_index
       """
}

process QUANTIFICATION {
   publishDir 'out', mode: 'symlink'

   input:
       path salmon_index
       tuple path(read1), path(read2)
       val(quant)

   output:
       path "$quant"

   script:
       """
       salmon quant --libType=U -i $salmon_index -1 $read1 -2 $read2 -o $quant
       """
}

process FASTQC {

   input:
       tuple path(read1), path(read2)

   output:
       path "fastqc_logs"

   script:
       """
       mkdir fastqc_logs
       fastqc -o fastqc_logs -f fastq -q ${read1} ${read2}
       """
}

process MULTIQC {
   publishDir 'out', mode:'symlink'

   input:
       path '*'

   output:
       path 'multiqc_report.html'

   script:
       """
       multiqc .
       """
}

workflow {
   index_ch = INDEX(Channel.fromPath(params.transcriptome_file))
   quant_ch = QUANTIFICATION(index_ch, Channel.of([file(params.read1), file(params.read2)]),Channel.of("quant"))
   fastqc_ch = FASTQC(Channel.of([file(params.read1), file(params.read2)]))
   MULTIQC(quant_ch.mix(fastqc_ch).collect())
}

The script uses the following tools:

  1. Salmon: Software tool for quantification of transcript abundance from RNA-seq data.

  2. FastQC: QC tool for sequencing data

  3. MultiQC: Tool to aggregate and summarize QC reports

Docker image upload

docker pull nextflow/rnaseq-nf

Create a tarball of the image to upload to ICA.

docker save nextflow/rnaseq-nf > cont_rnaseq.tar

Following are lists of commands that you can use to upload the tarball to your project.

# Enter the project context
icav2 enter docs
# Upload the container image to the root directory (/) of the project
icav2 projectdata upload cont_rnaseq.tar /

Add the image to the ICA Docker repository

The uploaded image can be added to the ICA docker repository from the ICA Graphical User Interface (GUI).

Change the format for the image tarball to DOCKER:

  1. Navigate to Projects > <your_project> Data

  2. Check the checkbox for the uploaded tarball

  3. Click on "Manage" dropdown

  4. Click on "Change format" In the new popup window, select "DOCKER" format and hit save.

To add this image to the ICA Docker repository, first click on "All Projects" to go back to the home page.

  1. From the ICA home page, click on the "Docker Repository" page under "System Settings"

  2. Click the "+ New" button to open the "New Docker Image" window.

  3. In the new window, click on the "Select a file with DOCKER format"

This will open a new window that lets you select the above tarball.

  1. Select the region (US, EU, CA) your project is in.

  2. Select your project. You can start typing the name in the textbox to filter it.

  3. The bottom pane will show the "Data" section of the selected project. If you have the docker image in subfolders, browse the folders to locate the file. Once found, click on the checkbox corresponding to the image and press "Select".

You will be taken back to the "New Docker image" window. The "Data" and "Name" fields will have been populated based on the imported image. You can edit the "Name" field to rename it. For this tutorial, we will change the name to "rnaseq". Select the region, and give it a version number, and description. Click on "Save".

If you have the images hosted in other repositories, you can add them as external image by clicking the "+ New external image" button and completing the form as shown in the example below.

After creating a new docker image, you can double click on the image to get the container URL for the nextflow configuration file.

Nextflow configuration file

Create a configuration file called "nextflow.config" in the same folder as the main.nf file above. Use the URL copied above to add the process.container line in the config file.

process.container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/3cddfc3d-2431-4a85-82bb-dae061f7b65d:latest'
process {
    container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/3cddfc3d-2431-4a85-82bb-dae061f7b65d:latest'
    pod = [
        annotation: 'scheduler.illumina.com/presetSize',
        value: 'standard-small'
    ]  
}

Parameters file

An empty form looks as follows:

<pipeline code="" version="1.0" xmlns="xsd://www.illumina.com/ica/cp/pipelinedefinition">
   <dataInputs>
   </dataInputs>
   <steps>
   </steps>
</pipeline>

The input files are specified within a single dataInputs node with individual input file specified in a separate dataInput node. Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc..

For this tutorial, we do not have any settings parameters but it requires multiple file inputs. The parameters.xml file looks as follows:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
   <pd:dataInputs>
       <pd:dataInput code="read1" format="FASTQ" type="FILE" required="true" multiValue="false">
           <pd:label>FASTQ Read 1</pd:label>
           <pd:description>FASTQ Read 1</pd:description>
       </pd:dataInput>
       <pd:dataInput code="read2" format="FASTQ" type="FILE" required="true" multiValue="false">
           <pd:label>FASTQ Read 2</pd:label>
           <pd:description>FASTQ Read 2</pd:description>
       </pd:dataInput>
       <pd:dataInput code="transcriptome_file" format="FASTA" type="FILE" required="true" multiValue="false">
           <pd:label>Transcript</pd:label>
           <pd:description>Transcript faster</pd:description>
       </pd:dataInput>
   </pd:dataInputs>
   <pd:steps/>
</pd:pipeline>

Use the following commands to create the pipeline with the above workflow in your project.

If not already in the project context, enter it by using the following command:

icav2 enter <PROJECT NAME or ID>

Create pipeline using icav2 project pipelines create nextflow Example:

icav2 projectpipelines create nextflow rnaseq-docs --main main.nf --parameter parameters.xml --config nextflow.config --storage-size small --description 'cli nextflow pipeline'

If you prefer to organize the processes in different folders/files, you can use --other parameter to upload the different processes as additional files. Example:

icav2 projectpipelines create nextflow rnaseq-docs --main main.nf --parameter parameters.xml --config nextflow.config --other index.nf:filename=processes/index.nf --other quantification.nf:filename=processes/quantification.nf --other fastqc.nf:filename=processes/fastqc.nf --other multiqc.nf:filename=processes/multiqc.nf --storage-size small --description 'cli nextflow pipeline'

Example command to run the pipeline from CLI:

icav2 projectpipelines start nextflow <pipeline_id> --input read1:<read1_file_id> --input read2:<read2_file_id> --input transcriptome_file:<transcriptome_file_id> --storage-size small --user-reference demo_run

You can get the pipeline id under "ID" column by running the following command:

icav2 projectpipelines list

You can get the file ids under "ID" column by running the following commands:

icav2 projectdata list

Additional Resources:

nf-core Pipelines

Introduction

This tutorial shows you how to

Preparation

  • Start Bench workspace

    • For this tutorial, the instance size depends on the flow you import, and whether you use a Bench cluster:

      • If using a cluster, choose standard-small or standard-medium for the workspace master node

      • Otherwise, choose at least standard-large as nf-core pipelines often need more than 4 cores to run.

    • Select the single user workspace permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines

    • Specify at least 100GB of disk space

  • Optional: After choosing the image, enable a cluster with at least this one standard-largeinstance type

  • Start the workspace, then (if applicable) start the cluster

Import nf-core Pipeline to Bench

mkdir demo
cd demo
pipeline-dev import-from-nextflow nf-core/demo

If conda and/or nextflow are not installed, pipeline-dev will offer to install them.

The Nextflow files are pulled into the nextflow-src subfolder.

A larger example that still runs quickly is nf-core/sarek

Result

/data/demo $ pipeline-dev import-from-nextflow nf-core/demo

Creating output folder nf-core/demo
Fetching project nf-core/demo

Fetching project info
project name: nf-core/demo
repository  : https://github.com/nf-core/demo
local path  : /data/.nextflow/assets/nf-core/demo
main script : main.nf
description : An nf-core demo pipeline
author      : Christopher Hakkaart

Pipeline “nf-core/demo” successfully imported into nf-core/demo.

Suggested actions:
  cd nf-core/demo
  pipeline-dev run-in-bench
  [ Iterative dev: Make code changes + re-validate with previous command ]
  pipeline-dev deploy-as-flow-pipeline
  pipeline-dev launch-validation-in-flow

Run Validation Test in Bench

All nf-core pipelines conveniently define a "test" profile that specifies a set of validation inputs for the pipeline.

The following command runs this test profile. If a Bench cluster is active, it runs on your Bench cluster, otherwise it runs on the main workspace instance.

cd nf-core/demo
pipeline-dev run-in-bench

The pipeline-dev tool is using "nextflow run ..." to run the pipeline. The full nextflow command is printed on stdout and can be copy-pasted+adjusted if you need additional options.

Result

Monitoring

When a pipeline is running locally (i.e. not on a Bench cluster), you can monitor the task execution from another terminal with docker ps

When a pipeline is running on your Bench cluster, a few commands help to monitor the tasks and cluster. In another terminal, you can use:

  • qstat to see the tasks being pending or running

  • tail /data/logs/sge-scaler.log.<latest available workspace reboot time> to check if the cluster is scaling up or down (it currently takes 3 to 5 minutes to get a new node)

Data Locations

  • The output of the pipeline is in the outdir folder

  • Nextflow work files are under the work folder

  • Log files are .nextflow.log* and output.log

Deploy as Flow Pipeline

pipeline-dev deploy-as-flow-pipeline

After generating a few ICA-specific files (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name, so that's what is checked here). It then asks if you want to update the latest version or create a new one.

Choose "3" and enter a name of your choice to avoid conflicts with other users following this same tutorial.

Choice: 3
Creating ICA Flow pipeline dev-nf-core-demo_v4
Sending inputForm.json
Sending onRender.js
Sending main.nf
Sending nextflow.config

At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.

Run Validation Test in Flow

pipeline-dev launch-validation-in-flow

This launches an analysis in ICA Flow, using the same inputs as the nf-core pipeline's "test" profile.

Some of the input files will have been copied to your ICA project to allow the launch to take place. They are stored in the folder bench-pipeline-dev/temp-data.

Hints

Using older versions of Nextflow

Some older nf-core flows are still using DSL1, which is only working up to Nextflow 22.

An easy solution is to create a conda environment for nextflow 22:

conda create -n nextflow22
 
# If, like me, you never ran "conda init", do it now:
conda init
bash -l # To load the conda's bashrc changes
 
conda activate nextflow22
conda install -y nextflow=22
 
# Check
nextflow -version
 
# Then use the pipeline-dev tools as in the demo

Data Transfer Options

ICA Connector

The platform provides Connectors to facilitate automation for operations on data (ie, upload, download, linking). The connectors are helpful when you want to sync data between ICA and your local computer or link data between projects in ICA.

ICA CLI

The ICA CLI upload/download proves beneficial when handling large files/folders, especially in situations where you're operating on a remote server by connecting from your local computer. You can use icav2 projects enter <project-name/id> to set the project context for the CLI to use for the commands when relevant. If the project context is not set, you can supply the additional parameter --project-id <project-id> to specify the project for the command.

Upload Data

Download Data

Note: Because of how S3 manages storage, it doesn't have a concept of folders in the traditional sense. So, if you provide the "folder" ID of an empty "folder", you will not see anything downloaded.

ICA API

Example

In the example above, we're generating a partial file named 'tempFile.txt' within a project identified by the project ID '41d3643a-5fd2-4ae3-b7cf-b89b892228be', situated inside a folder with the folder ID 'fol.579eda846f1b4f6e2d1e08db91408069'. You can access project, file, or folder IDs either by logging into the ICA web interface or through the use of the ICA CLI.

The response will look like this:

Retrieve the data/file ID from the response (for instance: fil.b13c782a67e24d364e0f08db9f537987) and employ the following format for the Post request - /api/projects/{projectId}/data/{dataId}:createUploadUrl:

The response will look like this:

Use the URL from the response to upload a file (tempFile.txt) as follows:

AWS CLI

Generate temporary credentials

Example cli to generate temporary credentials:

If you are trying to upload data to /cli-upload/ folder, you can get the temporary credentials to access the folder using icav2 projectdata temporarycredentials /cli-upload/. It will produce following output with accessKey, secretKey and sessionToken that you will need to configure AWS CLI to access this folder.

Copy the awsTempCredentials.accessKey, awsTempCredentials.secretKey and awsTempCredentials.sessionToken to build the credentials file: ~/.aws/credentials. It should look something like

Example format for credentials file:

The temporary credentials expire in 36 hours. If the temporary credentials expire before the copy is complete, you can use AWS sync command to resume from where it left off.

Following are a few AWS commands to demonstrate the use. The remote path in the commands below are constructed off of the output of temporarycredentials command in this format: s3://<awsTempCredentials.bucket>/<awsTempCredentials.objectPrefix>

Example AWS commands

You can also write scripts to monitor the progress of your copy operation and regenerate and refresh the temporary credentials before they expire.

rclone

You may also use Rclone for data transfer if you prefer. The steps to generate temporary credentials is the same as above. You can run rclone config to set keys and tokens to configure rclone with the temporary credentials. You will need to select the advanced edit option when asked to enter the session key. After completing the config, your configure file (~/.config/rclone/rclone.conf) should look like this:

Example Rclone commands

Pipeline Chaining on AWS

There are several ways to connect pipelines in ICAv2. One of them is to use Single Notification Service (SNS) and a Lambda function deployed on AWS. Once the initial pipeline is completed, SNS triggers the Lambda function. Lambda extracts information from the event parameter to create an API call to start the subsequent pipeline.

SNS

with arn being the Amazon Resource Name (ARN) of the target SNS topic. Once the SNS is created in AWS, you can create a New ICA Subscription in Projects > your_project > Project Settings > Notifications > New ICA Subscription. The following screenshot displays the settings of a subscription for Analysis success of a pipeline with the name starting with Hello.

ICA API endpoints

Starting a Nextflow pipeline using the API

To start a Nextflow pipeline using the API, use the endpoint /api/projects/{projectId}/analysis:nextflow. Provide the projectID and the reference body in JSON format containing userReference, pipelineId, analysisInput etc. Two parameters activationCodeDetailId and analysisStorageId have to be retrieved using the API endpoint api/activationCodes:findBestMatchingForNextflow from Entitlement Detail section in Swagger. For example:

Output of the API call:

In this particular case, the activationCodeDetailId is "6375eb43-e865-4d7c-a9e2-2c153c998a5c" and analysisStorageId is "6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0" (for resource type "Small").

Once you have all these parameters, you can start the pipeline using API.

Setup of Lambda function

Next, create a new Lambda function in the AWS Management Console. Choose Author from scratch and select Python3.7 (includes requests library) as the runtime. In the Function code section, write the code for the Lambda function that will use different Python modules and execute API calls to the existing online application. Add the SNS created above as a trigger.

Example

Here is an example of a Python code to check if there is file named 'test.txt' in the output of the successful pipeline. If the file exists, a new API call will be made to invoke the second pipeline with this 'test.txt' as an input.

Bench ICA Python Library

This tutorial demonstrates how to use the ICA Python library packaged with the JupyterLab image for Bench Workspaces.

The tutorial will show how authentication to the ICA API works and how to search, upload, download and delete data from a project into a Bench Workspace. The python code snippets are written for compatibility with a Jupyter Notebook.

Python modules

Navigate to Bench > Workspaces and click Enable to enable workspaces. Select +New Workspace to create a new workspace. Fill in the required details and select JupyterLab for the Docker image. Click Save and Start to open the workspace. The following snippets of code can be pasted into the workspace you've created.

This snippet defines the required python modules for this tutorial:

Authentication

This snippet shows how to authenticate using the following methods:

  • ICA Username & Password

  • ICA API Token

Data Operations

These snippets show how to manage data in a project. Operations shown are:

  • Create a Project Data API client instance

  • List all data in a project

  • Create a data element in a project

  • Upload a file to a data element in a project

  • Download a data element from a project

  • Search for matching data elements in a project

  • Delete matching data elements in a project

List Data

Create Data

Upload Data

Download Data

Search for Data

Delete Data

Base Operations

These snippets show how to get a connection to a base database and run an example query. Operations shown are:

  • Create a python jdbc connection

  • Create a table

  • Insert data into a table

  • Query the table

  • Delete the table

This snipppet defines the required python modules for this tutorial:

Get Base Access Credentials

Create a Table

Add Table Record

Query Table

Delete Table

Launch Pipelines on CLI

Launch a Prepackaged DRAGEN Pipeline

Prerequisite - Launch a CWL or Nextflow pipeline to completion using the ICA GUI with the intended set of parameters.

Configure CLI and Identify Pipeline ID

Configure and Authenticate ICA command line interface (CLI).

Obtain a list of your projects with their associated IDs:

Use the ID of the project from the previous step to enter the project context:

Find the pipeline you want to start from the CLI by obtaining a list of pipelines associated with your project:

Find the ID associated with your pipeline of interest.

Identify Input File Parameters

To find the input files parameter, you can use a previously launched projectanalyses with the input command.

Find the previous analyses launched along with their associated IDs:

List the analyses inputs by using the ID found in the previous step:

This will return the Input File Codes, as well as the file names and data IDs of the associated data used to previously launch the pipeline

Identify Configuration Settings

Currently, this step for CWL requires the use of the ICA API to access the configuration settings of a project analyses that ran successfully. It is optional for Nextflow since the XML configuration file can be accessed in the ICA GUI.

Nextflow XML file parameters

Click the previous GUI run, and select the pipeline that was run. On the pipeline page, select the XML Configuration Tab to view the configuration settings.

In the "steps" section of the XML file, you will find various steps labeled with

and subsequent labels of parameters with a similar structure

The code should be used to generate the later command line parameters e.g.

--parameters enable_map_align:true

API based Configuration Settings (CWL or Nextflow)

  • Generate JWT Token from API Key or Basic login credentials

  • Instructions on how to get an API Key https://illumina.gitbook.io/ica/account-management/am-iam#api-keys

  • If your user has access to multiple domains, you will need to need to add a "?tenant=($domain)" to the request

  • Response to this request will provide a JWT token {"token":($token)}, use the value of the token in further requests

  • Using the API endpoint /api/projects/{projectID}/analyses/{analysisId}/configurations to find the configuration file listing out all of required and optional parameters

The response JSON to this API will have configuration items listed as

Create Launch Command CWL

Structure of the final command

icav2 projectpipelines start cwl $(pipelineID) --user-referenc Plus input options

  • Input Options - For CLI, the entire input can be broken down as individual command line arguments

To launch the same analysis as using the GUI, use the same file ID and parameters, if using new data you can use the CLI command icav2 projectdata list to find new file IDs to launch a new instance of the pipeline Required information in Input - Input Data and Parameters

Command Line Arguments

  • This option requires the use of --type input STRUCTURED along with --input and --parameters

Responses:

Successful Response

Unsuccessful Response Pipeline ID not formatted correctly

  • Check that the pipeline ID is correct based on icav2 projectpipelines list

File ID not found

  • Check that the file ID is correct based on icav2 projectdata list

Parameter not found

Create Launch Command Nextflow

When using nextflow to start runs, the input-type parameter is not used, but the --project-id is required

Structure of the file command icav2 projectpipelines start nextflow $(pipelineID) --user-reference Plus input options

  • Response status can be used to determine if the pipeline was submitted successfully

  • status options: REQUESTED,SUCCEEDED,FAILED,ABORTED

Mount projectdata using CLI

Mount projectdata using CLI

Prerequisites

    • For other operating systems, refer to OS specific documentation for FUSE driver installation.

Mount projectdata

Identify the project id by running the following command:

Provide the project id under "ID" column above to the mount command to mount the project data for the project.

Check the content of the mount.

WARNING Do NOT use the CP -f command to copy or move data to a mounted location. This will result in data loss as data on the destination location will be deleted.

Unmount project data

You can unmount the project data using the 'unmount' command.

Releases

Find the links to CLI builds in the Releases section below.

Downloading the Installer

Version Check

To determine which CLI version you are currently using, navigate to your currently installed CLI and use the CLI command icav2 version For help on this command use icav2 version -h.

Integrity Check

Checksums are provided alongside each downloadable CLI binary to verify file integrity. The checksums are generated using the SHA256 algorithm. To use the checksums:

  1. Download the CLI binary for your OS

  2. Download the corresponding checksum using the links in the table

  3. Calculate the SHA256 checksum of the downloaded CLI binary

  4. Diff the calculated SHA256 checksum with the downloaded checksum. If the checksums match, the integrity is confirmed.

There are a variety of open source tools for calculating the SHA256 checksum. See the below tables for examples.

For CLI v2.3.0 and later:

For CLI v2.2.0

Releases

End-to-End User Flow: DRAGEN Analysis

This tutorial demonstrates major functions of the ICA platform, beginning with setting up a project with instrument run data to prepare to use pipelines, and concluding with viewing pipeline outputs in preparation for eventually ingesting outputs into available modules.

In the following example, we start from an existing ICA project with demultiplexed instrument run data (fastq), use the DRAGEN Germline pipeline, and view output data.

Set-up

Linking an Entitled Bundle

Steps:

  • Go to your project's Details page

  • Click the Edit button

  • Click the + button, under LINKED BUNDLES

  • Click on the DRAGEN Demo Bundle, then click the Link Bundles button

    • There may be multiple versions of DRAGEN Demo Bundles. This tutorial details steps for DRAGEN Demo Bundle 3.9.5. Steps for other versions after 3.9.5 should be similar.

  • Click the Save button

DRAGEN Demo Bundle assets should be available now in your projects Data and Pipelines pages.

Pipelines

After setting up the project in ICA and linking a bundle, we can run various pipelines.

DRAGEN Germline Published Pipeline

This example demonstrates how to run the DRAGEN Germline Published Pipeline (version 3.9.5) in your ICA project using the demo data from the linked DRAGEN Demo Bundle.

The required pipeline input assets for this tutorial include:

  • Under Data page

    • Illumina DRAGEN Germline Demo Data folder

    • Illumina DRAGEN Enrichment Demo Data folder

    • Illumina References folder

  • Under Pipelines page

    • DRAGEN Germline

Launching the DRAGEN Germline Published Pipeline with Demo Data

From the Pipelines page, select DRAGEN Germline 3.9.5, and then click Start Analysis. Initial set-up details require a User Reference (pipeline run name meaningful to the user) and an Entitlement Bundle from the drop-down menu under Pricing.

Running the DRAGEN Germline pipeline uses the following inputs which are to be added in the Input Files section:

  • FASTQ files

    • Select the FASTQ files in the Illumina DRAGEN Enrichment Demo Data folder and select Add.

  • Reference:

    • Select a reference genome from the Illumina References folder (do not select a methyl-converted reference genome for this tutorial)

      • Ie: hg38_altaware_nohla-cnv-anchored.v8.tar (suggested, if enabling CNV analysis)

The DRAGEN Germline Settings to be selected are:

  • Enable germline small variant calling: Set to true

  • Enable SV (structural variant) calling: Set to true

    • If true, Enable map align output must also be set to true

  • Enable repeat genotyping: Set to true

  • Enable map align: Set to true

    • When using FASTQ files as input, as in this example, set this to true as default.

    • When using BAM files as input, set to true to realign reads in input BAMs; set to false to keep alignments in input BAM files.

  • Enable CNV calling: Set to true

    • Enabling Copy Number Variant calling requires one of the following:

      • Enable CNV self normalization is set to true

      • A panel of normals (PON) is provided in the Input Files

  • Output format: Set to CRAM

    • Other available options for alignments output are BAM and SAM format.

  • Enable CNV self-normalization: Set to true

    • Required if Enable CNV calling is set to true and no panel of normals (PON) is provided in the Input Files.

  • Enable duplicate marking: Set to true

  • Emit Ref Confidence: Set to GVCF to enable banded gVCF generation for this example

    • To enable base pair resolution in the gVCF, set to BP_RESOLUTION

  • Additional DRAGEN args: Leave Empty

    • Users can provide additional DRAGEN arguments here (see the DRAGEN user guide for examples), but we will leave this blank for this example run.

  • Sample sex: Leave blank

    • Users may specific the sex of the sample here if known, but the user will omit this setting for this example run.

  • Enable HLA: Set to true to enable HLA typing

  • Enable map align output: Set to true

    • The format for alignment output was selected previously in the "Output format setting" above

  • Resources

    • Use the default resources settings:

      • Storage size: Set to small

      • FPGA Medium Tier: Set to Standard

      • FPGA Medium Resources: Set to FPGA Medium

Once all parameters have been set, click Start analysis

Monitoring analysis run status

Click on the run to view more information about it. The various tabs under a given run provide additonal context regarding the status of the completed run.

If you encounter a failed run, you can find more information in the Projects > your_project > Flow > Analyses > your_analysis > Details tab and on the execution report tab.

Analysis run logs can be found on the Steps tab. Use the sliders next to Stderr and Stdout for more details. Check the box next to "Show technical steps" to view additional log files.

Viewing DRAGEN analysis output

DRAGEN analysis output folders are found on the project's Data page, along with all other data loaded to the project (such as assets from a linked entitled bundle). Output analysis will be grouped into folders, so users can click through the folder structure to explore outputs.

Related Resources

  • DRAGEN Support Site: https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform.html

  • ICA Pricing: https://help.ica.illumina.com/reference/r-pricing

Service Connector

ICA provides a Service Connector, which is a small program that runs on your local machine to sync data between the platform's cloud-hosted data store and your local computer or server. The Service Connector securely uploads data or downloads results using TLS 1.2. In order to do this, the Connector makes 2 connections:

  • A control connection, which the Connector uses to get configuration information from the platform, and to update the platform about its activities

  • A connection towards the storage node, used to transfer the actual data between your local or network storage and your cloud-based ICA storage.

This Connector runs in the background, and configuration is done in the Illumina Connected Analytics (ICA) platform, where you can add upload and download rules to meet the requirements of the current project and any new projects you may create.

The Service Connector looks at any new files and checks their size. As long as the file size is changing, it knows data is still being added to the file and it is not ready for transfer. Only when the file size is stable and does not change anymore will it consider the file to be complete and initiate transfer. Despite this, it is still best practice to not connect the Service Connector to active folders which are used as streaming output for other processes as this can result in incomplete files being transferred when the active processes have extended compute periods in which the file size remains unchanged.

The service connector will handle integrity checking during file transfer, which requires the calculation of hashes on the data. In addition, Transmission speed depends on the available data transfer bandwidth and connection stability. For these reasons, uploading large amounts of data can take considerable time. This can in turn result in temporarily seeing empty folders at the destination location since these are created at the beginning of the transfer process.

Both the CLI and the service connector require x86 architecture. For ARM-based architecture on Mac or Windows, you need to keep x86 emulation enabled. Linux does not support x86 emulation.

Creating a New Connector

  1. Select Projects > your_project > Project Settings > Connectivity > Service Connectors.

  2. Select + Create.

  3. Fill out the fields in the New Connector configuration page.

    • Name - Enter the name of the connector.

    • Status - This is automatically updated with the actual status, you do not need to enter anything here.

    • Debug Information Accessible by Illumina (optional) - Illumina support can request connector debugging information to help diagnose issues. For security reasons, support can only collect this data if the option Debug Information Accessible by Illumina is active. You can choose to either proactively enable this when encountering issues to speed up diagnosis or to only activate it when support requests access. You can at any time revoke access again by deselecting the option.

    • Description (optional) - Enter any additional information you want to show for this connector.

    • Mode (required) - Specify if the connector can upload data, download data, both or neither.

    • Operating system (required) - Select your server or computer operating system.

  4. Select Save and download the connector (top right). An initialization key will be displayed in the platform now. Copy this value as it will be needed during installation.

  5. Launch the installer after the download completes and follow the on-screen prompts to complete the installation, including entering the initialization key copied in the previous step. Do not install the connector in the upload folder as this will result in the connector attempting to upload itself and the associated log files.

  • Run the downloaded .exe file. During the installation, the installer will ask for the initialization key. Fill out the initialization key you see in the platform.

  • The installer will create an Illumina Service Connector, register it as a Windows service, and start the service. That means, if you wait for about 60 seconds, and then refresh the screen in the Platform by using the refresh button in the right top corner of the page, the connector should display as connected.

  • You can only install 1 connector on Windows. If for some reason, you need to install a new one, first uninstall the old one. You only need to do this when there is a problem with your existing connector. Upgrading a connector is also possible. To do this, you don’t need to uninstall the old one first.

  • Double click the downloaded .dmg file. Double click Illumina Service Connector in the window that opens to start the installer. Run through the installer, and fill out the initialization key when asked for it.

  • To start the connector once installed or after a reboot, open the app. You can find the app on the location where you installed it. The connector icon will appear in your dock when the app is running.

  • In the platform on the Connectivity page, you can check whether your local connector has been connected with the platform. This can take 60 seconds after you started your connector locally, and you may need to refresh the Connectivity page using the refresh button in the top right corner of the page to see the latest status of your connector.

  • The connector app needs to be closed to shut down your computer. You can do this from within your dock.

  • Installations require Java 11 or later. You can check this with ‘java –version’ from a command line terminal. With Java installed, you can run the installer from the command line using the command bash illumina_unix_develop.sh.

  • Depending on whether you have an X server running or not, it will display a UI, or follow a command line installation procedure. You can force a command line installation by adding a –c flag: bash illumina_unix_develop.sh -c.

  • The connector can be started by running ./illuminaserviceconnector start from the folder in which the connector was installed.

Connector Rules

In the upload and download rules, you define different properties when setting up a connector. A connector can be used by multiple projects and a connector can have multiple upload and download rules. Configuration can be changed anytime. Changes to the configuration will be applied approximately 60 seconds after changes are made in ICA if the connector is already connected. If the connector is not already started when configuration changes are made in ICA, it will take about 60 seconds after the connector is started for the configuration changes to be propagated to the connector. The following are the different properties you can configure when setting up a connector. After adding a rule and installing the connector, you can use the Active checkbox to disable rules.

Below is an example of a new connector setup with an Upload Rule to find all files ending with .tar or .tar.gz located within the local folder C:\Users\username\data\docker-images.

Upload Rules

An upload rule tells the connector which folder on your local disk it needs to watch for new files to upload. The connector contacts the platform every minute to pick up changes to upload rules. To configure upload rules for different projects, first switch into the desired project and select Connectivity. Choose the connector from the list and select Click to add a new upload rule and define the rule. The project field will be automatically filled with the project you are currently within.

Download Rules

When you schedule downloads in the platform, you can choose which connector needs to download the files. That connector needs some way to know how and where it needs to download your files. That’s what a download rule is for. The connector contacts the platform every minute to pick up changes to download rules. The following are the different download rule settings.

Shared Drives

When you set up your connector for the first time, and your sample files are located on a shared drive, it’s best to create a folder on your local disk, put one of the sample files in there, and do the connector setup with that folder. When this works, try to configure the shared drive.

Transfer to and from a shared drive may be quite slow. That means it can take up to 30 minutes after you configured a shared drive before uploads start. This is due to the integrity check the connector does for each file before it starts uploading. The connector can upload from or download to a shared drive, but there are a few conditions:

  • The drive needs to be mounted locally. X:\illuminaupload or /Volumes/shareddrive/illuminaupload will work, \\shareddrive\illuminaupload or smb://shareddrive/illuminaupload will not.

  • The user running the connector must have access to the shared drive without a password being requested.

  • The user who runs the Illumina Service Connector process on the Linux machine needs to have read, write and execute permissions on the installation folder.

Update connector to newer version

Illumina might release new versions of the Service Connector, with improvements and/or bug fixes. You can easily download a new version of the Connector with the Download button on the Connectivity screen in the platform. After you downloaded the new installer, run it and choose the option ‘Yes, update the existing installation’.

Uninstall a connector

To uninstall the connector, perform one of the following:

  • Windows and Linux: Run the uninstaller located in the folder where the connector was installed.

  • Mac: Move the Illumina Service Connector to your Trash folder.

Log files

The Connector has a log file containing technical information about what’s happening. When something doesn’t work, it often contains clues to why it doesn’t work. Interpreting this log file is not always easy, but it can help the support team to give a fast answer on what is wrong, so it is suggested to attach it to your email when you have upload or download problems. You can find this log file at the following location:

\<Installation Folder>\logs\BSC.out Default: C:\Program Files (x86)\illumina\logs\BSC.out

/<Installation Directory>/Illumina Service Connector.app/Contents/java/app/logs/BSC.out

Default: /Applications/Illumina Service Connector.app/Contents/java/app/logs/BSC.out

/<Installation Directory>/logs/BSC.out

Default: /usr/local/illumina

Common issues

Please see the for all content related to Cloud Analysis Auto-Launch:

This tutorial references the example in the Nextflow documentation.

The first step in creating a pipeline is to create a project. For instructions on creating a project, see the page. In this tutorial, we'll use a project called "Getting Started".

Next we'll add the Nextflow pipeline definition. The pipeline we're creating is a modified version of the example from the Nextflow documentation. Modifications to the pipeline definition from the nextflow documentation include:

Resources: For each process, you can use the and to set the . ICA will then determine the best matching compute type based on those settings. Suppose you set memory '10240 GB' and cpus 6, then ICA will determine you need standard-large ICA Compute Type.

Before we launch the pipeline, we'll need to upload a FASTA file to use as input. In this tutorial, we'll use a public FASTA file from the . Download the file and unzip to decompress the FASTA file.

In this tutorial, we will demonstrate how to create and launch a simple DRAGEN pipeline using the Nextflow language in ICA GUI. More information about Nextflow on ICA can be found . For this example, we will implement the alignment and variant calling example from this for Paired-End FASTQ Inputs.

As of DRAGEN version 4.3.13, DRAGEN pipelines is no longer possible because of proprietary code.

The first step in creating a pipeline is to select a project for the pipeline to reside in. If the project doesn't exist, create a project. For instructions on creating a project, see the page. In this tutorial, we'll use a project called Getting Started.

To specify a compute type for a Nextflow process, use the directive within each process.

Outputs for Nextflow pipelines are uploaded from the out folder in the attached shared filesystem. The directive specifies the output folder for a given process. Only data moved to the out folder using the publishDir directive will be uploaded to the ICA project after the pipeline finishes executing.

Refer to the for details on ICA specific attributes within the Nextflow definition.

Next, create the input form used for the pipeline. This is done through the XML CONFIGURATION tab. More information on the specifications for the input form can be found in page.

This is an to help develop Nextflow pipelines that will run successfully on ICA. There are some syntax bugs that may get introduced in your Nextflow code. One suggestion is to run the steps as described below and then open these files in VisualStudio Code with the Nextflow plugin installed. You may also need to run smoke tests on your code to identify syntax errors you might not catch upon first glance.

Some examples of Nextflow pipelines that have been lifted over with this repo can be found .

Some additional examples of ICA-ported Nextflow pipelines are .

Relaunch pipeline analysis and

Monitor your analysis run in ICA and troubleshoot

Wrap a WDL-based workflow in a

Wrap a Nextflow-based workflow in a

This will allow you to test your main.nf script. If you have a Nextflow pipeline that is more nf-core like (i.e. where you may have several subworkflow and module files), this may be more appropriate. Any and all comments are welcome.

Generates parameter XML file based on nextflow_schema.json, nextflow.config, conf/ `- Take a look at to understand a bit more of what's done with the XML, as you may want to make further edits to this file for better usability

A table of instance types and the associated CPU + Memory specs can be found under a table named Compute Types

These scripts have been made to be compatible with workflows, so you may find the concepts from the documentation here a better starting point.

Next, you'll need an API key file for ICA that can be generated using the instructions .

Finally, you'll need to create a project in ICA. You can do this via the CLI and API, but you should be able to follow these to create a project via the ICA GUI.

Install ICA CLI by following these .

A table of all CLI releases for mac, linux, and windows can be found .

Step 1: Generate an XML file from nf-core pipeline (your pipeline has a )

Relaunch pipeline analysis and .

In this tutorial, we will be using the example RNASeq pipeline to demonstrate the process of lifting a simple Nextflow pipeline over to ICA.

Copy and paste the into the Nextflow files > main.nf tab. The following comparison highlights the differences between the original file and the version for deployment in ICA. The main difference is the explicit specification of containers and pods within processes. Additionally, some channels' specification are modified, and a debugging message is added. When copying and pasting, be sure to remove the text highlighted in red (marked with -) and add the text highlighted in green (marked with +).

Nextflow offers support for Scatter-gather pattern natively. The initial uses this pattern by splitting the FASTA file into chunks to channel records in the task splitSequences, then by processing these chunks in the task reverse.

.

While outside of any Project go to Tool Repository and Select New Tool. Fill the mandatory fields (Name and Version) and click on the Search Icon to look for a Docker image to link to the tool. You must double-click on the image row to confirm the selection. Tool creation in ICA adheres to the .

Using this command all the files starting with VariantCaller- will be downloaded (prerequisite: a tool is installed on the machine):

For more information on how to use pagination, please refer to

3.) Once you have selected the Graphical option, you will see a page with multiple tabs. The first tab is the Information page where you enter pipeline information. You can find the details for different fields in the tab in the . The following three fields are required for the INFORMATION page.

5.) The Definition tab is used to define the pipeline. When using graphical mode for the pipeline definition, the Definition tab provides options for configuring the pipeline using a visualization panel (A) and a list of component menus (B). You can find details on each section in the component menu

Tier lets you select Standard or Economy tier for AWS instances. Standard is on-demand ec2 instance and Economy is spot ec2 instance. You can find the difference between the two AWS instances . You can find the price difference between the two Tiers .

In bioinformatics and computational biology, the vast and growing amount of data necessitates methods and tools that can process and analyze data in parallel. This demand gave birth to the scatter-gather approach, an essential pattern in creating pipelines that offers efficient data handling and parallel processing capabilities. In this tutorial, we will demonstrate how to create a CWL pipeline utilizing the scatter-gather approach. To this purpose, we will use two widely known tools: and . Given the functionalities of both fastp and multiqc, their combination in a scatter-gather pipeline is incredibly useful. Individual datasets can be scattered across resources for parallel preprocessing with fastp. Subsequently, the outputs from each of these parallel tasks can be gathered and fed into multiqc, generating a consolidated quality report. This workflow not only accelerates the preprocessing of large datasets but also offers an aggregated perspective on data quality, ensuring that subsequent analyses are built upon a robust foundation.

First, we create the two tools: fastp and multiqc. For this, we need the corresponding Docker images and CWL tool definitions. Please, look up this of our help sites to learn more how to import a tool into ICA. In a nutshell, once the CWL tool definition is pasted into the editor, the other tabs for editing the tool will be populated. To complete the tool, the user needs to select the corresponding Docker image and to provide a tool version (could be any string).

For this demo, we will use the publicly available Docker images: quay.io/biocontainers/fastp:0.20.0--hdbcaa40_0 for fastp and docker.io/ewels/multiqc:v1.15 for multiqc. In this one can find how to import publicly available Docker images into ICA.

We will describe the second way in more detail. The tool will be based on public python Docker docker.io/python:3.10 and have the following definition. In this tool we are providing the Python script spread_script.py via Dirent .

Base is a genomics data aggregation and knowledge management solution suite. It is a secure and scalable integrated genomics data analysis solution which provides information management and knowledge mining. Refer to the for more details.

If you don't already have a project, please follow the instructions in the to create a project.

A tab delimited gene expression file (). Example format:

Please refer to for installing ICA CLI.

Following are the two CWL tools and workflow scripts we will use in the project. If you are new to CWL, please refer to the cwl for a better understanding of CWL codes. You will also need the cwltool installed to create these tools and workflows. You can find installation instructions on the CWL page.

To add a custom or public docker image to the ICA repository, please refer to the .

Before you can use ICA CLI, you will need to authenticate using the Illumina API key. Please follow to authenticate.

Upload data to the project using the "icav2 projectdata upload" command. Please refer to the for advanced data upload features. For this test, we will use a small FASTQ file test.fastq containing the following reads.

Please refer to for installing ICA CLI. To authenticate, please follow the steps in the page.

In this tutorial, we will create in ICA. The workflow includes four processes: index creation, quantification, FastQC, and MultiQC. We will also upload a Docker container to the ICA Docker repository for use within the workflow.

We need a Docker container consisting of these tools. You can refer to the section in the help page to build your own docker image with the required tools. For the sake of this tutorial, we will use the container from the

With in your computer, download the image required for this project using the following command.

You can add a pod directive within a process or in the config file to specify a compute type. The following is an example of a configuration file with the 'standard-small' compute type for all processes. Please refer to the page for a list of available compute types.

The parameters file defines the workflow input parameters. Refer to the for detailed information for creating correctly formatted parameters files.

You can refer to page to explore options to automate this process.

Refere to for details on running the pipeline from CLI.

Please refer to command help (icav2 [command] --help) to determine available flags to filter output of above commands if necessary. You can also refer to page for available flags for the icav2 commands.

For more help on uploading data to ICA, please refer to the page.

the execution

.

Another option to upload data to ICA is via . This option is helpful where data needs to be transferred via automated scripts. You can use the following two endpoints to upload a file to ICA.

Post - with the following body which will create a partial file at the desired location and return a dataId for the file to be uploaded. {projectId} is the the project id for the destination project. You can find the projectId in yout projects details page (Project > Details > URN > urn:ilmn:ica:project:projectId#MyProject).

Post - where dataId is the dataId from the response of the previous call. This call will generate the URL that you can use to upload the file.

Create data in the project by making the API call below. If you don't already have the API-Key, refer to the instructions on the for guidance on generating one.

ICA allows you to directly upload/download data from ICA using . It is especially helpful when dealing with an unstable internet connection to upload or download a large amount of data. If the transfer gets interrupted midway, you can employ the sync command to resume the transfer from the point it was stopped.

To connect to ICA storage, you must first download and install AWS CLI on your local system. You will need temporary credentials to AWS CLI to access ICA storage. You can generate temporary credentials through the ICA CLI, which can be used to authenticate AWS CLI against ICA. The temporary credentials can be obtained using

Notifications are used to subscribe to events in the platform and trigger the delivery of a message to an external delivery target. You can read more . Important: In order to allow the platform to deliver events to Amazon SQS or SNS delivery targets, a cross-account policy needs to be added to the target Amazon service.

On this there is a list of all available API endpoints for ICA. To use it, the API-Key from the Illumina ICA portal.

See the for details about the JupyterLab docker image provided by Illumina.

Snowflake Python API documentation can be found

The input parameter names such as Reference and Tumor_FASTQ_Files in the example below are from the pipeline definition where you can give the parameters a name. You can see which of these were used when the pipeline originally ran, in the section above. You can also look at the pipeline definitions for the input parameters, for example the code value of .

icav2 allows project data to be mounted on a local system. This feature is currently available on Linux and Mac systems only. Although not supported, users have successfully used Windows Subsystem for Linux (WSL) on Windows to use icav2 projectdata mount command. Please refer to the for installing WSL.

icav2 (>=2.3.0) and in the local system.

For MAC refer to .

A project created on ICA v2 with data in it. If you don't already have a project, please follow the instructions to create a project.

icav2 utilizes the FUSE driver to mount project data, providing both read and write capabilities. However, there are some limitations on the write capabilities that are enforced by the underlying AWS S3 storage. For more information, please refer to .

In the below, select the matching operating system in the link column for the version you want to install. This will download the installer for that operating system.

OS
Command
OS
Command
Version
Link
Checksum

Note: To access release history of CLI versions prior to v2.0.0, please see the ICA v1 documentation .

This tutorial assumes you already have an existing project in ICA. To create a new project, please see instructions in the page.

Additionally, you will need the DRAGEN Demo Bundle linked to your existing ICA project. The DRAGEN Demo Bundle is an an provided by Illumina with all standard ICA subscriptions and includes DRAGEN pipelines, references, and demo data.

For general steps on creating and linking bundles to your project, see the page. This tutorial explores the DRAGEN Germline Published Pipeline, so we will need to link the DRAGEN Demo Bundle to our existing project.

You can monitor the status of analysis pipeline runs from the Flow > Analysis page in your project. See for more details.

Click the refresh button () in upper right corner of the ICA environment page to update the status.

Add any upload or download rules. See below.

Field
Description
Field
Description
Operating system
Issue
Solution
General issues
Solution
Illumina Connected Software site
Cloud Analysis Auto-Launch Guidance
Sequencer Auto-launch Analyses Compatibility
Sample Sheet v2 Guidance
NovaSeqX: BCL Convert Auto-launch Analysis in Cloud
Guided Example
NovaSeq 6000: BCL Convert Auto-launch Analysis in Cloud Guided Example
Basic pipeline
Projects
Basic pipeline
memory directive
cpus directive
Compute Types
UCSC Genome Browser
chr1_GL383518v1_alt.fa.gz
here
DRAGEN support page
Projects
pod
publishDir
ICA help page
Input Form
Illumina's DRAGEN Documentation
Nextflow's official documentation
unofficial developer tool
here
here
here
here
here
CWL wrapper
CWL wrapper
naive wrapper
script
this
here
nf-core
here
instructions
installation instructions
here
nextflow_schema.json
here
here
todos
nextflow.io
RNASeq Nextflow pipeline
example
cwl standard
jq
Cursor- versus Offset-based Pagination
GitBook
here
here
here
fastp
multiqc
part
tutorial
feature
Base documentation
Project documentation
sampleX.final.count.tsv
these instructions
user guide
github
Docker Repository
these instructions
Data page
these instructions
Authentication
Simple RNA-Seq workflow
"Build and push to ICA your own Docker image"
original tutorial
Docker installed
Compute Types
help page
Nextflow: Pipeline Lift
Launch Pipelines on CLI
Command Index
Data Transfer options
ICA - Nextflow
ICA - Nextflow: Pipeline Lift
ICA - Launch Pipelines on CLI
ICA - Data Transfers
cloning
Import any nf-core pipeline from their public repository.
Run the pipeline in Bench.
monitor
Deploy pipeline as an ICA Flow pipeline
Launch Flow validation test from Bench.
> icav2 projects list #note the project-name/id.
> icav2 projects enter <project-name/id> # set the project context
> icav2 projectdata upload <localFileFolder> <remote-path> # upload localFileFolder to remote-path

#Example:
> icav2 projects enter demo
> icav2 projectdata upload localFolder /uploads/
> icav2 projectdata list # note the data-id
> icav2 projectdata download <data-id> # download the data.
 {
 "name": "string",
 "folderId": "string",
 "folderPath": "string",
 "formatCode": "string",
 "dataType": "FILE"
 }
 {
 "url": "string"
 }
 curl -X 'POST' \
 'https://ica.illumina.com/ica/rest/api/projects/41d3643a-5fd2-4ae3-b7cf-b89b892228be/data' \
 -H 'accept: application/vnd.illumina.v3+json' \
 -H 'X-API-Key: XXXXXXXXXXXXXXXX' \
 -H 'Content-Type: application/vnd.illumina.v3+json' \
 -d '{
 "name": "tempFile.txt",
 "folderId": "fol.579eda846f1b4f6e2d1e08db91408069",
 "dataType": "FILE"
 }'
{
"data": {
    "id": "fil.b13c782a67e24d364e0f08db9f537987",
    "urn": "string",
    "details": {
    "timeCreated": "2023-08-22T19:27:31.286Z",
    "timeModified": "2023-08-22T19:27:31.286Z",
    "creatorId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "tenantId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "tenantName": "string",
    "owningProjectId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "owningProjectName": "string",
    "name": "string",
    "path": "string",
    "fileSizeInBytes": 0,
    "status": "PARTIAL",
    "tags": {
        "technicalTags": [
        "string"
        ],
        "userTags": [
        "string"
        ],
        "connectorTags": [
        "string"
        ],
        "runInTags": [
        "string"
        ],
        "runOutTags": [
        "string"
        ],
        "referenceTags": [
        "string"
        ]
    },
    "format": {
        "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "timeCreated": "2023-08-22T19:27:31.286Z",
        "timeModified": "2023-08-22T19:27:31.286Z",
        "ownerId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "tenantId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "tenantName": "string",
        "code": "string",
        "description": "string",
        "mimeType": "string"
    },
    "dataType": "FILE",
    "objectETag": "string",
    "storedForTheFirstTimeAt": "2023-08-22T19:27:31.286Z",
    "region": {
        "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "timeCreated": "2023-08-22T19:27:31.286Z",
        "timeModified": "2023-08-22T19:27:31.286Z",
        "ownerId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "tenantId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "tenantName": "string",
        "code": "string",
        "country": {
        "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "timeCreated": "2023-08-22T19:27:31.286Z",
        "timeModified": "2023-08-22T19:27:31.286Z",
        "ownerId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "tenantId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "tenantName": "string",
        "code": "string",
        "name": "string",
        "region": "string"
        },
        "cityName": "string"
    },
    "willBeArchivedAt": "2023-08-22T19:27:31.286Z",
    "willBeDeletedAt": "2023-08-22T19:27:31.286Z",
    "sequencingRun": {
        "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "instrumentRunId": "string",
        "name": "string"
    }
    }
},
"projectId": "3fa85f64-5717-4562-b3fc-2c963f66afa6"
}
curl -X 'POST' \
'https://ica.illumina.com/ica/rest/api/projects/41d3643a-5fd2-4ae3-b7cf-b89b892228be/data/fil.b13c782a67e24d364e0f08db9f537987:createUploadUrl' \
-H 'accept: application/vnd.illumina.v3+json' \
-H 'X-API-Key: XXXXXXXXXX' \
-d ''
{
"url": "string"
}
curl --upload-file tempFile.txt "url"
> icav2 projectdata temporarycredentials --help
This command fetches  temporal AWS and Rclone credentials for a given project-data. If path is given, project id from the flag --project-id is used. If flag not present project is taken from the context

Usage:
    icav2 projectdata temporarycredentials [path or data Id] [flags]

Flags:
-h, --help                help for temporarycredentials
    --project-id string   project ID to set current project context

Global Flags:
-t, --access-token string    JWT used to call rest service
-o, --output-format string   output format (default "table")
-s, --server-url string      server url to direct commands
-k, --x-api-key string       api key used to call rest service
> icav2 projectdata temporarycredentials /cli-upload/
awsTempCredentials.accessKey                     XXXXXXXXXX
awsTempCredentials.bucket                        stratus-gds-use1
awsTempCredentials.objectPrefix                  XXXXXX/cli-upload/
awsTempCredentials.region                        us-east-1
awsTempCredentials.secretKey                     XXXXXXXX
awsTempCredentials.serverSideEncryptionAlgorithm AES256
awsTempCredentials.sessionToken                  XXXXXXXXXXXXXXXX
[profile]
aws_access_key_id=AKIAIOSFODNN7EXAMPLE
aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
aws_session_token = IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZVERYLONGSTRINGEXAMPLE
#Copy single file to ICA
> aws s3 cp cp1 s3://stratus-gds-use1/53395234-6b20-4fb1-3587-08db9144d245/cli-upload/

#Sync local folder to ICA
> aws s3 sync cli-upload s3://stratus-gds-use1/53395234-6b20-4fb1-3587-08db9144d245/cli-upload
[s3-config]
type = s3
provider = AWS
env_auth = false
access_key_id = XXXXXXXXXX
secret_access_key = XXXXXXX
region = us-east-1
acl = private
session_token = XXXXXXXX
#Copy single file to ICA
> rclone copy file.txt s3-config:stratus-gds-use1/53395234-6b20-4fb1-3587-08db9144d245/cli-upload/
#Sync local folder to ICA
> rclone sync cli-upload s3-config:stratus-gds-use1/53395234-6b20-4fb1-3587-08db9144d245/cli-upload/ 
        {
        "Version":"2012-10-17",
        "Statement":[
            {
                "Effect":"Allow",
                "Principal":{
                    "AWS":"arn:aws:iam::079623148045:root"
                },
                "Action":"SNS:Publish",
                "Resource": "*arn*"
            }
        ]
        }
        {
            "id": "6375eb43-e865-4d7c-a9e2-2c153c998a5c",
            "allowedSlots": -1,
            "usedSlots": 0,
            "movedSlots": 0,
            "originalSlots": -1,
            "pipelineBundle": {
                "id": "b4f2840c-4f79-44db-9e1c-5e7339a1b507",
                "name": "ICA_Ent-DE_Pipeline_Entitlement",
                "maxNumberOfAllowedSlots": -1,
                "activePipelines": [],
                "canceledPipelines": [],
                "retiredPipelines": [],
                "regions": [
                    {
                    }
                ],
                "analysisStorages": [
                    {},
                    {},
                    {
                        "id": "6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0",
                        "timeCreated": "2021-11-05T10:28:20Z",
                        "timeModified": "2021-11-05T10:28:20Z",
                        "ownerId": "8ec463f6-1acb-341b-b321-043c39d8716a",
                        "tenantId": "f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3",
                        "tenantName": "ica-cp-admin",
                        "name": "Small",
                        "description": "1.2 TB"
                    }
                ]
            },
            "usages": []
        }
        import json
        import requests

        import string
        import random

        import json

        def lambda_handler(event, context):
            
            message = json.loads(event['Records'][0]['Sns']['Message'])
            
            project_id = message['projectId']
            analysis_id = message['payload']['id']
            
            url = 'https://ica.illumina.com/ica/rest/api/projects/' + \
            project_id + '/analyses/' + analysis_id + '/outputs'
            headers = {
                'X-API-Key': '${API-KEY}',
                'accept': 'application/vnd.illumina.v3+json'
            }

            response = requests.get(url, headers=headers)
            json_data_slice = response.json()['items'][0]['data'][0]['children']
            
            for json_obj in json_data_slice:
                if json_obj.get('name') == 'test.txt':
                    file_id = json_obj['dataId']                   

            # some variables
            
            activation_code_detail_id = '6375eb43-e865-4d7c-a9e2-2c153c998a5c'
            analysis_storage_id = "6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0"
            pipeline_id = "fd540bf8-67f1-4506-99e9-c89cc9a98fdd"
            user_reference = 'CallLambda' + ''.join(random.choices(string.ascii_uppercase + \
            string.digits, k=6))
            
            payload = {
            "userReference": user_reference,
            "pipelineId": pipeline_id,
            "tags": {
                "technicalTags": [],
                "userTags": []
            },
            "activationCodeDetailId": activation_code_detail_id,
            "analysisStorageId": analysis_storage_id,
            "analysisInput": {
                "inputs": [
                {
                    "parameterCode": "file",
                    "dataIds": [
                    file_id
                    ]
                }
                ]
            }
            }
                 
            url_pipeline2 = 'https://ica.illumina.com/ica/rest/api/projects/' + \ 
            project_id + '/analysis:nextflow'
            headers_pipeline2 = {
                'X-API-Key': '${API-KEY}',
                'accept': 'application/vnd.illumina.v3+json',
                'Content-Type': 'application/vnd.illumina.v3+json'
            }
            
            response_start = requests.post(url_pipeline2, data=json.dumps(payload), headers=headers_pipeline2)

            # Check the response status code
            if response_start.status_code == 201:
                # POST request successful
                response_data = response_start.json()
                return {
                    'statusCode': 201,
                    'body': response_data
                }
            else:
                # POST request failed
                return {
                    'statusCode': response_start.status_code,
                    'body': 'Error: Failed'
                }
# Wrapper modules
import icav2
from icav2.api import project_data_api
from icav2.model.problem import Problem
from icav2.model.project_data import ProjectData

# Helper modules
import random
import os
import requests
import string
import hashlib
import getpass
# Authenticate using User credentials
username = input("ICA Username")
password = getpass.getpass("ICA Password")
tenant = input("ICA Tenant name")
url = os.environ['ICA_URL'] + '/rest/api/tokens'
r = requests.post(url, data={}, auth=(username,password),params={'tenant':tenant})
token = None
apiClient = None
if r.status_code == 200:
    token = r.content
    configuration = icav2.Configuration(
        host = os.environ['ICA_URL'] + '/rest',
        access_token = str(r.json()["token"])
        )
    apiClient = icav2.ApiClient(configuration, header_name="Content-Type",header_value="application/vnd.illumina.v3+json")
    print("Authenticated to %s" % str(os.environ['ICA_URL']))
else:
    print("Error authenticating to %s" % str(os.environ['ICA_URL']))
    print("Response: %s" % str(r.status_code))

## Authenticate using ICA API TOKEN
configuration = icav2.Configuration(
    host = os.environ['ICA_URL'] + '/rest'
)
configuration.api_key['ApiKeyAuth'] = getpass.getpass()
apiClient = icav2.ApiClient(configuration, header_name="Content-Type",header_value="application/vnd.illumina.v3+json")
# Retrieve project ID from the Bench workspace environment
projectId = os.environ['ICA_PROJECT']
# Create a Project Data API client instance
projectDataApiInstance = project_data_api.ProjectDataApi(apiClient)
# List all data in a project
pageOffset = 0
pageSize = 30
try:
    projectDataPagedList = projectDataApiInstance.get_project_data_list(project_id = projectId, page_size = str(pageSize), page_offset = str(pageOffset))
    totalRecords = projectDataPagedList.total_item_count
    while pageOffset*pageSize < totalRecords:
        for projectData in projectDataPagedList.items:
            print("Path: "+projectData.data.details.path + " - Type: "+projectData.data.details.data_type)
        pageOffset = pageOffset + 1
except icav2.ApiException as e:
    print("Exception when calling ProjectDataAPIApi->get_project_data_list: %s\n" % e)
# Create data element in a project
data = icav2.model.create_data.CreateData(name="test.txt",data_type = "FILE")

try:
    projectData = projectDataApiInstance.create_data_in_project(projectId, create_data=data)
    fileId = projectData.data.id
except icav2.ApiException as e:
    print("Exception when calling ProjectDataAPIApi->create_data_in_project: %s\n" % e)
## Upload a local file to a data element in a project
# Create a local file in a Bench workspace
filename = '/tmp/'+''.join(random.choice(string.ascii_lowercase) for i in range(10))+".txt"
content = ''.join(random.choice(string.ascii_lowercase) for i in range(100))
f = open(filename, "a")
f.write(content)
f.close()

# Calculate MD5 hash (optional)
localFileHash = md5Hash = hashlib.md5((open(filename, 'rb').read())).hexdigest()

try:
    # Get Upload URL
    upload = projectDataApiInstance.create_upload_url_for_data(project_id = projectId, data_id = fileId)
    # Upload dummy file
    files = {'file': open(filename, 'r')}
    data = open(filename, 'r').read()
    r = requests.put(upload.url , data=data)
except icav2.ApiException as e:
    print("Exception when calling ProjectDataAPIApi->create_upload_url_for_data: %s\n" % e)

# Delete local dummy file
os.remove(filename)
## Download a data element from a project
try:
    # Get Download URL 
    download = projectDataApiInstance.create_download_url_for_data(project_id=projectId, data_id=fileId)

    # Download file
    filename = '/tmp/'+''.join(random.choice(string.ascii_lowercase) for i in range(10))+".txt"
    r = requests.get(download.url)
    open(filename, 'wb').write(r.content)

    # Verify md5 hash
    remoteFileHash = hashlib.md5((open(filename, 'rb').read())).hexdigest()
    if localFileHash != remoteFileHash:
        print("Error: MD5 mismatch")

    # Delete local dummy file
    os.remove(filename)
except icav2.ApiException as e:
    print("Exception when calling ProjectDataAPIApi->create_download_url_for_data: %s\n" % e)
# Search for matching data elements in a project
try:
    projectDataPagedList = projectDataApiInstance.get_project_data_list(project_id = projectId, full_text="test.txt")
    for projectData in projectDataPagedList.items:
        print("Path: " + projectData.data.details.path + " - Name: "+projectData.data.id + " - Type: "+projectData.data.details.data_type)
except icav2.ApiException as e:
    print("Exception when calling ProjectDataAPIApi->get_project_data_list: %s\n" % e)
# Delete matching data elements in a project
try:
    projectDataPagedList = projectDataApiInstance.get_project_data_list(project_id = projectId, full_text="test.txt")
    for projectData in projectDataPagedList.items:
        print("Deleting file "+projectData.data.details.path)  
        projectDataApiInstance.delete_data(project_id = projectId, data_id = projectData.data.id)
except icav2.ApiException as e:
    print("Exception %s\n" % e)
# API modules
import icav2
from icav2.api import project_base_api
from icav2.model.problem import Problem
from icav2.model.base_connection import BaseConnection

# Helper modules
import os
import requests
import getpass
import snowflake.connector
# Retrieve project ID from the Bench workspace environment
projectId = os.environ['ICA_PROJECT']
# Create a Project Base API client instance
projectBaseApiInstance = project_base_api.ProjectBaseApi(apiClient)
# Get a Base Access Token
try:
    baseConnection = projectBaseApiInstance.create_base_connection_details(project_id = projectId)
except icav2.ApiException as e:
    print("Exception when calling ProjectBaseAPIApi->create_base_connection_details: %s\n" % e)
## Create a python jdbc connection
ctx = snowflake.connector.connect(
    account=os.environ["ICA_SNOWFLAKE_ACCOUNT"],
    authenticator=baseConnection.authenticator,
    token=baseConnection.access_token, 
    database=os.environ["ICA_SNOWFLAKE_DATABASE"],
    role=baseConnection.role_name,
    warehouse=baseConnection.warehouse_name
)
ctx.cursor().execute("USE "+os.environ["ICA_SNOWFLAKE_DATABASE"])
## Create a Table
tableName = "test_table"
ctx.cursor().execute("CREATE OR REPLACE TABLE " + tableName + "(col1 integer, col2 string)")
## Insert data into a table
ctx.cursor().execute(
        "INSERT INTO " + tableName + "(col1, col2) VALUES " + 
        "    (123, 'test string1'), " + 
        "    (456, 'test string2')")
## Query the table
cur = ctx.cursor()
try:
    cur.execute("SELECT * FROM "+tableName)
    for (col1, col2) in cur:
        print('{0}, {1}'.format(col1, col2))
finally:
    cur.close()
# Delete the table
ctx.cursor().execute("DROP TABLE " + tableName);
icav2 projects list
ID                                      NAME         OWNER
a5690b16-a739-4bd7-a62a-dc4dc5c5de6c    Project1     670fd8ea-2ddb-377d-bd8b-587e7781f2b5
ccb0667b-5949-489a-8902-692ef2f31827    Project2     f1aa8430-7058-4f6c-a726-b75ddf6252eb
No of items :  2
icav2 projects enter a5690b16-a739-4bd7-a62a-dc4dc5c5de6c
icav2 projectpipelines list
ID                                      CODE					DESCRIPTION      
fbd6f3c3-cb70-4b35-8f57-372dce2aaf98    DRAGEN Somatic 3.9.5			The DRAGEN Somatic tool identifies somatic variants
b4dc6b91-5283-41f6-8095-62a5320ed092    DRAGEN Somatic Enrichment 3-10-4	The DRAGEN Somatic Enrichment pipeline identifies somatic variants which can exist at low allele frequencies in the tumor sample.
No of items :  2
icav2 projectanalyses list
ID                                      REFERENCE                                       CODE                    STATUS
3539d676-ae99-4e5f-b7e4-0835f207e425    kyle-test-somatic-2-DRAGEN Somatic 3_9_5        DRAGEN Somatic 3.9.5    SUCCEEDED
f11e248e-9944-4cde-9061-c41e70172f20    kyle-test-somatic-1-DRAGEN Somatic 3_9_5        DRAGEN Somatic 3.9.5    FAILED
No of items :  2
icav2 projectanalyses input 3539d676-ae99-4e5f-b7e4-0835f207e425
CODE                                    NAMES                                                                   DATA ID
BED
CNV_B_Allele_VCF
CNV_Population_B_Allele_VCF
HLA_Allele_Frequency_File
HLA_BED
HLA_reference_file_(protein_FASTA)
Microsatellites_File
Normal_BAM_File
Normal_FASTQ_Files
Panel_of_Normals
Panel_of_Normals_TAR
Reference                               hg38_altaware_nohla-cnv-anchored.v8.tar                                 fil.35e27101fdec404fb37d08d9adf63307
Systematic_Noise_BED
Tumor_BAM_File
Tumor_FASTQ_Files                       HCC1187C_S1_L001_R1_001.fastq.gz,HCC1187C_S1_L001_R2_001.fastq.gz       fil.e1ec77f2647f45804fe508d9aecb19c4,fil.d89018f0c7784fc4b76708d9adf63307
<pd:tool code="map_align">
<pd:parameter code="enable_map_align" minValues="0" maxValues="1" classification="USER">
                    <pd:label>Enable Map/Align</pd:label>
                    <pd:description></pd:description>
                    <pd:optionsType>
                        <pd:option>true</pd:option>
                        <pd:option>false</pd:option>
                    </pd:optionsType>
                    <pd:value></pd:value>
curl -X 'POST' \
  'https://ica.illumina.com/ica/rest/api/tokens' \
  -H 'accept: application/vnd.illumina.v3+json' \
  -H 'X-API-Key: <APIKEY>' \
  -d ''
echo -ne 'testemail@testdomain.com:testpassword' | base64
    <BASE64UN+PW>

curl -X 'POST' \
  'https://ica.illumina.com/ica/rest/api/tokens' \
  -H 'accept: application/vnd.illumina.v3+json' \
  -H 'Authorization: Basic <BASE64UN+PW>' \
  -d ''
curl -X 'GET' \
  'https://ica.illumina.com/ica/rest/api/projects/e501a0d5-f5e7-458c-a590-586c79bb87e0/analyses/3539d676-ae99-4e5f-b7e4-0835f207e425/configurations' \
  -H 'accept: application/vnd.illumina.v3+json' \
  -H 'Authorization: Bearer <Token>' \
  -H 'X-API-Key: <APIKEY>'
{
	"items": [{
		"name": "DRAGEN_Somatic__enable_variant_caller",
		"multiValue": false,
		"values": [
			"true"
		]
	}]
}
icav2 projectpipelines start cwl fbd6f3c3-cb70-4b35-8f57-372dce2aaf98 \
--user-reference kyle-test-somatic-9 \
--storage-size small \
--type-input STRUCTURED \
--input Reference:fil.35e27101fdec404fb37d08d9adf63307 \
--input Tumor_FASTQ_Files:fil.e1ec77f2647f45804fe508d9aecb19c4,fil.d89018f0c7784fc4b76708d9adf63307 \
--parameters DRAGEN_Somatic__enable_variant_caller:true \
--parameters DRAGEN_Somatic__enable_hrd:false \
--parameters DRAGEN_Somatic__enable_sv:true \
--parameters DRAGEN_Somatic__output_file_prefix:tumor \
--parameters DRAGEN_Somatic__enable_map_align:true \
--parameters DRAGEN_Somatic__cnv_use_somatic_vc_baf:false \
--parameters DRAGEN_Somatic__enable_cnv:false \
--parameters DRAGEN_Somatic__output_format:BAM \
--parameters DRAGEN_Somatic__vc_emit_ref_confidence:BP_RESOLUTION \
--parameters DRAGEN_Somatic__enable_hla:false \
--parameters DRAGEN_Somatic__enable_map_align_output:true
analysisStorage.description           1.2 TB
analysisStorage.id                    6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
analysisStorage.name                  Small
analysisStorage.ownerId               8ec463f6-1acb-341b-b321-043c39d8716a
analysisStorage.tenantId              f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
analysisStorage.tenantName            ica-cp-admin
analysisStorage.timeCreated           2021-11-05T10:28:20Z
analysisStorage.timeModified          2021-11-05T10:28:20Z
id                                    51abe34a-2506-4ab5-adef-22df621d95d5
ownerId                               47793c21-75a6-3aa8-8147-81b354d0af4d
pipeline.analysisStorage.description  1.2 TB
pipeline.analysisStorage.id           6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
pipeline.analysisStorage.name         Small
pipeline.analysisStorage.ownerId      8ec463f6-1acb-341b-b321-043c39d8716a
pipeline.analysisStorage.tenantId     f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
pipeline.analysisStorage.tenantName   ica-cp-admin
pipeline.analysisStorage.timeCreated  2021-11-05T10:28:20Z
pipeline.analysisStorage.timeModified 2021-11-05T10:28:20Z
pipeline.code                         DRAGEN Somatic 3.9.5
pipeline.description                  The DRAGEN Somatic tool identifies somatic variants which can exist at low allele frequencies in the tumor sample. The pipeline can analyze tumor/normal pairs and tumor-only sequencing data. The normal sample, if present, is used to avoid calls at sites with germline variants or systematic sequencing artifacts. Unlike germline analysis, the somatic platform makes no ploidy assumptions about the tumor sample, allowing sensitive detection of low-frequency alleles.
pipeline.id                           fbd6f3c3-cb70-4b35-8f57-372dce2aaf98
pipeline.language                     CWL
pipeline.ownerId                      e9dd2ff5-c9ba-3293-857e-6546c5503d76
pipeline.tenantId                     55cb0a54-efab-4584-85da-dc6a0197d4c4
pipeline.tenantName                   ilmn-dragen
pipeline.timeCreated                  2021-11-23T22:55:49Z
pipeline.timeModified                 2021-12-09T16:42:14Z
reference                             kyle-test-somatic-9-DRAGEN Somatic 3_9_5-bc56d4b1-f90e-4039-b3a4-b11d29263e4e
status                                REQUESTED
summary
tenantId                              b5b750a6-49d4-49de-9f18-75f4f6a81112
tenantName                            ilmn-cci
timeCreated                           2022-03-16T22:48:31Z
timeModified                          2022-03-16T22:48:31Z
userReference                         kyle-test-somatic-9
400 Bad Request : ICA_API_004 : com.fasterxml.jackson.databind.exc.InvalidFormatException: Cannot deserialize value of type `java.util.UUID` from String "8f57-372dce2aaf98": UUID has to be represented by standard 36-char representation
 at [Source: (io.undertow.servlet.spec.ServletInputStreamImpl); line: 1, column: 983] (through reference chain: com.bluebee.rest.v3.publicapi.dto.analysis.SearchMatchingActivationCodesForCwlAnalysisDto["pipelineId"]) (ref. c9cd9090-4ddb-482a-91b5-8471bff0be58)
404 Not Found : ICA_GNRC_001 : Could not find data with ID [fil.35dec404fb37d08d9adf63307] (ref. 91b70c3c-378c-4de2-acc9-794bf18258ec)
400 Bad Request : ICA_EXEC_007 : The specified variableName [DRAGEN] does not exist. Make sure to use an existing variableName (ref. ab296d4e-9060-412c-a4c9-562c63450022)
icav2 projectpipelines start nextflow b4dc6b91-5283-41f6-8095-62a5320ed092 \
--user-reference "somatic-3-10-test5" \
--project-id e501a0d5-f5e7-458c-a590-586c79bb87e0 \
--storage-size Small \
--input ref_tar:fil.35e27101fdec404fb37d08d9adf63307 \
--input tumor_fastqs:fil.e1ec77f2647f45804fe508d9aecb19c4,fil.d89018f0c7784fc4b76708d9adf63307 \
--parameters enable_map_align:true \
--parameters enable_map_align_output:true \
--parameters output_format:BAM \
--parameters enable_variant_caller:true \
--parameters vc_emit_ref_confidence:BP_RESOLUTION \
--parameters enable_cnv:false \
--parameters enable_sv:true \
--parameters repeat_genotype_enable:true \
--parameters enable_hla:false \
--parameters enable_variant_annotation:false \
--parameters output_file_prefix:Tumor
analysisStorage.description           1.2 TB
analysisStorage.id                    6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
analysisStorage.name                  Small
analysisStorage.ownerId               8ec463f6-1acb-341b-b321-043c39d8716a
analysisStorage.tenantId              f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
analysisStorage.tenantName            ica-cp-admin
analysisStorage.timeCreated           2021-11-05T10:28:20Z
analysisStorage.timeModified          2021-11-05T10:28:20Z
id                                    9b8f9e84-2e7f-4adb-92e5-738b032c2328
ownerId                               47793c21-75a6-3aa8-8147-81b354d0af4d
pipeline.analysisStorage.description  1.2 TB
pipeline.analysisStorage.id           6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
pipeline.analysisStorage.name         Small
pipeline.analysisStorage.ownerId      8ec463f6-1acb-341b-b321-043c39d8716a
pipeline.analysisStorage.tenantId     f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
pipeline.analysisStorage.tenantName   ica-cp-admin
pipeline.analysisStorage.timeCreated  2021-11-05T10:28:20Z
pipeline.analysisStorage.timeModified 2021-11-05T10:28:20Z
pipeline.code                         DRAGEN Somatic Enrichment 3-10-4
pipeline.description                  The DRAGEN Somatic Enrichment pipeline identifies somatic variants which can exist at low allele frequencies in the tumor sample.
pipeline.id                           b4dc6b91-5283-41f6-8095-62a5320ed092
pipeline.language                     NEXTFLOW
pipeline.ownerId                      e9dd2ff5-c9ba-3293-857e-6546c5503d76
pipeline.tenantId                     55cb0a54-efab-4584-85da-dc6a0197d4c4
pipeline.tenantName                   ilmn-dragen
pipeline.timeCreated                  2022-03-07T18:09:38Z
pipeline.timeModified                 2022-03-08T17:47:20Z
reference                             somatic-3-10-test5-DRAGEN Somatic Enrichment 3-10-4-3df131a2-2187-489a-b9f8-140e3ec5efb0
status                                REQUESTED
tenantId                              b5b750a6-49d4-49de-9f18-75f4f6a81112
tenantName                            ilmn-cci
timeCreated                           2022-07-13T22:44:47Z
timeModified                          2022-07-13T22:44:47Z
userReference                         somatic-3-10-test5
% icav2 projects list
ID                                   	NAME                                            	OWNER  
422d5119-708b-4062-b91b-b398a3371eab	demo                                           	b23f3ea6-9a84-3609-bf1d-19f1ea931fa3
% icav2 projectdata mount mnt --project-id 422d5119-708b-4062-b91b-b398a3371eab
% ls mnt
sampleX.final.count.tsv
% icav2 projectdata unmount
Project with identifier 422d5119-708b-4062-b91b-b398a3371eab was unmounted from mnt.

Windows

CertUtil -hashfile ica-windows-amd64.zip SHA256

Linux

sha256sum ica-linux-amd64.zip

Mac

shasum -a 256 ica-darwin-amd64.zip

Windows

CertUtil -hashfile icav2.exe SHA256

Linux

sha256sum icav2

Mac

shasum -a 256 icav2

Name

Name of the upload rule.

Active

Set to true to have this rule be active. This allows you to deactivate rules without deleting them.

Local folder

The folder path on the local machine where files to be uploaded are stored.

File pattern

Files with filenames that match the string/pattern will be uploaded.

Location

The location the data will be uploaded to.

Project

The project the data will be uploaded to.

Description

Additional information about the upload rule.

Assign Format

Select which data format tag the uploaded files will receive. This is used for various things like filtering.

Data owner

The owner of the data after upload.

Name

Name of the download rule.

Active

Set to true to have this rule be active. This allows you to deactivate rules without deleting them.

Order of execution

If using multiple download rules, set the order the rules are performed.

Target Local folder

The folder path on the local machine where the files will be downloaded to.

Description

Additional information about the download rule.

Format

The format the files must comply to in order to be scheduled as downloaded.

Project

The projects the rule applies to.

Windows

Service connector doesn't connect

First, try restarting your computer. If that doesn’t help, open the Services application (By clicking the Windows icon, and then typing services). In there, there should be a service called Illumina Service Connector. • If it doesn’t have status Running, try starting it (right mouse click -> start) • If it has status Running, and still does not connect, you might have a corporate proxy. Proxy configuration is currently not supported for the connector. • If you do not have a corporate proxy, and your connector still doesn’t connect, contact Illumina Technical Support, and include your connector BSC.out log files.

OS X

Service connector doesn't connect

Check whether the Connector is running. If it is, there should be an Illumina icon in your Dock. • If it doesn’t, log out and log back in. An Illumina service connector icon should appear in your dock. • If it still doesn’t, try starting the Connector manually from the Launchpad menu. • If it has status Running, and still does not connect, you might have a corporate proxy. Proxy configuration is currently not supported for the connector. • If you do not have a corporate proxy, and your connector still doesn’t connect, contact Illumina Technical Support, and include your connector BSC.out log files.

Linux

Service connector doesn't connect

Check whether the connector process is running with: ps aux

Linux

Can’t define java version for connector

The connector makes use of java version 8 or 11. If you run the installer and get the following error “Please define INSTALL4J_JAVA_HOME to point to a suitable JVM.”: • When downloading the correct java version from Oracle, there are 2 variables in the script that can be defined (INSTALL4J_JAVA_HOME_OVERRIDE & INSTALL4J_JAVA_PREFIX), but not INSTALL4J_JAVA_HOME, which is printed in the above error message. Instead, export the variable to your env before running the installation script. You can export the variable to your env before running the script, like this: • Note that Java home should not point to the java executable, but to the jre folder. For example: export INSTALL4J_JAVA_HOME_OVERRIDE=/usr/lib/jvm/java-1.8.0-openjdk-amd64 sh illumina _unix_1_13_2_0_35.sh

Linux

Corrupted installation script

If you get the following error message “gzip: sfx_archive.tar.gz: not in gzip format. I am sorry, but the installer file seems to be corrupted. If you downloaded that file please try it again. If you transfer that file with ftp please make sure that you are using binary mode.” : • This indicates the installation script file is corrupted. Editing the shell script will cause it to be corrupt. Please re-download the installation script from ICA.

Linux

Unsupported version error in log file

If the log file gives the following error "Unsupported major.minor version 52.0", an unsupported version of java is present. The connector makes use of java version 8 or 11.

Linux

Manage the connector via the CLI

• Connector installation issues: It may be necessary to first make the connector installation script executable with: chmod +x illumina_unix_develop.sh Once it has been made executable, run the installation script with: bash illumina_unix_develop.sh It may be necessary to run with sudo depending on user permissions on the system: sudo bash illumina_unix_develop.sh If installing on a headless system, use the -c flag to do everythign from the command line: bash illumina_unix_develop.sh -c • Start connector with logging directly to the terminal stdout) (in case log file is not present, likely due to the absence of java version 8 or 11). From within the installation directory run: ./illuminaserviceconnector run • Check status of connector. From within the install location run: ./illuminaserviceconnector status • Stop the connector with: ./illuminaserviceconnector stop • Restart the connector with: ./illuminaserviceconnector restart

2023

2023 December 05 - ICA v2.21.0

Features and Enhancements

  • Flow

    • Analysis logs (task stdout/stderr files) are now written to a folder named ‘ica_logs’ within the analysis output folder

    • Default scratch disk size attached to analysis steps reduced from 2TB to 0B to improve cost and performance of analyses. Pipelines created before ICA v2.21.0 will not be impacted

  • Notifications

    • Notifications can now be updated and deleted in externally managed Projects

  • API

    • Clarified on the Swagger page which sorting options apply to which paging strategy (cursor-based versus offset-based). Changed the default sorting behavior so that:

      • When no paging strategy is specified and no sort is requested, then cursor-based paging is default

      • When no paging strategy is specified and sort is requested, then offset-based paging is default

  • Cohorts

    • Procedure Search Box: Users can now access additional UI functionalities for Procedures

      • Users can now access Procedure codes from OMOP

      • Improved handling of drug codes across all reports, excluding Survival comparison

    • Ingestion

      • Users now have enhanced job warning log and API status improvements

      • Users now require download permissions to facilitate the data ingestion process

    • Fetch Molecular Files: Improved import – Users can now input a folder path and select sample files individually

    • Variant Type Summary: Users can now access a new variants tab that summarizes Variant type statistics per gene

    • Added sorting and filtering capabilities to report tables, such as variants observed in genes

    • Users can now view sample barcodes, replacing internal auto-increment sample IDs in the Structural Variants table within the Genes tab

    • “Search subjects” functionality improved with flexible filtering logic that now supports partial matches against a single input string

Fixed Issues

  • Data Management

    • Fixed an issue with data copy via the CLI where the file was being copied to a subfolder of the intended location instead of the specified folder

    • Resolved an issue where browser upload hangs intermittently when creating data

    • Fixed an issue where the delete popup does not always disappear when deleting data

    • Fixed an issue where GetFolder API call returns 404 error if the Create and Get operations are performed 100ms apart

    • Fixed an issue where file copy would fail if the file was located at the root level of User’s S3 storage bucket

    • Fixed an issue causing data linked from externally managed projects to be incorrectly excluded from the list project data API response

    • Fixed an issue where User cannot use data URNs to identify the destination folder when interacting with copy data API endpoints

    • Bundles: Fixed an issue where clicking the back button before saving a new bundle leads to inconsistencies

  • Flow

    • Fixed an issue where pipeline documentation is not scrollable when launching pipeline

    • Fixed an issue with logfiles of a task not being available for streaming while the task is still running

    • Fixed an issue where using the 're-run' button from the analysis page reverts the storage size selection to default

    • Fixed an inconsistency where the following two endpoints would show different analysis statuses:

      • GET /api/projects/{projectId}/analyses

      • GET /api/projects/{projectId}/analyses/{analysisId}

    • Improved performance issues with UI loading data records when selecting inputs for analysis

    • Fixed a caching issue which resulted in delays when running pipelines

    • Fixed an issue where back button for analysis or pipeline details does not always direct Users back to analysis or pipelines view, respectively

    • Fixed an issue where system performance is degraded when large batches (e.g., 1,000) of data are added as input to Analyses via the graphical UI. It is recommended to start Analyses with large numbers of input files via API

  • Base

    • Fixed an issue where enabling Base from a Base view other than Base Tables returned a warning message

    • Fixed an issue where Base access was not enabled when a bundle with tables is added to a project without Base (Base is automatically enabled so users can see the bundle's tables). However, access to the bundle's tables is revoked upon the deletion of Base, and was not granted again once Base was re-enabled

    • Fixed an issue where a Base job to load data into a table never finished because the file was deleted after the job started and before it finished. Now the job will end up in a Failed state

  • Cohorts

    • Fixed an issue where needle plot filtered out data points reappear when zooming in the exon when a filter is in place

    • Fixed an issue where users from a different tenant who accept a project share may encounter a failure at the final step of the data ingestion process

    • Fixed an issue where users can encounter intermittent errors when browsing and typing for a gene

    • Fixed an issue where the UI hangs on large genes and returns a 502 error

2023 November 9 - ICA v2.20.1

Fixed Issues

  • Data Management

    • Fixed an issue where multiple folder copy jobs with the same destination may get stuck In Progress

    • Fixed an intermittent issue where tags on the target folder for a batch data update call are not set, but are set for all child data

  • Flow

    • Fixed an issue causing intermittent pipeline failures due to an infrastructure error

2023 October 31 - ICA v2.20.0

Features and Enhancements

  • General

    • Navigation: If multiple regions are enabled for the same tenant, the region will be indicated in the waffle menu

    • Logging: Data transfers of BaseSpace Sequence Hub projects with data stored in ICA will be traced in ICA logs

  • Cohorts

    • Disease Search Box: Added support for specifying subjects by age of onset of disease(s)

    • Drug Search Box: Added a new query builder box for Drugs

      • Ingestion: Support for Drug, drug route, etc. attached to subjects

      • Cohorts building: Users can build cohorts by specifying drugs, drug route, etc.

    • Ingestion

      • Combine different variant types during ingestion (small variants, cnv, sv)

      • Cohorts supports Illumina Pisces variant caller for hg19 VCFs

Fixed Issues

  • General

    • Fixed an issue where the graphical UI hands with ha spinning wheel when saving or executing a command

    • Fixed an issue where rich text editor for Documentation tab on Pipelines, Tools, Projects and Bundles does not populate with correct styles in edit mode

  • Data Management

    • Fixed an issue where multiple clicks on create data in Project API endpoint resulted in multiple requests

    • Fixed an issue where the secondary data selection screen could not be resized

    • A spinning wheel icon with ‘copying’ status is displayed at the folder level in the target Project when a folder is being copied. This applies to the actual folder itself and not for folders higher up in the hierarchy

    • Fixed an issue where API to retrieve a project data update batch is failing with 500 error when either the Technical or the User tags are updated during the batch update request

    • Fixed an issue where linking jobs fail to complete if other linking jobs are running

    • Improved performance for data transfer to support BaseSpace Sequence Hub Run transfers

    • Fixed an issue causing some folder copy jobs to remain in "Partially Succeeded" status despite being completed successfullyBundles: Fixed an issue where the URL and Region where a Docker image is available is not displayed for a Docker image Tool shared via an entitled Bundle

    • Fixed an issue where the folder copy job was getting stuck copying large amounts of big files

    • Fixed an issue where the folder counts were not matching expected counts after Data linking

    • Fixed an issue where delete data popup would occasionally not disappear after deleting data.

    • Fixed an issue with data copy where referencing data from another region would not result in immediate failure

    • Fixed issue where uploading a folder using the CLI was not working

    • Fixed an issue where a Docker image shared via an entitled Bundle can be added to another region

  • Workflows

    • Fixed an issue where workflow does not fail if BCL Convert fails for a BCL Convert-only run

  • Flow

    • Improved performance when batches of data up to 1000 are added as input to an Analysis

    • Nextflow engine will return exit code 55 if the pipeline runner task is preempted

    • Fixed an issue where log files cannot be opened for any steps in an analysis while the analysis is in progress

    • Fixed an issue with concurrent updates on analysis

    • Fixed an issue where unknown data inputs in the XML of an analysis are not being ignored

    • The warning, close, and machine profile icons for Tools can now be seen in the graphical CWL pipeline editor

    • Fixed an issue where user cannot expand analysis output folder if user permissions change after starting analysis. Now, if a user has the correct permissions to start an analysis, that analysis should be able to finish correctly no matter the permissions at the time it succeeds

  • Base

    • Fixed an issue switching back from template to Empty Table did not clear the fields

    • Data linked from an externally managed project can be added to Base Tables

    • Fixed an issue in the graphical UI where schema definition does not scroll correctly when many columns are defined

2023 October 3 - ICA v2.19.0

Features and Enhancements

  • Data Management/API

    • Added a new endpoint available to change project owner

      • POST /api/projects/{projectId}:changeOwner { “newOwnerId”:”}

    • Added a new endpoint to copy data from one project to another:

      • /api/projects/{projectId}/projectDataCopyBatch

  • Data Management/CLI

    • Added the ability to copy files and folders between projects in the UI and CLI. This includes support for copying data from projects with ICA-managed storage (default) to projects with S3-configured storage.

  • Flow/API

    • When starting an analysis via the API, you can specify the input files based on HTTP(s). When your analysis is done, you will see the URL corresponding to the inputs in the UI, but you will not be able to start an analysis from the UI using this URL

    • Added two new endpoints for workflow sessions:

      • Get /api/projects/{projectId}/workflowSessions

      • Get /api/projects/{projectId}/workflowSessions/{workflowSessionId}/inputs

    • Added a new endpoint to retrieve configurations from a workflow session

  • Flow/CLI

    • Duplicate analyses submitted via the CLI will be avoided

  • Flow

    • Removed the ability to start analyses from data and sample views in the UI where a single input is selected to start analyses in bulk

    • Flow/Autolaunch ICA Workflow Session and Orchestrated Analyses (launched by the workflow session) now saves outputs in an organized folder structure: /ilmn-analysis/<name_used_to_create_sequencer_run_output_folder>

  • Base

    • The Base module has a new feature called ‘Data Catalogue’. This allows you to add usage data from your tenant/project if that data is available for you.

      • Data Catalogue views will be available and can be used in Base to query on

      • You will be able to preview and query Data Catalogue views through Base Tables and Query screens

      • The Data Catalogue will always be up to date with the available views for your tenant/project

      • Data Catalogue views cannot be shared through a Bundle

      • Data Catalogue views will also be available to team members that were added after the view was added

      • Data Catalogue views can be removed from the Base tables and corresponding project

      • By removing Base from a project, the Data Catalogue will also be removed from that project

  • Cohorts: Disease Search box

    • Cohorts now includes a disease search box to search for disease concepts. This replaces the disease concept tree explorer

    • Disease search box located under a Disease tab in main Query builder

    • Search box allows for a copy/paste action of codes to be processed as separate query elements. Currently, the feature is limited to a complete valid list

    • Each disease entered into the search box is displayed as a separate query item and can be set to include or exclude.

    • Diseases in search box can be used with boolean logic in cohort creation

    • Search box allows for an auto-complete of diagnosis concepts and identifiers

    • The disease filter is included in the cohort query summary on cohort page

Fixed Issues

  • Data Management

    • Data copy between ICA-managed projects and S3 storage configured projects is supported

    • Fixed an issue where storage configurations matching ICA-managed buckets would cause volume records to get associated with the wrong storage configuration in the system

  • API

    • The endpoint GET/api/projects/{ProjectID}/samples/{SampleID} correctly returns all the own samples and linked samples

    • Improved handling of bulk update via API when concurrent deletion of file has occurred

  • CLI

    • Fixed an issue where projectdata update tags would not update the tags

    • Fixed an issue to support adding the server-url as a parameter instead of having the config set

  • Flow

    • Fixed an issue resulting in failure to send a notification resulting in a failed workflow

    • Fixed an issue where one workflow session may override another when both are executed at the same time

  • Base

    • Fixed an issue where query download in JSON format returns an error

    • Added a message in the UI when a query takes longer than 30 seconds to inform the user that the query is ongoing and can be monitored in the Activity view

    • Added a section describing the Data Catalogue functionality

  • Bench

    • Fixed an issue where resizing the workspace to current size would prevent users from resizing for the next 6 hours

  • Cohorts

    • Fixed an issue where Gene Expression table does not display with TCGA data or for tenants with a hyphen (e.g., ‘genome-group’)

    • Fixed an issue where user had no way to delete a cohort comparison from a deleted cohort

    • Fixed an issue in the UI where multi-cohort needle plot tracks are overlapping

    • Fixed an issue causing failures during annotation step with ‘CNV’ data type when selection ‘GB=hg19’ and ‘CNV data’ for liftover; also observed with ‘SM data’ and ‘hg38’ without liftover (in APS1 and CAC1 regions) due to a ‘404 Not Found’ error.

2023 September 14 - ICA v2.18.4

Fixed Issue

  • Fixed an issue uploading folders via the CLI

2023 September 8 - ICA v2.18.3

Fixed Issue

  • Fixed an issue causing CWL pipelines using Docker images that do not contain bash shell executable to fail.

2023 September 7 - ICA v2.18.2

Fixed Issue

  • Fixed an issue leading to intermittent system instability.

2023 September 6 - ICA v2.18.1

Fixed Issue

  • Cohorts

    • Issue fixed where GTEx plot is not available for tenants with a hyphen (e.g. ilmn-demo).

2023 August 31 - ICA v2.18.0

Features and Enhancements

  • General

    • Versioning: The ICA version can now be found under your user when you select "About"

    • Versioning/API: It is possible to retrieve system information about ICA, such as the current version through GET/api/systeminfo

    • Logging: When an action is initiated by another application, such as BaseSpace Sequence Hub, it will be traced as well in the ICA logs

  • Data Management

    • New API endpoints are available for:

      • Creation of a data update in bulk: POST/api/projects/{projectId}/dataUpdateBatch

        • A list of data updates for a certain project: GET/api/projects/{projectId}/dataUpdateBatch/{batchId}

        • A list of items from the batch update: GET/api/projects/{projectId}/dataUpdateBatch/{batchId}/items

        • A specific item from the batch update: GET/api/projects/{projectId}/dataUpdateBatch/{batchId}/items/{itemId} Note: Batch updates include tags, format, date to be archived and date to be deleted

  • Data Management/API

    • The sequencing run information can be retrieved through its Id by using the API endpoint GET/api/sequencingRuns/{sequencingRunId}

  • Flow:

    • Auto launch now supports BCL Convert v3.10.9 pipeline and both TruSight Oncology 500 v2 pipelines (from FASTQs)

    • Removed "fpga-small" from available compute types. Pipelines using "fpga-small" will use the "fpga-medium"-equivalent compute specifications instead

    • Analyses launched/tracked by BaseSpace Sequence Hub contain relevant BaseSpace information in analysis details view

  • Flow/API

    • getPipelineParameters API returns parameter type in response

    • Added endpoints to retrieve and update a project pipeline definition

    • New API endpoint available to request the analyses in which a sample is being used

    • When leaving activationCodeDetailId empty when starting an analysis, the best match activation code will be used

  • Flow/API/CLI

    • Include "mountPaths" field in response for API and CLI command to retrieve analysis inputs

  • API

    • Two new API endpoints added to accept Terms and Conditions on a bundle:

      • GET /api/bundles/{bundleId}/termsOfUse/userAcceptance/currentUser Returns you the time of acceptance when you, the current user, accepted the Terms & Conditions.

      • POST /api/bundles/{bundleId}/termOfUse:accept

    • Add temporary credentials duration to API documentation

  • Notifications

    • List of events to which you can subscribe contains new ICA notification containing analyses updates

  • Bench

    • A new Bench permission is being introduced: Administrator. This permission allows users to manages existing workspaces and create new workspaces

    • The Bench Administrator role allows you to create new Bench workspaces with any permissions even if you as a Bench administrator do not have these permissions. In that case, you can create and modify the workspace, but you cannot enter that workspace. Modifying is only possible when the workspace is stopped

    • As a Bench Contributor you are not allowed anymore to delete a Bench Workspace, you need the Bench Administrator role.

  • Cohorts

    • Users can now ingest raw DRAGEN bulk RNAseq results for genes and transcripts (TPM), with the option to precompute differential expression during ingestion

    • Added support for running multiple DEseq2 analyses in the ingestion workflow through bulk processing based on sample size and specific requirements

    • In multiple needle plot view, individual needle plots can now be collapsed and expanded

    • Pop-outs for needle plot variants now contain additional links to external resources, such as UCSC

    • For a given cohort, display a distribution of raw expression values (TPM per gene) for a selected attributes

    • Use of the Cohorts maintains session between core ICA and Cohorts iFrame to prevent unwanted timeouts

    • Cohorts displays structural variants that include or overlap with a gene of interest

Fixed Issues

  • General

    • Collaboration: Fixed an issue where a user is presented with a blank screen when responding to a project invitation

  • Data Management/API

    • Improved error handling for API endpoint: DELETE/api/bundles/{bundleId}samples{sampleId}

    • Fixed an issue where the API endpoint GET /api/samples erroneously returned a 500

    • API endpoint GET/api/projects/{projectId}/analyses now returns the correct list when filtering on UserTags whereas it previously returned too many

    • Improved retry mechanism for API endpoint to create folderuploadsession

  • Data Management/CLI

    • When an upload of a folder/file is done through the CLI, it returns the information and ID of the folder/file

  • Data Management

    • CreatorId is now present on all data, including subfolders

    • Improved external linking to data inside ICA using deep linking

    • Improved error handling when creating folders with invalid characters.

    • Fixed an inconsistency for URN formats on output files from Analyses. This fix will apply only for analyses that are completed starting from ICAv2.18.0

    • Improved resilience in situations of concurrent linking and unlinking of files and folders from projects

    • It is only possible to delete a storage configuration if all projects that are using this storage configuration have been hidden and are not active projects anymore

    • Improved accuracy of the displayed project data size. Prior cost calculations were accurate, but the project data size visualization included technical background data

    • Fixed an issue where there is a discrepancy in number of configurations between Storage->Configurations and Configurations-> Genomics.Byob.Storage Configuration view

  • Flow/API

    • Improved error handling when invalid project-id is used in API endpoint GET /api/projects/{projectId}/pipelines

    • Fixed an issue when an Analysis completed with error "incomplete folder session", the outputs of the Analysis are not always completely listed in the data listing APIs

    • Updated ICA Swagger Project > createProject to correctly state that the analysis priority must be in uppercase

  • Flow

    • When a spot instance is configured, but revoked by AWS, the pipeline will fail and exit code 55 is returned

    • Fix to return meaningful error message when instrument run ID is missing from Run Completion event during an auto launched analysis

    • Improved parallel processing of the same analysis multiple times

  • Base

    • Improved error handling when creating queries which use two or more fields with the same name. The error message now reads "Query contains duplicate column names. Please use column alias in the query"

    • Fixed an issue where queries on tables with many entries fail with NullPointerException

  • Bench

    • Clarified that changes to Bench workspace size only take effect after a restart

  • Cohorts

    • Fixed issue where counts of subjects are hidden behind attribute names

    • Fixed issue where the state of checked files are not retained when selecting molecular files that are in multiple nested folders

    • Fixed issue where projects that contain files from linked bundles cause a time out, resulting in users not being able to select files for ingestion

    • Fixed an issue where the 'Import Jobs' page loaded within the Data Sets frame, depending on where the import was initiated

    • Fixed an issue in the Correlation plat where x-axis counts were hidden under attribute names

    • Fixed an issue where users were previously incorrectly signed out of their active sessions

2023 August 3 - ICA v2.17.1

Fixed Issues

  • Fixed an issue causing analyses requesting FPGA compute resources to experience long wait times (>24h) or not be scheduled

2023 June 27 - ICA v2.17.0

Features and Enhancements

  • Data Management

    • Performance improvements for data link and unlink operations – Larger and more complex folders can now be linked in the graphical UI, and progress can be monitored with a new visual indication under Activity > Batch Jobs

  • Notifications

    • Notifications are now available for batch job changes

  • Flow

    • Increased the allowed Docker image size from 10GB to >20GB

    • CWL: Added support for javascript expressions “ResourceRequirements” fields (i.e., type, size, tier, etc.) in CWL Pipeline definitions

  • Flow/API

    • Added support for using Pipeline APIs to query Pipelines included in Entitled Bundles (i.e., to retrieve input parameters)

    • Added support for providing S3 URLs as Pipeline data inputs when launching via the API (using storage credentials)

    • Added support for specifying multi-value input parameters in a Pipeline launch command

  • Bench

    • Project and Tenant Administrators are now allowed to stop running Workspaces

  • Cohorts

    • Enhanced ingestion workflow to ingest RNAseq raw data from DRAGEN output into backend Snowflake database

    • Added support for running multiple DEseq2 analyses in the ingestion workflow through bulk processing based on sample size and specific requirements

    • Multi-Cohort Marker Frequency - Added Multi-Cohort Marker Frequency tab allowing users to compare expression data across up to four Cohorts at the gene level

    • Multi-Cohort Marker Frequency includes a pairwise p-value heat map

    • Multi-Cohort Marker Frequency - Includes frequencies for Somatic and Copy Number Variants

    • Tab added for a multi-cohort marker frequency analysis in cohort comparisons

    • Multi-Cohort Needle Plot - Added new tab in the Comparison view with vertically aligned needle plots per cohort for a specified gene, allowing collapsible and expandable individual needle plots

    • Additional filter logic added to multi-cohort needle plot

    • Improved DRAGEN data type determination during ingestion allowing for multiple variant type ingestion

    • Enhanced list of observed variants with grouped phenotypes and individual counts, including a column for total sample count; tooltips/pop-outs provide extended information

    • Updates to needle plot link outs

    • Improved the Comparison feature by optimizing API calls to handle subjects with multiple attributes, ensuring successful loading of the page and enabling API invocation only when the user selects or expands a section

    • Removed unused columns (genotype, mrna_feature_id, allele1, allele2, ref_allele, start_pos, stop_pos, snp_id) from annotated_somatic_mutations table in backend database

    • Refactored shared functionality for picking consequence type to reduce code duplication in PheWAS-plot and GWAS-plot components

    • Invalid comparisons on the Comparisons page are now grayed out and disabled This improvement prevents the selection of invalid options

    • Automatic retry of import jobs when there are failures accessing data from ICA API

Fixed Issues

  • General

    • Navigation: Removed breadcrumb indication in the graphical UI

  • Data Management

    • The content of hidden Projects can now be displayed

    • Fixed the TimeModified timestamp on files

    • Bundles: Resolved issues when linking a large number of files within a folder to a Bundle

  • Flow

    • Single values are now passed as a list when starting an Analysis

    • Pipelines will succeed if the input and output formats specified on the pipeline level match at the Tool level

    • Fixed an issue causing Analysis failures due to intermittent AWS S3 network errors when downloading input data

    • CWL: Improved performance on output processing after a CWL Pipeline Analysis completes

    • Flow/UI: Mount path details for Analysis input files are now visible

    • Flow/UI: Improved usability when starting an Analysis by filtering entitlement options based o inputs selected and available entitlements

  • Flow/API

    • List of Analyses can now be retrieved via the API based on filters for UserReference and UserTags

  • Base

    • Fixed an issue where the Scheduler continues to retry uploading files which cannot be loaded

  • Bench

    • Resolved an issue when attempting to access Workspaces with multi-factor authentication (MFA) enabled at the Tenant-level

  • API

    • Improved error messaging for POST /api/projects/{projectId}/data/{dataId}:scheduleDownload

  • Cohorts

    • Fixed issue where Correlation bubble plot not showing for any projects intermittently

    • Fixed issue where importing Germline/hg19 test file did not load variants for a specific gene in the Needle plot due to missing entries in the Snowflake table

    • Fixed a bug causing an HTTP 400 error while loading the Cohort for the second time due to the UI passing "undefined" as variantGroup, which failed to convert to the VariantGroup Enum type

    • Fixed issue where scale (y-axis) of needle plot is changed even if value of sample count gnomAD frequency is not accepted

    • Fixed an issue where no data was generated in the Base Tables after a successful import job in Canada - Central Region (CAC1)

    • Fixed issue where long chart axis labels overlap with tick marks on graph

2023 May 31 - ICA v2.16.0

Features and Enhancements

  • General

    • Navigation: Updated URLs for Correlation Engine and Emedgene in the waffle menu

    • Authentication: Using POST /api/tokens:refresh for refreshing the JWT is not possible if it has been created using an API-key.

    • Authentication: Improved error handling when there is an issue reaching the authentication server

    • Authentication: Improved usability of "Create OAuth access token" screen

  • Data Management

    • You can now select 'CYTOBAND' as format after file upload

    • Added support for selecting the root folder (of the S3 bucket) for Projects with user-managed

    • Added support for creating an AWS Storage Configuration with an S3 bucket with Versioning enabled

  • Auto-launch

    • Added technical tags for upstream BaseSpace Run information to auto-launched analyses

    • Added support for multiple versions of BCL Convert for auto-launched analyses

  • Flow

    • Added support for '/' as separator in CWL ResourceRequirements when specifying Compute Type

  • Flow/API

    • The API to retrieve analysis steps now includes exit code for completed steps

  • Bench

    • Restricted Workspaces (Open or Restricted) always allow for access to Project Data within the Workspace

    • Restricted Bench workspaces have limited access through whitelisted URLs that are checked before entry

    • Restricted Bench Workspaces allow for Open or Restricted workspaces. Restricted workspaces do not have access to the internet except for user-entered whitelist URLs

Fixed Issues

  • Data Management

    • Upload for files names including spaces is now consistent for connector and browser upload. We do still advise not to use spaces in file names in general

    • Fixed search functionality in Activity > Data Transfers screen

    • Improved performance on opening samples

    • Fixed an issue where reference data in download tab initiates an unexpected download

    • Fixed intermittent issue where the Storage configuration within a Project can go into Error status and can block users from creating records such as folders and files

    • Service Connector: Improved error message for DELETE/api/connectors/{connectorId}/downloadRules/{downloadRuleId}

  • Data Management/API

    • Improved error handling for API endpoints: Delete/api/projects/ {projectId}/bundles/{bundleId} and POST/api/projects/{projectId}/bundles/{bundleId}

    • Improved error handling for POST/api/projects/{projectId}/base:ConnectionDetails

  • Bundles

    • Fixed an issue where the Table view in Bundles is not available when linking to a new Bundle version

    • Fixed an issue where linking/unlinking a Bundle with Base Tables could result in errors

  • Bundles/API

    • Improved error handling for DELETE/api/bundles/{bundleId}/tools/{toolId} and POST/api/bundles/{bundleId}/tools/{toolId}

    • Improved error message for POST/api/bundles/{bundleId}/samples/{sampleId}

  • Notifications/API

    • Custom subscriptions with empty filter expressions will not fail when retrieving them via the API

    • Improved error handling for POST/api/projects/{projectId}/notificationSubscriptions

    • Improved notification for Pipeline success events

  • Flow

    • When the input for a pipeline is too large, ICA will fail the Analysis and will not retry

    • Fixed issue where analysis list does not search-filter by ID correctly

    • Improved error handling when issues occur with provisioning resources

    • When retry succeeds in a Nextflow pipeline, exit code is now '0' instead of '143'

  • Flow/API

    • Fixed an issue causing API error when attempting to launch an Analysis with 50,000 input files

    • Improved pipeline error code for GET/api/projects/{projectId}/pipelines/{pipelineId} when already unlinked pipeline Id is used for API call

    • Fixed an issue where Analyses could not be retrieved via API when the Pipeline contained reference data and originated from a different tenant

    • Fixed filtering analyses on analysisId. Filtering happens via exact match, so part of the Id won't work

  • Bench/CLI

    • Fixed issue where the latest CLI version was not available in Bench workspace images

  • Cohorts

    • Fixed an issue where CNV data converted from hg19 to hg38 do not show up in Base table views

    • Fixed an issue accounting for multiple methods of referring to the alternate allele in a deletion from Nirvana data

    • Fixed intermittent issue where GWAS ingestions not working after Base enabled in a project.

2023 May 2 - ICA v2.15.1

Fixed Issue

  • Fixed an issue causing incorrect empty storage configuration dropdown during Project creation when using the “I want to manage my own storage” option for users with access to a single region

2023 April 25 - ICA v2.15.0

Features and Enhancements

  • General

    • General availability of sequencer integration for Illumina sequencing systems and analysis auto launch

    • General usability improvements in the graphical interface, including improved navigation structure and ability to switch between applications via the waffle menu in the header

    • Storage Bundle field will be auto-filled based on the Project location that is being chosen if multiple regions are available

    • Event Log entries will be paged in the UI and will contain a maximum of 1,000 entries. Exports are limited to the maximum number of entries displayed on the page.

    • Read-only temporary credentials will be returned when you are not allowed to modify the contents of a file

    • The ICA UI will only allow selection of storage bundles belonging to ICA during Project creation, and the API will only return storage bundles for ICA

  • Notifications

    • Creating Project notifications for BaseSpace externally managed projects is now supported

  • Flow

    • Allow attached storage for Pipeline steps to be set to 0 to disable provisioning attached storage and improve performance

  • Cohorts

    • GRCh37/hg19-aligned molecular data will get converted to GRCh38/hg38 coordinates to facilitate cross-project analyses and incorporating publicly available data sets.

  • API

    • Project list API now contains a parameter to filter on (a) specific workgroup(s)

    • Two new API endpoints are added to retrieve regular parameters from a pipeline within or without a Project context

Fixed Issues

  • General

    • Optimized price calculations resulting in less overhead and logging

    • Improved error handling:

      • during Project creation

      • of own storage Project creation failures.

      • to indicate connection issue with credential

      • for graphical CWL draft Pipelines being updated during an Analysis

    • Improved error messaging in cases where the AWS path contains (a) special character(s)

    • Fixed an issue causing errors when navigating via deep link to the Analysis Details view

  • Data Management

    • Fixed an issue causing data records to remain incorrectly in Unarchiving status when an unarchive operation is requested in the US and Germany regions

  • API

    • Fixed returning list of unlinked data in a sample that was linked before in GET/api/projects/{projectId}/data

    • Fixed error for getSampleCreationBatch when using status filter

  • CLI

    • Unarchive of folders is supported when archive or unarchive actions are not in progress for the folder

    • Improved error message to indicate connection issue with credentials

  • Flow

    • Fixed an issue causing incorrect naming of Analysis tasks generated from CWL Expression Tools

    • Fixed an issue when cloning Pipelines linked from Entitled Bundles to preserve the original Tenant as the Owning Tenant of the cloned Pipeline instead of the cloning user’s Tenant

    • Fixed an issue causing outputs from CWL Pipelines to not show in the Analysis Details despite being uploaded to the Project Data Analysis output folder when an output folder is empty

    • When a Contributor starts an Analysis, but is removed afterwards, the Analysis still runs as expected

    • Fixed an issue where Analyses fail where Nextflow is run a second time

    • Fixed an issue causing API error when attempting to launch an Analysis with up to 50,000 input files

    • Fixed an issue causing degraded performance in APIs to retrieve Analysis steps in Pipelines with many steps

    • Fixed an issue causing Analysis failure during output upload with error “use of closed network connection”

    • Fixed an issue causing disk capacity alter log to not show when an Analysis fails due to disk capacity and added error message

    • Fixed an issue preventing cross-tenant users from being able to open a shared CWL pipeline

  • Base

    • Improved target Table selection for schedulers to be limited to your own Tables

  • Bench

    • Fixed an issue causing Workspaces to hang in the Starting or Stopping statuses

  • Cohorts

    • Now handles large VCFs/gVCFs correctly by splitting them into smaller files for subsequent annotation by Nirvana

2023 March 28 - ICA v2.14.0

Features and Enhancements

  • General

    • Added a limit to Event Log and Audit UI screens to show 10,000 records

  • API

    • Parent output folder can be specified in URN format when launching a Workflow session via the API

  • Flow

    • Reduced Analysis delays when system is experiencing heavy load

    • Improved formatting of Pipeline error text shown in Analysis Details view

    • Users can now start Analyses from the Analysis Overview screen

    • Superfluous “Namespace check-0” step was removed to reduce Analysis failures

    • Number of input files for an Analysis is limited to 50,000

    • Auto launched Workflow sessions will fail if duplicate sample IDs are detected under Analysis Settings in the Sample Sheet

  • Base

    • Activity screen now contains the size of the query

  • Cohorts

    • Detect and Lift Genome Build: Cohorts documentation provides set-up instructions to convert hg19/GRCh37 VCFs to GRCh38 before import into Cohorts.

    • Attribute Queries: Improved the user experience choosing a range of values for numerical attributes when defining a cohort

    • Export Cohort to ICA Project Data: Improved the user experience exporting list of subjects that match cohort definition criteria to their ICA project for further analysis

    • Ingest Structural Variants into database

      • The Cohorts ingestion pipeline supports structural variant VCFs and will deposit all such variants into an ICA Base table if Base is enabled for the given project

      • Structural variants can be ingested and viewed in base tables

    • Needle Plot Enhancements

      • Users can input a numerical value in the Needle Plot legend to display variants with a specific gnomAD frequency percentage or sample count

      • The needle plot combines variants that are observed among subjects in the current project as well as shared and public projects into a single needle, using an additional shape to indicate these occurrences

      • Needle Plot legend color changes for Variant severity; pathogenic color coding is the same as the color coding in the visualization; differentiating hue between proteins and variants; and other color coding changes.

      • Needle plot tool tips that display additional information on variants and mutations are now larger and modal

      • The needle plot now allows to filter by gnomAD allele frequency and sample count in the selected cohort. Variants include links to view a list of all subjects carrying that variant and export that list.

    • Remove Samples Individually from Cohorts

      • Exclude individual subjects from a cohort and save the refined list

      • The subjects view allows users to exclude individual subjects from subsequent analyses and plots and save these changes Subject exclusions are reset when editing a cohort

    • Subject Selection in Analysis Visualization: Users can follow the link for subject counts in the needle plot to view a list of subjects carrying the selected variant or mutation.

    • UI/UX: Start and End time points are available as a date or age with a condition attribute in the subject data summary screen.

Fixed Issues

  • General

    • Improved resilience against misconfiguration of the team page when there is an issue with Workgroup availability

    • Removed ‘IGV (beta)’ button from ‘View’ drop down when selecting Project Data in UI

  • Data Management

    • Improved handling of multi-file upload when system is experiencing heavy loads

    • Fixed an issue to allow upload of zero-byte files via the UI

    • Fixed issue where other Bundles would not be visible after editing and saving your Bundle

  • API:

    • Improved error handling for API endpoint: POST /api/projects/{projectId}/analysisCreationBatch

    • Improved performance of API endpoint: getbestmatchingfornextflow

  • Flow

    • Fixed an issue causing Analysis output mapping to incorrectly use source path as target path

    • Fixed an issue where the UI may display incorrect or invalid parameters for DRAGEN workflows which do not accurately show the true parameters passed. Settings can be confirmed by looking at the DRAGEN analysis log files.

  • Base

    • “Allow jagged rows” setting in the Scheduler has been replaced with “Ignore unknown values” to handle files containing records with more fields than there are Table columns

    • Improved Base Activity view loading time

    • Fixed an error message when using the API to load data into a Base Table that has been deleted

  • Bench

    • Fixed an issue resulting in incorrect Bench compute pricing calculations

    • Fixed an issue preventing building Docker images from Workspaces in UK, Australia, and India regions

    • Fixed an issue where /tmp path is not writeable in a Workspace

  • Cohorts

    • Fixed issue where the bubble plot sometimes failed to display results even though the corresponding scatter plot showed data correctly.

    • The order of messages and warnings for ingestion jobs was not consistent between the UI and an error report sent out via e-mail.

    • The UI now displays any open cohort view tabs using shortened (“…”) names where appropriate

    • Issue fixed where ingestions with multiple errors caused halting to the ingestion queue.

    • The needle plot sometimes showed only one source for a given variant as opposed to all projects in which the variant had been observed in.

    • Issue fixed with unhandled genotype index format in annotation file to base database table conversion

    • Status updates via e-mail sometimes contained individual error messages or warnings without a text.

    • Fixed issue where items show in needle plot with incorrect numbering on the y-axis.

    • Fixed performance issue with subject count.

    • Widget bar-chart counts are intermittently cut off over four digits.

    • Fixed slowness when switching between tabs in query builder

2023 March 23 - ICA v2.13.2

Fixed Issue

  • Fixed issue with BaseSpace Free Trial and Professional users storing data in ICA

2023 March 9 - ICA v2.13.1

Fixed Issue

  • Fixed an issue resulting in analysis failures caused by a Kubernetes 404 timeout error

2023 February 28 - ICA v2.13.0

Features and Enhancements

  • General *

    • Each tenant supports a maximum of 30,000 Projects

    • .MAF files are now recognized as .TSV files instead of UNKNOWN

    • Added VCF.IDX as a recognized file format

    • General scalability optimizations and performance improvements

  • API

    • POST /api/projects/{projectId}/data:createDownloadUrls now supports a list of paths (in addition to a list of IDs)

Fixed Issues

  • General

    • Fixed an issue preventing the ‘Owning Project’ column from being used outside of Project

    • Fixed an issue allowing the region of a Project to be changed. Changing the region of a resource is not supported

    • Strengthened data separation and improved resilience against cross-Project metadata contamination

  • Bundles

    • After creating a new Bundle the user will be taken to the Bundle Overview page

  • Data Management

    • Fixed an issue which prevented changing the format of a file back to UNKNOWN

    • Fixed an issue causing inaccurate upload progress to be displayed for UI uploads. The Service Connector or CLI are recommended for large file uploads.

    • Fixed an issue showing an incorrect status for data linking batch jobs when data is deleted during the linking job

    • Service Connector: Fixed an issue allowing download of a Service Connector when no operating system is set

    • Service Connector: Cleaned up information available on Service Connectors by removing empty address information fields

  • API

    • Fixed date formatting for GET /api/eventLog (yyyy-MM-dd’T’HH:mm:ss.SSS’Z’)

    • Fixed an issue where the GET users API was not case sensitive on email address

    • Fixed an issue causing the metadata model to be returned twice in PSOT /api/projects/{projectId}/samples:search

    • Fixed the listProjects API 500 response when using the pageoffset query parameter

    • The searchProjectSamples API returns Sample metadata for Samples shared via a Bundle

    • Fixed an issue causing createProjectDataDownloadUrls API 400 and 502 errors when server is under load

  • Flow

    • Fixed analysis failures caused by kubernetes 404 timeout error

    • Fixed an issue where Workflwos would prematurely report completion of an Analysis

    • Improved Pipeline retry logic to reduce startup delays

    • Fixed an issue where Nextflow pipelines were created with empty files (Nextflow config is allowed to be empty)

    • Removed the 1,000 input file limitation when starting an Analysis

    • Improved the performance of status update messages for pipelines with many parallel steps

    • Fixed an issue with overlapping fields on the Analysis Details screen

    • Deactivated the Abort button for Succeeded analyses

  • Base

    • Fixed an issue where Pipeline metadata was not captured in the metadata Table generated by the metadata schedule

    • Error logging and notification enhancements

  • Bench

    • Fixed an issue where Workspaces could be started twice

    • Fixed an issue where the system checkpoint folder was incorrectly created in Project data when opening a file in a Workspace

2023 February 13 - ICA v2.12.1

Features and Enhancements

  • Analysis system infrastructure updates

2023 January 31 - ICA v2.12.0

Features and Enhancements

  • Added ability to refresh Batch Jobs updates without needing to leave the Details screen.

  • Projects will receive a job queuing priority which can be adjusted by an Administrator.

  • The text "Only showing the first 100 projects. Use the search criteria to find your projects or switch to Table view." when performing queries is now displayed both on the top and bottom of the page for more clarity.

  • API: Added a new endpoint to retrieve download URLs for data: POST/api/projects/{projectId}/data:createDownloadUrls

  • API: Added support for paging of the Project Data/getProjectDataChildren endpoint to handle large amounts of data.

  • API: Added anew endpoint to deprecate a bundle (POST /api/bundles/{bundleId}:deprecate)

  • API: If the API client provides request header "Accept-Encoding: gzip", then the API applies GZIP compression on the JSON response. This way the size of the response is significantly smaller which improves the download time of the response, resulting in faster end-to-end API calls. In case of compression the API also provides header "Content-Encoding: gzip" in the response, indicating that compression was effectively applied.

  • Flow: Optimized Analysis storage billing, resulting in reduced pipeline charges.

  • Flow: Internal details of a (non-graphical) pipeline marked ‘Proprietary’ will not be shared with users from a different tenant.

  • Flow: A new grid layout is used to display Logs for Analyses with more than 50 steps. The classic view is retained for analyses with 50 steps or less, though you can choose to also use the grid layout by means of a grid button on the top right on the Analysis Log tab.

  • CLI: Command to launch a CWL and Nextlfow Pipeline now contains the mount path as a parameter.

  • CLI: Version command now contains the build number.

  • CLI: Added support for providing the nextflow.config file when creating a Nextflow pipeline via CLI.

  • API: HTML documentation for aPipeline can now be returned with the following requests:

    • GET /api/pipelines/{pipelineId}/documentation/HTML

    • GET /api/projects/{projectId}/pipelines/{pipelineId}/documentation/HTML

  • API: Added a new endpoint for creating and starting multiple analyses in batch: POST /api/projects/{projectId}/analysisCreationBatch

  • Flow: Linking to individual Analyses and Workflow sessions is now supported by /ica/link/project//analysis/ and /ica/link/project//workflowSession/

  • Cohorts: Users can now export subject lists to the ICA Project Data as a file.

  • Cohorts: Users can query their ingested data through ICA Base. For users who already have ingested private data into ICA Cohorts, another ingestion will need to happen prior to seeing available database shares. Customers can contact support to have previously ingested data sets available in Base.

  • Cohorts: Correlation bubble plot counts now link to a subject/sample list.

Fixed Issues

  • Tooltip in the Project Team page provides information about the status of an invite

  • ‘Resend invite’ button in the Project Team page will become available only when the invite is expired instead of from the moment the invite is being send out

  • Folders, subfolders and files all contain information about which user created the data

  • Files and folders with UTF-8 character are not supported. Please look at the documentation on how to recover from it in case you already have used them.

  • Improved performance for creating or hiding a Project in a tenant with many Projects

  • Service Connector: Updated information in the Service Connector screen to reflect the name change from "Type of Files" to the more accurate "Assign Format"

  • Service Connector: Folders within a Bundle can be downloaded via the Service Connector

  • Service Connector: Upload rules can only be modified in the Project where they apply

  • Service Connector: A message describes when a file is skipped during upload because it already exists in the Project

  • Service Connector: Fixed an issue where opening the Connectivity tab occasionally results in a null pointer error

  • Service Connector: Fixed an issue causing excessive logging when downloading files with long file paths

  • Service Connector: Fixed an issue where the Service Connector log may contain spurious errors which do not impact data transfers

  • Existing storage configurations are displayed and accessible via API and UI

  • Newly added storage configurations do no longer remain in ‘Initializing’ state

  • Fixed error when creating a storage configuration with more than 63 characters

  • Clicking on a Data folder in flat mode will now open the details of the folder

  • Only Tools in Released state can be added to a Bundle

  • Fixed issue preventing new Bundle versions to be created from Restricted Bundles

  • Deprecated Bundles are displayed upon request in card and table view

  • Bundles view limited to 100 Bundles

  • API: Fixed the API spec for ProjectDataTransfer.getDataTransfers

  • API: Fixed an issue with the projectData getChildren endpoint which returned incorrect object and pagination

  • API: Fixed an issue where multiple clicks on Create sample batch API endpoint resulted in multiple requests

  • API: POST /api/projects/{projectId}/data/{dataId}:scheduleDownload can now also perform folder downloads

  • API: Improved information on the Swagger page for GET /api/pipelines, GET/api/projects/{projectId}/pipelines, and GET/api/projects/{projectId}/pipelines/{pipelineId}

  • API: Fixed and issue when a user provides the same input multiple times to a multi-value input on an analysis run, that input is only passed to the pipeline once instead of multiple times: POST /api/projects/{projectId}/analysis:nextflow

  • CLI: Copying files in the CLI from a local folder on MacOS to your Project can result in both the desired file and the metadata file (beginning with ‘./’) being uploaded. The metadata file can safely be deleted from the Project

  • CLI: Hardened protection against accidental file overwriting

  • CLI: Improved handling for FUSE when connection to ICA is lost

  • CLI: icav2 projectdata mount –list shows updated list of mounted Projects

  • CLI: Paging improvements made for project list, projectanalyses list, and projectsdata list

  • CLI: When there is no config or session file the user will not be asked to create one for icav2 config reset and icav2 config get

  • CLI: Fixed an issue where Bundle data could not be seen through FUSE in Bench

  • CLI: Fixed an error message when missing config file upon entering the Project context

  • CLI: The unmount is possible without a path and will work via the stored Project ID or with a folder path resulting in an unmount of that path

  • CLI: Fixed an error when creating a Pipeline using URN for Project identifier

  • CLI: Attempting to delete a file from an externally-managed project returns an error indicating this not allowed

  • CLI: Fix to delete session file when config file is not detected

  • CLI: Paging option added to projectsamples list data

  • CLI: Fixed “Error finding children for data” error in CLI when downloading a folder

  • CLI: projectdata list now returns the correct page-size results

  • Flow: Fixed handling of special characters in CWL pipeline file names

  • Flow: Fixed an issue where task names exceeding 25 characters cause analysis failure in CWL pipelines

  • Flow: Fixed an issue which prevented requests for economy tier compute

  • Flow: Fixed an issue limiting CWL workflow concurrency to two running tasks

  • Flow: Fixed an issue where analysis file inputs specified in the input.json with ‘location’set to an external URL cause to CWL pipelines to fail

  • Flow: Fixed an issue resulting in out of sync Pipeline statuses

  • Flow: Improved Nextflow engine resiliency, including occurrences where Nextflow pipelines fail with ‘pod 404 not found’ error

  • Flow: Fix issue with intermittent system runtime failures incorrectly causing analysis failures

  • Flow: Fixed an issue where links to Analysis Details returned errors

  • Flow: Enabled scrolling for Pipeline documentation

  • Flow: Improved performance for handling analyses with large numbers of inputs

  • Flow: Improved handling of hanging Analyses

  • Flow: Improved error messages for failed Pipelines

  • Flow: Added documentation on how to use XML configuration files for CWL Pipelines

  • Flow: Duplicate values for multi-value parameters are no longer automatically removed

  • Flow: Correct exit code 0 is shown for successful Pipeline steps

  • Base: Fixed an issue so that only users with correct permissions are allowed to retrieve a list of Base tables

  • Base: Fixed an issue with metadata scheduler resulting in a null pointer

  • Base: An empty Table description will not return an error when requesting to list all Tables in a Project

  • Base: Jobs failed with an error containing 'has locked table' are not shown on the Base Job activity list. They can be displayed by selecting the 'Show transient failures' checkbox at Projects > Activity > Base Jobs.

  • Base: Users can see Schedulers and their results for the entire tenant if created by a tenant administrator in their project, but not create, edit or run them

  • Base: Fixed an issue preventing data format change in a schedule

  • Base: Fixed an issue preventing exporting data to Excel format

  • Bench: Improved handling to prevent multiple users in a single running Workspace

  • Bench: Fixed an issue causing Workspaces to be stuck in "Starting" state

  • Bench: Fixed an issue where usage does not showing up on usage CSV-based report

  • Bench: Fixed an issue where Bundle data could not be seen via the Fuse driver

  • Bench: Users can now consistently exit Workspaces with a single click on the ‘Back’ button.

  • Bench: After leaving a Workspace by clicking on the ‘Back’ button, the Workspace will remain in a ‘Running’ state and become available for a new user to access

  • Bench: Workspaces in a ‘Stuck’ state can be manually changed to ‘Error’ state, allowing users to restart or delete them

  • Cohorts: Fixed issue where file system cleanup not occurring after delete.

  • Cohorts: Fixed sign in and authentication issues in APN1 region.

  • Cohorts: Fixed issue where gene filter missing when editing a cohort and removing the edited filter and cancelling. The filter was preserved and should not have been.

  • Cohorts: Fixed issue where users see an application tile in the Illumina application dashboard selection screen called "Cohort Analysis Module".

  • Cohorts: Correlation: Fixed issue, Data type selections shows half when loading the search result

  • Cohorts: Fixed issue, Users will see an application tile on the Connected Platform home page screen called “Cohort Analysis Module” if the Cohorts module is added to the domain. Users should not enter the ICA Cohorts through this page. They should enter through ICA."

Service Connector
Project Connector
ICA API
/api/projects/{projectId}/data
​/api​/projects​/{projectId}​/data​/{dataId}:createUploadUrl
support page
AWS CLI
configure
this ICA API endpoint
here
site
obtain
JupyterLab documentation
here
WSL documentation
installed
authenticated
FUSE Driver
macFuse
here
this page
here
Projects
entitled bundle
Bundles
Identify Input File Parameters
these XML inputs
Releases section
Analysis Lifecycle
Connector Rules
storage size
Bench pricing

2.36.0

2.35.0

2.34.0

2.33.0

2.32.2

2.31.0

2.30.0

2.29.0

2.28.0

2.27.0

2.26.0

2.25.0

2.24.0

2.23.0

2.22.0

2.21.0

2.19.0

2.18.0

2.17.0

2.16.0

2.15.0

2.12.0

2.10.0

2.9.0

2.8.0

2.4.0

2.3.0

2.2.0

2.1.0

2.0.0

Connector gets connected, but uploads won’t start

Upload from shared drive does not work

Data Transfers are slow

Many factors can affect the speed: • Distance from upload location to storage location • Quality of the internet connection • Hardlines are preferred over WiFi • Restrictions for up- and download by the company or the provider. These factors can change every time the customer switches from location (e.g. working from home).

The upload or download progress % goes down instead of up.

This is normal behavior. Instead of one continuous transmission, data is split into blocks so that whenever transmission issues occur, not all data has to be retried. This does result in dropping back to a lower % of transmission completed when retrying.

2025

2025 May 5 - ICA v2.35.0

New Features and Enhancements:

  • General

    • Introduced navigation changes to ICA:

      • The navigation menu is now collapsible.

      • The Product overview selection menu moved from top left to top right.

      • The back button has been replaced by a breadcrumb menu.

      • The Projects overview back button is now part of the breadcrumb menu.

    • Sorting and filtering parameters are now part of the URL meaning they are retained

      when sharing a link, or when entering the details of an object in the grid and going

      back to the grid screen.

    • Moved the Export button to the left side of the footer of the grid as a small tertiary

      button for more consistency and ease of use.

    • Improved error message when a duplicate request is sent to create credentials via the

      API.

    • Improved link sharing so that user is brought to the correct page after logging in.

    • When creating a project with only one storage bundle available, that storage bundle is

      now automatically selected.

  • Data Management

    • Made the contributor role for bundles more consistent by removing edit rights for

      details and terms of use from the role.

    • Bundle sharing is now limited to users with bundle administrator role.

    • It is now possible to download one or more files from the data grid and details, even

      when the format has no viewer associated.

  • Flow

    • Pipelines now have an option to define patterns for detecting report files in the analysis

      output. On opening the analysis result window of this pipeline, an additional tab will

      display these report files.

    • New API endpoint POST/api/projects/\{projectId}/analysis:nextflowWithCustomInput

      for starting a Nextflow pipeline which uses both YAML and JSON as custom input.

    • For advanced output mapping the timelines and execution reports are now available for

      Nextflow pipelines.

    • Performance improvement in the response times for analyses with high number of

      inputs.

    • Improved aborting running analyses by faster process cancellation.

    • Analyses prices are rounded to two decimals for easier reading. More decimals are

      retained for calculations and invoicing.

  • Cohorts

    • Handing previously unrecognized consequence term protein_altering_variant during

      somatic mutation JSON ingestion to support TSO data inputs.

    • Users can now input a comma-separated list of up to 100 subject identifiers to filter

      cohorts directly by subject ID. This new filter is optional and integrates with existing

      filters using AND logic across filters and OR logic within the list. Subject counts update

      dynamically as IDs are added or removed.

    • Improved Needle Plot and Comparison Needle Plot filtering by making key variant filters

      (% gnomAD frequency, Sample Count, primateAI score) simultaneously and updating

      layout for better usability and alignment.

    • Cohorts phenotypic data available in base tables after input of user data as data sets in

      Cohorts. Tables will also accompany a data set when shared through a bundle.

    • Replaced Angular Material tab components with ILMN standard tab components in the

      Cohorts UI.

    • Added missing consequence values for somatic variants from TSO.

    • Cohorts reference database for PrimateAI includes previously not included genes.

Fixed Issues

  • General

    • Users entering and leaving workspaces are now more accurately tracked in the

      workspace history.

    • Fixed an issue which caused problems when printing out milliseconds from the shell.

    • Fixed an issue which prevented storage credentials from being shared without external

      approval.

    • During docker image creation 'Name' and 'Version' are now mandatory to fill in.

    • Fixed an issue in the API so it now returns the error in the appropriate dto format when

      an incorrect or invalid dockerDataProjectId is provided in the request body

  • Data Management

    • Fixed an issue which caused user who already had permission to a project or bundle to

      still appear in the list of users without permissions when trying to add more users.

    • Improved user verification when modifying the project owner.

    • Fixed an issue where the old project owner was not granted upload and download

      permissions when a new project owner was chosen for his project.

    • Fixed an issue which resulted in error 400/500 when encountering data connection

      problems. Retries have been implemented and if it still fails it will now show a 503

      instead of a 400 or 500.

    • Improved error handling during storage configuration creation.

    • Fixed an issue where cloned pipelines would appear to override externally linked

      pipelines with the same name. Both are now present in the list of pipelines.

    • Fixed an issue which caused the ica_logs folder to not always be present

    • Fixed an issue where sample metadata would not immediately be shown.

  • Flow

    • The project connector would incorrectly label samples from BSSH as being linked from

      another app, thus preventing their removal. This has been fixed.

    • Improved handling of database issues when analysis is stuck in progress.

  • Cohorts

    • Fixed issue where Firefox users could not export results.

    • Fixed issue where the [DELETE] /v1/study/{studyId} API returned a 500 error when

      deleting a valid and existing study.

    • Fixed issue where the spinner would remain indefinitely in the import wizard after a

      backend 500 error, instead of displaying an error message.

    • Fixed issue where using the dropdown for project information on a single subject was

      not working in Fire Fox.

    • Fixed an issue where modifying variant and phenotype category filters after searching

      in the PheWAS table caused the filters to stop applying to the table.

2025 March 26 - ICA v2.34.0

New Features and Enhancements:

  • General

    • A new Experimental Nextflow version has been made available (v24.10.2). Users will no longer be able to create new pipelines with Nextflow v20.10.0. In early 2026 ICA will no longer support Nextflow v20.10.0

    • Added an API endpoint to retrieve analysis usage details, exposing the analysis price. The UI now differentiates between errors and ongoing price calculations, displaying 'Price is being calculated' for pending requests instead of a generic error message

    • Made the project owner field read-only in the project details view and added a button in the Teams view to edit the project owner via a separate dialog

    • Autolaunch and BCLConvert now support dots and spaces in project names

  • Data Management

    • Users are now able to create non-indexed folders. These are special folders which cannot be accessed from the UI with some specific actions blocked (such as moving or copying those folders)

    • Enhanced visibility for data transfers by clearly marking those that do not match any download rule as 'ignored' in the UI. This helps users quickly identify transfers that won't start, preventing confusion and improving troubleshooting.

    • In bundles it is now possible to open the details for docker and tool images by clicking on the name in the overview

    • User managed storage configurations now allow for the copying of tags when copying/moving/archiving files and folders

  • Bench

    • For fast read/write access in Bench, you can now, link non-indexed folders with the CLI command workspace-ctl data create-mount --mode read-write

    • Bench can now be started in a single-user mode allowing only one user to work in the workspace. All assets generated in bench (e.g. pipelines) are owned by the Bench user instead of a service account

    • UI Changes made to Workspace configuration and splash pages

Fixed Issues

  • General

    • Fixed an issue where labels for success, failure, and other item counts were missing in the Batch job details panel

    • Improved error message in the API when creating a new project with user managed storage

    • Fixed an issue where the Save button remained enabled when clicking on the Documentation tab in the Tool Repository

    • Fixed duplicate project detection to handle the new 400 error response format, ensuring consistency with other unique constraint violations

    • Changes made to advanced scheduler options that are not applicable any longer

    • Removed erroneous Link/Unlink buttons in Tool and Docker Images of shared bundles

    • Fixed issue where project with large number of analyses loads slowly

  • Data Management

    • There will be a change in one of the upcoming releases where users are no longer able to edit connectors of other users through the API. This will be made consistent with the UI.

    • Fixed an issue where the sample list did not automatically refresh after deleting samples using the 'Delete input data and unlink other data' or 'Delete all data' options

    • Made display color of Bundle-related data more consistent

    • Removed on-click behavior off an added Cohorts dataset in a bundle that caused a yellow-bar warning

    • Fixed an issue preventing project creation with user managed storage when specifying a bucket name and prefix in the Storage Config without a subfolder

    • Fixed an issue where managing tags on data could result in a TransactionalException error, causing long load times and failed saves

    • When a project data download CLI command returned an error for a file, the command returned status 0, while it should have returned 1. This has now been fixed

    • Brought API in line with UI for detection of duplicate folder path already existing outside of your project

    • Fixed local version detection affecting automatic service connector upgrades

  • Flow

    • Improved error messaging when developing pipeline JSON based input forms

    • Fixed an issue where in some case Nextflow logs are too big but are still copied into the notification which causes the notification to fail. The log behind 'Show more' button is now truncated to a size which is accepted by SQS

    • Updated API behavior for JSON-based CWL and Nextflow pipelines to prevent unintended rounding of 'number' fields with values greater than 15 digits. Added a warning to advise users to pass such values as strings to maintain precision

    • Each analysis step attempt is now recorded as a separate entry, ensuring accurate billing and providing end users access to stdout/stderr logs for every retry

    • Fixed an issue when retries or duplicate step names caused improper entity_id identification

    • Clicking 'Open in Data' from analysis details now correctly redirects users to the file's parent folder in the project data view instead of the root

    • Refreshing the pipeline/workflow detail view now correctly updates the UI to reflect the latest version, ensuring any changes are displayed

  • Base

    • Fixed issue where columns filtering on a number in base activity producing an error

  • Bench

    • Improved protection against concurrent status changes when stopping workspaces

    • Added refresh button to Bench workspaces

    • Improved error handling when special characters are added to the storage size of bench workspaces

    • Made the behavior when running and stopping workspaces more consistent

    • Fixed an issue where the UI did not refresh automatically during long workspace initialization times, causing the workspace status to remain outdated until manually refreshed

2025 March 13 - ICA v2.33.2

Fixed Issues

  • Cohorts

    • Fixed an issue where users could not search the hierarchical disease concepts because of incorrect URL in the UI configuration.

2025 February 26 - ICA v2.33.0

New Features and Enhancements:

  • General

    • The current tab (e.g. analysis details, analysis steps, pipeline details, pipeline XML config, ...) is now saved in the URL, making the back button bring the user back to the tab they were in

    • Users are now able to go from the analysis data details to their location in the project data view

    • Toggling the availability of the "Acceptance list" tab in the legal view by a tenant admin used to be possible in the "Restrictions of monitoring" tab when editing the bundle. It has been moved to the legal tab

  • Data Management

    • New data formats available:

      • TRANSCRIPT: *.quant.sf, *.quant.sf.gz

      • GENE: *.quant.genes.sf, *.quant.genes.sf.gz

      • JSON.gz are now recognized as JSON format

    • New endpoint to create files POST /api​/projects​/{projectId}​/data:createFile

    • Endpoint POST ​/api​/projects​/{projectId}/data has been deprecated

    • The endpoint GET​/api​/projects​/{projectId}​/data​/{dataId}​/children now has more filters for more granular filtering

    • Users are now able to filter based on the owning project ID for the endpoint GET​/api​/projects​/{projectId}​/data

    • The links section in Bundle details and pipeline details now has proper URL validation and both fields are now required when adding links. In the case of editing an older links section of a bundle/pipeline, the user won't be able to save until the section is corrected

  • Flow

    • The cost of a single analysis is now exposed on its details page

    • Users can now abort analysis while being in the analysis detail view

    • '.command.*' files from Nextflow WorkDir are now copied into ICA logs

  • Base

    • Expanded the lifespan of Base OAuth token to 12h

  • Bench

    • Removed display of the current user using a bench workspace

  • Experimental Features

    • Streamable inputs for JSON-based input forms. Adding "streamable":true to an input field of type "data" makes it a streamable input.

Fixed Issues:

  • General

    • Fixed an issue which would overgenerate event calls when an analysis would run into diskfull alert

    • Improved API error handling so that being unable to reach ICA storage will now result in error code 500 instead of error code 400

    • Added a full name field for users in various grids (Bench activity, Bundle share, ...) to replace the separate first and last name fields

    • In the event log, the event ICA_EXEC_028 is now shown as INFO, it was before displayed as ERROR which was not correct

  • Data Management

    • Fixed an issue which would result in a null-pointer error when trying to open the details of a folder which was in the process of being deleted

    • Fixed an issue with bundle invites, now making it clear that you can only re-invite someone to that bundle if the previously rejected invites are removed

    • Various improvements to hardening data linking

    • Fixed an issue where the folder copy job would throw an Access Denied when copying file with _ in path

    • Fixed an issue that would produce a false error notification when changing the format from the data details window

    • Fixed an issue where an out of order event for Folder Deleting and Deleted would occur in rare scenarios

    • Fixed an issue regarding path too long error for Folder copy/Move operations for Managed bucket src and destination

  • Flow

    • Improved API file handling to better handle post processing when downloading the results from a successful analysis which could previously result in failed analysis being reported as result

    • Fixed an issue which resulted in a null-pointer error when starting an XML based CWL pipeline with an input.json

    • Fixed an issue which caused user references with slashes to prevent errors in failed runs from being displayed

    • Fixed an issue where the value 0 was not accepted in pipeline's inputForm.json for fields of type number

    • Fixed an issue where users could not retrieve pipeline_runner.0 logs file while a pipeline is running

    • List fields in filter options are now saved if closing and reopening the filter panel

    • Fixed an issue where the start time of an analysis's step would be intermittently reported wrongly

    • Fixed an issue where retrieving outputs of analysis through API was not consistent between analysis with advanced output-mapping or without

    • Improvements to the handling of large file uploads to prevent token expiry from blocking uploads

  • Base

    • Fixed an issue where shared database would not be visible in project Base, this was fixed in the newer version of Snowflake 9.3

  • Bench

    • Removed the rollback failed operations function on docker images as it had little to no benefit for end-users and frequently caused confusion

    • Fixed issue where users without proper permissions could create a workspace

  • Cohorts

    • Fixed issues where users doing large scale inputs of data received timeouts from the ICA API for file retrieval

    • Fixed issue with large OMOP data sets causing out of memory issues on input

    • Fixed issue where the 'Search Attributes' box in the 'Create Cohort' was not scrolling after typing a partial string.

    • Fixed issue with line-up of the exon values under exon track.

    • Fixed issue where subject attribute search box overlapped with other items when web browser zoom used.

    • Fixed issue where single subject view displayed concept codes and now shows concept names for diseases, drugs, procedures, and measurements.

2025 February 13 - ICA v2.32.2

Fixed Issues

  • Flow

    • Added retries for analysis process infrastructure provisioning to mitigate intermittent (~1%) CWL analysis failures. This impacts analysis steps failing with error "OCI runtime create failed" in logs.

2025 January 29 - ICA v2.32.0

Features and Enhancements

  • General

    • The End User License Agreement has been updated

    • New API endpoints for Docker Images management:

      • GET /api/dockerImages

      • GET /api/dockerImages/{imageId}

      • POST /api/dockerImages:createExternal

      • POST /api/dockerImages:createInternal

      • POST /api/dockerImages/{imageId}:addRegions

      • POST /api/dockerImages/{imageId}:removeRegions

    • Split up CWL endpoint (POST/api/projects/{projectId}/analysis:cwl) in 2:

      • CWL analysis with a JSON input (POST /api/projects/{projectId}/analysis:cwlWithJsonInput)

      • CWL analysis with a structured input (POST /api/projects/{projectId}/analysis:cwlWithStructuredInput)

  • Data Management

    • Next to using the Teams page to invite other tenants to use your assets, a dedicated bundle-sharing feature is now available. This allows you to share assets while also shielding sensitive information from other users, such as who has access to these assets

    • Improved visibility on ongoing data actions move and copy on the UI

    • Users can now add/remove bundles in an externally managed project. It will not be possible to link a restricted Bundle to a project containing read-only, externally managed data

  • Flow

    • JSON based input form now has a built-in check to make sure a tree does not have any cyclical dependencies

    • Added commands for creation and start of CWL JSON pipelines in the CLI

    • Users can now input external data into JSON based input forms from the API

  • Bench

    • Bench workspaces can be used in externally managed project

  • Cohorts

    • Users can now filter needles by customizable PrimateAI Score thresholds, affecting both plot and table variants, with persistence across gene views

    • The 'Single Subject View' now displays a summary of measurements (without values), with a link to the 'Timeline View' for detailed results under the section 'Measurements and Laboratory Values Available

Fixed Issues

  • General

    • Fixed an issue which caused authentication failures when using a direct link

    • Actions which are not allowed on externally-managed projects are now greyed-out instead of presenting an error when attempting to use them

    • Improved handling of regions for Docker images so that at least one region must remain. Previously, removing all regions would result in deleting the Docker image

    • Improved filtering out Docker images which are not relevant to the current user

    • Tertiary modules are no longer visible in externally-managed projects as they had no functional purpose there

    • Fixed an issue where adding public domain users to multiple collaborative workgroups would result in inconsistent instrument integration results

    • Added verification on the filter expressions of notification subscriptions

    • Fixed an issue where generating a cURL command with empty field values on the Swagger page resulted in invalid commands

    • Added information in the API swagger page that the GET ​/api​/projects​/{projectId}​/data endpoint can not retrieve the list of files from a linked folder. To get this list, use parentfolderid instead of parentfolderpath

    • For consistency reasons, UUID has been renamed to ID in the GUI

    • The bundle administrator will now see all data present in the bundle, including all versions with older versions in a different color

  • Data Management

    • Removed deprecated cloud connector from Activity/Data transfers option

    • Removed the erroneous 'Import' option from the direction filter which was present in Activity/Data transfers

    • Fixed an issue where entering multiple Download rules for a Service connector would result in not setting the correct unique sequence numbers

    • Improved the error message when erroneously trying to link data from an externally-managed subject to a sample. This is not allowed because data can only be linked to a single sample

    • Fixed an issue where filtering on file formats was not correctly applied when selecting files and folders for downloads with the service connector

    • Improved the download connector to fix Chrome compatibility issues

    • Fixed an issue where it was possible to update linked files if you had access to both the original file and the linked file

    • Fixed an issue where samples from externally-managed projects were not correctly linked to analyses

  • Flow

    • Fixed a JSON validation error when attempting to have more than one default value for a field configured as single value which would result in index out of bounds error

    • Fixed an issue where numerical values with a scientific exponent would not be correctly accepted

    • Improved the API error validation for usage of duplicate group id fields

    • Improved error handling when starting analysis via API with an incorrect DATA-ID in the request body

    • Improved handling of incorrect field types for JSON-based input forms

    • Improved error handling when trying to use externally-managed data as reference data

    • Removed the superfluous "save as" button from the create pipelines screen

    • Fixed an issue where refreshing the analysis page would result in an error when more than 1 log file was opened

    • Upon clicking "start run" to launch a pipeline, ICA now redirects to the "Runs" view

    • Fixed an issue where the minimum and maximum numbers of high values were incorrectly rounded for JSON input forms

    • Fixed an issue where the user could pass a value with the "values" parameter instead of "dataValues" for the data field type

    • Fixed an issue which caused the "dataValues" parameter to be valid for the textbox field type instead of "values"

    • Improved timeout handling for autolaunch workflow

    • Fixed auto-launched TSO500 pipelines using the StartsFromFastq=false setting to direct analysis outputs to the ilmn-analyses folder within the ICA project data

    • Added JSON validation to ensure only a single radio button can be selected at the same time as part of a radio-type list

    • Removed the Simulate button from the View mode pipeline detail screen

    • The proprietary option can now be set via the CLI on create pipeline commands

    • Added a validation to prevent pipeline input forms from having the same value for 'value' and 'parent'

  • Bench

    • Fixed an issue which caused bench workspaces to have incorrect last modified timestamps that were over 2000 years ago. They now will use the correct last updated timestamp

    • Adding or removing regions to bench images is now possible

    • Improvements to how workspaces handle deletion

  • Cohorts

    • Fixed issue where the error message for invalid disease IDs did not disappear after selecting the correct ontology, and filter chips were incorrectly created as 'UNDEFINED

    • Fixed issue where the search functionality in the ingestion file picker was not working correctly in production, causing a long delay and no files to display after entering a filename or folder name

    • Fixed issue where the Clinvar significance track was not resetting properly, causing the resized track and pointer to not return to the original position, with triangle data points displaying empty whitespace

    • Fixed issue where the 'PARTIAL' status for HPO filter chips was incorrectly removed when multiple chips were selected

    • Fixed issue where the pagination on the Variant List Needle Plot incorrectly displayed 741 items across 75 pages, causing a discrepancy with the actual number of displayed variants

    • The 'Search Attributes' box in the 'Create Cohort' page now properly scrolls and filters results when typing substrings, displaying matching results as the user types

    • Fixed issue where the search spinner continued loading after the search results were displayed in the Import Jobs table

    • Fixed issue where the 'stop_lost' consequence in Needleplot is corrected to 'Frameshift, Stop lost,' and the legend updated to 'Stop gained|lost.' The 'Stop gained' filter now excludes 'Stop lost' variants when the 'Display only variants shown in the plot above' toggle is on

    • Fixed issue where intermittent 500 error codes occurred on USE1 Prod when running Needleplot/VariantList queries with the full AGD/DAC dataset (e.g., LAMA1 gene query)

2024

2024 December 13 - ICA v2.31.2

Fixed Issues

  • When creating a new cohort, the disease filter’s tree hierarchy was not showing up, meaning it was not possible to add disease filter into the cohort definition. This has been resolved.

2024 December 12 - ICA v2.31.1

Fixed Issues

  • Flow

    • Fixed an issue which caused service degradation where analysis steps were not properly updated until analyses were finished, and log files only became available after completion.

2024 December 4 - ICA v2.31.0

Features and Enhancements

  • General

    • General usability improvements for the project overview screen

    • The timing for when jobs are deleted has been updated so that:

      • SUCCEEDED remains 7 days

      • FAILED and PARTIALLY_SUCCEEDED are increased to 90 days

  • Data Management

    • Data can now be uploaded into the BaseSpace-managed project

  • Flow

    • Analyses can now be started from the pipeline details screen

    • The analysis details now contain two additional tabs displaying timeline and execution reports for Nextflow analyses to aid in troubleshooting errors

    • Introduced a start command for starting a Nextflow pipeline with a JSON-based input form

    • Added new API endpoints to create a new CWL pipeline and start an analysis from a CWL pipeline with JSON-based input forms:

      • POST/api/projects/{projectId}/pipelines:createCwlJsonPipeline

      • POST/api/projects/{projectId}/analysis:cwlJson

    • Pipelines with JSON-based input forms can now pre-determine and validate storage sizes

    • Added support for tree structures in dropdown boxes on JSON-based input forms to simplify searching for specific values

    • Introduced a new filtering option on the analyses gid to enable filtering for values which differ from, or do not equal (!=), a given value (such as exit codes in the pipeline steps in the analysis details screen)

    • The analysis output folder format will now be user reference-analysis id

  • Cohorts

    • The side panel now displays the Boolean logic used for a query with ‘AND’, ‘OR’ notations

    • The needle plot visualization now drives the content of the variant list table below it. By default, the list displays variants in the visualization and can be toggled to display all variants with subsequent filtering

    • For diagnostic hierarchies, concept children count and descendant count for each disease name is displayed

    • The measurement/lab value can be removed when creating query criteria

Fixed Issues

  • General

    • Notification channels are not created at the tenant level and are not visible to members of external tenants working on the same project

  • Data Management

    • Fixed an issue where move jobs fail when the destination is set to the user’s S3 bucket where the root of the bucket mapped to ICA as storage configuration and volume

    • Fixed a data synchronization issue when restoring an already restored object from a project configured with S3 storage

  • Flow

    • Corrected the status of deleted Docker images from incorrect ‘Available’ to ‘Deleted’

    • The reference for an analysis has changed to userReference-UUID, where the UUID matches the ID from the analysis. (The previous format was userReference-pipelineCode-UUID.)

    • Pipeline files are limited to a file size of 20 Megabytes

  • Bench

    • Fixed an issue which caused ‘ICA_PROJECT_UUID not found in workspaceSettings.EnvironmentVariables’ when creating a new Workspace

  • Cohorts

    • Fixed an issue where the system displays ALL/partial filter chips when the top level tree node is selected in a hierarchical search

    • Fixed an issue where the system displays 400 bad request error despite valid input of metadata files during import jobs

    • Fixed an issue where the system displays inconsistent hierarchical disease filter results

    • Fixed an issue where the system changes the layout when displaying the p-value column

    • Fixed an issue where the system disables the next blutton when there is no study available in the dropdown menu

    • Fixed an issue where studies could not be selected when a project has one study to ingest data into

2024 October 31 - ICA v2.30.1

Fixed Issues

  • Mitigated an issue causing intermittent system authentication request failures. Impact includes analysis failures with "createFolderSessionForbidden" error

2024 October 30 - ICA v2.30.0

Features and Enhancements

  • General

    • The projectdata upload CLI command will from now on give you the credentials to access the data

  • Data Management

    • Introduced a limit to the number of data elements that can be put in POST ​/api​/projects​/{projectId}​/dataUpdateBatch to 100.000 entries

  • Flow

    • Users can now access json-based pipeline input forms for both Nextflow and CWL pipelines. API access is not yet available for CWL pipelines

    • Added GPU compute types (gpu-small, gpu-medium) for use in workflows

    • Users can now sort analyses by request date instead of start date, which was not always available

    • The analysis details page has been upgraded with the following features:

      • The progress bar which could be found on the analyses overview page will now also appear in the details page

      • A maximum of 5 rows of output are shown for each output parameter, but the output can be displayed in a large popup to have a better overview

      • Orchestrated analyses are shown in a separate tab

  • Cohorts

    • Users can now use the Measurement concept API to create cohorts based on lab measurement data and harmonize their values to perform downstream analysis

    • Users can now access the Hierarchical concept search API to view the phenotype ontologies

Fixed Issues

  • General

    • The mail option is now automatically filtered out for those events that do not support it

    • Fixed an issue where there was no email sent after rerunning a workflow session

    • Fixed an issue which caused authentication failures when using a direct link

    • Made file and folder selection more consistent

    • Fixed an issue with the CLI where using the “projectsamples get” command to retrieve a sample shared via an entitled bundle in another tenant failed

    • Fixed filtering so you can only see subscriptions and channels from your own tenant

    • Improved GUI handling for smaller display sizes

    • Fixed the workflow session user reference and output folder naming to use BaseSpace Experiment Name when available

  • Data Management

    • The unlink action is now greyed out if data not linked is in the selection

    • Fixed an issue that when deleting folders, the parent folders were deleted first, giving the impression that the parent folder is deleted but not the subfolders and files

    • Fixed an issue where the connector downloads only downloaded the main folder, not the folder contents

    • For consistency, it is no longer possible to link to folders or files from within subfolders. Previously, you could link, but the files and folders are always linked to top level instead of the subfolder from where linking was done

    • Updated error handling for dataUpdateBatch API endpoint

    • Moving small files (>8Mb) will not trigger a "moving" event, only that the move has completed as out-of-order events caused issues and moving small files happens fast enough to not need the status of being moved, only the completion of the move

    • Improved error handling when encountering issues during cancellation of data copy/move

    • Improved error message when trying to unlink data from a project via the API when this data is native to that project and not linked

    • Fixed issue where analysis can proceed to download input data when any of the inputs are in status other than AVAILABLE, including records within folder data inputs

  • Flow

    • Redesigned UI component to prevent issues with Analysis summary display

    • Fixed an issue where the field content was not set to empty when the field input forms have changed between the original analysis and a rerun

    • Replaced retry exhaustion message, "External Service failed to process after max retries 503 Unique Reference ID: 1234" with a more useful message to end users that advises them to contact Illumina support: "Attempt to launch ICA workflow session failed after max retries. Please contact Illumina Tech Support for assistance. Unique Reference ID: 1234". This does not replace more specific error messages that provide corrective advice to the user, such as "projectId cannot be null"

    • For efficiency reasons, pipeline files are limited to a file size of 100 Megabytes

  • Bench

    • Fixed an issue which caused .bash_profile to no longer source .bashrc

    • Fixed the status of deleted docker images which previously were displayed as available

    • After creating a tool, the Information tab and Create Tool are now no longer accessible to prevent erroneous selection

  • Cohorts

    • Fixed layout issue where buttons were moved up when the user selected the option

    • Fixed issue where user was not able to view PheWas plot when multiple cohorts are open and same gene is searched

    • Fixed issue where the user was not able to view GWAS plot when multiple cohorts are open and user switched forth and back between cohorts

    • Fixed issue where users were not able to see the cryogenic map in the gene summary page for gene associated with the chromosome

2024 September 27 - ICA v2.29.1

Fixed Issues

  • General

    • Fixed an issue where various Data Transfer Ownership API calls were failing with a 'countryView' constraint violation error

2024 September 25 - ICA v2.29.0

Features and Enhancements

  • General

    • Dynamically linked folders and files now have their own icon type, which is a folder/file symbol with a link symbol consisting of three connected circles

  • Data Management

    • With the move from static to dynamic data linking, unlinking data is now only possible from the project top level to prevent inconsistencies

    • The user can now manually, dynamically link a folder

    • The icav2 project data mount command now supports the “--allow-other” option to allow other users access to the data

    • The user can now set a time to be archived or deleted for files and folders with the “timeToBeArchived” and “timeToBeDeleted” parameter on the “POST/api/projects/{projectId}/dataUpdateBatch” command

    • Added 4 new API endpoints which combine the create and retrieve upload information

  • Flow

    • The default Nextflow version is now 22.04.03, from 20.10.0

    • The user can now specify the Nextflow version when deploying a pipeline via the CLI with the “--nextflow-version” flag

  • Bench

    • The user now has the option to choose either a tool or bench as a docker image when adding new docker images

    • It is now possible to open contents of a Bench workspace in a new tab from the Bench details tab > access section

Fixed Issues

  • General

    • Improved handling of API calls with an invalid or expired JWT or API token

  • Data Management

    • Renamed the "New storage credential" button to "Create storage credential"

    • Removed the "Edit storage credential" button. The user can now edit the column directly in the open dialog when clicking on the name

    • Performance improvements to scheduled data download

    • Fixed an issue where data records were shown more than once when updating the tags\

    • The data details were erroneously labeled with "size in bytes" while the size was in a variable unit

    • Fixed an issue where trying to download files could result in the error "Href must not be null" when the file was not available

    • Fixed an issue where existing data catalog views would return an empty screen caused by a mismatch in role naming

  • Flow

    • Fixed an issue that caused opening a pipeline in the read-only view to incorrectly detect there were unsaved changes to the pipeline

    • Fixed an issue when having different pipeline bundles with the same (name) resource models would result in duplicate listing of these resources

    • Improved error handling when encountering output folder creation failure, which previously could result in analysis being stuck in REQUESTED status

      • By default Nextflow will no longer generate the trace report. If you want to enable generating the report, add the section below to your userNextflow.config file:

        trace.enabled = true

        trace.file = '.ica/user/trace-report.txt'

        trace.fields = 'task_id,hash,native_id,process,tag,name,status,exit,module,container,cpus,time,disk,memory,attempt,submit,start,complete,duration,realtime,queue,%cpu,%mem,rss,vmem,peak_rss,peak_vmem,rchar,wchar,syscr,syscw,read_bytes,write_bytes,vol_ctxt,inv_ctxt,env,workdir,script,scratch,error_action'

    • Fixed the issue where users not allowed to run or rerun workflows could start them from the API or BaseSpace SequenceHub. Now, users that cannot start workflows cannot rerun them.

  • Cohorts

    • Fixed issue where the users can select the lower needles despite the overlap of multiple needles at the same location in the needle plot

    • Fixed issue where the user would not be able to view the cryogenic map in the gene summary page for gene associated with the chromosome

    • Fixed issue where the user would not be able to view the PheWas plot when multiple cohorts are open and same gene is searched

2024 September 5 - ICA v2.28.1

Features and Enhancements

  • Data Management

    • Improved performance of data linking jobs

Fixed Issues

  • General

    • Fixed an issue causing slow API responses and 500 errors

2024 August 28 - ICA v2.28.0

Features and Enhancements

  • General

    • The CLI readme file will now additionally contain the CLI build number

  • Data Management

    • Fixed an issue where there was a discrepancy between the Run Input tags shown to the user and what was stored on the data

    • Added a 25,000-item limit to the v3 endpoint for batch data linking. Using the v4 endpoint, which does not have this limitation, is recommended

  • Flow

    • Analyses and workflow sessions can now be resubmitted, and parameters can be updated upon resubmission

    • Changed the default image used for CWL pipeline processes with undefined image from docker.io/bash:5 to public.ecr.aws/docker/library/bash:5

    • Updated the choice of default nextflow docker image which is used when no docker image is defined. It is now public.ecr.aws/lts/ubuntu:22.04_stable

    • The analysis logs in the analysis details page can be refreshed

    • The user is now able to write a pipeline which executes files located in the /data/bin folder of the runtime environment

    • Pipeline files are now shown in a tree structure for easier overview

  • Cohorts

    • Updated GWAS UK Biobank data base gives users access to more phenotype information

    • Users can now incrementally ingest their molecular data for germline, CNV, structural variants, and somatic mutation data

Fixed Issues

  • General

    • Added an "All" option to the workgroup selection box in the projects view to reset the filter, which previously required you to delete all characters from the filter

    • Fixed an issue where updating two base permissions at the same time would sometimes not execute correctly

    • Fixed an issue where creating grid filters could result in a nullpointer error

    • Fixed an issue where 'Copy to Clipboard' button did not work anymore

    • After searching for a folder in the search box and going into that folder, the search box is now cleared

    • Improved the project permissions API to correctly handle empty values

    • Previously, when attempting to save and send a message from the Websolutions section without a unique subject, the system would report an error and still send the message. Now the non-unique message subject error is reported and no message is sent

    • Fixed an issue where linking samples in the sample screen would result in receiving the same "sample(s) linked" message twice

    • Improved error handling for CLI FUSE driver

    • Hardened log streaming for ongoing runs to better handle network issues which previously would result in missing log streaming

    • Add retries for "connected reset by peer" network-related errors during analysis upload tasks

    • Fixed an issue where inviting a user to collaborate on a project containing base would result in the error "entity not managed" if that user did not have base enabled in any project or if base was not enabled in the project tenant

    • Data Management

    • Fixed an issue where data could be moved to a restricted location called /analyses/ and no longer be visible after the move. Please contact Illumina Support with your data move job information to recover your data if you have encountered this issue

    • Fixed an issue where sorting on data format did not work correctly

    • Copying empty folders no longer results in a partially copied status

    • ICA now performs an automatic refresh after unlinking or deleting a sample

    • Improved handling of file path collisions when handling linked projects during data copy / move

    • Fixed an issue where, even though uploading a file in a linked folder is not permitted, this would erroneously present a success message without copying the file

    • Analysis-events which are too large for SQS (256KB) are now truncated at the first 1000 characters when using SQS

    • Improved error handling when trying to upload files which no longer exist

    • Fix system degradation under load by introducing rate limit for spawning tasks for a given analysis to 25 per 1 min

    • The createUploadUrl endpoint can now be used to upload a file directly from the region where it is located. The user can create both new files and overwrite files in status "partial"

    • Improved the project data list command with wildcard support. For example:

      • / or /* will return the contents of the root

      • /folder/ will return the folder

      • /folder/* will return the contents of the folder

    • To optimize performance, a limit has been set to prevent concurrent uploading of more than 100 files

    • Fixed an issue where folder syncing functionality would sometimes result in “Unhandled exception in scheduling worker”

  • Flow

    • Fixed an issue where writing a pipeline which executes files in the /data/bin folder wasn't functioning properly with larger storage classes

    • Nextflow pipelines no longer require pipeline inputs when starting them via the CLI

    • Improved error handling when using an unsupported data format in the XML input forms during pipeline creation

    • Fixed the issue where it was not possible to add links in the detail page for pipelines and bundles

    • Sorting is no longer supported on duration and average duration columns for analysis grids

    • In situations where the user would previously get the error "zero choices with weight >= 1" after the first attempt, additional retries will execute to prevent this from occurring

  • Cohorts

    • Fixed an issue resulting in a blank error when a cohort with hundreds of diagnostic concepts was created

2024 July 10 - ICA v2.27.1

Features and Enhancements

  • Flow

    • Improved analysis queue times during periods of limited compute resource availability

2024 June 25 - ICA v2.27.0

Features and Enhancements

  • General

    • New notification to the user when a copy job finishes running

    • Updated the "GET analysis storage" API endpoint to account for the billing mode of the project. If the billing mode of the project is set to tenant, then the analysis storage of the user's tenant will be returned. If the billing mode of the project is set to project, then the analysis storage of the project's owner tenant will be returned

    • A ReadMe file containing used libraries and licenses is now available for ICA CLI

  • Data Management

    • New DataFormats YAML (.yaml, .cwl), JAVASCRIPT (.js, .javascript), GROOVY (.groovy, .nf), DIFF (.diff), SH (.sh), SQL (.sql) to determine the used syntax highlighting when displaying/editing new pipeline files in the UI

    • ICAv2 CLI supports moving data both between and within projects

    • Added an alert to notify users when data sharing or data move is disabled for the project

    • A new version of the Event Log endpoint has been developed to support paging, retrieval of previous events, and resolution of inconsistencies in date formats. This new endpoint introduces the EventLogListV4 data transfer object

    • The user is now able to select a single file and download it from the browser directly. This does not apply for folders and multiple files selected at once

    • User can subscribe to notifications when data is unarchived

    • The BaseSpace Sequencing Run Experiment name will now be added to the technical tags when a workflow session is launched

  • Flow

    • Fastqs with the .ora extension are now supported when staging these for secondary analysis, either as a list of fastqs or as fastq_list_s3.csv files

    • Before, users had to click on the pipeline on the pipeline overview screen to start a new analysis. Now, you will enter the pipeline in edit mode when you click on the pipeline name. If you want to select a pipeline to start an analysis, you need to check the checkbox

Fixed Issues

  • General

    • Removed the refresh button from the workspace detail view as it was superfluous

    • Fixed an issue where searching for certain characters in the search field of the Projects or Data overviews screen would result in an indecipherable error

    • Improved security handling around tenant admin-level users in the context of data move

  • Data Management

    • Fixed a bug so folders copied from another previously copied folder no longer results in a corrupted file

    • Fixed an issue where creating a new bundle would result in an error if a project with the same name already exists

    • Data move between projects from different tenants is now supported

    • Fixed an issue where not selecting files before using the copy or move commands would result in EmptyDataId errors

    • For the CLI, Improved notifications when files can not be downloaded correctly

    • Fixed an issue where scheduled downloads of linked data would fail without warning

    • Corrected an issue where the tenant billing mode would be erroneously set to Illumina after a data copy

    • Fixed an issue where BatchCopy on linked data did not work

  • Flow

    • Resolved an issue to ensure that when a user creates a pipeline using a docker image shared from an entitled bundle, their analyses utilizing that pipeline can pull the docker image without errors

    • Removed superfluous options from the analysis status filter

    • Awaiting input

    • Pending request

    • Awaiting previous unit

    • Fixed an issue where writing a pipeline which executes files in the /data/bin folder wasn't functioning properly with larger storage classes

    • Fixed an issue where many-step analyses are getting stuck in "In Progress" status

    • Fixed an issue where the wrapper scripts when running a CommandLineTool in CWL would return a warning

    • Fixed the issue which caused the "Save as" option not to work when saving pipelines

  • Base

    • Fixed an issue where the ICA reference fields in the schema definition had the wrong casing. As a result of this update you might end up with 2 different versions of the reference data (one with keys written with an uppercase letter at the start, another one with keys written entirely in lowercase letters). To fix this:

    • Update your queries and use the Snowflake function : GET_IGNORE_CASE (ex: select GET_IGNORE_CASE( to_object(ica) , 'data_name' ) from testtableref)

    • Update the 'old' field names to the new ones (ex: update testtableref_orig set ica = object_delete(object_insert(ica, 'data_name', ica:Data_name), 'Data_name'))

    • Fixed an issue where using an expression to filter the "Base Job Success" event is not working

2024 June 6 - ICA v2.26.1

Fixed Issues

  • Flow

    • Resolved an issue to ensure that when a user creates a pipeline using a docker image shared from an entitled bundle, their analyses utilizing that pipeline can pull the docker image without errors.

2024 June 5 - ICA v2.26.0

Features and Enhancements

  • General

    • The left side navigation bar will collapse by default for screen smaller than 800 pixels. The user can expand it by hovering over it

    • The browser URL may be copied to share analyses, pipelines, samples, tools, workspaces and data in various contexts (project, bundle)

  • Data Management

    • Users are now able to move data within and across projects:

      • The user can:

        • Move available data

        • Move up to 1000 files and/or folders in 1 move operation

        • Retain links to entities (sample, sequencing run, etc.) and other meta-data (tags, app-info) when moving

        • Move data within a project if the user is a contributor

        • Move data across projects if (1) in the source project the user has download rights, has at least contributor rights, and data sharing is enabled, and (2) the user has upload rights and at least viewer rights in the target project

        • Move data across projects with different types of storage configurations (user-defined or default ICA-managed storage)

        • Select and move data to the folder they are currently in through the graphical UI

        • Select and move data in a destination project and/or folder through the API

      • The user cannot:

        • Move linked data. Only the source data can be moved

        • Move data to linked data. Can only move data to the source data location

        • Move data to a folder that is in the process of being moved

        • Move data which is in the first level of the destination folder

        • Move data to a destination folder which would create a naming conflict such as a file name duplicate

        • Move data across regions

    • New Event Log entries are provided when a user links (ICA_BASE_100) or unlinks (ICA_BASE_101) a Cohorts data set to a bundle

    • Added support for the following data formats: ora, adat, zarr, tiff and wsi

  • Flow

    • New compute types (Transfer Small, Transfer Medium, Transfer Large) are supported and can be used in upload and download tasks to significantly reduce overall analysis runtimes (and overall cost)

    • API: All the endpoints containing pipeline information now contain the status from the pipeline(s) as well

  • Bench

    • External Docker images will no longer display a status as they consistently showed 'Available,' even when the URL is not functional

  • Cohorts

    • Performance improvements to needle plot by refactoring its API endpoint to return only sample IDs

    • Users now click a cancel button that returns them to the landing page

    • Users can now perform time series analysis for a single patient view

    • Refresh of PrimateAI data now drives data in variant tables

    • Users can now access the structural variant tab in the Marker frequency section

Fixed Issues

  • General

    • Fixed an issue where, when a user is added to or removed from a workgroup, they could be stuck on an infinite redirect loop when logging in

    • Fixed syncing discrepancy issues about deleted files in user-managed storage projects with Lifecycle rules & Versioning

  • Data Access & Management

    • Sorting API responses for the endpoint GET /api/jobs is possible on the following criteria: timeCreated, timeStarted and timeFinished

    • Improved the error message when trying to link a bundle which is in a different region than the project

    • More documentation has been added to the GET /eventLog regarding the order of rows to fetch

    • Fixed an issue where the API call - POST api/projects/{projectId}/permissions would return an error when DATA_PROVIDER was set for roleProject

    • Fixed an issue stemming from attempts to copy files from the same source to the same destination, which incorrectly updated file statuses to Partial

    • CLI: Fixed an issue where the environment variable ICAV2_X_API_KEY did not work

  • Flow

    • The analysis is no longer started from the API if error 400 ( 'Content-Type' and 'Accept' do not match) occurs

  • Base

    • Fixed an issue where the Base schedule would not run automatically in some cases when files are present in the schedule

  • Bench

    • Improved error handling when trying to create a tool with insufficient permissions

    • Fixed an issue where the user is unable to download docker-image with adhoc-subscription

    • The "version":"string" field is now included in the API response GET /api/referenceSets. If no version is specified, the field is set to "Not Specified"

    • Fixed an issue where, under some conditions, fetching a job by id would throw an error if the job was in pending status

2024 April 24 - ICA v2.25.0

Features and Enhancements

  • Data Management

    • The GUI now has a limit of 100 characters for the name and 2048 characters for the URL for links in pipelines and bundles

    • Added a link to create a new connector if needed when scheduling a data download

    • Improved the data view with additional filtering in the side panel

  • Flow

    • New CLI environment variable ICA_NO_RETRY_RATE_LIMITING allows users to disable the retry mechanism. When it is set to "1”, no retries are performed. For any other value, http code 429 will result in 4 retry attempts after 0.5, 2, 10, and 30 seconds * Code-based pipelines will alphabetically order additional files next to the main.nf or workflow.cwl file

    • When the Compute Type is unspecified, it will be determined automatically based on CPU and Memory values using a "best fit" strategy to meet the minimum specified requirements

  • Bench

    • Paths can be whitelisted to allowed URLs on restricted settings

Fixed Issues

  • General

    • Fixed an issue where the online help button does not work upon clicking on it

  • Data Access & Management

    • Improved automatic resource cleanup when hiding a project

    • Fixed an issue with the service connector where leading blanks in the path of an upload/download rule would result in errors. It is no longer possible to define rules with leading or trailing blanks

    • Fixed an issue where a folder copy job fails if the source folder doesn't have metadata set

    • Linking data to sample has been made consistent between API and GUI

    • Improved resource handling when uploading large amounts of files via the GUI

    • Fixed an issue where the API endpoint to retrieve input parameters for a project pipeline linked to a bundle would fail when the user is not entitled on the bundle

    • Fixed an issue where deleting and adding a bundle to a project in one action does not work

  • Flow

    • The event sending protocol was rewritten to limit prematurely exhausting event retries and potentially leaving workflows stuck when experiencing high server loads or outages

    • Fixed an issue where specifying the minimum number of CPUs using coresMin in a CWL workflow would always result in the allocation of a standard-small instance, regardless of the coresMin value specified

    • Fixed an issue in the API endpoint to create a Nextflow analysis where tags were incorrectly marked as mandatory inputs in the request body

    • Fixed an issue with intermittent failures following completion of a workflow session

  • Base

    • Improved syntax highlighting in Base queries by making the different colors more distinguishable

  • Bench

    • Fixed an issue where the Bench workspace disk size cannot be adjusted when the workspace is stopped. Now, the adjusted size is reflected when the workspace is resumed

    • Fixed an issue where regions were not populating correctly for Docker images

    • Fixed an issue where API keys do not get cleaned up after failed workspace starts, leading to unusable workspaces once the API key limit is reached

2024 April 15 - ICA v2.24.2

Features and Enhancements

  • Cohorts

    • Users can now query variant lists with a large number of associated phenotypes

    • Users can now perform multiple concurrent data import jobs

Fixed Issues:

  • Cohorts

    • Fixed an issue with displaying shared views when refreshing a Bundle’s shared database in Base

2024 April 4 - ICA v2.24.1

Fixed Issues

  • Fixed an issue where autolaunch is broken for any users utilizing run and samplesheet inputs stored in BSSH and operating in a personal context, rather than a workgroup.

2024 March 27 - ICA v2.24.0

Features and Enhancements

  • Data Management

    • Data (files and folders) may be copied from one folder to another within the same Project

    • The empty ‘URN’ field in the Project details at Project creation is now removed

    • The ‘Linked Bundles’ area in the Project details at Project creation is now removed as you are only allowed to link Bundles after Project creation

    • The card or grid view selected will become the default view when navigating back to the Projects or Bundles views

    • Added a new API endpoints to retrieve and accept the Terms & Conditions of an entitled bundle:

      • /api/entitledbundles{entitledBundleId}/termsOfUse

      • /api/entitledbundles/{entitledBundleId}/termsOfUse/userAcceptance/currentUser

      • /api/entitledbundles/{entitledBundleId}/termsOfUse:accept

  • Flow

    • Added a new API endpoint to retrieve orchestrated analyses of a workflow session

      • GET /api/projects/{ProjectID}/workflowSessoins/{WorkflowSessionID}/analyses

    • Code-based pipelines will alphabetically order additional files next to the main.nf or workflow.cwl file

  • Bench

    • New JupyterLab - 1.0.19 image published for Bench using the Ubuntu 22.04 base image

    • Resources have been expanded to include more options for compute families when configuring a workspace. See ICA help documentation for more details

  • Cohorts

    • Sample count for an individual cohort may be viewed in the variants table

    • Filter the variants list table through the filter setting in the needle plot

    • Execute concurrent jobs from a single tenant

    • Improved the display of error and warning messages for import jobs

    • Structural variant tab may be accessed from the Marker frequency section

Fixed Issues

  • Data Access & Management

    • Bundles now reflect the correct status when they are released instead of the draft status

    • Double clicking a file opens the data details popup only once instead of multiple times

    • Improved performance to prevent timeouts during list generation which resulted in Error 500

    • The counter is now accurately updated when selecting refresh in the Projects view

    • Fixed an issue resulting in one job to succeed and one to fail when running two or more file copy jobs at the same time to copy files from same project to same destination folder

    • Fixed an issue resulting in an error in a sample when linking nested files with the same name

    • Added a new column to the Source Data tab of the Table view which indicates the upload status of the source data

    • Removed the unused ‘storage-bundle’ field from the Data details window

    • Fixed an issue where the Project menu does not update when navigating into a Project in Chrome browsers

    • (CLI) Fixed an issue where deleting a file/folder via path would result in an error on Windows CLI

  • Base

    • Improved schedule handling to prevent an issue where some files were not correctly picked up by the scheduler in exceptional circumstances

    • Fixed an issue where an incorrect owning tenant is set on a schedule when running it before saving

    • The number of returned results which is displayed on the scheduler when trying to load files now reflects the total number of files instead of the maximum number of files which could be displayed per page

    • Fixed an issue where Null Pointer Exception is observed when deleting Base within a Project

  • Bench

    • Fixed an issue where users were unable to delete their own Bench image(s) from the docker repository

  • Cohorts

    • Fixed an issue where the value in the tumor_or_normal field, in the phenotype table in database, would not set properly for germline and somatic mutation data

    • Fixed an issue where large genes with subjects containing large sets of diagnostic concepts caused a 503 error

2024 March 7 - ICA v2.23.1

Fixed Issues

  • Fixed an issue where automated analysis after sequencing run in non-US regions may fail for certain analysis configurations

2024 February 28 - ICA v2.23.0

Features and Enhancements

  • Data Management

    • The --exclude-source-path flag has been added to the ‘project data download’ command so that subfolders can be downloaded to the target path without including the parent path

    • The system automatically re-validates storage credentials updated in the graphical UI

    • Added a new API endpoint to validate storage configurations after credentials are changed: /api/storageConfigurations/{storageConfigurationId}:validate

  • Notifications

    • Added support for multi-version notification event payloads corresponding to versioned API response models

  • Flow

    • (API) Improved the analysis-dto by adding a new POST search endpoint as a replacement for the search analysis GET endpoint. The GET endpoint will keep working but we advise using the new POST endpoint.

    • Improved analysis statuses to reflect the actual status more accurately

    • Parallelized analysis input data downloads and output data uploads to reduce overall analysis time

    • No scratch size is allocated if tmpdirMin is not specified

  • Cohorts

    • Performance improvements of the ingestion pipeline

    • Performance improvements to subject list retrieval

    • Increased the character limit of ingestion log messages to the user

Fixed Issues

  • Data Access & Management

    • Fixed an issue where the target user cannot see analysis outputs after a successful transfer of analysis ownership in BaseSpace Sequence Hub

    • Update the API Swagger documentation to include paging information for: /projects/{projectId}/samples/{sampleId}/data

    • Fixed an issue resulting in errors when creating a new bundle version

    • Fixed an issue where the GET API call with the ‘Sort’ parameter returns an error when multiple values are separated by commas followed by a space

    • Fixed an issue where adding the –eligible-link flag to the ‘projectdata list’ API endpoint caused other flags to not work correctly

    • Added cursor-based pagination for the ‘projectdata list’ API endpoint

    • Fixed an issue with the entitled bundles cards view where the region is cut off when the Status is not present

    • Fixed an issue where bundle filtering on categories did not work as expected

    • Fixed an issue where file copy across tenants did not work as expected

    • Added a cross-account permission check so that file copy jobs fail when the cross-account set up is missing instead of being retried indefinitely

    • Fixed an issue where ‘Get Projects’ API endpoint returns an error when too many projects are in the tenant

    • Fixed an issue where the UpdateProject API call (PUT /api/projects/{projectId}) returns an error when technical tags are removed from the request

    • Fixed an issue where users need to confirm they want to cancel an action multiple times when clicking the back button in the graphical UI

    • Fixed an issue where clicking into a new version of a bundle from the details view does not open the new version, and instead directs to the bundle card view

  • Flow

    • Fixed an issue where the analysis logs are returned in the analysis screen “outputs” section and included in the getAnalysisOutputs API response. The log output is no longer considered as part of the analysis outputs

    • Analysis history screen has been removed

    • Fixed an issue resulting in inability to retrieve pipeline files via the API when the pipeline is shared cross-tenant

    • Fixed an issue where the API endpoint to retrieve files for a project pipeline would not return all files for pipelines created via CLI or API

    • Fixed an issue where the API does not check the proprietary flag of a pipeline before retrieving or downloading the pipeline files

  • Base

    • The ‘Download’ button is available to download Base activity data locally (and replaces the non-functional ‘Export’ button for restricted bundles)

    • Fixed an issue resulting in missing ICA reference fields in table records if the file was loaded into the table with no metadata

    • Improved consistency of the references included in the scheduler

  • Bench

    • Users are now logged out from a terminal window opened in a workspace after a period of inactivity

    • Fixed an issue where permissions could not be enabled after a workspaces has been created

    • Fixed an issue where a Contributor could not start/stop a workspace

  • Cohorts

    • Fixed an issue where large genes with subjects with large sets of diagnostic concepts cause a 503 error

    • Fixed an issue where the value in tumor_or_normal field in the phenotype table in the database is not set properly for germline and somatic mutation data

    • Resolved a discrepancy between the number of samples reported when hovering over the needle plot and the variant list

2024 January 31 - ICA v2.22.0

Features and Enhancements

  • General

  • Data Management

    • Users are now able to revalidate storage configurations in an Error state

    • Improved existing endpoints and added new endpoints to link and unlink data to a bundle or a project in batch:

      • POST /api/projects/{projectId}/dataUnlinkingBatch

      • GET /api/projects/{projectId}/dataUnlinkingBatch/{batchId}

      • GET /api/projects/{projectId}/dataUnlinkingBatch/{batchId}/items

      • GET /api/projects/{projectId}/dataUnlinkingBatch/{batchId}/items/{itemId}

  • Flow

    • Analyses started via the API can now leverage data stored in BaseSpace Sequence Hub as input

    • ICA now supports auto-launching analysis pipelines upon sequencing run completion with run data stored in BaseSpace Sequence Hub (instead of ICA)

    • Updated the API for creating pipelines to include "proprietary" setting, which hides pipeline scripts and details from users who do not belong to the tenant which owns the pipeline and prevents pipeline cloning.

  • Cohorts

    • Added support for partial matches against a single input string to the “Search subjects” flexible filtering logic

    • Users can now view an overview page for a gene when they search for it or click on a gene in the marker frequency charts

    • ICA Cohorts includes access to both pathogenic and benign variants, which are plotted in the “Pathogenic variants” track underneath the needle plot

    • Ingestion: UI notifications and/or errors will be displayed in the event of partially completed ingestions

    • Users can share cohort comparisons with any other users with access to the same project

Fixed Issues

  • General

    • Improved the project card view in the UI

    • Fixed an issue with user administration where changing the permissions of multiple users at the same time would result in users receiving Invalid OAuth access token messages

  • Data Access & Management

    • Improved the error message when downloading project data if the storage configuration is not ready for use

    • Fixed an issue causing Folder Copy jobs to time out and restart, resulting in delays in copy operations

    • Fixed an issue where only the Docker image of the first restricted bundle that was added could be selected

    • Improved the performance of folder linking with "api/projects/{ProjectID}/dataLinkingBatch"

    • The URL for links for "post/api/bundles" endpoint can be up to 2048 characters long

    • Improved the error response when using offset-based paging on API responses which contain too much data and require cursor-based paging

    • Fixed an issue resulting in failures downloading data from CLI using a path

    • The correct error message is displayed if the user does not have a valid subscription when creating a new project

    • Fixed an issue where changing ownership of a project does not change previous owner access for Base tables

  • Flow

    • Input parameters of pipelines are now displayed in the "label (code)" format unless there is no label available or the label equals the code, in which case only the code is shown

    • Fixed an issue where multiple folders were created upon starting new analyses

    • Fixed an issue preventing analyses from using inputs with BaseSpace v1pre3 APIs

    • Fixed an issue causing analyses with a specified output path to incorrectly return an error stating that the data does not exist

    • The following endpoint "/api/projects/{projectId}/workflowSessions/{workflowSessionId}/inputs" now supports using external data as input

    • Any value other than "economy" or "standard" for submitted analysis jobs will default to "standard" and use "standard"

    • The parameter to pass an activationcode is now optional for start-analysis API endpoints

  • Base

    • Improved the display of errors in the activity jobs screen if a Meta Data schedule fails

    • If an error occurs when processing metadata a failed job entry will be added in the Base Activity screen

    • Fixed an issue where records ingested via schedules from the same file could be duplicated

    • Fixed an issue where exporting the view shared via bundle would show an error 'Could not find data with ID (fol. ....)'

    • Resolved a NullPointerException error when clicking on Format and Status filters in the details screen of a Schedule in the Results tab

    • Fixed an issue where a schedule download would fail when performed by different user than the initial user

  • Bench

    • Fixed an issue when trying to query a Base table with a high limit within a workspace

    • Fixed an apt-get error when building images due to an outdated repository

    • Fixed an issue where a stopped workspace would display "Workspace paused" instead of "Workspace stopped"

    • Fixed an issue where large files (e.g., 150GB+) could not be downloaded to a fuse-driver location from a Workspace, and set the new limit to 500GB

  • Cohorts

    • Fixed an issue where split Nirvana JSON files are not recognized during ingestion

    • Fixed an issue causing the UI hangs on large genes and returns a 502 error

    • Fixed an issue where OMOP files are not correctly converted to CAM data model, preventing OMOP data ingestions

    • Fixed an issue where OMOP large drug ingestions led to memory issues and preventing further drug data ingestion

    • Fixed an issue where users from a different tenant accessing a shared project could not ingest data

API Beginner Guide

API Basics

Any operation from the ICA graphical user interface can also be performed with the API.

The following are some basic examples on how to use the API. These examples are based on using Python as programming language. For other languages, please see their native documentation on API usage.

Prerequisites

  • An installed copy of Python. (https://www.python.org/)

  • The package installer for python (pip) (https://pip.pypa.io/)

  • Having the python requests library installed (pip install requests)

Authenticating

API keys are valid for a single user, so any information you request is for that user to which the key belongs. For this reason, it is best practice to create a dedicated API user so you can manage the access rights for the API by managing those user rights.

API Reference

Converting curl to Python

To get the curl command,

  1. Look up the endpoint you want to use on the API reference page.

  2. Select Try it out.

  3. Enter the necessary parameters.

  4. Select Execute.

  5. Copy the resulting curl command.

Never paste your API authentication key into online tools when performing curl conversion as this poses a significant security risk.

In the most basic form, the curl command

curl my.curlcommand.com

becomes

import requests
response = requests.get('http://my.curlcommand.com')

-H means header.

-X means the string is passed "as is" without interpretation.

curl -X 'GET' 'https://my.curlcommand.com' -H 'HeaderName: HeaderValue'

becomes

import requests
headers = {
    'HeaderName': 'HeaderValue',
}
response = requests.get('https://my.curlcommand.com', headers=headers)

Simple API Examples

Request a list of event codes

This is a straightforward request without parameters which can be used to to verify your connection.

The API call is

response = requests.get('https://ica.illumina.com/ica/rest/api/eventcodes', headers={'X-API-Key': '<your_generated_API_key>'})

In this example, the API key is part of your API call, which means you must update all API calls when the key changes. A better practice is to put this API key in the headers so it is easier to maintain. The full code then becomes

# The requests library will allow you to make HTTP requests.
import requests

# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
    'X-API-Key': '<your_generated_API_key>',
}

# Store the API request in response.
response = requests.get('https://ica.illumina.com/ica/rest/api/eventCodes', headers=headers)

# Display the response status code. Code 200 means the request succeeded.
print("Response status code: ", response.status_code)

# Display the data from the request.
print(response.json())

Pretty-printing the result

The list of application codes was returned as a single line, which makes it difficult to read, so let's pretty-print the result.

# The requests library will allow you to make HTTP requests.
import requests

# JSON will allow us to format and interpret the output.
import json

# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
    'X-API-Key': '<your_generated_API_key>',
}

# Store the API request in response.
response = requests.get('https://ica.illumina.com/ica/rest/api/eventCodes', headers=headers)

# Display the response status code. Code 200 means the request succeeded.
print("Response status code: ", response.status_code)

# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()

# Print JSON data in readable format with indentation and sorting.
print(json.dumps(My_API_Data, indent=3, sort_keys=True))

Retrieving a list of projects

Now that we are able to retrieve information with the API, we can use it for a more practical request like retrieving a list of projects. This API request can also take parameters.

Retrieve all projects

First, we pass the request without parameters to retrieve all projects.

# The requests library will allow you to make HTTP requests.
import requests

# JSON will allow us to format and interpret the output.
import json

# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
    'X-API-Key': '<your_generated_API_key>',
}

# Store the API request in response.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects', headers=headers)

# Display the response status code. Code 200 means the request succeeded.
print("Response status code: ", response.status_code)

# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()

# Print JSON data in readable format with indentation and sorting.
print(json.dumps(My_API_Data, indent=3, sort_keys=True))

Single parameter

The easiest way to pass a parameter is by appending it to the API request. The following API request will list the projects with a filter on CAT as user tag.

response = requests.get('https://ica.illumina.com/ica/rest/api/projects?userTags=CAT', headers=headers)

Multiple parameters

If you only want entries that have both the tags CAT and WOLF, you would append them like this:

response = requests.get('https://ica.illumina.com/ica/rest/api/projects?userTags=CAT&userTags=WOLF', headers=headers)

Copying Data

To copy data, you need to know:

  • Your generated API key.

  • The dataId of the files and folders which you want to copy (their syntax is fil.hexadecimal_identifier and fol.hexadecimal_identifier). You can select a file or folder in the GUI and select it to see the Id (Projects > your_project > Data > your_file > Data details > Id) or you can use the /api/projects/{projectId}/data endpoint.

  • The destination project to which you want to copy the data.

  • The destination folder within the destination project to which you want to copy the data (fol.hexadecimal_identifier).

  • What to do when the destination files or folders already exist (OVERWRITE, SKIP or RENAME).

The full code will then be as follows:

# The requests library will allow you to make HTTP requests.
import requests

# Fill out your generated API key.
headers = {
    'accept': 'application/vnd.illumina.v3+json',
    'X-API-Key': '<your_generated_API_key>',
    'Content-Type': 'application/vnd.illumina.v3+json',
}

# Enter the files and folders, the destination folder, and the action to perform when the destination data already exists.
data = '{"items": [{"dataId": "fil.0123456789abcdef"}, {"dataId": "fil.735040537abcdef"}], "destinationFolderId": "fol.1234567890abcdef", "copyUserTags": true,"copyTechnicalTags": true,"copyInstrumentInfo": true,"actionOnExist": "SKIP"}'

# Replace <Project_Identifier> with the actual identifier of the destination project.
response = requests.post(
    'https://ica.illumina.com/ica/rest/api/projects/**<Project_Identifier>**/dataCopyBatch',
    headers=headers,
    data=data,
)

# Display the response status code.
print("Response status code: ", response.status_code) 

Combined API Example - Running a Pipeline

Now that we have done individual API requests, we can combine them and use the output of one request as input for the next request. When you want to run a pipeline, you need a number of input parameters. In order to obtain these parameters, you need to make a number of API calls first and use the returned results as part of your request to run the pipeline. In the examples below, we will build up the requests one by one so you can run them individually first to see how they work. These examples only follow the happy path to keep them as simple as possible. If you program them for a full project, remember to add error handling. You can also use the GUI to get all the parameters or write them down after performing the individual API calls in this section. Then, you can build your final API call with those values fixed.

Initialization

This block must be added at the beginning of your code

# The requests library will allow you to make HTTP requests.
import requests

# JSON will allow us to format and interpret the output.
import json

# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
    'X-API-Key': '<your_generated_API_key>',
}

Look for a project in the list of Projects

Previously, we already requested a list of all projects, now we add a search parameter to look for a project called MyProject. (Replace MyProject with the name of the project you want to look for).

# Store the API request in response. Here we look for a project called "MyProject".
response = requests.get('https://ica.illumina.com/ica/rest/api/projects?search=MyProject', headers=headers)

# Display the response status code. Code 200 means the request succeeded.
print("Response status code: ", response.status_code)

# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()

# Print JSON data in readable format with indentation and sorting.
print(json.dumps(My_API_Data, indent=3, sort_keys=True))

Now that we have found our project by name, we need to get the unique project id, which we will use in the combined requests. To get the id, we add the following line to the end of the code above.

print(My_API_Data['items'][0]['id'])

Syntax ['items'][0]['id'] means we look for the items list, 0 means we take the first entry (as we presume our filter was accurate enough to only return the correct result and we don't have duplicate project names) and id means we take the data from the id field. Similarly, you can build other expressions to give you the data you want to see, such as ['items'][0]['urn'] to get the urn or ['items'][0]['tags']['userTags'] to get the list of user tags.

Once we have the identifier we need, we add it to a variable which we will call Project_Identifier in our examples.

# Get the project identifier.
Project_Identifier = My_API_Data['items'][0]['id']

Retrieve the Pipelines of your Project

Once we have the identifier of our project, we can fill it out in the request to list the pipelines which are part of our project.

response = requests.get('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/pipelines?', headers=headers)

This will give us all the available pipelines for that project. As we will only want to run a single pipeline, we can search for our pipeline, which in this example will be the basic_pipeline. Unfortunately, this API call has no direct search parameter, so when we get the list of pipelines, we will look for the id and store that in a variable which we will call Pipeline_Identifier in our examples as follows:

# Find Pipeline
# Store the API request in response. Here we look for the list of pipelines in MyProject.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/pipelines', headers=headers)

# Display the response status code. Code 200 means the request succeeded.
print("Find Pipeline Response status code: ", response.status_code)

# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()

# Store the list of pipelines for further processing.
pipelineslist = json.dumps(My_API_Data)

# Set "basic_pipeline" as the pipeline to search for. Replace this with your target pipeline.
target_pipeline = "basic_pipeline"
found_pipeline = None

# Look for the code to match basic_pipeline and store the ID.
for item in My_API_Data['items']:
    if 'pipeline' in item and item['pipeline'].get('code') == target_pipeline:
        found_pipeline = item['pipeline']
        Pipeline_Identifier = found_pipeline['id']
        break
print("Pipeline Identifier: " + Pipeline_Identifier)

Find which parameters the Pipeline needs.

Once we know the project identifier and the pipeline identifier, we can create an API request to retrieve the list of input parameters which are needed for the pipeline. We will consider a simple pipeline which only needs a file as input. If your pipeline has more input parameters, you will need to set those as well.

# Find Parameters
# Store the API request in response. Here we look for the Parameters in basic_pipeline
response = requests.get('https://ica.illumina.com/ica/rest/api/pipelines/'+(Pipeline_Identifier)+'/inputParameters', headers=headers)

# Display the response status code. Code 200 means the request succeeded.
print("Find Parameters Response status code: ", response.status_code)

# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()

# Get the parameters and store in the Parameters variable.
Parameters = My_API_Data['items'][0]['code']
print("Parameters: ",Parameters)

Find the Storage Size to use for the analysis.

Here we will look for the id of the extra small storage size. This is done with the 0 in the My_API_Data['items'][0]['id']

# Store the API request in response. Here we look for the analysis storage size.
response = requests.get('https://ica.illumina.com/ica/rest/api/analysisStorages', headers=headers)

# Display the response status code. Code 200 means the request succeeded.
print("Find analysisStorages Response status code: ", response.status_code)

# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()

# Get the storage size. We will select extra small.
Storage_Size = My_API_Data['items'][0]['id']
print("Storage_Size: ",Storage_Size)

Find the files to use as input for your pipeline.

Now we will look for a file "testExample" which we want to use as input and store the file id.

# Get Input File
# Store the API request in response. Here we look for the Files testExample.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/data?fullText=testExample', headers=headers)

# Display the response status code. Code 200 means the request succeeded.
print("Find input file Response status code: ", response.status_code)

# Put the JSON data from the response in My_API_Data
My_API_Data = response.json()

# Get the first file ID.
InputFile = My_API_Data['items'][0]['data']['id']
print("InputFile id: ",InputFile)

Start the Pipeline.

Finally, we can run the analysis with parameters filled out.

Postheaders = {
    'accept': 'application/vnd.illumina.v4+json',
    'X-API-Key': '<your_generated_API_key>',
    'Content-Type': 'application/vnd.illumina.v4+json',
}

data = '{"userReference":"api_example","pipelineId":"'+(Pipeline_Identifier)+'","analysisStorageId":"'+(Storage_Size)+'","analysisInput":{"inputs":[{"parameterCode":"'+(Parameters)+'","dataIds":["'+(InputFile)+'"]}]}}'

response = requests.post(
    'https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/analysis:nextflow',headers=Postheaders,data=data,
)

Complete code example

# The requests library will allow you to make HTTP requests.
import requests

# JSON will allow us to format and interpret the output.
import json

# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
    'X-API-Key': '<your_generated_API_key>',
}

# Replace <your_generated_API_key> with your actual generated API key here.
Postheaders = {
    'accept': 'application/vnd.illumina.v4+json',
    'X-API-Key': '<your_generated_API_key>',
    'Content-Type': 'application/vnd.illumina.v4+json',
}

# Find project
# Store the API request in response. Here we look for a project called "MyProject".
response = requests.get('https://ica.illumina.com/ica/rest/api/projects?search=MyProject', headers=headers)

# Display the response status code. Code 200 means the request succeeded.
print("Find Project response status code: ", response.status_code)

# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()

# Get the project identifier.
Project_Identifier = My_API_Data['items'][0]['id']
print("Project_Identifier: ",Project_Identifier)

# Find Pipeline
# Store the API request in response. Here we look for the list of pipelines in MyProject.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/pipelines', headers=headers)

# Display the response status code. Code 200 means the request succeeded.
print("Find Pipeline Response status code: ", response.status_code)

# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()

# Store the list of pipelines for further processing.
pipelineslist = json.dumps(My_API_Data)

# Set "basic_pipeline" as the pipeline to search for. Replace this with your target pipeline.
target_pipeline = "basic_pipeline"
found_pipeline = None

# Look for the code to match basic_pipeline and store the ID.
for item in My_API_Data['items']:
    if 'pipeline' in item and item['pipeline'].get('code') == target_pipeline:
        found_pipeline = item['pipeline']
        Pipeline_Identifier = found_pipeline['id']
        break
print("Pipeline Identifier: " + Pipeline_Identifier)

# Find Parameters
# Store the API request in response. Here we look for the Parameters in basic_pipeline.
response = requests.get('https://ica.illumina.com/ica/rest/api/pipelines/'+(Pipeline_Identifier)+'/inputParameters', headers=headers)

# Display the response status code. Code 200 means the request succeeded.
print("Find Parameters Response status code: ", response.status_code)

# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()

# Get the parameters and store in the Parameters variable.
Parameters = My_API_Data['items'][0]['code']
print("Parameters: ",Parameters)

# Get Storage Size
# Store the API request in response. Here we look for the analysis storage size.
response = requests.get('https://ica.illumina.com/ica/rest/api/analysisStorages', headers=headers)

# Display the response status code. Code 200 means the request succeeded.
print("Find analysisStorages Response status code: ", response.status_code)

# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()

# Get the storage size. We will select extra small.
Storage_Size = My_API_Data['items'][0]['id']
print("Storage_Size: ",Storage_Size)

# Get Input File
# Store the API request in response. Here we look for the Files testExample.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/data?fullText=testExample', headers=headers)

# Display the response status code. Code 200 means the request succeeded.
print("Find input file Response status code: ", response.status_code)

# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()

# Get the first file ID.
InputFile = My_API_Data['items'][0]['data']['id']
print("InputFile id: ",InputFile)

# Finally, we can run the analysis with parameters filled out.
data = '{"userReference":"api_example","pipelineId":"'+(Pipeline_Identifier)+'","tags":{"technicalTags":[],"userTags":[],"referenceTags":[]},"analysisStorageId":"'+(Storage_Size)+'","analysisInput":{"inputs":[{"parameterCode":"'+(Parameters)+'","dataIds":["'+(InputFile)+'"]}]}}'
print (data)
response = requests.post('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/analysis:nextflow',headers=Postheaders,data=data,)
print("Post Response status code: ", response.status_code)

Create a new empty folder on your local disk, put a small file in there, and configure this folder as upload folder. • If it works, and your sample files are on a shared drive, have a look at the section. • If it works, and your sample files are on your local disk, there are a few possibilities: a) There is an error in how the upload folder name is configured in the platform. b) For big files, or on slow disks, the connector needs quite some time to start the transfer because it needs to calculate a hash to make sure there are no transfer errors. Wait up to 30 minutes, without changing anything to your Connector configuration. • If this doesn’t work, you might have a corporate proxy. Proxy configuration is currently not supported for the connector.

Follow the guidelines in section. Inspect the connector BSC.log file for any error messages regarding the folder not being found. • If there is such a message, there are two options: a) An issue with the the folder name, such as special characters and spaces. As a best practice, use only alphanumeric characters, underscores, dashes and periods. b) A permissions issue. In this case, ensure the user running the connector has read & write access, without a password being requested, to the network share. • If there are no messages indicating the folder cannot be found, it may be necessary to wait for some time until the integrity checks have been done. This check can take quite long on slow disks and slow networks.

A new version of the has been released with this version of ICA. It is highly recommended to update your connector to the latest version by downloading and installing it from Projects > your_project > Project Settings > Connectivity > Service connectors > Download installer.

System notifications which could already previously be found on  , both regional and global, are now shown in the ICA UI when an important ICA message needs to be communicated

POST Creates a file in this project, and retrieves temporary credentials for it

POST Creates a file in this project, and retrieves an upload url for it

POST Creates a folder in this project, and and retrieves temporary credentials for it

POST Creates a folder in this project, and creates a trackable folder upload session

Users can now access the system via or

One of the easiest authentication methods is by means of API keys. To generate an API key, refer to the section. This key is then used in your Python code to authenticate the API calls. It is best practice to regularly update your API keys.

There is a dedicated where you can enter your API key and try out the different API commands and get an overview of the available parameters.

The examples on the page use curl (Client URL) while Python uses Python requests. There are a number of online tools to automatically convert from curl to python.

You will see the following options in the curl commands on the page.

Mac
sha256
windows
sha256
linux
sha256
Mac
sha256
windows
sha256
linux
sha256
Mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
sha256
windows
sha256
linux
sha256
mac
windows
linux
mac
windows
linux
Service Connector
https://status.illumina.com/
​/api​/projects​/{projectId}​/data:createFileWithTemporaryCredentials
​/api​/projects​/{projectId}​/data:createFileWithUploadUrl
​/api​/projects​/{projectId}​/data:createFolderWithTemporaryCredentials
​/api​/projects​/{projectId}​/data:createFolderWithUploadSession
https://ica.illumina.com
https://ica.illumina.com/ica
Get Started
API Reference
API Reference
API Reference
Shared Drives
Shared Drives
Fields and Field Groups
Metadata.response.json format
sse-kms-0
sse-kms-1
sse-kms-2
sse-kms-3
data-0
Owning Project Filter
Two options for a reference file
Required settings
Selections from the options
nextflowconfig-0
createproject-1
Project Filters
externally-managed-projects-0.png
Flow diagram of access to Bench
Bring Your Own Bench Image Steps
File Mapping
Nextflow output
launch-validation-in-flow
tutorial-nextflowpipeline-1
tutorial-nextflowpipeline-2
tutorial-nextflowpipeline-3
tutorial-nextflowpipeline-4
tutorial-nextflowpipeline-5
tutorial-nextflowpipeline-6
tutorial-nextflowpipeline-7
tutorial-nextflowpipeline-9
tutorial-nextflowpipeline-10
tutorial-nextflowpipeline-11
tutorial-nextflowpipeline-13
tutorial-nextflowpipeline-12
tutorial-nextflowpipeline-15
FastQC
fastp_multiqc
base-enabled-oauth
base-oauth-token
base-oauth-command
enable-base
save-table
upload_data
schedule-tab
schedule_load_content_from_file
create_schedule
run_schedule
activity
tablde_preview_data
query_page
query_results
table_view
json-format
create docker
Architecture
Screenshot
ScreenshotActivation
Connector setup