Illumina® Connected Analytics is a cloud-based software platform intended to be used to manage, analyze, and interpret large volumes of multi-omics data in a secure, scalable, and flexible environment. The versatility of the system allows the platform to be used for a broad range of applications.
When using the applications provided on the platform for diagnostic purposes, it is the responsibility of the user to determine regulatory requirements and to validate for intended use, as appropriate.
The platform is hosted in regions listed below.
Region Name
Region Identifier
Australia
AU
The platform hosts a suite of RESTful HTTP-based application programming interfaces (APIs) to perform operations on data and analysis resources. A web application user-interface is hosted alongside the API to deliver an interactive visualization of the resources and enables additional functionality beyond automated analysis and data transfer. Storage and compute costs are presented via usage information in the account console, and a variety of compute resource options are specifiable for applications to fine tune efficiency.
Our systems are synchronized using a Cloud Time Sync Service to ensure accurate timekeeping and consistent log timestamps.
Getting Started
The user documentation provides material for learning the basics of interacting with the platform including examples and tutorials. Start with the documentation to learn more.
Getting Help
Use the search bar on the top right to navigate through the help docs and find specific topics of interest.
If you have any questions, contact Illumina Technical Support by phone or email:
For customers outside the United States, Illumina regional Technical Support contact information can be found at www.illumina.com/company/contact-us.html.
To see the current ICA version you are logged in to, click your username found on the top right of the screen and then select About.
Other Illumina Products
To view a list of the products to which you have access, select the 9 dots symbol at the top right of ICA. This will list your products. If you have multiple regional applications for the same product, the region of each is shown between brackets.
The More Tools category presents the following options
My Illumina Dashboard to monitor instruments, streamline purchases and keep track of upcoming activities.
Link to the Support Center for additional information and help.
Link to the order management from where you can keep track of your current and past orders.
Release Notes
In the section of the documentation, posts are made for new versions of deployments of the core platform components.
Illumina® Connected Analytics is a cloud-based software platform intended to be used to manage, analyze, and interpret large volumes of multi-omics data in a secure, scalable, and flexible environment. The versatility of the system allows the platform to be used for a broad range of applications.
Get Started gives an overview of how to access and configure ICA with Network settings providing access prerequisites.
The home section provides an overview of the main ICA sections such as (the main work location), (asset packages), , (to capture additional information), and images (containerised applications) and how to configure (your own) .
Project
Projects are your primary work locations which contain your and . Here you will create and use them for . You configure who can access your project by means of the settings. The results can be processed with the help of , or . Projects can be considered as a binder for your work and information.
Command-Line Interface
This section contains information on how to , , , and the command line interface, as alternative for the ICA GUI.
Sequencer Integration
Information and tutorials on .
Tutorials
There is a set of step-by-step Tutorials
Reference
In the Reference section, you can find more information on the , , ,
For an overview of the available subscription tiers and functionality, please refer to on the Illumina website.
The platform requires a provisioned tenant in the () system with access to the Illumina Connected Analytics (ICA) application. Once a tenant has been provisioned, a tenant administrator will be assigned. The tenant administrator has permission to manage account access including add users, create workgroups, and add additional tenant administrators.
Each tenant is assigned a domain name used to login to the platform. The domain name is used in the login URL to navigate to the appropriate login page in a web browser. The login URL is https://<domain_name>.login.illumina.com with <domain_name> replaced by the actual domain name.
New user accounts can be created
For more details on identity and access management, please see the help.
If you have intrusion detection systems active on your infrastructure, be aware that activities performed by ICA on your behalf (such as accessing ) might trigger suspicious activity alerts. Please review the alerts and rules with your vendor to set up appropriate policies on your detection system.
by the tenant administrator by logging in to their domain and navigating to Illumina Account Management under their profile at the top right
or by the user by accessing https://platform.login.illumina.com and selecting the option Don't have an account.
Once the account has been added to the domain, the tenant administrator may assign registered users to which bundle users with permission to use the ICA application. Registered users can be made workgroup administrators by tenant administrators or existing workgroup administrators.
API Keys
If you want to use the (CLI) or the (API), you can use an as credentials when logging in. API Keys operate similar to a user name and password combination and must be kept secure and rotated on a regular basis (preferably yearly). `
When keys are compromised or no longer in use, they must be revoked. This is done through the by navigating to the User menu item on the left and selecting "API Keys", followed by selecting the key and using the trash icon next to it.
Generate an API Key
API Keys are limited to 10 per user and are managed through the product dashboard after logging in through the . See for more information.
{% hint style="warning" %} For security reasons, do not use accounts with administrator level access to generate API keys. Create a specific CLI user with basic permissions instead. This will minimize the possible impact of compromised keys. {% endhint %}
{% hint style="warning" %} Once the API key generation window is closed, the key contents will not be accessible through the domain login page, so be sure to store it securely for future reference. {% endhint %}
Access via Web UI
The web application provides a visual user interface (UI) for navigating resources in the platform, managing projects, and extended features beyond the API. To access the web application, navigate to the .
On the left, you have the navigation bar (1) which will auto-collapse on smaller screens. To collapse it, use the double arrow symbol (2). When collapsed, use the >> symbol to expand it.
The centralpart (3) of the display is the item on which you are performing your actions and the breadcrumbmenu (4) to return to the projects overview or a previous level. You can also use your browser's back button to return to the level from which you came.
Access via the CLI
The command-line interface offers a developer-oriented experience for interacting with the APIs to manage resources and launch analysis workflows. Find instructions for using the command-line interface including download links for your operating system in the .
Access via the API
The HTTP-based application programming interfaces (APIs) are listed in the section of the documentation. The reference documentation provides the ability to call APIs from the browser page and shows detailed information about the API schemas. HTTP client tooling such as Postman or cURL can be used to make direct calls to the API outside of the browser.
{% hint style="info" %} When accessing the API using the API Reference page or through REST client tools, the Authorization header must be provided with the value set to Bearer <token> where <token> is replaced with a valid JSON Web Token (JWT). For generating a JWT, see . {% endhint %}
Object Identifiers
The object data models for resources that are created in the platform include a unique id field for identifying the resource. These fixed machine-readable IDs are used for accessing and modifying the resource through the API or CLI, even if the resource name changes.
JSON Web Token (JWT)
Accessing the platform APIs requires authorizing calls using JSON Web Tokens (JWT). A JWT is a standardized trusted claim containing authentication context. This is a primary security mechanism to protect against unauthorized cross-account data access.
A JWT is generated by providing user credentials (API Key or username/password) to the token creation endpoint. Token creation can be performed using the API directly or the CLI.
Flow
Flow provides tooling for building and running secondary analysis pipelines. The platform supports analysis workflows constructed using Common Workflow Language (CWL) and Nextflow. Each step of an analysis pipeline executes a containerized application using inputs passed into the pipeline or output from previous steps.
You can configure the following components in Illumina Connected Analytics Flow:
Reference Data — Reference Data for Graphical CWL flows. See .
Pipelines — One or more tools configured to process input data and generate output files. See
Cloud Analysis Auto-launch
Please see the for all content related to Cloud Analysis Auto-Launch:
.
Analyses — Launched instance of a pipeline with selected input data. See Analyses.
The event log shows an overview of system events with options to search and filter. For every entry, it lists the following:
Event date and time
Category (error, warn or info)
Code
Description
Tenant
Up to 200,000 results will be be returned. If your desired records are outside the range of the returned records, please refine the filters or use the search function at the top right.
Export is restricted to the amount of entries shown per page. You can use the selector at the bottom to set this to up to 1000 entries per page.
Connectivity
The platform provides Connectors to facilitate automation for operations on data (ie, upload, download, linking).
Service connectors sync data between your local computer or server and the project's cloud-based data storage.
The CLI supports outputs in table, JSON, and YAML formats. The format is set using the output-format configuration setting through a command line option, environment variable, or configuration file.
Dates are output as UTC times when using JSON/YAML output format and local times when using table format.
To set the output format, use the following setting:
Once a cluster is started, the cluster manager can be accessed from the workspace node.
Job resources
Snowflake
User
Every Base user has 1 snowflake username: ICA_U_<id>
User/Project-Bundle
Precomputed GWAS and PheWAS
The GWAS and PheWAS tabs in ICA Cohorts allow you to visualize precomputed analysis results for phenotypes/diseases and genes, respectively. Note that these do not reflect the subjects that are part of the cohort that you created.
ICA Cohorts currently hosts GWAS and PheWAS analysis results for approximately 150 quantitative phenotypes (such as "LDL direct" and "sitting height") and about 700 diseases.
Visualize Results from Precomputed Genome-Wide Association Studies (GWAS)
Cohorts
Introduction to Cohorts
ICA Cohorts is a cohort analysis tool integrated with Illumina Connected Analytics (ICA). ICA Cohorts combines subject- and sample-level metadata, such as phenotypes, diseases, demographics, and biometrics, with molecular data stored in ICA to perform tertiary analyses on selected subsets of individuals.
For each user/project-bundle combination a role is created: ICA_UR_<id>_<name project/bundle>__<id>
This role receives the viewer or contributor role of the project/bundle, depending on their permissions in ICA.
Roles
Every project or bundle has a dedicated Snowflake database.
For each database, 2 roles are created:
<project/bundle name>_<id>_VIEWER
<project/bundle name>_<id>_CONTRIBUTOR
Project viewer role
This role receives
REFERENCE and SELECT rights on the tables/views within the project's PUBLIC schema.
Grants on the viewer roles of the bundles linked to the project.
Project contributor role
This role receives the following rights on current an future objects in the project's/bundle database in the PUBLIC schema:
ownership
select, insert, update, delete, truncate and references on tables/views/materialized views
usage on sequences/functions/procedures/file formats
write, read and usage on stages
select on streams
monitor and operate on tasks
It also receives grant on the viewer role of the project.
Warehouses
For each project (not bundle!) 2 warehouses are created, whose size can be changed ICA at projects > your_project > project settings > details.
<projectname>_<id>_QUERY
<projectname>_<id>_LOAD
Using Load instead of Query warehouse
When you generates an oauth token, ICA always uses the QUERY warehouse by default (see bold part below):
snowsql -a iap.us-east-1 -u ICA_U_277853 --authenticator=oauth -r ICA_UR_274853_603465_264891 -d atestbase2_264891 -s PUBLIC -w ATESTBASE2_264891_QUERY --token=<token>
If you wish to use the LOAD warehouse in a session, you have 2 options :
Change the name in the connect string : snowsql -a iapdev.us-east-1 -u ICA_U_277853 --authenticator=oauth -r ICA_UR_277853_603465_264891 -d atestbase2_264891 -s PUBLIC -w ATESTBASE2_264891_LOAD ``--token=<token>
Execute the following statement after logging in : “use warehouse ATESTBASE2_264891_LOAD”
To determine which warehouse you are using, execute : select current_warehouse();
Synchronizing Tables
if you have created tables directly in Snowflake with the OAuth token, you can synchronize them to appear in ICA by means of the Projects > your_project > Base > Tables > Sync button.
Navigate to the GWAS tab and start looking for phenotypes and diseases in the search box. Cohorts will suggest the best matches against any partial input ("cancer") you provide. After selecting a phenotype/disease, Cohorts will render a Manhattan plot, by default collapsed to gene level and organized by their respective position in each chromosome.
Circles in the Manhattan plot indicate binary traits, potential associations between genes and diseases. Triangles indicate quantitative phenotypes with regression Beta different from zero, and point up or down to depict positive or negative correlation, respectively.
Hovering over a circle/triangle will display the following information:
gene symbol
variant group (see below)
P-value, both raw and FDR-corrected
number of carriers of variants of the given type
number of carriers of variants of any type
regression Beta
For gene-level results, Cohorts distinguishes five different classes of variants: protein truncating; deleterious; missense; missense with a high ILMN PrimateAI score (indicating likely damaging variants); and synonymous variants. You can limit results to either of these five classes, or select All to display all results together.
Deleterious variants (del): the union of all protein-truncating variants (PTVs, defined below), pathogenic missense variants with a PrimateAI score greater than a gene-specific threshold, and variants with a SpliceAI score greater than 0.2.
Protein-truncating variants (ptv): variant consequences matching any of stop_gained, stop_lost, frameshift_variant, splice_donor_variant, splice_acceptor_variant, start_lost, transcript_ablation, transcript_truncation, exon_loss_variant, gene_fusion, or bidirectional_gene_fusion.
missense_all: all missense variants regardless of their pathogenicity.
missense, PrimateAI optimized (missense_pAI_optimized): only pathogenic missense variants with primateAI score greater than a gene-specific threshold.
missenses and PTVs (missenses_and_ptvs_all): the union of all PTVs, SpliceAI > 0.2 variants and all missense variants regardless of their pathogenicity scores.
all synonymous variants (syn).
To zoom in to a particular chromosome, click the chromosome name underneath the plot, or select the chromosome from the drop down box, which defaults to Whole genome.
Visualize Results from Precomputed Phenome-Wide Association Studies (PheWAS)
To browse PheWAS analysis results by gene, navigate to the PheWAS tab and enter a gene of interest into the search box. The resulting Manhattan plot will show phenotypes and diseases organized into a number of categories, such as "Diseases of the nervous system" and "Neoplasms". Click on the name of a category, shown underneath the plot, to display only those phenotypes/diseases, or select a category from the drop down, which defaults to All.
Every cluster member has a certain capacity which is determined by the selected Resource model for the cluster member.
The following complex values have been added to the SGE cluster environment and are requestable.
static_cores (default: 1)
static_mem (default: 2G)
These values are used to avoid oversubscription of a node which can result in Out-Of-Memory or unresponsiveness. You need to ensure these limits are not exceeded.
To ensure stability of the system, some headroom is deducted from the total node capacity.
Scaling
These two values are used by the SGE auto scaler when running in dynamic mode. The SGE auto scaler will summarise all pending jobs and their requested resources to determine the scale up/down operation within the defined range.
Cluster members will remain in the cluster for at least 300 seconds. The Auto scaler only executes one scale up/down operation at a time and is stabilised before taking on a new operation.
Job requests that require more resources than the capacity of the selected resource model will be ignored by the auto scaler and will wait indefinitely.
The operation of the auto scaler can be monitored in the log file /data/logs/sge-scaler.log
Submitting jobs
Submitting a single job
Submitting a job array
Do not limit the job concurrency amount as this will result in unused cluster members.
Monitoring members
Listing all members of the cluster
Managing running/pending jobs
listing all jobs in the cluster
Showing the details of a job.
Deleting a job.
Managing executed jobs
Showing the details of an executed job.
SGE Reference documentation
SGE command line options and configuration details can be found here.
This video is an overview of Illumina Connnected Analytics. It walks through a Multi-Omics Cancer workflow that can be found here: Oncology Walkthrough
Features At-a-glance
Intuitive UI for selecting subjects and samples to analyze and compare: deep phenotypical and clinical metadata, molecular features including germline, somatic, gene expression.
Comprehensive, harmonized data model exposed to ICA Base and ICA Bench users for custom analyses.
Run analyses in ICA Base and ICA Bench and upload final results back into Cohorts for visualization.
Out-of-the-box statistical analyses including genetic burden tests, GWAS/PheWAS.
Rich public data sets covering key disease areas to enrich private data analysis.
Easy-to-use visualizations for gene prioritization and genetic variation inspection.
ICA Cohorts contains a variety of freely available data sets covering different disease areas and sequencing technologies. For a list of currently available data, see here.
JupyterLab
Bench workspaces require setting a Docker image to use as the image for the workspace. Illumina Connected Analytics (ICA) provides a default Docker image with JupyterLab installed.
JupyterLab supports Jupyter Notebook documents (.ipynb). Notebook documents consist of a sequence of cells which may contain executable code, markdown, headers, and raw text.
The JupyterLab Docker image contains the following environment variables:
Variable
Set to
ICA_URL
https://ica.illumina.com/ica (ICA server URL)
ICA_PROJECT (Obsolete)
To export data from your workspace to your local machine, it is best practice to move the data in your workspace to the /data/project/ folder so that it becomes available in your project under projects > your_project > Data.
ICA Python Library
Included in the default JupyterLab Docker image is a python library with APIs to perform actions in ICA, such as add data, launch pipelines, and operate on Base tables. The python library is generated from the using .
The ICA Python library API documentation can be found in folder /etc/ica/data/ica_v2_api_docs within the JupyterLab Docker image.
See the for examples on using the ICA Python library.
Storage Cost Managment
Since there is a storage cost associated with the data in your projects, it is good practice to regularly check how much cost is being generated by your projects and evaluate which data can be removed from cloud storage. The instructions provided here will help you quickly determine which data is generating the highest storage costs.
Monitoring Storage Cost
To see how much storage costs are currently being generated for your tenant, you can look at the usage explorer at https://platform.illumina.com/usage/ or from within ICA, navigate to the 9-dot symbol () in the top right next to your name and choose the usage explorer from the menu.
From the usage explorer overview screen, you can see below the graphical representation which projects are incurring the highest storage costs.
Project Files in ICA
When you have determined which projects are incurring the largest storage costs, you can find out which files within that project are taking up the most space. To find the largest files in your project,
Go to Projects > your_project > Data and switch to list view with the () icon left above your files.
Use the column icon () top right to add the size column to your view. You can drag the size column to the desired position in your list view or use the move left and use right options which appear when you select the three vertical dots.
Select Sort descending to show the largest files first.
Once you have the list sorted like this, you can evaluate if those large files are still needed, if the can be sent to (manage > archive) or if they can be deleted (manage > delete).
Bench
ICA provides a tool called Bench for interactive data analysis. This is a sandboxed workspace which runs a docker image with access to the data and pipelines within a project. This workspace runs on the Amazon S3 system and comes with associated processing and provisioning costs. It is therefore best practice to not keep your Bench instances running indefinitely, but stopping them when not in use.
Access
Having access to Bench depends on the following conditions:
Bench needs to be included in your ICA subscription.
The project owner needs to enable Bench for their project.
Individual users of that project need to be given access to Bench.
Enabling Bench for your project
After creating a project, go to Projects > your_project > Bench > Workspaces page and click the Enable button. The entitlements you have determine the available resources for your Bench workspaces. If you have multiple entitlements, all the resources of your individual entitlements are taken into account. Once bench is enabled, users with matching have access to the Bench module in that project.
If you do not see the Enable button for Bench, then either your tenant subscription does not include Bench or the tenant to which you belong is not the one where the project was created. Users from other tenants can create workspaces in Bench once Bench is enabled, but they cannot enable the Bench module itself.
Setting user level access.
Once Bench has been enabled for your project, the combination of roles and teams settings determines if a user can access Bench.
Tenant administrators and project owners are always able to access Bench and perform all actions.
The teams settings page at Projects > your_project > Project Settings > Team determines the role for the user/workgroup.
No Access means you have no access to the Bench workspace for that project.
To specify a compute type for a CWL CommandLineTool, either define the ram and number of coresor use the resource type and size. The ICA Compute Type will automatically be determined based on CWL ResourceRequirement coresMin/coresMax (CPU) and ramMin/ramMax (Memory) values using a "best fit" strategy to meet the minimum specified requirements (See the Compute Types table to see to what the resources are mapped).
For example, take the following ResourceRequirements:
This will result in a best fit of standard-large ICA Compute Type request for the task.
If the specified requirements can not be met by any of the presets, the task will be rejected and failed.
See the example below to use the ResourceRequirement in the cwl workflow with the Predefined
FPGA requirements can not be set by means of CWL ResourceRequirements.
The Machine Profile Resource in the graphical editor will override whatever is set for requirements in the ResourceRequirement.
Standard vs Economy
For each compute type, you can choose between the
Standard - (Default) or
Economy - tiers.
You can set economy mode with the "tier" parameter
Considerations
If no Docker image is specified, Ubuntu will be used as default. Both : and / can be used as separator.
CWL Overrides
ICA supports overriding workflow requirements at load time using Command Line Interface (CLI) with JSON input. Please refer to for more details on the CWL overrides feature.
In ICA you can provide the "override" recipes as a part of the input JSON. The following example uses CWL overrides to change the environment variable requirement at load time.
Samples
You can use samples to group information related to a sample, including input files, output files, and analyses. You can consider samples as creating a binder to collect related information. When you then link that sample to another project, you bring over the empty binder, but not the files contained in it. In that project, you can then add your own data to it.
You can search for samples (excluding their metadata) with the Search button at the top right.
Add New Sample
To add a new sample, do as follows.
Select Projects > your_project > Samples.
To add a new sample, select + Create, and then enter a unique name and description for the sample.
To add files to the sample, see
Your sample is added to the Samples page. To view information on the sample, select the sample name in the overview.
Add Files to Samples
You can add files to a sample after creating the sample. Any files that are not currently linked to a sample are listed on the Unlinked Files tab.
To add an unlinked file to a sample, do as follows.
Go to Projects > your_project > Samples > Unlinked files tab.
Select a file or files, and then select one of the following options:
Create sample — Create a new sample that includes the selected files.
Alternatively, you can add unlinked files from the sample details.
Going to Projects > your_project > Samples > your_sample.
Select your sample to open the details.
Go to the Data tab and select link.
Data can only be linked to a single sample, so once you have linked data to a sample, it will no longer appear in the list of data to choose form.
Unlink Files from Samples
To remove files from samples,
Go to Projects > your_project > Samples > your_sample > Data
Select the files you want to remove.
Select Unlink.
If your selection contains both unlinkable samples and non-unlinkable samples (for example linked via a linked bundle), then this will be indicated in the confirmation dialog and only the unlinkable samples will be unlinked.
Link Samples to Project
A Sample can be linked to a project from a separate project to make it available in read-only capacity.
Navigate to the Samples view in the Project
Click the Link button
Select the Sample(s) to link to the project
Data linked to Samples is not automatically linked to the project. The data must be linked separately from the Data view. Samples also must be available in a complete state in order to be linked.
Delete Samples
If you want to remove a sample, select it and use the delete option from the top navigation row. You will be presented a choice of how to handle the data in the sample.
Unlink all data without deleting it.
Delete input data and unlink other data.
Delete all data.
Data Integrity
You can verify the integrity of the data by comparing the hash which is usually (with some exceptions) an MD5 (Message Digest Algorithm 5) checksum. This is a common cryptographic hash function that generates a fixed-size, 128-bit hash value from any input data. This hash value is unique to the content of the data, meaning even a slight change in the data will result in a significantly different MD5 checksum. AWS S3 calculates this checksum when data is uploaded and stores it in the ETag (Entity tag).
For files smaller than 16 MB, you can directly retrieve the MD5 checksum using our API endpoints. Make an API GET call to the https://ica.illumina.com/ica/rest/api/projects/{projectId}/data/{dataId} endpoint specifying the data Id you want to check and the corresponding project ID. The response you receive will be in JSON format, containing various file metadata. Within the JSON response, look for the objectETag field. This value is the MD5 checksum for the file you have queried. You can compare this checksum with the one you compute locally to ensure file integrity.
This ETag does not change and can be used as a file integrity check even when that file is archived, unarchived and/or copied to another location. Changes to the metadata have no impact on the ETag
For larger files, the process is different due to computation limitations. In these cases, we recommend using a dedicated pipeline on our platform to explicitly calculate the MD5 checksum. Below you can find both a main.nf file and the corresponding XML for a possible Nextflow pipeline to calculate the MD5 checksum for FASTQ files.
Installation
Download links for the CLI can be found at the Release History.
Both the CLI and the service connector require x86 architecture.
For ARM-based architecture on Mac or Windows, you need to keep x86 emulation enabled.
Linux does not support x86 emulation.
After the file is downloaded, place the CLI in a folder that is included in your $PATH environment variable list of paths, typically /usr/local/bin. Open the Terminal application, navigate to the folder where the downloaded CLI file is located (usually your Downloads folder), and run the following command to copy the CLI file to the appropriate folder. If you do not have write access to your /usr/local/bin folder, then you may use sudo (which requires a password) prior to the cp command. For example:
After the file is downloaded, place the CLI in a folder that is included in your $PATH environment variable list of paths, typically /usr/local/bin. Open the Terminal application, navigate to the folder where the downloaded CLI file is located (usually your Downloads folder), and run the following command to copy the CLI file to the appropriate folder. If you do not have write access to your /usr/local/bin folder, then you may use sudo (which requires a password) prior to the cp command. For example:
If you do not have sudo access on your system, contact your administrator for installation. Alternately, you may place the file in an alternate location and update your $PATH to include this location (see the documentation for your shell to determine how to update this environment variable).
You will also need to make the file executable so that the CLI can run:
You will likely want to place the CLI in a folder that is included in your $PATH environment variable list of paths. In Windows, you typically want to save your applications in the C:\Program Files
Mount projectdata using CLI
Mount projectdata using CLI
icav2 allows project data to be mounted on a local system. This feature is currently available on Linux and Mac systems only. Although not supported, users have successfully used Windows Subsystem for Linux (WSL) on Windows to use icav2 projectdata mount command. Please refer to the WSL documentation for installing WSL.
Prerequisites
icav2 (>=2.3.0) and in the local system.
For MAC refer to .
Mount projectdata
Identify the project id by running the following command:
Provide the project id under "ID" column above to the mount command to mount the project data for the project.
Check the content of the mount.
icav2 utilizes the FUSE driver to mount project data, providing both read and write capabilities. However, there are some limitations on the write capabilities that are enforced by the underlying AWS S3 storage. For more information, please refer to .
WARNING Do NOT use the CP -f command to copy or move data to a mounted location. This will result in data loss as data on the destination location will be deleted.
Unmount project data
You can unmount the project data using the 'unmount' command.
Authentication
The ICA CLI uses an Illumina API Key to authenticate. An Illumina API Key can be acquired through the product dashboard after logging into a domain. See API Keys for instructions to create an Illumina API Key.
Authenticate using icav2 config set command. The CLI will prompt for an x-api-key value. Input the API Key generated from the product dashboard here. See the example below (replace EXAMPLE_API_KEY with the actual API Key).
icav2 config set
Creating /Users/johngenome/.icav2/config.yaml
Initialize configuration settings [default]
server-url [ica.illumina.com]:
x-api-key : EXAMPLE_API_KEY
output-format (allowed values table,yaml,json defaults to table) :
colormode (allowed values none,dark,light defaults to none) :
The CLI will save the API Key to the config file as an encrypted value.
If you want to overwrite existing environment values, use the command icav2 config set.
To remove an existing configuration/session file, use the command icav2 config reset.\
Check the server and confirm you are authenticated using icav2 config get
If during these steps or in the future you need to reset the authentication, you can do so using the command: icav2 config reset
Storage
A storage configuration provides ICA with information to connect to an external cloud storage provider, such as AWS S3. The storage configuration validates that the information provided is correct, and then continuously monitors the integration.
Refer to the following pages for instructions to setup supported external cloud storage providers:
Oncology Walk-through
This walk-through is intended to represent a typical workflow when building and studying a cohort of oncology cases.
Create a Cancer Cohort and View Subject Details
Click Create Cohort button.
Data Transfer
The ICA CLI can be used for uploading, downloading and viewing information about data stored within ICA projects. If not already authenticated, please see the section. Once the CLI has been authenticated with your account, use the command below to list all projects:
icav2`` projects list
The first column of the output (in default table format) will show the ID. This is the project-id and will be used in the examples below.
Details
The project details page contains the properties of the project, such as the location, owner, storage and linked bundles. This is also the place where you add assets in the form of linked .
The project details are configured during project creation and may be updated by the project owner, entities with the project Adminstrator role, and tenant administrators.
Adding Linked Bundle Assets
Spark on ICA Bench
Running a Spark application in a Bench Spark Cluster
Running a pyspark application
The JupyterLab environment is by default configured with 3 additional kernels
PySpark –
Non-Indexed Folders
Non-indexed folders () are designed for optimal performance in situations where no file actions are needed. They serve as fast storage in situations like temporary analysis file storage where you don't need access or searches via the GUI to individual files or subfolders within the folder. Think of a non-indexed folder as a data container. You can access the container which contains all the data, but you can not access the individual data files within the container from the GUI. As non-indexed folders contain data, they count towards your total project storage.
You can see the size of a non-indexed folder as part of the data details screen (projects > your_project > data > your_non-indexed_data > data details tab) and in the data view (projects > your_project > data).
There can be a noticeable delay before the size of a non-indexed folder is updated after changes because of how the data is handled.
FUSE Driver
Bench Workspaces use a FUSE driver to mount project data directly into a workspace file system. There are both read and write capabilities with some limitations on write capabilities that are enforced by the underlying AWS S3 storage.
As a user, you are allowed to do the following actions from Bench (when having the correct user permissions compared to the workspace permissions) or through the CLI:
Copy project data
Delete project data
Reference Data
Reference Data are reference genome sets which you use to help look for deviations and to compare your data against.
Creating Reference Data
Reference data properties are located at the main navigation level and consist of the following free text fields.
Config Settings
The ICA CLI accepts configuration settings from multiple places, such as environment variables, configuration file, or passed in as command line arguments. When configuration settings are retrieved, the following precedence is used to determine which setting to apply:
Command line options - Passed in with the command such as --access-token
Environment variables - Stored in system environment variables such as ICAV2_ACCESS_TOKEN
Activity
The Activity view (Projects > your_project > Activity) shows the status and history of long-running activities including Data Transfers, Base Jobs, Base Activity, Bench Activity and Batch Jobs.
Data Transfers
The Data Transfers tab shows the status of data uploads and downloads. You can sort, search and filter on various criteria and export the information. Show ongoing transfers (top right) allows you to filter out the completed and failed transfers to focus on current activity.
Transfers with a yellow background indicate that rules have been modified in ways that prevent planned files from being uploaded. Please verify your service connectors to resolve this.
Prepare Metadata Sheets
In ICA Cohorts, metadata describe any subjects and samples imported into the system in terms of attributes, including:
subject:
demographics such as age, sex, ancestry;
Mount project data (CLI only)
Unmount project data (CLI only)
When you have a running workspace, you will find a file system in Bench under the project folder along with the basic and advanced tutorials. When opening that folder, you will see all the data that resides in your project.
This is a fully mounted version of the project data. Changes in the workspace to project data cannot be undone.
Copy project data
The FUSE driver allows the user to easily copy data from /data/project to the local workspace and vice versa.
There is a file size limit of 500 GB per file for the FUSE driver.
Delete project data
The FUSE driver also allows you to delete data from your project. This is different from the use of Bench before where you took a local copy and still kept the original file in your project.
Deleting project data through Bench workspace through the FUSE driver will permanently delete the data in the Project. This action cannot be undone.
CLI
Using the FUSE driver through the CLI is not supported for Windows users. Linux users will be able to use the CLI without any further actions, Mac users will need to install the kernel extension from macFuse.
MacOS uses hidden metadata files beginning with ._ ,which are copied over and exposed during CLI copy to your project data. These can be safely deleted from your project.
Mount and unmount of data needs to be done through the CLI. In Bench this happens automatically and is not needed anymore.
Do NOT use the CP -f command to copy or move data to a mounted location. This will result in data loss as data on the destination location will be deleted.
Restrictions
Once a file is written, it cannot be changed! You will not be able to update it in the project location because of the restrictions mentioned above.
Trying to update files or saving you notebook in the project folder will typically result in File Save Error for fusedrivererror.ipynb Invalid response: 500 Internal Server Error.
Some examples of other actions or commands that will not work because of the above mentioned limitations:
Save a jupyter notebook or R script on the /project location
Add/remove a file from an existing zip file
Redirect with append to an existing file e.g. echo "This will not work" >> myTextFile.txt
Rename a file due to the existing association between ICA and AWS
Move files or folders.
Using vi or another editor
A file can be written only sequentially. This is a restriction that comes from the library the FUSE driver uses to store data in AWS. That library supports only sequential writing, random writes are currently not supported. The FUSE driver will detect random writes and the write will fail with an IO error return code. Zip will not work since zip writes a table of contents at the end of the file. Please use gzip.
Listing data (ls -l) reads data from the platform. The actual data comes from AWS and there can be a short delay between the writing of the data and the listing being up to date. As a result, a file that is written may appear temporarily as a zero length file, a file that is deleted may appear in the file list. This is a tradeoff, the FUSE driver caches some information for a limited time and during that time the information may seem wrong. Note that besides the FUSE driver, the library used by the FUSE driver to implement the raw FUSE protocol and the OS kernel itself may also do caching.
Jupyter notebooks
To use a specific file in a jupyter notebook, you will need to use '/data/project/filename'.
Old Bench workspaces
This functionality won't work for old workspaces unless you enable the permissions for that old workspace.
phenotypes and diseases;
biometrics such as body height, body mass index, etc.;
pathological classification, tumor stages, etc.;
family and patient medical history;
sample:
sample type such as FFPE,
tissue type,
sequencing technology: whole genome DNA-sequencing, RNAseq, single-cell RNAseq, among others.
You can use these attributes while creating a cohort to define the cases and/or controls that you want to include.
During import, you will be asked to upload a metadata sheet as a tab-delimited (TSV) file. An example sheet is available for download on the Import files page in the ICA Cohorts UI.
A metadata sheet will need to contain at least these four columns per row:
Subject ID - identifier referring to individuals; use the column header "SubjectID".
Sample ID - identifier for a sample. Sample IDs need to match the corresponding column header in VCF/GVCFs; each subject can have multiple samples, these need to be specified in individual rows for the same SubjectID; use the column header "SampleID".
Biological sex - can be "Female (XX)", "Female"; "Male (XY)", "Male"; "X (Turner's)"; "XXY (Klinefelter)"; "XYY"; "XXXY" or "Not provided". Use the column header "DM_Sex" (demographics).
Sequencing technology - can be "Whole genome sequencing", "Whole exome sequencing", "Targeted sequencing panels", or "RNA-seq"; use the column header "TC" (technology).
You can download an example of a metadata sheet, which contains some samples from The Cancer Genome Atlas (TCGA) and their publicly available clincal attributes, here: ICA_Cohorts_Example_Metadata.tsv
A list of concepts and diagnoses that cover all public data subjects to easily navigate the new concept code browser for diagnosis can be found here: PublicData_AllConditionsSummarized.xlsx
ICA project ID
ICA_PROJECT_UUID
Current ICA project UUID
ICA_SNOWFLAKE_ACCOUNT
ICA Snowflake (Base) Account ID
ICA_SNOWFLAKE_DATABASE
ICA Snowflake (Base) Database ID
ICA_PROJECT_TENANT_NAME
Name of the owning tenant of the project where the workspace is created.
ICA_STARTING_USER_TENANT_NAME
Name of the tenant of the user which last started the workspace.
ICA_COHORTS_URL
URL of the Cohorts web application used to support the Cohort's view
In this example, we will upload a file called Sample-1_S1_L001_R1_001.fastq.gz to the project. Copy your project-id obtained above and use the following command:
To check if the file has uploaded, run the following command to get a list of all files stored within the specified project:
icav2`` projectdata list ``--project-id <project-id>
This will show a file ID starting with fil. which can be used to get more information about the file and attributes.
icav2`` projectdata get ``<file-id> --project-id <project-id>
We have to use --project-id in the examples above because we have not entered into a specific project context. To enter a project context use the following command.
icav2`` projects enter ``<project-name or project-id>
This will infer the project-id, so that it does not need to be entered for each command.
Uploading Multiple Files
You can only upload individual files with the projectdata upload command, wildcards are not supported.
If you want to upload multiple files in a folder, based on a common name, you can use the following method from the folder where the files are located ls VAL-0* | xargs -I {} icav2 projectdata upload "{}" /my_upload/my_files/ --project-id <project-id>
ls VAL-0* lists all files in the current directory whose names start with VAL-O, for example VAL-001.txt, VAL-002.bin,...
| The pipe symbol takes the output of the ls command and passes is as input to the next command
xargs -I {} take the list of files and execute the next command while replacing the curly brackets for every individual file.
icav2 projectdata upload "{}" /my_upload/my_files/ This command gets executed for each file and uploads that file to the /my_upload/my_files/ folder
--project-id <project-id> The id of the project in which you want to upload the files (only needed if you have not entered a project context)
Filenames beginning with / are not allowed, so be careful when entering full path names as those will result in the file being stored on S3 but not being visible in ICA. Likewise, folders containing a / in their individual folder name and folders named '.' are not supported
Download Data
The ICA CLI can also be used to download files. This can be especially useful if the download destination is a remote server or HPC cluster into which you are logged in. To download data into the current folder, run the following command:
To fetch temporary AWS credentials for given project data, use the command icav2`` projectdata temporarycredentials ``[path or data Id] [flags]. If the path is provided, the project id from the flag --project-id is used. If the --project-id flag is not present, then the project id is taken from the context. The returned AWS credentials for file or folder upload expire after 36 hours.
Data Transfer Options
For information on options such as using the ICA API and AWS CLI to transfer data, visit the Data Transfer Options tutorial.
The Base Jobs tab gives an overview of all the actions related to a table or a query that have run or are running (e.g., Copy table, export table, Select * from table, etc.) If a job is still running, you can abort it from this screen
The jobs are shown with their:
Creation time: When did the job start
Description: The query or the performed action with some extra information
Type: Which action was taken
Status: Failed or succeeded
Duration: How long the job took
Billed bytes: The used bytes that need to be paid for
Failed jobs provide information on why the job failed. Double-click the job to see more details. If a job is retried, the original failed job will still remain visible here.
Base Activity
The Base Activity tab gives an overview of previous results (e.g., Executed query, Succeeded Exporting table, Created table, etc.) Collecting this information can take considerable time. For performance reasons, only the activity of the last month (rolling window) with a limit of 1000 records is shown.
You can use the Export function at the bottom of the screen to download the current page or selected rows in Excel, JSV or JSON format.
To get the data for the last year without limit on the number of records, use the export to project file function at the top of the screen. This will export the data in JSON format as either a single file or split over multiple files according to your desired file size. No activity data is retained for more than one year.
The activities are shown with:
Start Time: The moment the action was started
Query: The SQL expression.
Status: Failed or succeeded
Duration: How long the job took
User: The user that requested the action
Size: For SELECT queries, the size of the query results is shown. Queries resulting in less than 100Kb of data will be shown with a size of <100K
Bench Activity
The Bench Activity tab shows the actions taken on Bench Workspaces in the project.
The activities are shown with:
Workspace: Workspace where the activity took place
Date: Date and time of the activity
User: User who performed the activity
Action: Which activity was performed
Batch Jobs
The Batch Jobs tab allows users to monitor progress of Batch Jobs in the project. It lists Data Downloads, Sample Creation (double-click entries for details) and Data Linking (double-click entries for details). The (ongoing) Batch Job details are updated each time they are (re)opened, or when the refresh button is selected at the bottom of the details screen. Batch jobs which have a final state such as Failed or Succeeded are removed from the activity list after 7 days.
Which batch jobs are visible depends on the user role.
Contributor gives you the right to start and stop the Bench workspace and to access the workspace contents, but not to create or edit the workspace.
Administrator gives you the right to create, edit, delete, start and stop the Bench workspace, and to access the actual workspace contents. In addition, the administrator can also build new derived Bench images and tools.
Finally, a verification is done of your user rights against the required workspace permissions. You will only have access when your user rights meet or exceed the required workspace permissions. The possible required Workspace permissions include:
Upload / Download rights (Download rights are mandatory for technical reasons)
Project Level (No Access / Data Provider / Viewer / Contributor)
Default config file - Stored by default in the ~/.icav2/config.yaml on macOS/Linux and C:\Users\USERNAME\.icav2\.config on Windows
Command Line Options
The following global flags are available in the CLI interface:
Environment Variables
Environment variables provide another way to specify configuration settings. Variable names align with the command line options with the following modifications:
Upper cased
Prefix ICAV2_
All dashes replaced by underscore
For example, the corresponding environment variable name for the --access-token flag is ICAV2_ACCESS_TOKEN.
Disable Retry Mechanism
The environment variable ICAV2_ICA_NO_RETRY_RATE_LIMITING allows to disable the retry mechanism. When it is set to 1, no retries are performed. For any other value, http code 429 will result in 4 retry attempts:
after 500 milliseconds
after 2 seconds
after 10 seconds
after 30 seconds
Config File
Upon launching icav2 for the first time, the configuration yaml file is created and the default config settings are set. Enter an alternative server URL or press enter to leave it as the default. Then enter your API Key and press enter.
After installing the CLI, open a terminal window and enter the icav2 command. This will initialize a default configuration file in the home folder at .icav2/config.yaml.
To reset the configuration, use ./icav2 config reset
Resetting the configuration removes the configuration from the host device and cannot be undone. The configuration needs to be recreated.
Configuration settings is stored in the default configuration file:
The file ~/.icav2/.session.ica.yamlon macOS/Linux and C:\Users\USERNAME\.icav2\.session.ica on Windows will contain the access-token and project-id. These are output files and should not be edited as they are automatically updated.
requirements:
ResourceRequirement:
https://platform.illumina.com/rdf/ica/resources:type: fpga2
https://platform.illumina.com/rdf/ica/resources:size: medium
https://platform.illumina.com/rdf/ica/resources:tier: standard
requirements:
ResourceRequirement:
https://platform.illumina.com/rdf/ica/resources:type: himem
https://platform.illumina.com/rdf/ica/resources:size: small
https://platform.illumina.com/rdf/ica/resources:tier: economy
% icav2 projects list
ID NAME OWNER
422d5119-708b-4062-b91b-b398a3371eab demo b23f3ea6-9a84-3609-bf1d-19f1ea931fa3
% icav2 projectdata mount mnt --project-id 422d5119-708b-4062-b91b-b398a3371eab
% ls mnt
sampleX.final.count.tsv
% icav2 projectdata unmount
Project with identifier 422d5119-708b-4062-b91b-b398a3371eab was unmounted from mnt.
-t, --access-token string JWT used to call rest service
-h, --help help for icav2
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
folder. If you do not have write access to that folder, then open a CMD window in administrative mode (hold down the SHIFT key as you right-click on the CMD application and select "Run as administrator"). Type in the following commands (assuming you have saved ica.exe in your current folder):
Then you make sure that the C:\Program Files\Illumina folder is included in your %path% list of paths. Please do a web search for how to add a path to your %path% system environment variable for your particular version of Windows.
sudocpicav2/usr/local/bin
sudochmoda+x/usr/local/bin/icav2
mkdir"C:\Program Files\Illumina"copyicav2.exe"
Credentials
The storage configuration requires credentials to connect to your storage. AWS uses the security credentials to authenticate and authorize your requests. On the System Settings > Credentials > Create > Storage Credential, you can enter these credentials. Long-term access keys consist of a combination of the access key ID and secret access key as a set.
Fill out the following fields:
Type—The type of access credentials. This will usually be AWS user.
Name—Provide a name to easily identify your access key.
Access key ID—The access key you created.
Secret access key—Your related secret access key.
You can share the credentials you own with other users of your tenant. To do so select your credentials at System Settings > Credentials and choose Share.
In the ICA main navigation, select System Settings > Storage > Create.
Configure the following settings for the storage configuration.
Type—Use the default value, eg, AWS_S3. Do not change.
Region—Select the region where the bucket is located.
Configuration name—You will use this name when creating volumes that reside in the bucket. The name length must be in between 3 and 63 characters.
Description—Here you can provide a description for yourself or other users to identify this storage configuration.
Bucket name—Enter the name of your S3 bucket.
Key prefix —You can provide a key prefix to allow only files inside the prefix to be accessible. Although this setting is optional, it is highly recommended to use a key prefix and mandatory when using . The key prefix must end with "/".
If a key prefix is specified, your projects will only have access to that folder and subfolders. For example, using the key prefix folder-1/ ensures that only the data from the folder-1 folder in your S3 bucket is synced with your ICA project. Using prefixes and distinct folders for each ICA project is the recommended configuration as it allows you to use the same S3 bucket for different projects.
Secret—Select the credentials to associate with this storage configuration. These were created on the Credentials tab.
Server Side Encryption [Optional]—If needed, you can enter the algorithm and key name for server-side encryption processes.
Select Save.
ICA performs a series of steps in the background to verify the connection to your bucket. This can take several minutes. You may need to manually refresh the list to verify that the bucket was successfully configured. Once the storage configuration setup is complete, the configuration can be used while creating a new project.
With the action Manage > Set as default for region, you select which storage will be used as default storage in a region for new projects of your tenant. Only one storage can be default at a time for a region, so selecting a new storage as default will unselect the previous default. If you do not want to have a default, you can select the default storage and the action will become Unset as default for region.
The SystemSettings > Credentials > select your credentials > Manage > Share action is used to make the storage available to everyone in your tenant. By default, storage is private per user so that you have complete control over the contents. Once you decide you want to share the storage, simply select it and use the Share action. Do take into account that once shared, you can not unshare the storage. Once your shared storage is used in a project, it can also no longer be deleted.
Filenames beginning with / are not allowed, so be careful when entering full path names. Otherwise the file will end up on S3 but not be visible in ICA. If this happens, access your S3 storage directly and copy the data to where it was intended. If you are using an Illumina-managed S3 storage, submit a support request to delete the erroneous data.
Deleting Storage Configurations
In the ICA main navigation, select System Settings > Storage > select your storage > Manage > Delete. You can then create a new storage configuration to reuse the bucket name and key prefix.
Hiding a project will also unlock your storage configuration so that it can be reused for another project. Data stored by the hidden project will remain in your S3 storage, so you may need to perform manual cleanup before reusing the storage.
Storage Configuration Verification
Every 4 hours, ICA will verify the storage configuration and credentials to ensure availability. When an error is detected, ICA will attempt to reconnect once every 15 minutes. After 200 consecutively failed connection attempts (50 hours), ICA will stop trying to connect.
When you update your credentials, the storage configuration is automatically validated. In addition, you can manually trigger revalidation when ICA has stopped trying to connect by selecting the storage and then clicking Validate on the System Settings > Storage > select your storage > Manage > Validate.
ICA supports the following storage classes. Please see the AWS documentation for more information on each:
Object Class
ICA Status
S3 Standard
Available
S3 Intelligent-Tiering
Available
S3 Express One Zone
Available
S3 Standard-IA
Available
S3 One Zone-IA
Available
S3 Glacier Instant Retrieval
Available
If you are using Intelligent Tiering, which allows S3 to automatically move files into different cost-effective storage tiers, please do NOT include the Archive and Deep Archive Access tiers, as these are not supported by ICA yet. Instead, you can use lifecycle rules to automatically move files to Archive after 90 days and Deep Archive after 180 days. Lifecycle rules are supported for user-managed buckets.
Select the following studies to add to your cohort:
TCGA – BRCA – Breast Invasive Carcinoma
TCGA – Ovarian Serous Cystadenocarcinoma
Add a Cohort Name = TCGA Breast and Ovarian_1472
Click on Apply.
Expand Show query details to see the study makeup of your cohort.
Charts will be open by default. If not, click Show charts
Use the gear icon in the top-right to change viewable chart settings.
Tip: Disease Type, Histological Diagnosis, Technology, Overall Survival have interesting data about this cohorts
The Subject tab with all Subjects list is displayed below Charts with a link to each Subject by ID and other high-level information, like Data Types measured and reported. By clicking a subject ID, you will be brought to the data collected at the Subject level.
Search for subject TCGA-E2-A14Y and view the data about this Subject.
Click the TCGA-E2-A14Y Subject ID link to view clinical data for this Subject that was imported via the metadata.tsv file on ingest.
Note: the Subject is a 35 year old Female with vital status and other phenotypes that feed up into the Subject attribute selection criteria when creating or editing cohorts.
Click X to close the Subject details.
Click Hide charts to increase interactive landscape.
Data Analysis, Multi-Omic Biomarker Discovery, and Interpretation
Click the Marker Frequency tab, then click the Somatic Mutation tab.
Review the gene list and mutation frequencies.
Note that PIK3CA has a high rate of mutation in the Cohort (ranked 2nd with 33% mutation frequency in 326 of the 987 Subjects that have Somatic Mutation data in this cohort).
Do Subjects with PIK3CA mutations have changes in PIK3CA RNA Expression?
Click the Gene Expression tab, search for PIK3CA
PIK3CA RNA is down-regulated in 27% of the subjects relative to normal samples.
Switch from normal
Click directly on PIK3CA gene link in the Gene Expression table.
You are brought to the Gene tab under the Gene Summary sub-tab that lists information and links to public resources about PIK3CA.
Click the Variants tab and Show legend and filters if it does not open by default.
Below the interactive legend you see a set of analysis tracks: Needle Plot, Primate AI, Pathogenic variants, and Exons.
The Needle Plot allows toggling the plot by gnomAD frequency and Sample Count. Select Sample Count in the Plot by legend above the plot.
There are 87 mutations distributed across the 1068 amino acid sequence, listed below the analysis tracks. These can be exported via the icon into a table.
We know that missense variants can severely disrupt translated protein activity. Deselect all Variant Types except for Missense from the Show Variant Type legend above the needle plot.
Many mutations are in the functional domains of the protein as seen by the colored boxes and labels on the x-axis of the Needle Plot.
Hover over the variant with the highest sample count in the yellow PI3Ka protein domain.
The pop-up shows variant details for the 64 Subjects observed with it: 63 in the Breast Cancer study and 1 in the Ovarian Cancer Study.
Use the Exon zoom bar from each end of the Amino Acid sequence to zoom in to the PI3Ka domain to better separate observations.
There are three different missense mutations at this locus changing the wildtype Glutamine at different frequencies to Lysine (64), Glycine (6), or Alanine (2).
The Pathogenic Variant Track shows 7 ClinVar entries for mutations stacked at this locus affecting amino acid 545. Pop up details with pathogenicity calls, phenotypes, submitter and a link to the ClinVar entry is seen by hovering over the purple triangles.
Note the Primate AI track and high Primate AI score.
Primate AI track displays Scores for potential missense variants, based on polymorphisms observed in primate species. Points above the dashed line for the 75th percentile may be considered likely pathogenic as cross-species sequence is highly conserved; you often see high conservancy at the functional domains. Points below the 25th percentile may be considered "likely benign".
Click the Expression tab and notice that normal Breast and normal Ovarian tissue have relatively high PIK3CA RNA Expression in GTex RNAseq tissue data but ubiquitously expressed.
Click the Edit button at the top of the Details page.
Click the + button, under LINKED BUNDLES.
Click on the desired bundle, then click the Link button.
Click Save.
If your linked bundle contained a pipeline, then it will appear in Projects > your_project > Flow > Pipelines.
Details List
Detail
Description
Name
Name of the project unique within the tenant. Alphanumerics, underscores, dashes, and spaces are permitted.
Short Description
Short description of the project
Project Owner
Owner of the project (has Administrator access to the project)
Storage Configuration
Storage configuration to use for data stored in the project
User Tags
User tags on the project
Technical Tags
Technical tags on the project
Billing Mode
A project's billing mode determines the strategy for how costs are charged to billable accounts.
Billing Mode
Description
Project
All incurred costs will be charged to the tenant of the project owner
Tenant
Incurred costs will be charged to the tenant of the user owning the project resource (ie, data, analysis). The only exceptions are base tables and queries, as well as bench compute and storage costs, which are always billed to the project owner.
For example, with billing mode set to Tenant, if tenant A has created a project resource and uses it in their project, then tenant A will pay for the resource data, compute costs and storage costs of any output they generate within the project. When they share the project with tenant B, then tenant B will pay the compute and storage for the data which they generate in that project. Put simply, in billing mode tenant, the person who generates data pays for the processing and storage of that data, regardless of who owns the actual project.
Benchworkspaces always use project billing even when tenant billing is selected on their project.
If the project billing mode is updated after the project has been created, the updated billing mode will only be applied to resources generated after the change.
If you are using your own S3 storage, then the billing mode impacts where collaborator data is stored.
Project billing will result in using your S3 storage for the data.
Tenant billing will result in collaborator data being stored in Illumina-managed storage instead of your own S3 storage.
Tenant billing, when your collaborators also have their own S3 storage and have it set as default, will result in their data being stored in their S3 storage.
Authentication Token
Use the Create OAuth access token button to generate an OAuth access token which is valid for 12 hours after generation. This token can be used by Snowflake and Tableau to access the data in your Base databases and tables for this Project.
When one of the above kernels is selected, the spark context is automatically initialised and can be accessed using the sc object.
PySpark - Local
The PySpark - Local runtime environment launches the spark driver locally on the workspace node and all spark executors are created locally on the same node. It does not require a spark cluster to run and can be used for running smaller spark applications which don’t exceed the capacity of a single node.
The spark configuration can be found at /data/.spark/local/conf/spark-defaults.conf.
Making changes to the configuration requires a restart of the Jupyter kernel.
PySpark - Remote
The PySpark – Remote runtime environment launches the spark driver locally on the workspace node and interacts with the Manager for scheduling tasks onto executors created across the Bench Cluster.
This configuration will not dynamically spin up executors, hence it will not trigger the cluster to auto scale when using a Dynamic Bench cluster.
The spark configuration can be found at /data/.spark/remote/conf/spark-defaults.conf.
Making changes to the configuration requires a restart of the Jupyter kernel.
PySpark – Remote - Dynamic
The PySpark – Remote - Dynamic runtime environment launches the spark driver locally on the workspace node and interacts with the Manager for scheduling tasks onto executors created across the Bench Cluster.
This configuration will increase/decrease the required executors which will result into a cluster that auto scales using a Dynamic Bench cluster
The spark configuration can be found at /data/.spark/remote/conf-dynamic/spark-defaults.conf.
Making changes to the configuration requires a restart of the Jupyter kernel.
Job resources
Every cluster member has a certain capacity depending on the selection of the Resource model for the member.
A spark application consists of 1 or more jobs. Each Job consists out of one or more stages. Each stage consists out of one or more tasks. Task are handled by executors and executors are run on a worker (cluster member).
The following setting define the amount of cpus needed per task
The following settings define the size of a single executor which handles the execution of a task
The above example allows an executor to handle 4 tasks concurrently and share a total capacity of 4Gb of memory. Depending on the resource model chosen (e.g. standard-2xlarge) a single cluster member (worker node) is able to run multiple executors concurrently (e.g. 32 cores, 128 Gb for 8 concurrent executors on a single cluster member)
Spark User Interface
The Spark UI can be accessed via the Cluster. The Web Access URL is displayed in the Workspace details page
This Spark UI will register all applications submitted when using one of the Remote Jupyter kernels. It will provide an overview of the registered workers (Cluster members) and the applications running in the Spark cluster.
The GUI considers non-indexed folders as a single object. You can access the contents from a non-indexed folder
as Analysis input/output
in Bench
via the API
Action
Allowed
Details
Creation
Yes
You can create non-indexed folders at Projects > your_project > Data > Manage > Create non-indexed folder. or with the /api/projects/{projectId}/data:createNonIndexedFolder
Deletion
Yes
You can delete non-indexed folders by selecting them at Projects > your_project > Data > select the folder > Manage > Delete.
or with the /api/projects/{projectId}/data/{dataId}:delete endpoint
Uploading Data
API
Bench
Analysis
Use non-indexed folders as normal folders for Analysis runs and bench. Different methods are available with the API such as creating temporary credentials to upload data to S3 or using /api/projects/{projectId}/data:createFileWithUploadUrl
Downloading Data
Yes
Use non-indexed folders as normal folders for Analysis runs and bench. Use temporary credentials to list and download data with the API.
Types
Species
Reference Sets
Once these are configured,
Go to your data in Projects > your_project > Data.
Select the data you want to use as reference data and Manage > Use as reference data.
Fill out the configuration screen
You can see the result at the main navigation level > Reference Data (outside of projects) or in Projects > your_project > Flow > Reference Data.
Linking Reference Data to your Project
To use a reference set from within a project, you have first to add it. Select Projects > your_project > Flow > Reference Data > Link. Then select a reference set to add to your project.
Reference sets are only supported in Graphical CWL pipelines.
Copying Reference Data to other Regions
Navigate to Reference Data (Not from within a project, but outside of project context, so at the main navigation level).
Select the data set(s) you wish to add to another region and select Copy to another project.
Select a project located in the region where you want to add your reference data.
You can check in which region(s) Reference data is present by opening the Reference set and viewing Copy Details.
Allow a few minutes for new copies to become available before use.
You only need one copy of each reference data set per region. Adding Reference Data sets to additional projects set in the same region does not result in extra copies, but creates links instead. This is done from inside the project at Projects > <your_project> > Flow > Reference Data > Manage > Add to project.
Creating a Pipeline with Reference Data
To create a pipeline with a reference data, use the CWL - graphical mode. Projects > your_project > Flow > Pipelines > +Create > CWL Graphical. Use the reference data icon instead of regular input icon. On the right hand side use the Reference files submenu to specify the name, the format, and the filters. You can specify the options for an end-user to choose from and a default selection. You can select more than 1 file, but you can only select 1 at a time (so, repeat process to select multiple reference files). If you only select 1 reference file, that file will be the only one users can use with your pipeline. In the screenshot a reference data with two options is presented.
Safari is not supported for graphical CWL data.
Two options for a reference file
If your pipeline was built to give users the option of choosing among multiple input reference files, they will see the option to select among the reference files you configured, under Settings. After clicking the magnifying glass icon the user can select from provided options.
Pay close attention to uppercase and lowercase characters when creating pipelines.
Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create > Nextflow> JSON based button to start creating a Nextflow pipeline.
In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.
Nextflow files
split.nf
First, we present the individual processes. Select Nextflow files > + Create and label the file split.nf. Copy and paste the following definition.
sort.nf
Next, select +Create and name the file sort.nf. Copy and paste the following definition.
merge.nf
Select +Create again and label the file merge.nf. Copy and paste the following definition.
main.nf
Edit the main.nf file by navigating to the Nextflow files > main.nf tab and copying and pasting the following definition.
Here, the operators flatten and collect are used to transform the emitting channels. The Flatten operator transforms a channel in such a way that every item of type Collection or Array is flattened so that each single entry is emitted separately by the resulting channel. The collect operator collects all the items emitted by a channel to a List and return the resulting object as a sole emission.
Inputform files
On the Inputform files tab, edit the inputForm.json to allow selection of a file.
inputForm.json
Click the Simulate button (at the bottom of the text editor) to preview the launch form fields.
The onSubmit.js and onRender.js can remain with their default scripts and are just shown here for reference.
onSubmit.js
onRender.js
Click the Save button to save the changes.
Compare Cohorts
You can compare up to four previously created individual cohorts, to view differences in variants and mutations, RNA expression, copy number variation, and distribution of clinical attributes. Once comparisons are created, they are saved in the Comparisons left-navigation tab of the Cohorts module.
Create a comparison view
Select Cohorts from the left-navigation panel.
Select 2 to 4 cohorts already created. If you have not created any cohorts, See Create a Cohort documentation.
Click Compare Cohorts in the right-navigation panel.
Note you are now in the Comparisons left-navigation tab of the Cohorts module.
In the Charts Section, if the COHORTS item is not displayed, click the gear icon in the top right and add Cohorts as the first attribute and click Save.
The COHORTS item in the charts panel will provide a count of subjects in each cohort and act as a legend for color representation throughout comparison screens.
For each clinical attribute category, a bar chart is displayed. Use the gear icon to select attributes to display in the charts panel.
You can share a comparison with other team members in the same ICA Project. Please refer to the section on "Sharing a Cohort" on "Create a Cohort" for details on sharing, unsharing, deleting, and archiving, which are analogous for sharing comparisons.
Attribute Comparison
Select the Attributes tab
Attribute categories are listed and can be expanded using the down-arrows next to the category names. The categories available are based on cohorts selected. Categories and attributes are part of the ICA Cohorts metadata template that map to each Subject.
For example, use the drop-down arrow next to Vital status to view sub-categories and frequencies across selected cohorts.
Variants Comparison
Select the Genes tab
Search for a gene of interest using its HUGO/HGNC gene symbol
Variants and mutations will be displayed as one needle plot for each cohort that is part of the comparison (see in this online help for more details)
Survival Summary
Select the Survival Summary tab.
Attribute categories are listed and can be expanded using the down-arrows next to the category names.
Select the drop-down arrow for Therapeutic interventions.
Survival Comparison
Click Survival Comparison tab.
A Kaplan-Meier Curve is rendered based on each cohort.
P-Value Displayed at the top of Survival Comparison indicates whether there is statistically significant variance between survival probabilities over time of any pair of cohorts (CI=0.95).
When comparing two cohorts, the P-Value is shown above the two survival curves. For three or four cohorts, P-Values are shown as a pair-wise heatmap, comparing each cohort to every other cohort.
Marker Frequency Comparison
Select the Marker Frequency tab.
Select either Gene expression (default), Somatic mutation, or Copy number variation
Correlation Comparison
Select the Correlation tab.
Similar to the single-cohort view (Cohort Analysis | Correlation), choose two clinical attributes and/or genes to compare.
Depending on the available data types for the two selections (categorical and/or continuous), Cohorts will display a bubble plot, violin plot, or scatter plot.
Docker Repository
In order to create a Tool or Bench image, a Docker image is required to run the application in a containerized environment.
Illumina Connected Analytics supports both public Docker images and private Docker images uploaded to ICA.
Use Docker images built for x86 architecture or multi-platform images that support x86. You can build Docker images that support both ARM and X86 structure.
Importing a Public External Image (Tools)
Navigate to System Settings > Docker Repository.
Click Create > External image to add a new external image.
Add your full image URL in the Url field, e.g. docker.io/alpine:latest or registry.hub.docker.com/library/alpine:latest. Docker Name and Version will auto-populate. (Tip: do not add http:// or https:// in your URL)
Do not use :latest when the repository has rate limiting enabled as this interferes with caching and incurs additional data transfer.
(Optional) Complete the Description field.
Click Save.
The newly added image will appear in your Docker Repository list. You can differentiate between internal and external images by looking at the Source column. If this column is not visible, you can add it with the columns icon ().
Verification of the URL is performed during execution of a pipeline which depends on the Docker image, not during configuration.
External images are accessed from the external source whenever required and not stored in ICA. Therefore, it is important not to move or delete the external source. There is no status displayed on external Docker repositories in the overview as ICA cannot guarantee their availability.
The use of :stable instead of :latest is recommended.
Importing a Private Image (Tools + Bench Images)
In order to use private images in your tool, you must first upload them as a TAR file.
Navigate to Projects > your_project .
Upload your private image as a TAR file, either by dragging and dropping the file in the Data tab, using the CLI or a Connector. For more information please refer to project .
Select your uploaded TAR file and click in the top menu on Manage > Change Format .
Copying Docker Images to other Regions
Navigate to System Settings > Docker Repository.
Either
Select the required image(s) and go to Manage > Add Region.
To remove regions, go to Manage > Remove Region or unselect the regions from the Docker image detail view.
Downloading Docker Images
You can download your created Docker images at System Settings > Docker Images > your_Docker_image > Manage > Download.
In order to be able to download Docker images, the following requirements must be met:
The Docker image can not be from an entitled bundle.
Only self-created Docker images can be downloaded.
The Docker image must be an internal image and in status Available.
File Size Considerations
Docker image size should be kept as small as practically possible. To this end, it is best practice to compress the image. After compressing and uploading the image, select your uploaded file and click Manage > Change Format in the top menu to change it to Docker format so ICA can recognize the file.
In either FPGA mode (hardware-accelerated) or software mode when using FPGA instances. This can be useful when comparing performance gains by hardware acceleration or to distribute concurrent processes between the FPGA and cpu.
In software mode when using non-FPGA instances.
To run DRAGEN in software mode, you need to use the DRAGEN --sw-mode parameter.
The DRAGEN command line parameters to specify the location of the licence file are different.
DRAGEN software is provided in specific Bench images with names starting with Dragen. For example (available versions may vary):
Dragen 4.4.1 - Minimal provides DRAGEN 4.4.1 and SSH access
Dragen 4.4.6 provides DRAGEN 4.4.6, SSH and JupyterLab.
Prerequisites
Memory
The instance type is selected during workspace creation (Projects > your_project > Bench > Workspaces). The amount of RAM available on the instance is critical. 256GiB RAM is a safe choice to run DRAGEN in production. All FPGA2 instances offer 256GiB or more of RAM.
When running in Software mode, use (348GiB RAM) or (144 GiB RAM) to ensure enough RAM is available for your runs.
During pipeline development, when typically using small amounts of data, you can try to scale down in instance types to save costs. You can start at hicpu-large and progressively use smaller instances, though you will need at least standard-xlarge.
If DRAGEN runs out of available memory, the system is rebooted, losing your currently running commands and interface.
DRAGEN version 4.4.6 and later verify if the system has at least 128GB of memory available. If not enough memory is available, you will encounter an error stating that the Available memory is less than the minimum system memory required 128GB.
This can be overridden with the command line parameter dragen --min-memory 0
FPGA-mode
Using an fpga2-medium .
Example
Software-mode
Using a standard-xlarge .
Software mode is activated with the DRAGEN --sw-mode parameter.
Example
Rare Genetic Disorders Walk-through
Cohorts Walk-through: Rare Genetic Disorders
This walk-through is meant to represent a typical workflow when building and studying a cohort of rare genetic disorder cases.
Login and Create a new ICA Project
Create a new Project to track your study:
Login to the ICA
Navigate to Projects
Create a new project using the New Project button.
Create and Review a Rare Disease Cohort
Navigate to the ICA Cohorts module by clicking Cohorts in the left navigation panel.
Click Create Cohort button.
Enter a name for your cohort, like Rare Disease + 1kGP at top, left of pencil icon.
Analyze Your Rare Disease Cohort Data
A recent GWAS publication identified 10 risk genes for intellectual disability (ID) and autism. Our task is to evaluate them in ICA Cohorts: TTN, PKHD1, ANKRD11, ARID1B, ASXL3, SCN2A, FHL1, KMT2A, DDX3X, SYNGAP1.
First let’s Hide charts for more visual space.
Click the Genes tab where you need to query a gene to see and interact with results.
Project Connector
The platform GUI provides the Project Connector utility which allows data to be linked automatically between projects. This creates a one-way dynamic link for files and samples from source to destination, meaning that additions and deletions of data in the source project also affect the destination project. This differs from copying or moving which create editable copies of the data. In the destination project, you can delete data which has been moved or copied and unlink data which has been linked.
one-way
files
folders
erases source data
propagate source edits
editable on destination
Prepare Source Project
Select the source project (project that will own the data to be linked) from the Projects page (Projects > your_source_project).
Select Project Settings > Details.
Select Edit
Creating a New Project Connector
Select the destination project (the project to which data from the source project will be linked) from the Projects page (Projects > your_destination_project).
From the projects menu, select Project Settings > Connectivity > Project Connector
Select + Create and complete the necessary fields.
Filter Expression Examples
The examples below will restrict linking Files based on the Format field.
Only Files with Format of FASTQ will be linked:
[?($.details.format.code == 'FASTQ')]
Only Files with Format of VCF will be linked:
[?($.details.format.code == 'VCF')]
The examples below will restrict linked Files based on a filenames.
Exact match to 'Sample-1_S1_L001_R1_001.fastq.gz':
The examples below will restrict linking Samples based on User Tags and Sample name, respectively.
Only Samples with the User Tag 'WGS-Project-1'
[?('WGS-Project-1' in $.tags.userTags)]
Link a Sample with the name 'BSSH_Sample_1':
[?($.name == 'BSSH_Sample_1')]
Base: Access Tables via Python
You can access the databases and tables within the Base module using Python from your local machine. Once retrieved as e.g. pandas object, the data can be processed further. In this tutorial, we will describe how you could create a Python script which will retrieve the data and visualize it using Dash framework. The script will contain the following parts:
Importing dependencies and variables.
Function to fetch the data from Base table.
Creating and running the Dash app.
Importing dependencies and variables
This part of the code imports the dependencies which have to be installed on your machine (possibly with pip). Furthermore, it imports the variables API_KEY and PROJECT_ID from the file named config.
Function to fetch the data from Base table
We will be creating a function called fetch_data to obtain the data from Base table. It can be broken into several logically separated parts:
Retrieving the token to access the Base table together with other variables using API.
Establishing the connection using the token.
SQL query itself. In this particular example, we are extracting values from two tables Demo_Ingesting_Metrics and BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL. The table Demo_Ingesting_Metrics contains various metrics from DRAGEN analyses (e.g. the number of bases with quality at least 30 Q30_BASES) and metadata in the column ica which needs to be flattened to access the value
Here is the corresponding snippet:
Creating and running the Dash app
Once the data is fetched, it is visualized in an app. In this particular example, a scatter plot is presented with END_DATE as x axis and the choice of the customer from the dropdown as y axis.
Now we can create a single Python script called dashboard.py by concatenating the snippets and running it. The dashboard will be accessible in the browser on your machine.
Base: SnowSQL
You can access the databases and tables within the Base module using snowSQL command-line interface. This is useful for external collaborators who do not have access to ICA core functionalities. In this tutorial we will describe how to obtain the token and use it for accessing the Base module. This tutorial does not cover how to install and configure snowSQL.
Obtaining OAuth token and URL
Once the Base module has been enabled within a project, the following details are shown in Projects > your_project > Project Settings > Details.
base-enabled-oauth
After clicking the button Create OAuth access token, the pop-up authenticator is displayed.
After clicking the button Generate snowSQL command the pop-up authenticator presents the snowSQL command.
Copy the snowSQL command and run it in the console to log in.
You can also get the OAuth access token via API by providing <PROJECT ID> and <YOUR KEY>.
Example:
API Call:
Response
Template snowSQL:
Now you can perform a variety of tasks such as:
Querying Data: execute SQL queries against tables, views, and other database objects to retrieve data from the Snowflake data warehouse.
Creating and Managing Database Objects: create tables, views, stored procedures, functions, and other database objects in Snowflake. you can also modify and delete these objects as needed.
Loading Data: load data into Snowflake from various sources such as local files, AWS S3, Azure Blob Storage, or Google Cloud Storage.
Overall, snowSQL CLI provides a powerful and flexible interface to work with Snowflake, allowing external users to manage data warehouse and perform a variety of tasks efficiently and effectively without access to the ICA core.
Example Queries:
Show all tables in the database:
Create a new table:
List records in a table:
Load data from a file: To load data from a file, you can start by create a staging area in the internal storage using the following commend:
You can then upload the local file to the internal storage using the following command:
You can check if the file was uploaded properly using LIST command:
Finally, Load data by using COPY TO command. The command assumes the data.tsv is a tab delimited file. You can easily modify the following command to import JSON file setting TYPE=JSON.
Load data from a string: If you have data as JSON string, you can import the data into the tables using following commands.
Load data into specific columns: If you want to load sample_name into the table, you can remove the "count" from the column and the value list as below:
List the views of the database to which you are connected. As shared database and catalogue views are created within the project database, they will be listed. However, it does not show views which are granted via another database, role or from bundles.
Show grants, both directly on the tables and views and grants to roles which in turn have grants on tables and views.
Nextflow: Scatter-gather Method
Nextflow supports scatter-gather patterns natively through Channels. The initial example uses this pattern by splitting the FASTA file into chunks to channel records in the task splitSequences, then by processing these chunks in the task reverse.
In this tutorial, we will create a pipeline which will split a TSV file into chunks, sort them, and merge them together.
Creating the pipeline
Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create > Nextflow> XML based button to start creating a Nextflow pipeline.
In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.
First, we present the individual processes. Select +Nextflow files > + Create and label the file split.nf. Copy and paste the following definition.
Next, select +Create and name the file sort.nf. Copy and paste the following definition.
Select +Create again and label the file merge.nf. Copy and paste the following definition.
Add the corresponding main.nf file by navigating to the Nextflow files > main.nf tab and copying and pasting the following definition.
Here, the operators flatten and collect are used to transform the emitting channels. The Flatten operator transforms a channel in such a way that every item of type Collection or Array is flattened so that each single entry is emitted separately by the resulting channel. The collect operator collects all the items emitted by a channel to a List and return the resulting object as a sole emission.
Finally, copy and paste the following XML configuration into the XML Configuration tab.
Click the Generate button (at the bottom of the text editor) to preview the launch form fields.
Click the Save button to save the changes.
Running the pipeline
Go to the Pipelines page from the left navigation pane. Select the pipeline you just created and click Start New Analysis.
Fill in the required fields indicated by red "*" sign and click on Start button. You can monitor the run from the Analyses page. Once the Status changes to Succeeded, you can click on the run to access the results page.
In Projects > your_project > Flow > Analyses > your_analysis >Steps you can see that the input file is split into multiple chunks, then these chunks are sorted and merged.
Base
Introduction to Base
Base is a genomics data aggregation and knowledge management solution suite. It is a secure and scalable integrated genomics data analysis solution which provides Information management and knowledge mining. You can analyze, aggregate and query data for new insights that can inform and improve diagnostic assay development, clinical trials, patient testing and patient care. Clinically relevant data needs to be generated and extracted from routine clinical testing and clinical questions need to be asked across all data and information sources. As a large data store, Base provides a secure and compliant environment to accumulate data, allowing for efficient exploration of the aggregated data. This data consists of test results, patient data, metadata, reference data, consent and QC data.
Use Cases
Base can be used by for different use cases:
Clinical and Academic Researchers:
Big data storage solution housing all aggregated sample test outcomes
Analyze information by way of a convenient query formalism
Base Action Possibilities
Data Warehouse Creation: Build a relational database for your Project in which desired data sets can be selected and aggregated. Typical data sets include pipeline output metrics and other suitable data files generated by the ICA platform which can be complemented by additional public (or privately built) databases.
Report and Export: Once created, a data warehouse can be mined using standard database query instructions. All Base data is stored in a structured and easily accessible way. An interface allows for the selection of specific datasets and conditional reporting. All queries can be stored, shared, and re-used in the future. This type of standard functionality supports most expected basic mining operations, such as variant frequency aggregation. All result sets can be downloaded or exported in various standard data formats for integration in other reporting or analytical applications.
Access
The Base module can be found at Projects > your_project > Base.
In order to use Base, you need to meet the following requirements:
Subscription
On the domain level, Base needs to be included in the subscription (full and premium subscriptions give access to Base).
Enabling Base
Once a project is created, the project owner must navigate to Projects > your_project > Base and click the Enable button. From that moment on, every user who has the proper permissions has access to the Base module in that project.
Enabling User Access
Access to the projects and Base is configured on the Projects > your_project > Project settings > Team page. Here you can add or edit a user or workgroup and give them .
Activity
The status and history of Base activities and jobs are shown on the page.
Tips and Tricks
Developing on the cloud incurs inherent runtime costs due to compute and storage used to execute workflows. Here are a few tips that can facilitate development.
Leverage the cross-platform nature of these workflow languages. Both CWL and Nextflow can be run locally in addition to on ICA. When possible, testing should be performed locally before attempting to run in the cloud. For Nextflow, can be utilized to specify settings to be used either locally or on ICA. An example of advanced usage of a config would be applying the to a set of process names (or labels) so that they use the higher performance local scratch storage attached to an instance instead of the shared network disk,
When trying to test on the cloud, it's oftentimes beneficial to create scripts to automate the deployment and launching / monitoring process. This can be performed either using the
Schedule
On the Schedule page at Projects > your_project > Base > Schedule, it’s possible to create a job for importing different types of data you have access to into an existing table.
When creating or editing a schedule, Automatic import is performed when the Active box is checked. The job will run at 10 minute intervals. In addition, for both active and inactive schedules, a manual import is performed when selecting the schedule and clicking the »run button.
Configure a schedule
Public Data Sets
ICA Cohorts comes front-loaded with a variety of publicly accessible data sets, covering multiple disease areas and also including healthy individuals.
Data set
Samples
Diseases/Phenotypes
Reference
CWL Graphical Pipeline
This tutorial aims to guide you through the process of creating CWL tools and pipelines from the very beginning. By following the steps and techniques presented here, you will gain the necessary knowledge and skills to develop your own pipelines or transition existing ones to ICA.
Build and push to ICA your own Docker image
The foundation for every tool in ICA is a Docker image (externally published or created by the user). Here we present how to create your own Docker image for the popular tool (FASTQC).
Copy the contents displayed below to a text editor and save it as a Dockerfile. Make sure you use an editor which does not add formatting to the file.
Nextflow: Pipeline Lift: RNASeq
How to lift a simple NextFlow pipeline?
In this tutorial, we will be using the example RNASeq pipeline to demonstrate the process of lifting a simple Nextflow pipeline over to ICA.
This approach is applicable in situations where your main.nf file contains all your pipeline logic and illustrates what the liftover process would look like.
Bench Clusters
Managing a Bench cluster
Introduction
Workspaces can have their own dedicated cluster which consists of a number of nodes. First the workspace node, which is used for interacting with the cluster, is started. Once the workspace node is started, the workspace cluster can be started.
spark.task.cpus 1
spark.executor.cores 4
spark.executor.memory 4g
Give your project a name and click Save.
Navigate to the ICA Cohorts module by clicking COHORTS in the left navigation panel then choose Cohorts.
From the Public Data Sets list select:
DRAGEN-1kGP
All Rare genetic disease cohorts
Notice that a cohort can also be created based on Technology, Disease Type and Tissue.
Under Selected Conditions in right panel, click on Apply
A new page opens with your cohort in a top-level tab.
Expand Query Details to see the study makeup of your cohort.
A set of 4 Charts will be open by default. If they are not, click Show Charts.
Use the gear icon in the top-right of the Charts pane to change chart settings.
The bottom section is demarcated by 8 tabs (Subjects, Marker Frequency, Genes, GWAS, PheWAS, Correlation, Molecular Breakdown, CNV).
The Subjects tab displays a list of exportable Subject IDs and attributes.
Clicking on a Subject ID link pops up a Subject details page.
Type SCN2A into the Gene search field and select it from autocomplete dropdown options.
The Gene Summary tab now lists information and links to public resources about SCN2A.
Click on the Variants tab to see an interactive Legend and analysis tracks.
The Needle Plot displays gnomAD Allele Frequency for variants in your cohort.
Note that some are in SCN2A conserved protein domains.
In Legend, switch the Plot by option to Sample Count in your cohort.
In Legend, uncheck all Variant Types except Stop gained. Now you should see 7 variants.
Hover over pin heads to see pop-up information about particular variants.
The Primate AI track displays Scores for potential missense variants, based on polymorphisms observed in primate species. Points above the dashed line for the 75th percentile may be considered "likely pathogenic" as cross-species sequence is highly conserved; you often see high conservancy at the functional domains. Points below the 25th percentile may be considered "likely benign".
The Pathogenic variants track displays markers from ClinVar color-coded by variant type. Hover over to see pop-ups with more information.
The Exons track shows mRNA exon boundaries with click and zoom functionality at the ends.
Below the Needle Plot and analysis tracks is a list of "Variants observed in the selected cohort"
Export Gene Variants table icon is above the legend on right side.
Now let's click on the Gene Expression tab to see a Bar chart of 50 normal tissues from GTEx in transcripts per million (TPM). SCN2A is highly expressed in certain Brain tissues, indicating specificity to where good markers for intellectual disability and autism could be expected.
As a final exercise in discovering good markers, click on the tab for Genetic Burden Test. The table here associates Phenotypes with Mutations Observed in each Study selected for our cohort, alongside Mutations Expected to derive p-values. Given all considerations above, SCN2A is good marker for intellectual disability (p < 1.465 x 10 -22) and autism (p < 5.290 x 10 -9).
Continue to check the other genes of interest in step 1.
C:\Program Files\Illumina
"
Metadata Model
Metadata model assigned to the project
Project Location
Project region where data is stored and pipelines are executed. Options are derived from the Entitlement(s) assigned to user account, based on the purchased subscription
Storage Bundle
Storage bundle assigned to the project. Derived from the selected Project Location based on the Entitlement in the purchased subscription
Billing Mode
Billing mode assigned to the project
Data sharing
Enables data and samples in the project to be linked to other projects
Using no key prefix (not recommended) results in syncing all data in your S3 bucket (starting from root level) with your ICA project. Your project will have access to your entire S3 bucket, which prevents that S3 bucket from being used for other ICA projects.
As additional filter options, you can view only those variants that are occur in every cohort; that are unique to one cohort; that have been observed in at least two cohorts; or any variant.
In each subcategory there is a sum of the subject counts across select cohorts.
For each cohort, designated by a color, there is a Subject count and Median survival (years) column.
Type Malignancy in the Search Box and an auto-complete dropdown suggests three different attributes.
Select Synchronous malignancy and the results are automatically opened and highlighted in orange.
For gene expression (up- versus down-regulated) and for copy number variation (gain versus loss), Cohorts will display a list of all genes with bidirectional barcharts
For somatic mutations, the barcharts are unidirectional and indicate the percentage of samples with a mutation in each gene per cohort.
Bars are color-coded by cohort, see the accompanying legend.
Each row shows P-value(s) resulting from pairwise comparison of all cohorts. In the case of comparing two cohorts, the numerical P-value will be displayed in the table. In the case of comparing three or more cohorts, the pairwise P-values are shown as a triangular heatmap, with details available as a tooltip.
Look for signals in combined phenotypic and genotypic data
Analyze QC patterns over large cohorts of patients
Securely share (sub)sets of data with other scientists
Generate reports and analyze trends in a straightforward and simple manner.
Bioinformaticians:
Access, consult, audit, and query all relevant data and QC information for tests run
All accumulated data and accessible pipelines can be used to investigate and improve bioinformatics for clinical analysis
Metadata is captured via automatic pipeline version tracking, including information on individual tools and/or reference files used during processing for each sample analyzed, information on the duration of the pipeline, the execution path of the different analytical steps, or in case of failure, exit codes can be warehoused.
Product Developers and Service Providers:
Better understand the efficiency of kits and tests
Analyze usage, understand QC data trends, improve products
Store and aggregate business intelligence data such as lab identification, consumption patterns and frequency, as well as allow renderings of test result outcome trends and much more.
Detect Signals and Patterns: extensive and detailed selection of subsets of patients or samples adhering to any imaginable set of conditions is possible. Users can, for example, group and list subjects based on a combination of (several) specific genetic variants in combination with patient characteristics such as therapeutic (outcome) information. The built-in integration with public datasets allows users to retrieve all relevant publications, or clinically significant information for a single individual or a group of samples with a specific variant. Virtually any possible combination of stored sample and patient information allow for detecting signals and patterns by a simple single query on the big data set.
Profile/Cluster patients: use and re-analyze patient cohort information based on specific sample or individual characteristics. For instance, they might want to run a next agile iteration of clinical trials with only patients that respond. Through integrated and structured consent information allowing for time-boxed use, combined with the capability to group subjects by the use of a simple query, patients can be stratified and combined to export all relevant individuals with their genotypic and phenotypic information to be used for further research.
Share your data: Data sharing is subject to strict ethical and regulatory requirements. Base provides built-in functionality to securely share (sub)sets of your aggregated data with third parties. All data access can be monitored and audited, in this way Base data can be shared with people in and outside of an organization in a compliant and controlled fashion.
Non-indexed files can be used as input for an analysis and the non-indexed folder can be used as output location. You will not be able to view the contents of the input and output in the analysis details screen.
Bench
Yes
Non-indexed folders can be used in Bench and the output from Bench can be written to non-indexed folders. Non-indexed folders are accessible across Bench workspaces within a project.
Viewing
No
The folder is a single object, you can not view the contents.
function onSubmit(input) {
var validationErrors = [];
return {
'settings': input.settings,
'validationErrors': validationErrors
};
}
function onRender(input) {
var validationErrors = [];
var validationWarnings = [];
if (input.currentAnalysisSettings === null) {
//null first time, to use it in the remainder of he javascript
input.currentAnalysisSettings = input.analysisSettings;
}
switch(input.context) {
case 'Initial': {
renderInitial(input, validationErrors, validationWarnings);
break;
}
case 'FieldChanged': {
renderFieldChanged(input, validationErrors, validationWarnings);
break;
}
case 'Edited': {
renderEdited(input, validationErrors, validationWarnings);
break;
}
default:
return {};
}
return {
'analysisSettings': input.currentAnalysisSettings,
'settingValues': input.settingValues,
'validationErrors': validationErrors,
'validationWarnings': validationWarnings
};
}
function renderInitial(input, validationErrors, validationWarnings) {
}
function renderEdited(input, validationErrors, validationWarnings) {
}
function renderFieldChanged(input, validationErrors, validationWarnings) {
}
function findField(input, fieldId){
var fields = input.currentAnalysisSettings['fields'];
for (var i = 0; i < fields.length; i++){
if (fields[i].id === fieldId) {
return fields[i];
}
}
return null;
}
mkdir /data/demo
cd /data/demo
# download ref
wget --progress=dot:giga https://s3.amazonaws.com/stratus-documentation-us-east-1-public/dragen/reference/Homo_sapiens/hg38.fa -O hg38.fa
# => 0.5min
# Build ht-ref
mkdir ref
dragen --build-hash-table true --ht-reference hg38.fa --output-directory ref
# => 6.5min
# run DRAGEN mapper
FASTQ=/opt/edico/self_test/reads/midsize_chrM.fastq.gz
# Next line is needed to resolve "run the requested pipeline with a pangenome reference, but a linear reference was provided" in DRAGEN (4.4.1 and others). Comment out when encountering unrecognised option '--validate-pangenome-reference=false'.
DRAGEN_VERSION_SPECIFIC_PARAMS="--validate-pangenome-reference=false"
# License Parameters
LICENSE_PARAMS="--lic-instance-id-location /opt/dragen-licence/instance-identity.protected --lic-credentials /opt/dragen-licence/instance-identity.protected/dragen-creds.lic"
mkdir out
dragen -r ref --output-directory out --output-file-prefix out -1 $FASTQ --enable-variant-caller false --RGID x --RGSM y ${LICENSE_PARAMS} ${DRAGEN_VERSION_SPECIFIC_PARAMS}
# => 1.5min (10 sec if fpga already programmed)
mkdir /data/demo
cd /data/demo
# download ref
wget --progress=dot:giga https://s3.amazonaws.com/stratus-documentation-us-east-1-public/dragen/reference/Homo_sapiens/hg38.fa -O hg38.fa
# => 0.5min
# Build ht-ref
mkdir ref
dragen --build-hash-table true --ht-reference hg38.fa --output-directory ref
# => 6.5min
# run DRAGEN mapper
FASTQ=/opt/edico/self_test/reads/midsize_chrM.fastq.gz
# Next line is needed to resolve "run the requested pipeline with a pangenome reference, but a linear reference was provided" in DRAGEN (4.4.1 and others). Comment out when encountering ERROR: unrecognised option '--validate-pangenome-reference=false'.
DRAGEN_VERSION_SPECIFIC_PARAMS="--validate-pangenome-reference=false"
# When using DRAGEN 4.4.6 and later, the line above should be extended with --min-memory 0 to skip the memory check.
DRAGEN_VERSION_SPECIFIC_PARAMS="--validate-pangenome-reference=false --min-memory 0"
# License Parameters
LICENSE_PARAMS="--sw-mode --lic-credentials /opt/dragen-licence/instance-identity.protected/dragen-creds-sw-mode.lic"
mkdir out
dragen -r ref --output-directory out --output-file-prefix out -1 $FASTQ --enable-variant-caller false --RGID x --RGSM y ${LICENSE_PARAMS} ${DRAGEN_VERSION_SPECIFIC_PARAMS}
# => 2min
from dash import Dash, html, dcc, callback, Output, Input
import plotly.express as px
from config import API_KEY, PROJECT_ID
import requests
import snowflake.connector
import pandas as pd
def fetch_data():
# Your data fetching and processing code here
# retrieving the Base oauth token
url = 'https://ica.illumina.com/ica/rest/api/projects/' + PROJECT_ID + '/base:connectionDetails'
# set the API headers
headers = {
'X-API-Key': API_KEY,
'accept': 'application/vnd.illumina.v3+json'
}
response = requests.post(url, headers=headers)
ctx = snowflake.connector.connect(
account=response.json()['dnsName'].split('.snowflakecomputing.com')[0],
authenticator='oauth',
token=response.json()['accessToken'],
database=response.json()['databaseName'],
role=response.json()['roleName'],
warehouse=response.json()['warehouseName']
)
cur = ctx.cursor()
sql = '''
WITH flattened_Demo_Ingesting_Metrics AS (
SELECT
flattened.value::STRING AS execution_reference_Demo_Ingesting_Metrics,
t1.SAMPLEID,
t1.VARIANTS_TOTAL_PASS,
t1.VARIANTS_SNPS_PASS,
t1.Q30_BASES,
t1.READS_WITH_MAPQ_3040_PCT
FROM
Demo_Ingesting_Metrics t1,
LATERAL FLATTEN(input => t1.ica) AS flattened
WHERE
flattened.key = 'Execution_reference'
) SELECT
f.execution_reference_Demo_Ingesting_Metrics,
f.SAMPLEID,
f.VARIANTS_TOTAL_PASS,
f.VARIANTS_SNPS_PASS,
t2."EXECUTION_REFERENCE",
t2.END_DATE,
f.Q30_BASES,
f.READS_WITH_MAPQ_3040_PCT
FROM
flattened_Demo_Ingesting_Metrics f
JOIN
BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL t2
ON
f.execution_reference_Demo_Ingesting_Metrics = t2."EXECUTION_REFERENCE";
'''
cur.execute(sql)
data = cur.fetch_pandas_all()
return data
df = fetch_data()
app = Dash(__name__)
#server = app.server
app.layout = html.Div([
html.H1("My Dash Dashboard"),
html.Div([
html.Label("Select X-axis:"),
dcc.Dropdown(
id='x-axis-dropdown',
options=[{'label': col, 'value': col} for col in df.columns],
value=df.columns[5] # default value
),
html.Label("Select Y-axis:"),
dcc.Dropdown(
id='y-axis-dropdown',
options=[{'label': col, 'value': col} for col in df.columns],
value=df.columns[2] # default value
),
]),
dcc.Graph(id='scatterplot')
])
@callback(
Output('scatterplot', 'figure'),
Input('y-axis-dropdown', 'value')
)
def update_graph(value):
return px.scatter(df, x='END_DATE', y=value, hover_name='SAMPLEID')
if __name__ == '__main__':
app.run(debug=True)
Reference where the Subject’s denominator is the median of all disease samples in your cohort.
The count of matching vs. total subjects that have PIK3CA up-regulated RNA which may indicate a distinctive sub-phenotype.
Select DOCKER from the drop-down menu and Save.
Navigate to System Settings > Docker Repository (outside of your project).
Click on Create > Image.
Click on the Docker image field. Select the TAR file from the desired region.
Provide a name and version for your Docker image
Select the appropriate region, determine the type (tool or bench image), the cluster compatibility (only available for bench images), access method and click Save.
The newly added image should appear in your Docker Repository list. Verify it is marked as Available under the Status column to ensure it is ready to be used in your tool or pipeline.
OR open the image details, check the box matching the region you want to add, and select update.
In both cases, allow a few minutes for the image to become available in the new region (the status becomes available in table view).
You can only select a single Docker image at a time for download.
You need a serviceconnector with a download rule to download the Docker image.
Check the box next to Active to ensure the connector will be active.
Name (required) — Provide a unique name for the connector.
Type (required) — Select the data type that will be linked (either File or Sample)
Source Project - Select the source poject whose data will be linked to.
Filter Expression (optional) — Enter an expression to restrict which files will be linked via the connector (see Filter Expression Examples below)
Tags (optional) — Add tags to restrict what data will be linked via the connector. Any data in the source project with matching tags will be linked to the destination project.
Starts with 'Sample-':
[?($.details.name =~ /Sample-.*/)]
Contains '_R1_':
[?($.details.name =~ /.*_R1_.*/)]
move
x
x
x
x
x
copy
x
x
x
x
manual link
x
x
x
project connector
x
x
x
or by creating your own scripts integrating with the REST API.
For scenarios in which instances are terminated prematurely (for example, while using spot instances) without warning, you can implement scripts like the following to retry the job a certain number of times. Adding the following script to 'nextflow.config' enables five retries for each job, with increasing delays between each try.
Note: Adding the retry script where it is not needed might introduce additional delays.
When hardening a Nextflow to handle resource shortages (for example exit code 2147483647), an immediate retry will in most circumstances fail because the resources have not yet been made available. It is best practice to use Dynamic retry with backoff which has an increasing backoff delay, allowing the system time to provide the necessary resources.
When publishing your Nextflow pipeline, make sure your have defined a container such as 'public.ecr.aws/lts/ubuntu:22.04' and are not using the default container 'ubuntu:latest'.
To limit potential costs, there is a timeout of 96 hours: if the analysis does not complete within four days, it will go to a 'Failed' state. This time begins to count as soon as the input data is being downloaded. This takes place during the ICA 'Requested' step of the analysis, before going to 'In Progress'. In case parallel tasks are executed, running time is counted once. As an example, let's assume the initial period before being picked up for execution is 10 minutes and consists of the request, queueing and initializing. Then, the data download takes 20 minutes. Next, a task runs on a single node for 25 minutes, followed by 10 minutes of queue time. Finally, three tasks execute simultaneously, each of them taking 25, 28, and 30 minutes, respectively. Upon completion, this is followed by uploading the outputs for one minute. The overall analysis time is then 20 + 25 + 10 + 30 (as the longest task out of three) + 1 = 86 minutes:
Analysis task
request
queued
initializing
input download
single task
queue
parallel tasks
generating outputs
completed
96 hour limit
1m (not counted)
7m (not counted)
2m (not counted)
20m
25m
If there are no available resources or your project priority is low, the time before download commences will be substantially longer.
By default, Nextflow will not generate the trace report. If you want to enable generating the report, add the section below to your userNextflow.config file.
There are different types of schedules that can be set up:
Files
Metadata
Administrative data.
Files
This type will load the content of specific files from this project into a table. When adding or editing this schedule you can define the following parameters:
Name (required): The name of the scheduled job
Description: Extra information about the schedule
File name pattern (required): Define in this field a part or the full name of the file name or of the tag that the files you want to upload contain. For example, if you want to import files named sample1_reads.txt, sample2_reads.txt, … you can fill in _reads.txt in this field to have all files that contain _reads.txt imported to the table.
Generated by Pipelines: Only files generated by these selected pipelines are taken into account. When left clear, files from all pipelines are used.
Target Base Table (required): The table to which the information needs to be added. A drop-down list with all created tables is shown. This means the table needs to be created before the schedule can be created.
Write preference (required): Define data handling; whether it can overwrite the data
Data format (required): Select the data format of the files (CSV, TSV, JSON)
Delimiter (required): to indicate which delimiter is used in the delimiter separated file. If the delimiter is not present in list, it can be indicated as custom.
Active: The job will run automatically if checked
Custom delimiter: the custom delimiter that is used in the file. You can only enter a delimiter here if custom delimiter is selected.
Header rows to skip: The number of consecutive header rows (at the top of the table) to skip.
References: Choose which references must be added to the table
Advanced Options
Encoding (required): Select the encoding of the file.
Null Marker: Specifies a string that represents a null value in a CSV/TSV file.
Metadata
This type will create two new tables: BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL and ICA_PROJECT_SAMPLE_META_DATA. The job will load metadata (added to the samples) into ICA_PROJECT_SAMPLE_META_DATA. The process gathers the metadata from the samples via the data linked to the project and the metadata from the analyses in this project. Furthermore, the schedular will add provenance data to BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL. This process gathers the execution details of all the analyses in the project: the pipeline name and status, the user reference, the input files (with identifiers), and the settings selected at runtime. This enables you to track the lineage of your data and to identify any potential sources of errors or biases. So, for example, the following query will count how many times each of the pipelines was executed and sort it accordingly:
To obtained the similar table for the failed runs, you can execute the following SQL query:
When adding or editing this schedule you can define the following parameters:
Name (required): the name of this scheduled job.
Description: Extra information about the schedule.
Include sensitive meta data fields: in the meta data fields configuration, fields can be set to sensitive. When checked, those fields will also be added.
Active: the job will run automatically if ticked.
Source (Tenant Administrators Only):
Project (default): All administrative data from this project will be added.
Account: All administrative data from every project in the account will be added. When a tenant admin creates the tenant-wide table with administrative data in a project and invites other users to this project, these users will see this table as well.
Administrative data
This type will automatically create a table and load administrative data into this table. A usage overview of all executions is considered administrative data.
When adding or editing this schedule the following parameters can be defined:
Name (required): The name of this scheduled job.
Description: Extra information about the schedule.
Include sensitive metadata fields: In the metadata fields configuration, fields can be set to sensitive. When checked, those fields will also be added.
Active: The job will run automatically if checked.
Source (Tenant Administrators Only):
Project (default): All administrative data from this project will be added.
Account: All administrative data from every project in the account will be added. When a tenant admin creates the tenant-wide table with administrative data in a project and invites other users to this project, these users will see this table as well.
Delete schedule
Schedules can be deleted. Once deleted, they will no longer run, and they will not be shown in the list of schedules.
Run schedule
When clicking the Run button, or Save & Run when editing, the schedule will start the job of importing the configured data in the correct tables. This way the schedule can be run manually. The result of the job can be seen in the tables. The load status is empty while the data is being processed and set to failed or succeeded once loading completes.
Open a terminal window, place this file in a dedicated folder and navigate to this folder location. Then use the following command:
Check the image has been successfully built:
Check that the container is functional:
Once inside the container check that the fastqc command is responsive and prints the expected help message. Remember to exit the container.
Save a tar of the previously built image locally:
Upload your docker image .tar to an ICA project (browser upload, Connector, or CLI).
In Projects > your_project > Data, select the uploaded .tar file, then click Manage > Change Format , select DOCKER and Save.
Now go outside of the Project and go to System Settings > Docker Repository, Select Create > Image. Select your docker file and fill out a name and version and set your type to tool and Press Select.
Create a CWL tool
While outside of any Project go to System Settings > Tool Repository and Select +Create. Fill the mandatory fields (Name and Version) and look for a Docker image to link to the tool.
You can create a tool by either pasting the tool definition in the code syntax field on the right or you can use the different tabs to manually define inputs, outputs, arguments, settings, etc …
In this tutorial we will use the CWL tool syntax method. Paste the following content in the General tab.
Other tabs, except for the Details tab can also be used.
Since the user needs to specify the output folder for FASTQC application (-o prefix), we are using the $(runtime.outdir) runtime parameter to point to the designated output folder.
Fill the mandatory fields and click on the Definition tab to open the Graphical Editor.
Expand the Tool Repository menu (lower right) and drag your FastQC tool into the Editor field (center).
Now drag one Input and one Output file icon (on top) into the Editor field as well. Both may be given a Name (editable fields on the right when icon is selected) and need a Format attribute. Set the Input Format to fastq and Output Format to html. Connect both Input and Output files to the matching nodes on the tool itself (mouse over the node, then hold-click and drag to connect).
Press Save, you just created your first FastQC pipeline on ICA!
FastQC
Run a pipeline
First make sure you have at least one Fastq file uploaded and/or linked to your Project. You may use Fastq files available in the Bundle.
Navigate to Pipelines and select the pipeline you just created, then press Start analysis
Fill the mandatory fields and click on the + button to open the File Selection dialog box. Select one of the Fastq files available to you.
Press Start analysis on the top right, the platform is now orchestrating the pipeline execution.
View Results
Navigate to Projects > your_project > Flow > Analyses and observe that the pipeline execution is now listed and will first be in Status Requested. After a few minutes the Status should change to In Progress and then to Succeeded.
Once this Analysis succeeds click it to enter the Analysis details view. You will see the FastQC HTML output file listed on the Output files tab. Click on the file to open Data Details view. Since it is an HTML file Format there is a View tab that allows visualizing the HTML within the browser.
Creating the pipeline
Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create > Nextflow> XML based button to start creating a Nextflow pipeline.
In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.
How to modify the main.nf file
Copy and paste the RNASeq Nextflow pipeline into the Nextflow files > main.nf tab. The following comparison highlights the differences between the original file and the version for deployment in ICA. The main difference is the explicit specification of containers and pods within processes. Additionally, some channels' specification are modified, and a debugging message is added. When copying and pasting, be sure to remove the text highlighted in red (marked with -) and add the text highlighted in green (marked with +).
The XML configuration
In the XML configuration, the input files and settings are specified. For this particular pipeline, you need to specify the transcriptome and the reads folder. Navigate to the XML Configuration tab and paste the following:
Click the Generate button (at the bottom of the text editor) to preview the launch form fields.
Click the Save button to save the changes.
Running the pipeline
Go to the Pipelines page from the left navigation pane. Select the pipeline you just created and click Start New Analysis.
Fill in the required fields indicated by red "*" sign and click on Start Analysis button. You can monitor the run from the Analyses page. Once the Status changes to Succeeded, you can click on the run to access the results page.
The manager node which orchestrates the workload across the members.
Anywhere between 0 and up to maximum 50 member nodes.
Clusters can run in two modes.
Static - A static cluster has a manager node and a static number of members. At start-up of the cluster, the system ensures the predefined number of members are added to the cluster. These nodes will keep running as long as the entire cluster runs. The system will not automatically remove or add nodes depending on the job load. This gives the fastest resource availability, but at additional cost as unused nodes stay active, waiting for work.
Dynamic - A dynamic cluster has a manager node and a dynamic number of workers up to a predefined maximum (with a hard limit of 50). Based on the job load the system will scale the number of members up or down. This saves resources as only as much worker nodes as needed to perform the work are being used.
Configuration
You manage Bench Clusters via the Illumina Connected Analytics UI in Projects > your_project > Bench > Workspaces > your_workspace > Details.
The following settings can be defined for a bench cluster:
Field
Description
Web access
Enable or disable web access to the cluster manager.
Dedicated Cluster Manager
Use a dedicated node for the cluster manager. This means that an entire machine of the type defined at resource model is reserved for your cluster manager. If no dedicated cluster manager is selected, one core per cluster member will be reserved for scheduling.
For example, if you have 2 nodes of standard-medium (4 cores) and no dedicated cluster manager, then only 6 (2x3) cores are available to run tasks as each node reserves 1 core for the cluster manager.
Type
Choose between cluster members
Scaling interval
For static, set the number of cluster member nodes (maximum 50), for dynamic, choose the minimum and maximum (up to 50) amount of cluster member nodes.
Resource model
The type of on which the cluster member(s) will run. For every cluster member, one of these machines is used as resource, so be aware of the possible cost impact when running many machines with a high individual .
Economy mode
Economy mode uses AWS . This halves many compute iCredit rates vs standard mode, but may be interrupted. See for a list of which resource models support economy pricing.
Operations
Once the workspace is started, the cluster can be started at Projects > your_project > Bench > Workspaces > your_workspace > Details and the cluster can be stopped without stopping the workspace. Stopping the workspace will also stop all clusters in that workspace.
Managing Data in a Bench cluster
Data in a bench workspace can be divided into three groups:
Workspace data is accessible in read/write mode and can be accessed from all workspace components (workspace node, cluster manager node, cluster member nodes ) at /data. The size of the workspace data is defined at the creation of the workspace but can be increased when editing a workspace in the Illumina connected analytics UI. This is persistent storage and data remains when a workspace is shut down.
Project data can be accessed from all workspace components at /data/project. Every component will have their own dedicated mount to the project. Depending on the project data permissions you will be able to access it in either Read-Only or Read-Write mode.
Scratchdata is available on the cluster members at /scratch and can be used to store intermediate results for a given job dedicated to that member. This is temporary storage, and all data is deleted when a cluster member is removed from the cluster.
Managing these mounts is done via the workspace cli /data/.local/bin/workspace-ctl in the workspace. Every node will have his dedicated mount.
For fast data access, bench offers a mount solution to expose project data on every component in the workspace. This mount provides read-only access to a given location in the project data and is optimized for high read throughput per single file with concurrent access to files. It will try to utilise the full bandwidth capacity of the node.
All mounts occur in path /data/mounts/
Show mounts
Creating a mount
For fast read-only access, link folders with the CLI commandworkspace-ctl data create-mount --mode read-only.
This has the same effect as using the --mode read-only option because this is applied by default when using workspace-ctl data create-mount .
Removing a mount
Query
Queries can be used for data mining. On the Projects > your_project > Base > Query page:
New queries can be created and executed
Already executed queries can be found in the query history
Saved queries and query templates are listed under the saved queries tab.
New Query
Available tables
All available tables are listed on the Run tab.
Metadata tables are created by syncing with the Base module. This synchronization is configured on the Details page within the project.
Input
Queries are executed using SQL (for example Select * From table_name). When there is a syntax issue with the query, the error will be displayed on the query screen when trying to run it. The query can be immediately executed or saved for future use.
Best practices and notes
Do not use queries such as ALTER TABLE to modify your table structure as it will go out of sync with the table definition and will result in processing errors.
When you have duplicate column names in your query, put the columns explicitly in the select clause and use column aliases for columns with the same name.
Case sensitive column names (such as the VARIANTS table) must be surrounded by double quotes. For example, select * from MY_TABLE where "PROJECT_NAME" = 'MyProject'.
The syntax for ICA case-sensitive subfields is without quotes, for example select * from MY_TABLE where ica:Tenant = 'MyTenant'
Querying data within columns.
Some tables contain columns with an array of values instead of a single value.
Querying data within an array
As of ICA version 2.27, there is a change in the use of capitals for ICA array fields. In previous versions, the data name within the array would start with a capital letter. As of 2.27, lowercase is used. For example ICA:Data_reference has become ICA:data_reference.
You can use the GET_IGNORE_CASE option to adapt existing queries when you have both data in the old syntax and new data in the lowercase syntax. The syntax is GET_IGNORE_CASE(Table_Name.Column_Name,'Array_field')
For example:
Suppose you have a table called YOUR_TABLE_NAME consisting of three fields. The first is a name, the second is a code and the third field is an array of data called ArrayField:
NameField
CodeField
ArrayField
Examples
Query results
If the query is valid for execution, the result will be shown as a table underneath the input box. Only the first 200 chars of a string, record or variant field are included in the query results grid. The complete value is available through clicking the "show details" link.
From within the result page of the query, it is possible to save the result in several ways:
Export to > New table saves the query result as a new table with contents.
Export to > New view saves the query results as a new .
Export to > Project file: As a new table, as a view or as file to the project in CSV (Tab, Pipe or a custom delimeter is also allowed.) or JSON format. When exporting in JSON format, the result will be saved in a text file that contains a JSON object for each entry, similar to when exporting a . The exported file can be located in the Data page under the folder named base_export_<user_supplied_name>_<
Run a new query
Navigate to Projects > your_project > Base > Query.
Enter the query to execute using SQL.
Select Run.
If the query takes more than 30 seconds without returning a result, a message will be displayed to inform you the query is still in progress and the status can be consulted on Projects > your_project > Activity > Base Jobs. Once this Query is successfully completed, the results can be found in Projects > your_project > Base > Query > Query History tab.
Query history
The query history lists all queries that were executed. Historical queries are shown with their date, executing user, returned rows and duration of the run.
Navigate to Projects > your_project > Base > Query.
Select the History tab.
Select a query.
Saved Queries
All queries saved within the project are listed under the Saved tab together with the query templates.
The saved queries can be:
Use — Open the query for editing and running in the Run tab. You can then select Run to execute the query again.
Saved as template — The saved query becomes a query template.
Deleted — The query is removed from the list and cannot be opened again.
The query templates can be:
Opened: This will open the query again in the “New query” tab.
Deleted: The query is removed from the list and cannot be opened again.
It is possible to edit the saved queries and templates by double-clicking on each query or template. Specifically for Query Templates, the data classification can be edited to be:
Account: The query template will be available for everyone within the account
User: The query template will be available for the user who created it
Run a saved Query
If you have saved a query, you can run the query again by selecting it from the list of saved queries.
Navigate to Projects > your_project > Base > Query.
Select the Saved Queries tab.
Select a query.
Shared database for project
Shared databases are displayed under the list of Tables as Shared Database for project <project name>.
For ICA Cohorts Customers, shared databases are available in a project Base instance. For more information on specific Cohorts shared database tables that are viewable, See .
Import New Samples
Import New Samples
ICA Cohorts can pull any molecular data available in an ICA Project, as well as additional sample- and subject-level metadata information such as demographics, biometrics, sequencing technology, phenotypes, and diseases.
To import a new data set, select Import Jobs from the left navigation tab underneath Cohorts, and click the Import Files button. The Import Files button is also available under the Data Sets left navigation item.
The Data Set menu item is used to view imported data sets and information. The Import Jobs menu item is used to check the status of data set imports.
Confirm that the project shown is the ICA Project that contains the molecular data you would like to add to ICA Cohorts.
Choose a data type among
Germline variants
Somatic mutations
Search Spinner behavior in input jobs table
Search a term and press ** Enter.
The search spinner will appear while the results are loading.
All VCF types, specifically from DRAGEN, can be ingested using the Germline variants selection. Cohorts will distinguish the variant types that it is ingesting. If Cohorts cannot determine the variant file type, it will default to ingest small variants.
Alternatively to VCFs, you can select Nirvana JSON files for DNA variants: small variants, structural variants, and copy number variation.
The maximum amount of files that can be part of a single manual ingestion batch is capped at 1000
Alternatively, users can choose a single folder and ICA Cohorts will identify all ingestible files within that folder and its sub-folders. In this scenario, cohorts will select molecular data files matching the samples listed in the metadata sheet which is the next step in the import process.
Users have the option to ingest either VCF files or Nirvana JSON files for any given batch, regardless of the chosen ingestion method.
The sample identifiers used in the VCF columns need to match the sample identifiers used in subject/sample metadata files; accordingly, if you are starting from JSON files containing variant- and gene-level annotations provided by ILMN Nirvana, the samples listed in the header need to match the metadata files.
Variant file formats
ICA Cohorts supports VCF files formatted according to VCF v4.2 and v4.3 specifications. VCF files require at least one of the following header rows to identify the genome build:
##reference=file://... --- needs to contain a reference to hg38/GRCh38 in the file path or name (numerical value is sufficient)
##contig=<ID=chr1,length=248956422> --- for hg38/GRCh38
##DRAGENCommandLine= ... --ht-reference
ICA Cohorts accepts VCFs aligned to hg38/GRCh38 and hg19/GRCh37. If your data uses hg19/GRCh37 coordinates, Cohorts will convert these to hg38/GRCh38 during the ingestion process [see Reference 1]. Harmonizing data to one genome build facilitates searches across different private, shared, and public projects when building and analyzing a cohort. If your data contains a mixture of samples mapped to hg38 and hg19, please ingest these in separate batches, as each import job into Cohorts is limited to one genome build.
Alternative to VCFs, ICA Cohorts accepts the JSON output of for hg38/GRCh38-aligned data for small germline variants and somatic mutations, copy number variations other structural variants.
RNAseq file format
ICA Cohorts can process gene- and transcript-level quantification files produced by the Illumina DRAGEN RNA pipeline. The file naming convention needs to match .quant.genes.sf for genes; and .quant.sf for transcript-level TPM (transcripts per million.)
Please also see the online documentation for the for more information on output file formats.
GWAS file format
ICA Cohorts currently support upload of SNV-level GWAS results produced by and saved as CSV files.
Metadata and File Types
Note: If annotating large sets of samples with molecular data, expect the annotation process to take over 20 minutes per whole genome batch of samples. You will receive two e-mail notifications: once your ingestion starts and once completed successfully or failed.
As an alternative to ICA Cohorts' metadata file format, you can provide files formatted according to the . Cohorts currently ingests data for these OMOP 5.4 tables, formatted as tab-delimited files:
PERSON (mandatory),
CONCEPT (mandatory if any of the following is provided),
CONDITION_OCCURRENCE (optional),
Additional files such as measurement and observation will be supported in a subsequent release of Cohorts.
Note that Cohorts requires that all such files do not deviate from the OMOP CDM 5.4 standard. Depending on your implementation, you may have to adjust file formatting to be OMOP CDM 5.4-compatible.
Bench has the ability to handle containers inside a running workspace.
This allows you to install and package software more easily as a container image and provides capabilities to pull and run containers inside a workspace.
Bench offers a container runtime as a service in your running workspace. This allows you to do standardized container operations such as pulling in images from public and private registries, build containers at runtime from a Dockerfile, run containers and eventually publish your container to a registry of choice to be used in different ICA products such as ICA Flow.
Setup
The Container Service is accessible from your Bench workspace environment by default.
The container service uses the workspace disk to store any container images you pulled in or created.
To interact with the Container Service, a container remote client CLI is exposed automatically in the /data/.local/bin folder. The Bench workspace environment is preconfigured to automatically detect where the Container Service is made available using environment variables. These environment variables are automatically injected into your environment and are not determined by the Bench Workspace Image.
Container Management
Use either docker or podman cli to interact with the Container Service. Both are interchangeable and support all the standardized operations commonly known.
Pulling a Container Image
To run a container, the first step is to either build a container from a source container or pull in a container from a registry
Public Registry
A public image registry does not require any form of authentication to pull the container layers.
The following command line example shows how to pull in a commonly known image.
The Container Service uses Dockerhub by default to pull images from if no registry hostname is defined in the container image URI.
Private Registry
To pull images from a private registry, the Container Service needs to authenticate to the Private Registry.
The following command line example shows how to instruct the Container Service to login into the Private registry.hub.docker.com registry
Depending on your authorisations in the private registry you will be able to pull and push images. These authorisations are managed outside of the scope of ICA.
Pushing a Container Image
Depending on the Registry setup you can publish Container Images with or without authentication.
If Authentication is required, follow the login procedure described in
The following command line example shows how to publish a locally available Container Image to a private registry in Dockerhub.
Saving a Container Image as an Archive
The following example shows how to save a locally available Container Image as a compressed tar archive.
This lets you upload the into the Private ICA Docker Registry.
Listing Locally Available Container Images
The following example shows how to list all locally available Container Images
Deleting a Container Image
Container Images require storage capacity on the Bench Workspace disk. The capacity is shown when listing the locally available container images. The container Images are persisted on disk and remain available whenever a workspace stops and restarts.
The following example shows how to clean up a locally available Container Image
When a Container Image has multiple tags, all the tags need to be removed individually to free up disk capacity.
Running a Container
A Container Image can be instantiated in a Container running inside a Bench Workspace.
By default the workspace disk (/data) will be made available inside the running Container. This lets you to access data from the workspace environment.
When running a Container, the default user defined in the Container Image manifest will be used and mapped to the uid and the gid of the user in the running Bench Workspace (uid:1000, gid: 100). This will ensure files created inside the running container on the workspace disk will have the same file ownership permissions.
Run a Container as a normal user
The following command line example shows how to run an instance a locally available Container Image as a normal user
Run a Container as root user
Running a Container as root user maps the uid and gid inside the running Container to the running non-root user in the Bench Workspace. This lets you act as user with uid 0 and gid 0 inside the context of the container.
By enabling this functionality, you can install system level packages inside the context of the Container. This can be leveraged to run tools that require additional system level packages at runtime.
The following command line example shows how to run an instance of a locally available Container as root user and install system level packages
When no specific mapping is defined using the --userns flag, the user in the running Container user will be mapped to an undefined uid and gid based on an offset of id 100000. Files created in your workspace disk from the running Container will also use this uid and gid to define the ownership of the file.
Building a Container
To build a Container Image, you need to describe the instructions in a Dockerfile.
This next example builds a local Container Image and tags it as myimage:1.0 The Dockerfile used in this example is
The following command line example will build the actual Container Image
When defining the build context location, keep in mind that using the HOME folder (/data) will index all files available in /data, which can be a lot and will slow down the process of building. Hence the reason to use a minimal build context whenever possible.
For this tutorial, the instance size depends on the flow you import, and whether you use a Bench cluster:
If using a cluster, choose standard-small or standard-medium for the workspace master node
Import nf-core Pipeline to Bench
If conda and/or nextflow are not installed, pipeline-dev will offer to install them.
The Nextflow files are pulled into the nextflow-src subfolder.
A larger example that still runs quickly is nf-core/sarek
Result
Run Validation Test in Bench
All nf-core pipelines conveniently define a "test" profile that specifies a set of validation inputs for the pipeline.
The following command runs this test profile. If a Bench cluster is active, it runs on your Bench cluster, otherwise it runs on the main workspace instance.
The pipeline-dev tool is using "nextflow run ..." to run the pipeline. The full nextflow command is printed on stdout and can be copy-pasted+adjusted if you need additional options.
Result
Monitoring
When a pipeline is running locally (i.e. not on a Bench cluster), you can monitor the task execution from another terminal with docker ps
When a pipeline is running on your Bench cluster, a few commands help to monitor the tasks and cluster. In another terminal, you can use:
qstat to see the tasks being pending or running
tail /data/logs/sge-scaler.log.<latest available workspace reboot time> to check if the cluster is scaling up or down (it currently takes 3 to 5 minutes to get a new node)
Data Locations
The output of the pipeline is in the outdir folder
Nextflow work files are under the work folder
Log files are .nextflow.log* and output.log
Deploy as Flow Pipeline
After generating a few ICA-specific files (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name, so that's what is checked here). It then asks if you want to update the latest version or create a new one.
Choose "3" and enter a name of your choice to avoid conflicts with other users following this same tutorial.
At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.
Run Validation Test in Flow
This launches an analysis in ICA Flow, using the same inputs as the nf-core pipeline's "test" profile.
Some of the input files will have been copied to your ICA project to allow the launch to take place. They are stored in the folder bench-pipeline-dev/temp-data.
Hints
Using older versions of Nextflow
Some older nf-core flows are still using DSL1, which is only working up to Nextflow 22.
An easy solution is to create a conda environment for nextflow 22:
SSE-KMS Encryption
This section describes how to connect an AWS S3 Bucket with SSE-KMS Encryption enabled. General instructions for configuring your AWS account to allow ICA to connect to an S3 bucket are found on this page.
Create an S3 bucket with SSE-KMS
Follow the AWS instructions for how to create S3 bucket with SSE-KMS key.
S3-SSE-KMS must be in the same region as your ICA v2.0 project. See the for more information.
In the "Default encryption" section, enable Server-side encryption and choose AWS Key Management Service key (SSE-KMS). Then select Choose your AWS KMS key.
If you do not have an existing customer managed key, click Create a KMS key and follow from AWS.
Once the bucket is set, create a folder with encryption enabled in the bucket that will be linked in the ICA storage configuration. This folder will be connected to ICA as a . Although it is technically possible to use the root folder, this is not recommended as it will cause the S3 bucket to no longer be available for other projects.
Connect the S3-SSE-KMS to ICA
Follow the for connecting an S3 bucket to ICA.
In the step :
Add permission to use KMS key by adding kms:Decrypt, kms:Encrypt, and kms:GenerateDataKey
Add the ARN KMS key arn:aws:kms:xxx on the first "Resource"
At the end of the policy setting, there should be 3 permissions listed in the "Summary".
Create the S3-SSE-KMS configuration in ICA
Follow the for how to create a storage configuration in ICA.
On step 3 in process above, continue with the [Optional] Server Side Encryption to enter the algorithm and key name for server-side encryption processes.
On "Algorithm", input aws:kms
On "Key Name", input the ARN KMS key: arn:aws:kms:xxx
Although "Key prefix" is optional, it is highly recommended to use this and not use the root folder of your S3 bucket. "Key prefix" refers to the folder name in the bucket which you created.
Cross-Account Copy Setup for S3 buckets with SSE-KMS encryption
KMS Policy
In addition to following the instructions to , the KMS policy must include the following statement for AWS S3 Bucket with SSE-KMS Encryption (refer to the Role ARN table from the linked page for the ASSUME_ROLE_ARN value):
CWL DRAGEN Pipeline
In this tutorial, we will demonstrate how to create and launch a DRAGEN pipeline using the CWL language.
In ICA, CWL pipelines are built using tools developed in CWL. For this tutorial, we will use the "DRAGEN Demo Tool" included with DRAGEN Demo Bundle 3.9.5.
Linking bundle to Project
1.) Start by selecting a project at the Projects inventory.
2.) In the details page, select Edit.
3.) In the edit mode of the details page, click the + button in the LINKED BUNDLES section.
4.) In the Add Bundle to Project window:
Select the DRAGEN demo tool bundle from the list. Once you have selected the bundle, the Link Bundles button becomes available. Select it to continue.
Tip: You can select multiple bundles using Ctrl + Left mouse button or Shift + Left mouse button.
5.) In the project details page, the selected bundle will appear under the LINKED BUNDLES section. If you need to remove a bundle, click on the - button. Click Save to save the project with linked bundles.
Create Pipeline
1.) From the project details page, select Pipelines > CWL
2.) You will be given options to create pipelines using a graphical interface or code. For this tutorial, we will select Graphical.
3.) Once you have selected the Graphical option, you will see a page with multiple tabs. The first tab is the Information page where you enter pipeline information. You can find the details for different fields in the tab in the . The following three fields are required for the INFORMATION page.
Code: Provide pipeline name here.
Description: Provide pipeline description here.
Storage size: Select the storage size from the drop-down menu.
4.) The Documentation tab provides options for configuring the HTML description for the tool. The description appears in the tool repository but is excluded from exported CWL definitions.
5.) The Definition tab is used to define the pipeline. When using graphical mode for the pipeline definition, the Definition tab provides options for configuring the pipeline using a visualization panel (A) and a list of component menus (B). You can find details on each section in the component menu
6.) To build a pipeline, start by selecting Machine PROFILE from the component menu section on the right. All fields are required and are pre-filled with default values. Change them as needed.
The profile Name field will be updated based on the selected Resource. You can change it as needed.
Color assigns the selected color to the tool in the design view to easily identify the machine profile when more than one tool is used in the pipeline.
Tier lets you select Standard or Economy tier for AWS instances. Standard is on-demand ec2 instance and Economy is spot ec2 instance. You can find the difference between the two AWS instances . You can find the price difference between the two Tiers
7.) Once you have selected the Machine Profile for the tool, find your tool from the Tool Repository at the bottom section of the component menu on the right. In this case, we are using the DRAGEN Demo Tool. Drag and drop the tool from the Tool Repository section to the visualization panel.
8.) The dropped tool will show the machine profile color, number of outputs and inputs, and warning to indicate missing parameters, mandatory values, and connections. Selecting the tool in the visualization panel activates the tool (DRAGEN Demo Tool) component menu. On the component menu section, you will find the details of the tool under Tool - DRAGEN Demo Tool. This section lists the inputs, outputs, additional parameters, and the machine profile required for the tool. In this case, the DRAGEN Demo Tool requires three inputs (FASTQ read 1, FASTQ read 2, and a Reference genome). The tool has two outputs (a VCF file and an output folder). The tool also has a mandatory parameter (Output File Prefix). Enter the value for the input parameter (Output File Prefix) in the text box.
9.) The top right corner of the visualization panel has icons to zoom in and out in the visualization panel followed by three icons: ref, in, and out. Based on the type of input/output needed, drag and drop the icons into the visualization area. In this case, we need three inputs (read 1, read 2, and Reference hash table.) and two outputs (VCF file and output folder). Start by dragging and dropping the first input (a). Connect the input to the tool by clicking on the blue dot at the bottom of the input icon and dragging it to the blue dot representing the first input on the tool (b). Select the input icon to activate the input component menu. The input section for the first input lets you enter the Name, Format, and other relevant information based on tool requirements. In this case, for the first input, enter the following information:
Name: FASTQ read 1
Format: FASTQ
Comments: any optional comments
10.) Repeat the step for other inputs. Note that the Reference hash table is treated as the input for the tool rather than Reference files. So, use the input icon instead of the reference icon.
11.) Repeat the process for two outputs by dragging and connecting them to the tool. Note that when connecting output to the tool, you will need to click on the blue dot at the bottom of the tool and drag it to the output.
12.) Select the tool and enter additional parameters. In this case, the tool requires Output File Prefix. Enter demo_ in the text box.
13.) Click on the Save button to save the pipeline. Once saved, you can run it from the Pipelines page under Flow from the left menus as any other pipeline.
Metadata Models
Illumina Connected Analytics allows you to create and assign metadata to capture additional information about samples.
Every tenant has a root metadata model that is accessible to all projects of that tenant. This allows an organization to collect the same piece of information, such as an ID number, for every sample in every project. Within this root model, you can configure multiple metadata submodels, even at different levels. These submodels inherit all fields and groups from their parent models.
Illumina recommends that you limit the amount of fields or field groups you add to the root model. Fields can have various types containing single or multiple values and field groups contain fields that belong together, such as all fields related to quality metrics. If there are any misconfigured items in the root model, it will carry over into all other tenant metadata models. Once a root model is published, the fields and groups that are defined within it cannot be deleted, only more fields can be added.
Data Catalogue
Data Catalogues provide views on data from Illumina hardware and processes (Instruments, Cloud software, Informatics software and Assays) so that this data can be distributed to different applications. This data consists of read-only tables to prevent updates by the applications accessing it. Access to data catalogues is included with professional and enterprise subscriptions.
Available views
Project-level views
Pipeline Chaining on AWS
There are several ways to connect pipelines in ICAv2. One of them is to use Single Notification Service (SNS) and a Lambda function deployed on AWS. Once the initial pipeline is completed, SNS triggers the Lambda function. Lambda extracts information from the event parameter to create an API call to start the subsequent pipeline.
SNS
Notifications are used to subscribe to events in the platform and trigger the delivery of a message to an external delivery target. You can read more .
Important: In order to allow the platform to deliver events to Amazon SQS or SNS delivery targets, a cross-account policy needs to be added to the target Amazon service.
process {
maxRetries = 4
errorStrategy = { sleep(task.attempt * 60000 as long); return'retry'} // Retry with increasing delay
}
withName: 'process1|process2|process3' { scratch = '/scratch/' }
withName: 'process3' { stageInMode = 'copy' } // Copy the input files to scratch instead of symlinking to shared network disk
SELECT PIPELINE_NAME, COUNT(*) AS Appearances
FROM BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL
GROUP BY PIPELINE_NAME
ORDER BY Appearances DESC;
SELECT PIPELINE_NAME, COUNT(*) AS Appearances
FROM BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL
WHERE PIPELINE_STATUS = 'Failed'
GROUP BY PIPELINE_NAME
ORDER BY Appearances DESC;
FROM centos:7
WORKDIR /usr/local
# DEPENDENCIES
RUN yum -y install java-1.8.0-openjdk wget unzip perl && \
yum clean all && \
rm -rf /var/cache/yum
# INSTALLATION fastqc
RUN wget http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.9.zip --no-check-certificate && \
unzip fastqc_v0.11.9.zip && \
chmod a+rx /usr/local/FastQC/fastqc && rm -rf fastqc_v0.11.9.zip
# Adding FastQC to the PATH
ENV PATH $PATH:/usr/local/FastQC
# DEFAULTS
ENV LANG=en_US.UTF-8
ENV LC_ALL=en_US.UTF-8
ENTRYPOINT []
## how to build the docker image
## docker build --file fastqc-0.11.9.Dockerfile --tag fastqc-0.11.9:0 .
## docker run --rm -i -t --entrypoint /bin/bash fastqc-0.11.9:0
workspace-ctl data create-mount --mount-path /data/mounts/mydata --source /data/project/mydata
workspace-ctl data delete-mount --mount-path /data/mounts/mydata
Quote: The value (single character) that is used to quote data sections in a CSV/TSV file. When this character is encountered at the beginning and end of a field, it will be removed. For example, entering " as quote will remove the quotes from "bunny" and only store the word bunny itself.
Ignore unknown values: This applies to CSV-formatted files. You can use this function to handle optional fields without separators, provided that the missing fields are located at the end of the row. Otherwise, the parser can not detect the missing separator and will shift fields to the left, resulting in errors.
If headers are used: The columns that have matching fields are loaded, those that have no matching fields are loaded with NULL and remaining fields are discarded.
If no headers are used: The fields are loaded in order of occurrence and trailing missing fields are loaded with NULL, trailing additional fields are discarded.
10m
30m
1m
-
Status in ICA
status requested
status queued
status initializing
status preparing inputs
status in progress
status in progress
status in progress
status generating outputs
status succeeded
RNAseq
GWAS
Choose a new study name by selecting the radio button: Create new study and entering a Study Name.
To add new data to an existing Study, select the radio button: Select from list of studies and select an existing Study Name from the dropdown.
To add data to existing records or add new records, select Job Type, Append.
Append does not wipe out any data ingested previously and can be used to ingest the molecular data in an incremental manner.
To replace data, select Job Type, Replace. If you are ingesting data again, use the Replace job type.
Enter an optional Study description.
Select the metadata model (default: Cohorts; alternatively, select OMOP version 5.4 if your data is formatted that way.)
Select the genome build your molecular data is aligned to (default: GRCh38/hg38)
For RNAseq, specify whether you want to run differential expression (see below) or only upload raw TPM.
Click Next.
Navigate to VCFs located in the Project Data.
Select each single-sample VCF or multi-sample VCF to ingest. For GWAS, select CSV files produced by Regenie.
As an alernative to selecting individual files, you can also opt to select a folder instead. Toggle the radio button on Step 2 from "Select files" to "Select folder".
This option is currently only available for germline variant ingestion: any combination of small variants, structural variation, and/or copy number variants.
ICA Cohorts will scan the selected folder and all sub-folders for any VCF files or JSON files and try to match them against the Sample ID column in the metadata TSV file (Step 3).
Files not matching sample IDs will be ignored; allowed file extensions for VCF files after the sample ID are: *.vcf.gz, *.hard-filtered.vcf.gz, *.cnv.vcf.gz, and *.sv.vcf.gz .
Files not matching sample IDs will be ignored; allowed file extensions for JSON files after the sample ID are: .json,.json.gz, *.json.bgz, *.json.gzip.
Click Next.
Navigate to the metadata (phenotype) data tsv in the project Data.
Select the TSV file or files for ingestion.
Click Finish.
Once the results are displayed in the table, the spinner will disappear immediately
DRUG_EXPOSURE (optional), and
PROCEDURE_OCCURRENCE (optional.)
Field
Description
Project name
The ICA project for your cohort analysis (cannot be changed.)
Study name
Create or select a study. Each study represents a subset of data within the project.
Description
Short description of the data set (optional).
Job type
Append: Appends values to any existing values. If a field supports only a single value, the value is replaced.
Replace: Overwrites existing values with the values in the uploaded file.
Subject metadata files
Subject metadata file(s) in tab-delimited format.
For Append and Replace job types, the following fields are required and cannot be changed:
- Sample identifier
- Sample display name
- Subject identifier
- Subject display name
- Sex
Select this to create scratch space for your nodes. Enabling it will make the storage size selector appear. The stored data in this space is deleted when the instance is terminated. When you deselect this option, the storage size is 0.
Storage size
How much storage space (1GB - 16 TB) should be reserved per node as dedicated scratch space, available at /scratch
Resource lets you choose from various compute resources available. In this case, we are building a DRAGEN pipeline and we will need to select a resource with FPGA in it. Choose from FPGA resources (FPGA Medium/Large) based on your needs.
# Push a Container Image to a Private registry in Dockerhub
/data $ docker pull alpine:latest
/data $ docker tag alpine:latest registry.hub.docker.com/<privateContainerUri>:<tag>
/data $ docker push registry.hub.docker.com/<privateContainerUri>:<tag>
# Save a Container Image as a compressed archive
/data $ docker pull alpine:latest
/data $ docker save alpine:latest | bzip2 > /data/alpine_latest.tar.bz2
# List all local available images
/data $ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/library/alpine latest aded1e1a5b37 3 weeks ago 8.13 MB
# Remove a locally available image
/data $ docker rmi alpine:latest
# Run a Container as a normal user
/data $ docker run -it --rm alpine:latest
~ $ id
uid=1000(ica) gid=100(users) groups=100(users)
# Run a Container as root user
/data $ docker run -it --rm --userns keep-id:uid=0,gid=0 --user 0:0 alpine:latest
/ # id
uid=0(root) gid=0(root) groups=0(root)
/ # apk add rsync
...
/ # rsync
rsync version 3.4.0 protocol version 32
...
# Run a Container as a non-mapped root user
/data $ docker run -it --rm --user 0:0 alpine:latest
/ # id
uid=0(root) gid=0(root) groups=100(users),0(root)
/ # touch /data/myfile
/ #
# Exited the running Container back to the shell in the running Bench Workspace
/data $ ls -al /data/myfile
-rw-r--r-- 1 100000 100000 0 Mar 13 08:27 /data/myfile
FROM alpine:latest
RUN apk add rsync
COPY myfile /root/myfile
# Build a Container image locally
/data $ mkdir /tmp/buildContext
/data $ touch /tmp/buildContext/myFile
/data $ docker build -f /tmp/Dockerfile -t myimage:1.0 /tmp/buildContext
...
/data $ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/library/alpine latest aded1e1a5b37 3 weeks ago 8.13 MB
localhost/myimage 1.0 06ef92e7544f About a minute ago 12.1 MB
mkdir demo
cd demo
pipeline-dev import-from-nextflow nf-core/demo
/data/demo $ pipeline-dev import-from-nextflow nf-core/demo
Creating output folder nf-core/demo
Fetching project nf-core/demo
Fetching project info
project name: nf-core/demo
repository : https://github.com/nf-core/demo
local path : /data/.nextflow/assets/nf-core/demo
main script : main.nf
description : An nf-core demo pipeline
author : Christopher Hakkaart
Pipeline “nf-core/demo” successfully imported into nf-core/demo.
Suggested actions:
cd nf-core/demo
pipeline-dev run-in-bench
[ Iterative dev: Make code changes + re-validate with previous command ]
pipeline-dev deploy-as-flow-pipeline
pipeline-dev launch-validation-in-flow
conda create -n nextflow22
# If, like me, you never ran "conda init", do it now:
conda init
bash -l # To load the conda's bashrc changes
conda activate nextflow22
conda install -y nextflow=22
# Check
nextflow -version
# Then use the pipeline-dev tools as in the demo
As these are case sensitive, the upper and lowercasing must be respected.
If you want to query data from a table shared from another tenant (indicated in green), select the table (Projects > your_project > Base > Tables > your_table) to see the unique name. In the example below, the query will be select * from demo_alpha_8298.public.TestFiles
For more information on queries, please also see the snowflake documentation: https://docs.snowflake.com/en/user-guide/
select ICA:Data_reference as MY_DATA_REFERENCE from TestTable becomes:
select GET_IGNORE_CASE(TESTTABLE.ICA,'Data_reference') as MY_DATA_REFERENCE from TestTable
You can also modify the data to have consistent capital usage by executing the query update YOUR_TABLE_NAME set ica = object_delete(object_insert(ica, 'data_name', ica:Data_name), 'Data_name') and repeating this process for all field names (Data_name, Data_reference, Execution_reference, Pipeline_name, Pipeline_reference, Sample_name, Sample_reference, Tenant_name and Tenant_reference).
auto generated unique id
>.
Optionally, select Save to add the query to your saved queries list.
Perform one of the following actions:
Use—Open the query for editing and running in the Run tab. You can then select Run to execute the query again.
Save —Save the query to the saved queries list.
View Results—Download the results from a query or export results to a new table, view, or file in the project. Results are available for 24 hours after the query is executed. To view results after 24 hours, you need to execute the query again.
Select Open Query to open the query in the New Query tab from where it can be edited if needed and run by selecting Run Query.
Illumina recommends that you limit the amount of fields or field groups you add to the root model as this model can not be deprecated and anything you add to the root model can not be removed. You should always consider creating submodels before adding anything to the root model.
Do not use dots (.) in the metadata model names, fieldgroup names or field names as this can cause issues with field data.
When configuring a project, you can assign a published metadata model for all samples in the project. This metadata model can be any published metadata model in your tenant such as the root model, or one of the lower level submodels. When a metadata model is selected for a project, all fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.
Metadata gives information about a sample and can be provided by the user, the pipeline and the API. There are 2 general categories of metadata models: Project Metadata models and Pipeline Metadata models . Both models contain metadata fields and groups.
The project metadata model is specific per tenant. A Project metadata model has metadata linked to a specific project. Values are known upfront, general information is required for each sample of a specific project, and it may include general mandatory company information.
The pipeline metadata model is linked to a pipeline, not to a project and can be shared across tenants. Values are populated during pipeline execution and it requires an output file with the name 'metadata.response.json'.
Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled.
Each sample can have multiple metadata models. When you link a project metadata model to your project, you will see its groups and fields present on each sample. The root model from that tenant will be present as every metadata model inherits the groups and fields specified in the parent metadata model(s). When a pipeline is executed with single sample and the pipeline containing a metadata model, the groups and fields will be present as well for each analysis resulting from a pipeline execution.
Creating a Metadata Model
In the main navigation, go to System Settings > Metadata Models. Here you will see the root metadata model and any underlying sub-metadata models. To create a new submodel, select +Create at the bottom of the screen.
The new metadata model screen will show an overview of all the higher-level metadata models. use the down arrow next to the model name to expand these for more information.
For your new metadata model, add a unique name and optional description. Once this is done, start adding the metadata fields with the +Add button. The field type will determine the parameters which you can configure.
To edit your metadata model later on, select it and choose Manage > Edit. Keep in mind that fields can be added, but not removed once the model is published.
Field Types & Properties
field types
Text
Free text
Keyword
Automatically complete value based on already used values
Numeric
Only numbers
Boolean
True or false, cannot be multiple value
Date
e.g. 23/02/2022
Date time
e.g. 23/02/2022 11:43:53, saved in UTC
The following properties can be selected for groups & fields:
Propery
Required
Pipeline can not be started with this sample until the required group/field is filled in.
Sensitive
Values of this group/field are only visible to project users of the own tenant. When a sample is shared across tenants, these fields will not be visible.
Multi value
This group/field can consist of multiple (grouped) values
Filled by pipeline
Fields that need to be filled by pipeline should be part of the same group. This group will automatically be multiple value and values will be available after pipeline execution. This property is only available for the Field Group type.
If you have fields that are filled by the pipeline you can create an example JSON structure indicating what the json in an analysis output file with name metadata.response.json should look like to fill in the metadata fields of this model. Use System Settings > Metadata Models > your_metadata_model > Manage > Generate example JSON. Only fields in groups marked as Filled by pipeline are included.
Fields cannot be both required and filled by pipeline at the same time.
To help retrieve the field values via API calls, you can use System Settings > Metadata Models > your_metadata_model > Manage > Show Field Paths.
Metadata Actions
Publish a Metadata Model
Newly created and updated metadata models are not available for use within the tenant until the metadata model is published. Once a metadata model is published, fields and field groups cannot be deleted, but the names and descriptions for fields and field groups can be edited. A model can be published after verifying all parent models are published first. To publish your model, select System Settings > Metadata Models > your_metadata_model > Manage > Publish.
Retire a Metadata Model
If a published metadata model is no longer needed, you can retire the model (except the root model). Once a model is retired, it can be published again in case you would need to reactivate it.
First, check if the model contains any submodels. A model cannot be retired if it contains any published submodels.
When you are certain you want to retire a model and all submodels are retired, select System Settings > Metadata Models > your_metadata_model > Manage > Retire Metadata Model.
Assign a Metadata Model to a Project
To add metadata to your samples, you first need to assign a metadata model to your project.
Go to Projects > your_project > Project Settings > Details.
Select Edit.
From the Metadata Model drop-down list, select the metadata model you want to use for the project.
Select Save. All fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.
Add Metadata to Samples Manually
If you have a metadata model assigned to your project, you can manually fill out the defined metadata of the samples in your project:
Go to Projects > your_project > Samples > your_sample.
Click your sample to open the sample details and choose Edit Sample.
Enter all metadata information as it applies to the selected sample. All required metadata fields must be populated or the pipeline will not be able to start.
Select Save
Populating a Pipeline Metadata Model
To fill metadata by pipeline executions, a pipeline model must be created.
In the main navigation, go to Projects > your_project > Flow > Pipelines > your_pipeline.
Click on your pipeline to open the pipeline details and choose Edit.
Create/Edit your model under Metadata Model tab. Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled.
In order for your pipeline to fill the metadata model, an output file with the name metadata.response.json must be generated. After adding your group fields to the pipeline model, click on Generate example JSON to view the required format for your pipeline.
Use System Settings > Metadata Models > your_metadata_model > Manage > Generate example JSON to see an example JSON for these fields.
The field names cannot have . in them, e.g. for the metric name Q30 bases (excl. dup & clipped bases) the . after excl must be removed.
Pushing Metadata Metrics to Base
Populating metadata models of samples allows having a sample-centric view of all the metadata. It is also possible to synchronize that data into your project's Base warehouse.
In ICA, select Projects > your_project >Base > Schedule.
Select +Create > From metadata.
Type a name for your schedule, optionally add a description, and set it to active. You can select if sensitive metadata fields should be included as values of sensitive metadata fields will not be visible to other users outside of the project.
Select Save.
Navigate to Base > Tables in your project.
Two new table schemas should be added with your current metadata models.
CLARITY_SEQUENCINGRUN_VIEW_tenant (sequencing run data coming from the lab workflow software)
CLARITY_SAMPLE_VIEW_tenant (sample data coming from the lab workflow software)
CLARITY_LIBRARY_VIEW_tenant (library data coming from the lab workflow software)
CLARITY_EVENT_VIEW_tenant (event data coming from the lab workflow software)
ICA_DRAGEN_QC_METRIC_ANALYSES_VIEW (quality control metrics)
Preconditions for view content
DRAGEN metrics will only have content when DRAGEN pipelines have been executed.
Analysis views will only have content when analyses have been executed.
Views containing Clarity data will only have content if you have a Clarity LIMS instance with minimum version 6.0 and the Product Analytics service installed and configured. Please see the Clarity LIMS documentation for more information.
When you use your storage in a project, metrics can not be collected and thus the DRAGEN METRICS - related views can not be used.
Who can add or remove Catalogue data (views) to a project?
Members of a project, who have both base contributor and project contributor or administrator rights and who belong to the same tenant as the project can add views from a Catalogue. Members of a project with the same rights who do not belong to the same tenant can remove the catalogue views from a project. Therefore, if you are invited to collaborate on a project, but belong to a different tenant, you can remove catalogue views, but cannot add them again.
Adding Catalogue data (views) to your project
To add Catalogue data,
Go to Projects > your_project > Base > Tables.
Select Add table > Import from Catalogue.
A list of available views will be displayed. (Note that views which are already part of your project are not listed)
Select the table you want to add and choose +Select
Catalogue data will have View as type, the same as tables which are linked from other projects.
Removing Catalogue data (views) from your project
To delete Catalogue data,
go to Projects > your_project > Base > Tables.
Select the table you want to delete and choose Delete.
A warning will be presented to confirm your choice. Once deleted, you can add the Catalogue data again if needed.
Description: An explanation of which data is contained in the view.
Category: The identification of the source system which provided the data.
Tenant/project. Appended to the view name as _tenant or _project. Determines if the data is visible for all projects within the same tenant or only within the project. Only the tenant administrator can see the non-project views.
Catalogue table details (Table Schema Definition)
In the Projects > your_project > Base > Tables view, double-click the Catalogue table to see the details. For an overview of the available actions and details, see Tables.
Querying views
In this section, we provide examples of querying selected views from the Base UI, starting with ICA_PIPELINE_ANALYSES_VIEW (project view). This table includes the following columns: TENANT_UUID, TENANT_ID, TENANT_NAME, PROJECT_UUID, PROJECT_ID, PROJECT_NAME, USER_UUID, USER_NAME, and PIPELINE_ANALYSIS_DATA. While the first eight columns contain straightforward data types (each holding a single value), the PIPELINE_ANALYSIS_DATA column is of type VARIANT, which can store multiple values in a nested structure. In SQL queries, this column returns data as a JSON object. To filter specific entries within this complex data structure, a combination of JSON functions and conditional logic in SQL queries is essential.
Since Snowflake offers robust JSON processing capabilities, the FLATTEN function can be utilized to expand JSON arrays within the PIPELINE_ANALYSIS_DATA column, allowing for the filtering of entries based on specific criteria. It's important to note that each entry in the JSON array becomes a separate row once flattened. Snowflake aligns fields outside of this FLATTEN operation accordingly, i.e. the record USER_ID in the SQL query below is "recycled".
The following query extracts
USER_NAME directly from the ICA_PIPELINE_ANALYSES_VIEW_project table.
PIPELINE_ANALYSIS_DATA:reference and PIPELINE_ANALYSIS_DATA:price. These are direct accesses into the JSON object stored in the PIPELINE_ANALYSIS_DATA column. They extract specific values from the JSON object.
Entries from the array 'steps' in the JSON object. The query uses LATERAL FLATTEN(input => PIPELINE_ANALYSIS_DATA:steps) to expand the steps array within the PIPELINE_ANALYSIS_DATA JSON object into individual rows. For each of these rows, it selects various elements (like bpeResourceLifeCycle, bpeResourcePresetSize, etc.) from the JSON.
Furthermore, the query filters the rows based on the status being 'FAILED' and the stepId not containing the word 'Workflow': it allows the user to find steps which failed.
Now let's have a look at DRAGEN_METRICS_VIEW_project view. Each DRAGEN pipeline on ICA creates multiple metrics files, e.g. SAMPLE.mapping_metrics.csv, SAMPLE.wgs_coverage_metrics.csv, etc for DRAGEN WGS Germline pipeline. Each of these files is represented by a row in DRAGEN_METRICS_VIEW_project table with columns ANALYSIS_ID, ANALYSIS_UUID, PIPELINE_ID, PIPELINE_UUID, PIPELINE_NAME, TENANT_ID, TENANT_UUID, TENANT_NAME, PROJECT_ID, PROJECT_UUID, PROJECT_NAME, FOLDER, FILE_NAME, METADATA, and ANALYSIS_DATA. ANALYSIS_DATA column contains the content of the file FILE_NAME as an array of JSON objects. Similarly to the previous query we will use FLATTEN command. The following query extracts
Sample name from the file names.
Two metrics 'Aligned bases in genome' and 'Aligned bases' for each sample and the corresponding values.
The query looks for files SAMPLE.wgs_coverage_metrics.csv only and sorts based on the sample name:
Lastly, you can combine these views (or rather intermediate results derived from these views) using the WITH and JOIN commands. The SQL snippet below demonstrates how to join two intermediate results referred to as 'flattened_dragen_scrna' and 'pipeline_table'. The query:
Selects two metrics ('Invalid barcode read' and 'Passing cells') associated with single-cell RNA analysis from records where the FILE_NAME ends with 'scRNA.metrics.csv', and then stores these metrics in a temporary table named 'flattened_dragen_scrna'.
Retrieves metadata related to all scRNA analyses by filtering on the pipeline ID from the 'ICA_PIPELINE_ANALYSES_VIEW_project' view and stores this information in another temporary table named 'pipeline_table'.
Joins the two temporary tables using the JOIN operator, specifying the join condition with the ON operator.
An example how to obtain the costs incurred by the individual steps of an analysis
You can use ICA_PIPELINE_ANALYSES_VIEW to obtained the costs of individual steps of an analysis. Using the following SQL snippet you can retrieve the costs of individual steps for every analyses run in the past week.
Limitations
Data Catalogue views cannot be shared as part of a Bundle.
Data size is not shown for views because views are a subset of data.
By removing Base from a project, the Data Catalogue will also be removed from that project.
Best Practices
As tenant-level Catalogue views can contain sensitive data, it is best to save this (filtered) data to a new table and share that table instead of sharing the entire view as part of a project. To do so, add your view to a separate project and run a query on the data at Projects > your_project > Base > Query > New Query. When the query completes, you can export the result as a new table. This ensures no new data will be added on consequent runs.
with arn being the Amazon Resource Name (ARN) of the target SNS topic. Once the SNS is created in AWS, you can create a New ICA Subscription in Projects > your_project > Project Settings > Notifications > New ICA Subscription. The following screenshot displays the settings of a subscription for Analysis success of a pipeline with the name starting with Hello.
Screenshot
ICA API endpoints
On this site there is a list of all available API endpoints for ICA. To use it, obtain the API-Key from the Illumina ICA portal.
Starting a Nextflow pipeline using the API
To start a Nextflow pipeline using the API, use the endpoint /api/projects/{projectId}/analysis:nextflow. Provide the projectID and the reference body in JSON format containing userReference, pipelineId, analysisInput etc. Two parameters activationCodeDetailId and analysisStorageId have to be retrieved using the API endpoint api/activationCodes:findBestMatchingForNextflow from Entitlement Detail section in Swagger. For example:
ScreenshotActivation
Output of the API call:
In this particular case, the activationCodeDetailId is "6375eb43-e865-4d7c-a9e2-2c153c998a5c" and analysisStorageId is "6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0" (for resource type "Small").
Once you have all these parameters, you can start the pipeline using API.
Setup of Lambda function
Next, create a new Lambda function in the AWS Management Console. Choose Author from scratch and select Python3.7 (includes requests library) as the runtime. In the Function code section, write the code for the Lambda function that will use different Python modules and execute API calls to the existing online application. Add the SNS created above as a trigger.
Example
Here is an example of a Python code to check if there is file named 'test.txt' in the output of the successful pipeline. If the file exists, a new API call will be made to invoke the second pipeline with this 'test.txt' as an input.
The Pipeline Development Kit in Bench makes it easy to create Nextflow pipelines for ICA Flow. This kit consists of a number of development tools which are installed in /data/.software (regardless of which Bench image is selected) and provides the following features:
Import to Bench
From public nf-core pipelines
From existing ICA Flow Nextflow pipelines
Run in Bench
Modify and re-run in Bench, providing fast development iterations
Deploy to Flow
Launch validation in Flow
Prerequisites
Recommended workspace size: Nf-core Nextflow pipelines typically require 4 or more cores to run.
The pipeline development tools require
Conda which is automatically installed by “pipeline-dev” if conda-miniconda.installer.ica-userspace.sh
NextFlow Requirements / Best Practices
Pipeline development tools work best when the following items are defined:
Nextflow profiles:
test profile, specifying inputs appropriate for a validation run
docker profile, instructing NextFlow to use Docker
ICA Flow adds one additional constraint. The output directoryout is the only one automatically copied to the Project data when an ICA Flow Analysis completes. The -outdir parameter recommended by nf-core should therefore be set to--outdir=out when running as a Flow pipeline.
Pipeline Development Tools
New Bench pipeline development tools only become active after a workspace reboot.
These are installed in /data/.software (which should be in your $PATH), the pipeline-dev script is the front-end to the other pipeline-dev-* tools.
Pipeline-dev fulfils a number of roles:
Checks that the environment contains the required tools (conda, nextflow, etc) and offers to install them if needed.
Checks that the fast data mounts are present (/data/mounts/project etc.) – it is useful to check regularly, as they get unmounted when a workspace is stopped and restarted.
Redirects stdout and stderr to .pipeline-dev.log, with the history of log files kept as .pipeline-dev.log.<log date>
Usage
1) Starting a new Project
A pipeline-dev project relies on the following Folder structure, which is auto-generated when using the pipeline-dev import* tools.
If you start a project manually, you must follow the same folder structure.
Project base folder
nextflow-src: Platform-agnostic Nextflow code, for example the github contents of an nf-core pipeline, or your usual nextflow source code.
main.nf
Pipeline Sources
The above-mentioned project structure must be generated manually. The nf-core CLI tools can assist to generate the nextflow_schema.json. Tutorial goes into more details about this use case.
A directory with the same name as the nextflow/nf-core pipeline is created, and the Nextflow files are pulled into the nextflow-src subdirectory.
Tutorial goes into more details about this use case.
A directory called imported-flow-analysis
2) Running in Bench
Optional parameters --local / --sge can be added to force the execution on the local workspace node, or on the workspace cluster (when available). Otherwise, the presence of a cluster is automatically detected and used.
The script then launches nextflow. The full nextflow command line is printed and launched.
In case of errors, full logs are saved as .pipeline-dev.log
Currently, not all corner cases are covered by command line options. Please start from the nextflow command printed by the tool and extend it based on your specific needs.
Output Example
Container (Docker) images
Nextflow can run processes with and without Docker images. In the context of pipeline development, the pipeline-dev tools assume Docker images are used, in particular during execution with the nextflow --profile docker.
In NextFlow, Docker images can be specified at the process level
This is done with the container "<image_name:version>" directive, which can be specified
in nextflow config files (preferred method when following the nf-core best practices)
or at the start of each process definition.
Resources such as #cpu and memory can be specified as described See or our for details about Nextflow-Docker syntax.
Bench can push/pull/create/modify Docker images, as described in .
3) Deploying to ICA Flow
This command does the following:
Generate the JSON file describing the ICA Flow user interface.
If ica-flow-config/inputForm.json doesn’t exist: generate it from nextflow-src/nextflow_swagger.json .
Output Example:
The pipeline name, id and URL are printed out, and if your environment allows, Ctrl+Click/Option+Click/Right click can open the URL in a browser.
Opening the URL of the pipeline and clicking on Start Analysis shows the generated user interface:
4) Launching Validation in Flow
The ica-flow-config/launchPayload_inputFormValues.json file generated in the previous step is submitted to ICA Flow to start an analysis with the same validation inputs as “nextflow --profile test”.
Output Example:
The analysis name, id and URL are printed out, and if your environment allows, Ctrl+Click/Option+Click/Right click can open the URL in a browser.
Tutorials
Create a Cohort
ICA Cohorts lets you create a research cohort of subjects and associated samples based on the following criteria:
Project:
Include subjects that are part of any ICA Project that you own or that is shared with you.
Sample:
Sample type such as FFPE.
Tissue type.
Sequencing technology: Whole genome DNA-sequencing, RNAseq, single-cell RNAseq, etc.
Subject:
Subject inclusion by Identifier:
Input a list of Subject Identifiers (up to 100 entries) when defining a cohort.
Sample:
Sample type such as FFPE.
Tissue type.
Sequencing technology: Whole genome DNA-sequencing, RNAseq, single-cell RNAseq, etc.
Disease:
Phenotypes and diseases from standardized ontologies.
Drug:
Drugs from standardized ontologies along with specific typing, stop reasons, drug administration routes, and time points.
Molecular attributes:
Samples with a somatic mutation in one or multiple, specified genes.
Samples with a germline variant of a specific type in one or multiple, specified genes.
Disease search
ICA Cohorts currently uses six standard medical ontologies to 1) annotate each subject during ingestion and then to 2) search for subjects: HPO for phenotypes, MeSH, SNOMED-CT, ICD9-CM, ICD10-CM, and OMIM for diseases. By default, any 'type-ahead' search will find matches from all six; and you can limit the search to only the one(s) you prefer. When searching for subjects using names or codes from one of these ontologies, ICA Cohorts will automatically match your query against all the other ontologies, therefore returning subjects that have been ingested using a corresponding entry from another ontology.
In the 'Disease' tab, you can search for subjects diagnosed with one or multiple diseases, as well as phenotypes, in two ways:
Start typing the English name of a disease/phenotype and pick from the suggested matches. Continue typing if your disease/phenotype of interest is not listed initially.
Use the mouse to select the term or navigate to the term in the dropdown using the arrow buttons.
If applicable, the concept hierarchy is shown, with ancestors and immediate children visible.
Drug Search
In the 'Drug' tab, you can search for subjects who have a specific medication record:
Start typing the concept name for the drug and pick from suggested matches. Continue typing if the drug is not listed initially.
Paste one or multiple drug concept codes. ICA Cohorts currently use RXNorm as a standard ontology during ingestion. If multiple concepts are in your instance of ICA Cohorts, they will be listed under 'Concept Ontology.'
'Drug Type' is a static list of qualifiers that denote the specific administration of the drug. For example, where the drug was dispensed.
Measurement Search
In the ‘Measurements’ tab, you can search for vital signs and laboratory test data leveraging LOINC concept codes. ·
Start typing the English name of the LOINC term, for example, ‘Body height’. A dropdown will appear with matching terms. Use the mouse or down arrows to select the term.
Upon selecting a term, the term will be available for use in a query.
Terms can be added to your query criteria.
Include/Exclude
As attributes are added to the 'Selected Condition' on the right-navigation panel, you can choose to include or exclude the criteria selected.
Select a criterion from 'Subject', 'Disease', and/or 'Molecular' attributes by filling in the appropriate checkbox on the respective attribute selection pages.
When selected, the attribute will appear in the right-navigation panel.
Once you selected Create Cohort, the above data are organized in tabs such as Project, Subject, Disease, and Molecular. Each tab then contains the aforementioned sections, among others, to help you identify cases and/or controls for further analysis. Navigate through these tabs, or search for an attribute by name to directly jump to that tab and section, and select attributes and values that are relevant to describe your subjects and samples of interest. Assign a new name to the cohort you created, and click Apply to save the cohort.
Duplicate a Cohort Definition
After creating a Cohort, select the Duplicate icon.
A copy of the Cohort definition will be created and tagged with "_copy".
Delete a Cohort Definition
Deleting a Cohort Definition can be accomplished by clicking the Delete Cohort icon.
This action cannot be undone.
Sharing a Cohort within an ICA Project
After creating a Cohort, users can set a Cohort bookmark as Shared. By sharing a Cohort, the Cohort will be available to be applied across the project by other users with access to the Project. Cohorts created in a Project are only accessible at scope of the user. Other users in the project cannot see the cohort created unless they use this sharing functionality.
Share Cohort Definition
Create a Cohort using the directions above.
To make the Cohort available to other users in your Project, click the Share icon.
The Share icon will be filled in black and the Shared Status will be turned from Private to
Unshare a Cohort Definition
To unshare the Cohort, click the Share icon.
The icon will turn from black to white, and other users within the project will no longer have access to this cohort definition.
Archive a Cohort Definition
A Shared Cohort can be Archived.
Select a Shared Cohort with a black Shared Cohort icon.
Click the Archive Cohort icon.
Sharing a Cohort as Bundle
You can link cohorts data sets to a bundle as follows:
Create or edit a bundle at Bundles from the main navigation.
Navigate to Bundles > your_bundle > Cohorts > Data Sets.
Select Link Data Set to Bundle.
If you can not find the cohorts data sets which you want to link, verify if
Your data set is part of a project (Projects > your_project > Cohorts > Data Sets)
This project is set to Data Sharing (Projects > your_project > Project Settings > Details)
Stop sharing a Cohort as Bundle
You can unlink cohorts data sets from bundles as follows:
Edit the desired bundle at Bundles from the main navigation.
Navigate to Bundles > your_bundle > Cohorts > Data Sets.
Select the cohorts data set which you want to unlink.
Nextflow
ICA supports running pipelines defined using Nextflow. See this tutorial for an example.
In order to run Nextflow pipelines, the following process-level attributes within the Nextflow definition must be considered.
System Information
Version 20.10 on Illumina Connected Analytics will be obsoleted on April 22nd, 2026. After this date, all existing pipelines using Nextflow v20.10 will no longer run.
See
The following table shows when which Nextflow version is
default (⭐) This version will be proposed when creating a new Nextflow pipeline.
supported (✅) This version can be selected when you do not want the default Nextflow version.
deprecated (⚠️) This version can not be selected for new pipelines, but pipelines using this version will still work.
The switchover happens in the January release of that year.
Nextflow Version
You can select the Nextflow version while building a pipeline as follows:
Compute Type
To specify a compute type for a Nextflow process, you can either define the cpu and memory (recommended) or use the compute type sizes (required for specific hardware such as FPGA2).
Do not mix these definition methods within the same process2, use either one or the other method.
CPU and Memory
Specify the task resources using Nextflow directives in both the workflow script (.nf) and the configuration file (nextflow.config) cpus defines the number of CPU cores allocated to the process, memory defines the amount of RAM which will be allocated.
Process file example
Configuration file example
ICA will convert the required resources to the correct predefined size. This enables porting public Nextflow pipelines without configuration changes.
Predefined Sizes
To use the predefined sizes, use the within each process. Set the annotation to scheduler.illumina.com/presetSize and the value to the desired compute type. The default compute type, when this directive is not specified, is standard-small (2 CPUs and 8 GB of memory).
For example, if you want to use , you need to add the line below
Often, there is a need to select the compute size for a process dynamically based on user input and other factors. The Kubernetes executor used on ICA does not use the cpu and memorydirectives, so instead, you can dynamically set the pod directive, as mentioned . e.g.
It can also be specified in the . See the example configuration below:
Standard vs Economy
Concept
For each compute type, you can choose between the
scheduler.illumina.com/lifecycle: standard - (Default) or
You can switch to economy in the process itself with the pod directive or in the nextflow.config file.
Process example
nextlow.config example
Inputs
Inputs are specified via the form or . The specified code in the XML will correspond to the field in the params object that is available in the workflow. Refer to the for an example.
Outputs
Outputs for Nextflow pipelines are uploaded from the out folder in the attached shared filesystem. The can be used to symlink (recommended), copy or move data to the correct folder. Symlinking is faster and does not increase storage cost as it creates a file pointer instead of copying or moving data. Data will be uploaded to the ICA project after the pipeline execution completes.
Nextflow version 20.10.10 (Deprecated)
Version 20.10 will be obsoleted on April 22nd, 2026. After this date, all existing pipelines using Nextflow v20.10 will no longer be able to run.
For Nextflow version 20.10.10 on ICA, using the "copy" method in the publishDir directive for uploading output files that consume large amounts of storage may cause workflow runs to complete with missing files. The underlying issue is that file uploads may silently fail (without any error messages) during the publishDir process due to insufficient disk space, resulting in incomplete output delivery.
Solutions:
Nextflow Configuration
During execution, the Nextflow pipeline runner determines the environment settings based on values passed via the command-line or via a configuration file (see ). When creating a Nextflow pipeline, use the nextflow.config tab in the UI (or API) to specify a nextflow configuration file to be used when launching the pipeline.
Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list.
If no Docker image is specified, Ubuntu will be used as default.
The following configuration settings will be ignored if provided as they are overridden by the system:
Best Practices
Process Time
Setting a timeout to between 2 and 4 times the expected processing time with the directive for processes or task will ensure that no stuck processes remain indefinitely. Stuck process keep incurring costs for the occupied resources, so if the process can not complete within that timespan, it is safer and more economical to end the process and retry.
Sample Sheet File Ingestion
When you want to use a sample sheet with references to files as Nextflow input, add an extra input to the pipeline. This extra input lets the user select the samplesheet-mentioned files from their project. At run time, those files will get staged in the working directory, and when Nextflow parses the samplesheet and looks for those files without paths, it will find them there. You can not use file paths in a sample sheet without selecting the files in the input form because files are only passed as file/folder ids in the API payload when the analysis is launched.
You can include public data such as http urls because Nextflow is able download those. Nextflow is also able to download publicly accessible S3 urls (s3://...). You can not use Illumina's urn:ilmn:ica:region:... structure.
Creating a Pipeline from Scratch
Introduction
This tutorial shows you how to start a new pipeline from scratch
For this tutorial, any instance size will work, even the smallest standard-small.
Select the single user workspace permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines.
A small amount of disk space (10GB) will be enough.
We are going to wrap the "gzip" linux compression tool with inputs:
1 file
compression level: integer between 1 and 9
We intentionally do not include sanity checks, to keep this scenario simple.
Creation of test file:
Wrapping in Nextflow
Here is an example of NextFlow code that wraps the bzip2 command and publishes the final output in the “out” folder:
nextflow-src/main.nf
Save this file as nextflow-src/main.nf, and check that it works:
Result
Wrap the Pipeline in Bench
We now need to:
Use Docker
Follow some nf-core best practices to make our source+test compatible with the pipeline-dev tools
Using Docker:
In NextFlow, Docker images can be specified at the process level
Each process may use a different docker image
It is highly recommended to always specify an image. If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.
Specifying the Docker image is done with the container '<image_name:version>' directive, which can be specified
at the start of each process definition
or in nextflow config files (preferred when following nf-core guidelines)
For example, create nextflow-src/nextflow.config:
We can now run with nextflow's -with-docker option:
Following some nf-core to make our source+test compatible with the pipeline-dev tools:
Create NextFlow “test” profile
Here is an example of “test” profile that can be added to nextflow-src/nextflow.config to define some input values appropriate for a validation run:
nextflow-src/nextflow.config
With this profile defined, we can now run the same test as before with this command:
Create NextFlow “docker” profile
A “docker” profile is also present in all nf-core pipelines. Our pipeline-dev tools will make use of it, so let’s define it:
nextflow-src/nextflow.config
We can now run the same test as before with this command:
We also have enough structure in place to start using the pipeline-dev command:
In order to deploy our pipeline to ICA, we need to generate the user interface input form.
This is done by using nf-core's recommended nextflow_schema.json.
For our simple example, we generate a minimal one by hand (done by using one of the nf-core pipelines as example):
nextflow-src/nextflow_schema.json
In the next step, this gets converted to the ica-flow-config/inputForm.json file.
Note: For large pipelines, as described on the nf-core
Manually building JSONSchema documents is not trivial and can be very error prone. Instead, the nf-core pipelines schema build command collects your pipeline parameters and gives interactive prompts about any missing or unexpected params. If no existing schema is found it will create one for you.
We recommend looking into "nf-core pipelines schema build -d nextflow-src/", which comes with a web builder to add descriptions etc.
Deploy as a Flow Pipeline
We just need to create a final file, which we had skipped until now: Our project description file, which can be created via the command pipeline-dev project-info --init:
pipeline-dev.project_info
We can now run:
After generating the ICA-Flow-specific files in the ica-flow-config folder (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name).
It then asks if we want to update the latest version or create a new one.
Choose "3" and enter a name of your choice to avoid conflicts with all the others users following this same tutorial.
At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.
Run Validation Test in Flow
This launchesananalysis in ICA Flow, using the same inputs as the pipeline's "test" profile.
Some of the input files will have been copied to your ICA project in order for the analysis launch to work. They are stored in the folder /data/project/bench-pipeline-dev/temp-data.
Result
Projects
Introduction
When looking at the main ICA navigation, you will see the following structure:
Projects are your primary work locations which contain your data and tools to execute your analyses. Projects can be considered as a binder for your work and information. You can have data contained within a project, or you can choose to make it shareable between projects.
Reference Data are reference genome sets which you use to help look for deviations and to compare your data against.
Bundles are packages of assets such as sample data, pipelines, tools and templates which you can use as a curated data set. Bundles can be provided both by Illumina and other providers, and you can even create your own bundles. You will find the Illumina-provided pipelines in bundles.
Audit/Event Logs are used for audit purposes and issue resolving.
System Settings contain general information susch as the location of storage space, docker images and tool repositories.
Projects are the main dividers in ICA. They provide an access-controlled boundary for organizing and sharing resources created in the platform. The Projects view is used to manage projects within the current tenant.
There is a combined limit of 30,000 projects and bundles per tenant.
Create new Project
To create a new project, click the Projects > + Create button.
On the project creation screen, add information to create a project. See page for information about each field.
Required fields include:
Name
1-255 characters
Must begin with a letter
Click the Save button to finish creating the project. The project will be visible from the Projects view.
You may see projects with an additional Information field, this is for backwards compatibility as the field has been superseded by the Short description field.
Create with Storage Configuration
Refer to the documentation for details on creating a storage configuration.
During project creation, select the I want to manage my own storage checkbox to use a Storage Configuration as the data provider for the project.
With a storage configuration set, a project will have a 2-way sync with the external cloud storage provider: any data added directly to the external storage will be synchronized into the ICA project data, and any data added to the project will be synchronized into the external cloud storage.
If there is an issue with your storage configuration, it will be indicated on the project tile.
Managing Projects
Several tools are available to assist you with keeping an overview of your projects. These filters work in both list and tile view and persist across sessions.
Searching is a case-insensitive wildcard filter. Any project which contains the characters will be shown. Use * as wildcard in searches. Be aware that operators without search words are blocked and will result in Unexpected error occurred when searching for projects. You can use the brackets, AND, OR and NOT operators, provided that you do not start the search with them (Monkey AND Banana is allowed, AND Aardvark by itself is invalid syntax)
Filter by Workgroup : Projects in ICA can be accessible for different . This drop-down list allows you to filter projects for specific workgroups. To reset the filter so it displays projects from all your workgroups, use the x on the right which appears when a workgroup is selected.
Favorites : By clicking on the star next to the project name in the tile view, you set a project as favorite. You can have multiple favorites and use the Favorites checkbox to only show those favorites. This prevents having too many projects visible.
Tile view shows a grid of projects. This view is best suited if you only have a few projects or have filtered them out by creating favorites. A single click will open the project.
List view shows a list of projects. This view allows you to add additional filters on name, description, location, user role, tenant, size and analyses. Click on the project name to open it.
Items which are shown in list view have an Export option at the bottom of the screen. You can choose to support the entire page or only the selected rows in CSV, JSON or Excel format.
In tile view, your project tiles will show the project name, location, tenant, size, number of analyses and your role.
In list view, the star indicates your favourites while warnings, errors and information icons are displayed next to the project name. Hover over those icons to see the details.
If you are missing Projects
If you are missing projects, especially those been created by other users, the workgroup filter might still be active. Clear the filter with the x to the right. You can verify the list of projects to which you have access with the icav2 projects list.
Externally-managed projects
Illumina software applications which do their own data management on ICA (such as BSSH) store their resources and data in a project in the same was as manually created projects work in ICA. For ICA, these projects are considered externally-managed projects and there are a number of restrictions on which actions are allowed on externally-managed projects from within ICA. For example, you can not delete or move externally-managed data. This is to prevent inconsistencies when these applications want to access their own project data.
You can add and data such as to externally managed projects. Separation of data is ensured by only allowing additional files at the root level or in dedicated subfolders which you can create in your projects. Data which you have added can be moved and deleted again.
You can add to externally managed projects, provided those bundles do not come with additional restrictions for the project.
Tertiary modules such as are not supported for externally-managed projects.
Projects are indicated as externally-managed in the projects overview screen by a project card with a managed by app <app name> label.
You can keep track of which files are externally controlled and which are ICA-managed by means of the “managed by” column, visible in the data list view of externally-managed projects at Projects > your_project > Data.
Data Transfer
If you have an externally-manage project and want to move the data to another project, you need to:
from the externally-managed project to the other (new) project.
From within your external application, delete the data which is stored in the externally-managed project in ICA.
Notes
When you create a folder with a name which already exists as externally-managed folder, your project will have that folder twice. Once ICA-managed and once externally-managed.
Externally-managed projects protect their notification subscriptions to ensure no user can delete them. It is possible to add your own subscriptions to externally-managed projects, see for more information.
Tutorial
For a better understanding of how all components of ICA work, try the .
Sharing
You can share links to your project and content within projects to people who have to it. Sharing is done by copying the URL from your browser. This URL contains both the filters and the sort options which you have applied.
Nextflow Pipeline
In this tutorial, we will show how to create and launch a pipeline using the Nextflow language in ICA.
This tutorial references the Basic pipeline example in the Nextflow documentation.
Create the pipeline
The first step in creating a pipeline is to create a project. For instructions on creating a project, see the Projects page. In this tutorial, the project is named Getting Started.
After creating your project,
Open the project at Projects > your_project.
Navigate to the Flow > Pipelines view in the left navigation pane.
From the Pipelines view, click +Create > Nextflow > XML based to start creating the Nextflow pipeline.
In the Nextflow pipeline creation view, the Description field is used to add information about the pipeline. Add values for the required Code (unique pipeline name) , description and size fields.
Next we'll add the Nextflow pipeline definition. The pipeline we're creating is a modified version of the example from the Nextflow documentation. Modifications to the pipeline definition from the nextflow documentation include:
Add the container directive to each process with the desired ubuntu image. If no Docker image is specified, public.ecr.aws/lts/ubuntu:22.04_stable is used as default. To use the latest image, you can use container 'public.ecr.aws/lts/ubuntu:latest'
Add the publishDir directive with value 'out' to the reverse process.
The description of the pipeline from the linked Nextflow docs:
This example shows a pipeline that is made of two processes. The first process receives a FASTA formatted file and splits it into file chunks whose names start with the prefix seq_.
The process that follows, receives these files and it simply reverses their content by using the rev command line tool.
Resources: For each process, you can use the and to set the . ICA will then determine the best matching compute type based on those settings. Suppose you set memory '10240 GB' and cpus 6, then ICA will determine you need standard-large ICA Compute Type.
Syntax example:
Navigate to the Nextflow files > main.nf tab to add the definition to the pipeline. Since this is a single file pipeline, we don't need to add any additional definition files. Paste the following definition into the text editor:
Next we'll create the input form used when launching the pipeline. This is done through the XML Configuration tab. Since the pipeline takes in a single FASTA file as input, the XML-based input form will include a single file input.
Paste the below XML input form into the XML CONFIGURATION text editor. Click the Generate button to preview the launch form fields.
With the definition added and the input form defined, the pipeline is complete.
On the Documentation tab, you can add additional information about your pipeline. This information will be presented under the Documentation tab whenever a user starts a new analysis on the pipeline.
Click the Save button at the top right. The pipeline will now be visible from the Pipelines view within the project.
Launch the pipeline
Before we launch the pipeline, we'll need to upload a FASTA file to use as input. In this tutorial, we'll use a public FASTA file from the . Download the file and unzip to decompress the FASTA file.
To upload the FASTA file to the project, first navigate to the Data section in the left navigation pane. In the Data view, drag and drop the FASTA file from your local machine into the indicated section in the browser. Once the file upload completes, the file record will show in the Data explorer. Ensure that the format of the file is set to "FASTA".
Now that the input data is uploaded, we can proceed to launch the pipeline. Navigate to Projects > your_project > Flow > Analyses click on Start. Next, select your pipeline from the list.
Alternatively you can start your pipeline from Projects > your_project > Flow > Pipelines > your_pipeline > Start analysis.
In the Launch Pipeline view, the input form fields are presented along with some required information to create the analysis.
Enter a User Reference (identifier) for the analysis. This will be used to identify the analysis record after launching.
Set the Entitlement Bundle (there will typically only be a single option).
In the Input Files section, select the FASTA file for the single input file. (chr1_GL383518v1_alt.fa)
With the required information set, click the button to Start Analysis.
Monitor Analysis
After launching the pipeline, navigate to the Analyses view in the left navigation pane.
The analysis record will be visible from the Analyses view. The Status will transition through the analysis states as the pipeline progresses. It may take some time (depending on resource availability) for the environment to initialize and the analysis to move to the In Progress status.
Once the pipeline succeeds, the analysis record will show the "Succeeded" status. Do note that this may take considerable time if it is your first analysis because of the required resource management.
Click the analysis record to enter the analysis details view.
From the analysis details view, the logs produced by each process within the pipeline are accessible via the Steps tab.
View Results
Analysis outputs are written to an output folder in the project with the naming convention {Analysis User Reference}-{Pipeline Code}-{GUID}. (1)
Inside of the analysis output folder are the files output by the analysis processes written to the out folder. In this tutorial, the file test.txt (2) is written to by the reverse process. Navigating into the analysis output folder, clicking into the test.txt file details, and opening the VIEW tab (3) shows the output file contents.
The "Download" button (4) can be used to download the data to the local machine.
Base Basics
Base is a genomics data aggregation and knowledge management solution suite. It is a secure and scalable integrated genomics data analysis solution which provides information management and knowledge mining. Refer to the Base documentation for more details.
This tutorial provides an example for exercising the basic operations used with Base, including how to create a table, load the table with data, and query the table.
Prerequisites
An ICA project with access to Base
If you don't already have a project, please follow the instructions in the to create a project.
File to import
A tab delimited gene expression file (). Example format:
Create table
Tables are components of databases that store data in a 2-dimensional format of columns and rows. Each row represents a new data record in the table; each column represents a field in the record. On ICA, you can use Base to create custom tables to fit your data. A schema definition defines the fields in a table. On ICA you can create a schema definition from scratch, or from a template. In this activity, you will create a table for RNAseq count data, by creating a schema definition from scratch.
Go to the Projects > your_project > Base > Tables and enable Base by clicking on the Enable button.
Select Add Table > New Table.
Create your table
To create your table from scratch, select Empty Table from the Create table from dropdown.
Upload data to load into your table
Upload sampleX.final.count.tsv file with the final count.
Select Data tab (1) from the left menu.
Click on the grey box (2) to choose the file to upload or drag and drop the sampleX.final.count.tsv into the grey box
Create a schedule to load data into your table
Data can be loaded into tables manually or automatically. To load data automatically, you can set up a schedule. The schedule specifies which files’ data should be automatically loaded into a table, when those files are uploaded to ICA or created by an analyses on ICA. Active schedules will check for new files every 24 hours.
In this exercise, you will create a schedule to automatically load RNA transcript counts from .final.count.tsv files into the table you created above.
Go to Projects > your_project > Base > Schedule and click the + Add New button.
Select the option to load the contents from files into a table.
Create your schedule.
Name your schedule LoadFeatureCounts
Choose Project as the source of data for your table.
Highlight your schedule. Click the Run button to run your schedule now.
It will take a short time to prepare and load data into your table.
Check the status of your job on your Projects > your_project > Activity page.
Click the BASE JOBS tab to view the status of scheduled Base jobs.
Check the data in the table.
Go back to your Projects > your_project > Base > Tables page.
Double-click your table to view its details.
Query a table
To request data or information from a Base table, you can run an SQL query. You can create and run new queries or saved queries.
In this activity, we will create and run a new SQL query to find out how many records (RNA transcripts) in your table have counts greater than 100.
Go to your Projects > your_project > Base > Query page.
Paste the above query into the NEW QUERY text box
Click the Run Query button to run your query
View your query results.
Export table data
Find the table you want to export on the "Tables" page under BASE. Go to the table details page by clicking twice on the table you want to export.
Click on the Export As File icon and complete the required fields
Name: Name of the exported file
Data Format: A table can be exported in CSV and JSON format. The exported files can be compressed using GZIP, BZ2, DEFLATE or RAW_DEFLATE.
CSV Format: In addition to Comma, the file can be Tab, Pipe or Custom character delimited.
Export to single/multiple files: This option allows the export of a table as a single (large) file or multiple (smaller) files. If "Export to multiple files" is selected, a user can provide "Maximum file size (in bytes)" for exported files. The default value is 16,000,000 bytes but can be increased to accommodate larger files. The maximum file size supported is 5 GB.
Nextflow DRAGEN Pipeline
In this tutorial, we will demonstrate how to create and launch a simple DRAGEN pipeline using the Nextflow language in ICA GUI. More information about Nextflow on ICA can be found here. For this example, we will implement the alignment and variant calling example from this DRAGEN support page for Paired-End FASTQ Inputs.
Prerequisites
The first step in creating a pipeline is to select a project for the pipeline to reside in. If the project doesn't exist, create a project. For instructions on creating a project, see the Projects page. In this tutorial, we'll use a project called Getting Started.
After a project has been created, a DRAGEN bundle must be linked to a project to obtain access to a DRAGEN docker image. Enter the project by clicking on it, and click Edit in the Project Details page. From here, you can link a DRAGEN Demo Tool bundle into the project. The bundle that is selected here will determine the DRAGEN version that you have access to. For this tutorial, you can link DRAGEN Demo Bundle 4.0.3. Once the bundle has been linked to your project, you can now access the docker image and version by navigating back to the All Projects overview page, clicking on System Settings > Docker Repository, and double clicking on the docker image dragen-ica-4.0.3. The URL of this docker image will be used later in the container directive for your DRAGEN process defined in Nextflow.
Creating the pipeline
Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click +Create Pipeline > Nextflow > XML based to start creating a Nextflow pipeline.
In the Nextflow pipeline creation view, the Details tab is used to add information about the pipeline. Add values for the required Code (pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.
Next, add the Nextflow pipeline definition by navigating to the Nextflow files > main files > main.nf. You will see a text editor. Copy and paste the following definition into the text editor. Modify the container directive by replacing the current URL with the URL found in the docker image dragen-ica-4.0.3.
To specify a compute type for a Nextflow process, use the directive within each process.
Outputs for Nextflow pipelines are uploaded from the out folder in the attached shared filesystem. The directive specifies the output folder for a given process. Only data moved to the out folder using the publishDir directive will be uploaded to the ICA project after the pipeline finishes executing.
Refer to the for details on ICA specific attributes within the Nextflow definition.
Next, create the input form used for the pipeline. This is done through the XML CONFIGURATION tab. More information on the specifications for the input form can be found in page.
This pipeline takes two FASTQ files, one reference file and one sample_id parameter as input.
Paste the following XML input form into the XML CONFIGURATION text editor.
Click the Simulate button (at the bottom of the text editor) to preview the launch form fields.
Click the Save button to save the changes.
The dataInputs section specifies file inputs, which will be mounted when the pipeline executes. Parameters defined under the steps section refer to string and other input types.
Each of the dataInputs and parameters can be accessed in the Nextflow within the params object named according to the code defined in the XML (e.g. params.sample_id).
Running the pipeline
If you have no test data available, you need to link the Dragen Demo Bundle to your project at Projects > your_project > Project Settings > Details > Linked Bundles.
Go to the projects > your_project > flow > pipelines page from the left navigation pane. Select the pipeline you just created and click Start Analysis.
Fill in the required fields indicated by the asterisk "*" sign and click on Start Analysis button.
Results
You can monitor the run from the Projects > your_project > Flow > analysis page. Once the Status changes to Succeeded, you can click on the run to access the results.
Useful Links
Bundles
Bundles are curated data sets which combine assets such as pipelines, tools, and Base query templates. This is where you will find packaged assets such as Illumina-provided pipelines and sample data. You can create, share and use bundles in projects of your own as well as projects in other tenants.
There is a combined limit of 30 000 projects and bundles per tenant.
The following ICA assets can be included in bundles:
Team
Projects can be shared by updating the project's Team. You can add team members as
Existing user within the current tenant
By adding their E-mail address
Nextflow: Pipeline Lift
Nextflow Pipeline Liftover
What do these scripts do?
Nextflow CLI
Nextflow CLI
In this tutorial, we will demonstrate how to create and launch a Nextflow pipeline using the ICA command line interface (CLI).
SELECT
USER_NAME as user_name,
PIPELINE_ANALYSIS_DATA:reference as reference,
PIPELINE_ANALYSIS_DATA:price as price,
PIPELINE_ANALYSIS_DATA:totalDurationInSeconds as duration,
f.value:bpeResourceLifeCycle::STRING as bpeResourceLifeCycle,
f.value:bpeResourcePresetSize::STRING as bpeResourcePresetSize,
f.value:bpeResourceType::STRING as bpeResourceType,
f.value:completionTime::TIMESTAMP as completionTime,
f.value:durationInSeconds::INT as durationInSeconds,
f.value:price::FLOAT as price,
f.value:pricePerSecond::FLOAT as pricePerSecond,
f.value:startTime::TIMESTAMP as startTime,
f.value:status::STRING as status,
f.value:stepId::STRING as stepId
FROM
ICA_PIPELINE_ANALYSES_VIEW_project,
LATERAL FLATTEN(input => PIPELINE_ANALYSIS_DATA:steps) f
WHERE
f.value:status::STRING = 'FAILED'
AND f.value:stepId::STRING NOT LIKE '%Workflow%';
SELECT DISTINCT
SPLIT_PART(FILE_NAME, '.wgs_coverage_metrics.csv', 1) as sample_name,
f.value:column_2::STRING as metric,
f.value:column_3::FLOAT as value
FROM
DRAGEN_METRICS_VIEW_project,
LATERAL FLATTEN(input => ANALYSIS_DATA) f
WHERE
FILE_NAME LIKE '%wgs_coverage_metrics.csv'
AND (
f.value:column_2::STRING = 'Aligned bases in genome'
OR f.value:column_2::STRING = 'Aligned bases'
)
ORDER BY
sample_name;
WITH flattened_dragen_scrna AS (
SELECT DISTINCT
SPLIT_PART(FILE_NAME, '.scRNA.metrics.csv', 1) as sample_name,
ANALYSIS_UUID,
f.value:column_2::STRING as metric,
f.value:column_3::FLOAT as value
FROM
DRAGEN_METRICS_VIEW_project,
LATERAL FLATTEN(input => ANALYSIS_DATA) f
WHERE
FILE_NAME LIKE '%scRNA.metrics.csv'
AND (
f.value:column_2::STRING = 'Invalid barcode read'
OR f.value:column_2::STRING = 'Passing cells'
)
),
pipeline_table AS (
SELECT
PIPELINE_ANALYSIS_DATA:reference::STRING as reference,
PIPELINE_ANALYSIS_DATA:id::STRING as analysis_id,
PIPELINE_ANALYSIS_DATA:status::STRING as status,
PIPELINE_ANALYSIS_DATA:pipelineId::STRING as pipeline_id,
PIPELINE_ANALYSIS_DATA:requestTime::TIMESTAMP as start_time
FROM
ICA_PIPELINE_ANALYSES_VIEW_project
WHERE
PIPELINE_ANALYSIS_DATA:pipelineId = 'c9c9a2cc-3a14-4d32-b39a-1570c39ebc30'
)
SELECT * FROM flattened_dragen_scrna JOIN pipeline_table
ON
flattened_dragen_scrna.ANALYSIS_UUID = pipeline_table.analysis_id;
SELECT
USER_NAME as user_name,
PROJECT_NAME as project,
SUBSTRING(PIPELINE_ANALYSIS_DATA:reference, 1, 30) as reference,
PIPELINE_ANALYSIS_DATA:status as status,
ROUND(PIPELINE_ANALYSIS_DATA:computePrice,2) as price,
PIPELINE_ANALYSIS_DATA:totalDurationInSeconds as duration,
PIPELINE_ANALYSIS_DATA:startTime::TIMESTAMP as startAnalysis,
f.value:bpeResourceLifeCycle::STRING as bpeResourceLifeCycle,
f.value:bpeResourcePresetSize::STRING as bpeResourcePresetSize,
f.value:bpeResourceType::STRING as bpeResourceType,
f.value:durationInSeconds::INT as durationInSeconds,
f.value:price::FLOAT as priceStep,
f.value:status::STRING as status,
f.value:stepId::STRING as stepId
FROM
ICA_PIPELINE_ANALYSES_VIEW_project,
LATERAL FLATTEN(input => PIPELINE_ANALYSIS_DATA:steps) f
WHERE
PIPELINE_ANALYSIS_DATA:startTime > CURRENT_TIMESTAMP() - INTERVAL '1 WEEK'
ORDER BY
priceStep DESC;
The Subject Identifier filter is combined using AND logic with any other applied filters.
Within the list of subject identifiers, OR logic is applied (i.e., a subject matches if it is in the provided list).
Demographics such as age, sex, ancestry.
Biometrics such as body height, body mass index.
Family and patient medical history.
Samples over- or under-expressed in one or multiple, specified genes.
Samples with a copy number gain or loss involving one or multiple, specified genes.
For diagnostic hierarchies, concept children count and descendant count for each disease name is displayed.
Descendant Count: Displays next to each disease name in the tree hierarchy (e.g., "Disease (10)").
Leaf Nodes: No children count shown for leaf nodes.
Missing Counts: Children count is hidden if unavailable.
Show Term Count: A new checkbox below "Age of Onset" that is always checked. Unchecking it hides the descendant count.
Select a checkbox to include the diagnostic term along with all of its children and decedents.
Expand the categories and select or deselect specific disease concepts.
Paste one or multiple diagnostic codes separated by a pipe (‘|’).
'Stop Reason' is a static list of attributes describing a reason why a drug was stopped if available in the data ingestion.
'Drug Route' is a static list of attributes that describe the physical route of administration of the drug. For example, Intravenous Route (IV).
For each term, you can set a value `Greater than or equal`, `Equals`, `Less than or equal`, `In range`, or `Any value`.
`Any value` will find any record where there is an entry for the measurement independent of an available value.
Click `Apply` to add your criteria to the query.
Click `Update Now` to update the running count of the Cohort.Include/Exclude
You can use the 'Include' / 'Exclude' dropdown next to the selected attribute to decide if you want to include or exclude subjects and samples matching the attribute.
Note: the semantics of 'Include' work in such a way that a subject needs to match only one or multiple of the 'included' attributes in any given category to be included in the cohort. (Category refers to disease, sex, body height, etc.) For example, if you specify multiple diseases as inclusion criteria, subjects will only need to be diagnosed with one of them. Using 'Exclude', you can exclude any subject who matches one or multiple exclusion criteria; subjects do not have to match all exclusion criteria in the same category to be excluded from the cohort.
Note: This feature is not available on the 'Project' level selections as there is no overlap between subjects in datasets.
Note: Using exclusion criteria does not account for NULL values. For example, if the Super-population 'Europeans' is excluded, subjects will be in your cohort even if they do not contain this data point.
Shared
.
Other users with access to Cohorts in the Project can now apply the Cohort bookmark to their data in the project.
You will be asked to confirm this selection.
Upon archiving the Cohort definition, the Cohort will no longer be seen by other users in the Project.
The archived Cohort definition can be unarchived by clicking the Unarchive Cohort icon.
When the Cohort definition is unarchived, it will be visible to all users in the Project.
Select the data set which you want to link and +Select.
After a brief time, the cohorts data set will be linked to your bundle and ICA_BASE_100 will be logged.
Select Unlink Data Set from Bundle.
After a brief time, the cohorts data set will be unlinked from your bundle and ICA_BASE_101 will be logged.
Characters are limited to alphanumerics, hyphens, underscores, and spaces
Project Owner Owner (and usually contact person) of the project. The project owner has the same rights as a project administrator, but can not be removed from a project without first assigning another project owner. This can be done by the current project owner, the tenant administrator or a project administrator of the current project. Reassignment is done at Projects > your_project > Project Settings > Team > Edit.
Region Select your project location. Options available are based on Entitlement(s) associated with purchased subscription.
Analysis Priority (Low/Medium(default)/High) This is balanced per tenant with high priority analyses started first and the system progressing to the next lower priority once all higher priority analyses are running. Balance your priorities so that lower priority projects do not remain waiting for resources indefinitely.
Billing Mode Select if the costs of this project are to be charged to the tenant of the Project owner or the tenant of the user who is using the project.
Data Sharing Enable this if you want to allow the data from this project to be linked, moved or copied and used in other projects of your tenant. Disabling this is a convenient way to prevent your data from showing up in the list of available data to be linked, moved or copied in other projects. Even though this prevents copying and linking files and folders, it does not protect against someone downloading the files or copying the contents of your files from the viewer.
Storage Bundle This is auto-selected and appears when you select the Project Region.
Hidden projects : You can hide projects (Projects > your_project > Project settings > Details > Hide) which you no longer use. Hiding will delete data in base and bench and will thus be irreversible.
You can still see hidden projects if you select this option and delete the data they contain at Projects > your_project > Data to save on storage costs.
If you are using your own S3 bucket, your S3 storage will be unlinked from the project, but the data will remain in your S3 storage. Your S3 storage can then be used for other projects.
workspaces in externally-managed projects. The resulting data will be stored in the externally-managed project.
Project administrators and tenant administrators can disabledata sharing on externally managed projects at Projects > externally_managed_project > Project Settings > Details to prevent data from being copied or extracted.
Connect an AWS S3 Bucket with SSE-KMS Encryption Enabled
is present in the image.
Nextflow (version 24.10.2 is automatically installed using conda, or you can use other versions)
git (automatically installed using conda)
jq, curl (which should be made available in the image)
nextflow_schema.json, as described here. This is useful for the launch UI generation. The nf-core CLI tool (installable via pip install nf-core) offers extensive help to create and maintain this schema.
.
Launches the appropriate sub-tool.
Prints out errors with backtrace, to help report issues.
nextflow.config
nextflow_schema.json
pipeline-dev.project-info: contains project name, description, etc.
nextflow-bench.config (automatically generated when needed): contains definitions for bench.
ica-flow-config: Directory of files used when deploying pipeline to Flow.
inputForm.json (if not present, gets generated from nextflow-src/nextflow_schema.json): input form as defined in ICA Flow.
onSubmit.js, onRender.js (optional, generated at the same time as inputForm.json): javascript code to go with the input form.
launchPayload_inputFormValues.json (if not present, gets generated from the test profile): used by “pipeline-dev launch-validation-in-flow”.
is created and the analysis+pipeline assets are downloaded.
Currently only pipelines with publicly available Docker images are supported. Pipelines with ICA-stored images are not yet supported.
Each process can use a different docker image
It is highly recommended to always specify an image. If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.
Generate the JSON file containing the validation launch inputs.
If ica-flow-config/launchPayload_inputFormValues.json doesn’t exist: generate it from nextflow --profile test inputs.
If local files are used as validation inputs or as default input values:
copy them to /data/project/pipeline-dev-files/temp .
get their ICA file ids.
use these file ids in the launch specifications.
If remote files are used as validation inputs or as default input values of an input of type “file” (and not “string”): do the same as above.
Identify the pipeline name to use for this new pipeline deployment:
If a deployment has already occurred in this project, or if the project was imported from an existing Flow pipeline, start from this pipeline name. Otherwise start from the project name.
Identify which already-deployed pipelines have the same base name, with or without suffixes that could be some versioning (_v<number>, _<number>, _<date>) .
Ask the user if they prefer to update the current version of the pipeline, create a new version, or enter a new name of their choice – or use the --create/--update parameters when specified, for scripting without user interactions.
New ICA Flow pipeline gets created (except in case of pipeline update) .
The current Nextflow version in Bench is used to select the best Nextflow version to be used in Flow
nextflow-srcfolder is uploaded file by file as pipeline assets.
removed (❌). This version can not be selected when creating new pipelines and pipelines using this version will no longer work.
Use "symlink" instead of "copy" in the publishDir directive. Symlinking creates a link to the original file rather than copying it, which doesn’t consume additional disk space. This can prevent the issue of silent file upload failures due to disk space limitations.
Use Nextflow 22.04 or later and enable the "failOnError" publishDir option. This option ensures that the workflow will fail and provide an error message if there's an issue with publishing files, rather than completing silently without all expected outputs.
Guaranteed capacity with Full control of starting, stopping, and terminating.
Not guaranteed. Depends on unused AWS capacity. Can be terminated and reclaimed by AWS when the capacity is needed for other processes with 2 minutes notice.
Best for
Ideal for critical workloads and urgent scaling needs.
Best for cost optimization and non-critical workloads as interruptions can occur any time.
Uncheck the box next to Include reference, to exclude reference data from your table.
Check the box next to Edit as text. This will reveal a text box that can be used to create your schema.
Copy the schema text below and paste it in into the text box to create your schema.
Click the Save button
Refresh the screen (3)
The uploaded file (4) will appear on the data page after successful upload.
To specify that data from .final.count.tsv files should be loaded into your table, enter .final.count.tsv in the Search for a part of a specific ‘Orignal Name’ or Tag text box.
Specify your table as the one to load data into, by selecting your table (FeatureCounts) from the dropdown under Target Base Table.
Under Write preference, select Append to table. New data will be appended to your table, rather than overwriting existing data in your table.
The format of the .final.count.tsv files that will be loaded into your table are TSV/tab-delimited, and do not contain a header row. For the Data format, Delimiter, and Header rows to skip fields, use these values:
Data format: TSV
Delimiter: Tab
Header rows to skip: 0
Click the Save button
Click BASE ACTIVITY to view Base activity.
You will land on the SCHEMA DEFINITION page.
Click the PREVIEW tab to view the records that were loaded into your table.
Click the DATA tab, to view a list of the files whose data has been loaded into your table.
Save your query for future use by clicking the Save Query button. You will be asked to "Name" the query before clicking on the "Create" button.
JSON Format: Selecting JSON format exports the table in a text file containing a JSON object for each entry in the table. This is the standard snowflake behavior.
In this tutorial, we will create Simple RNA-Seq pipeline in ICA which includes four processes:
index creation
quantification
FastQC
MultiQC
We will also upload a Docker container to the ICA Docker repository for use within the pipeline.
main.nf
The 'main.nf' file defines the pipeline that orchestrates various RNASeq analysis processes.
The script uses the following tools:
Salmon: Software tool for quantification of transcript abundance from RNA-seq data.
FastQC: QC tool for sequencing data
MultiQC: Tool to aggregate and summarize QC reports
We need a Docker container containing these tools. For the sake of this tutorial, we will use the container from the original tutorial. You can refer to the "Build and push to ICA your own Docker image" section to build your own docker image with the required tools.
Docker image upload
With Docker installed in your computer, download the image required for this project using the following command.
docker pull nextflow/rnaseq-nf
Create a tarball of the image to upload to ICA.
Following are lists of commands that you can use to upload the tarball to your project.
Add the image to the ICA Docker repository
The uploaded image can be added to the ICA docker repository from the ICA Graphical User Interface (GUI).
Change the format for the image tarball to DOCKER:
Navigate to Projects > your_project > Data.
Check the checkbox for the uploaded tarball.
Click on Manage > Change format.
In the new popup window, select "DOCKER" format and save.
To add this image to the ICA Docker repository, first click on Projects to go back to the home page.
From the ICA home page, click on the SystemSettings > Docker Repository > Create > Image.
This will open a new window that lets you select the region (US, EU, CA) in which your your project is and the docker image from the bottom pane.
Edit the Name field to rename it. For this tutorial, we will change the name to "rnaseq". Select the region, and give it a version number, and description. Click on "Save".
If you have the images hosted in other repositories, you can add them as external image by using SystemSettings > Docker Repository > Create > External Image.
After creating a new docker image, you can click on the image to get the container URL (under Regions) for the nextflow configuration file.
Nextflow configuration file
Create a configuration file called "nextflow.config" in the same folder as the main.nf file above. Use the URL copied above to add the process.container line in the config file.
You can add a pod directive within a process or in the config file to specify a compute type. The following is an example of a configuration file with the 'standard-small' compute type for all processes. Please refer to the Compute Types page for a list of available compute types.
Parameters file
The parameters file defines the pipeline input parameters. Refer to the JSON or XML input for detailed information for creating correctly formatted parameters files.
An empty form looks as follows:
The input files are specified within a single dataInputs node with individual input file specified in a separate dataInput node. Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the pipeline, including but not limited to, strings, booleans, integers, etc..
For this tutorial, we do not have any settings parameters but it requires multiple file inputs. The parameters.xml file looks as follows:
Use the following commands to create the pipeline with the above contents in your project.
If not already in the project context, enter it by using the following command:
icav2 enter <PROJECT NAME or ID>
Create pipeline using icav2 project pipelines create nextflow Example:
If you prefer to organize the processes in different folders/files, you can use --other parameter to upload the different processes as additional files. Example:
You can get the pipeline id under "ID" column by running the following command:
You can get the file ids under "ID" column by running the following commands:
Please refer to command help (icav2 [command] --help) to determine available flags to filter output of above commands if necessary. You can also refer to Command Index page for available flags for the icav2 commands.
For more help on uploading data to ICA, please refer to the Data Transfer options page.
$ pipeline-dev import-from-nextflow <repo name e.g. nf-core/demo>
pod annotation: 'scheduler.illumina.com/presetSize', value: 'fpga2-medium'
process foo {
// Assuming that params.compute_size is set to a valid size such as 'standard-small', 'standard-medium', etc.
pod annotation: 'scheduler.illumina.com/presetSize', value: "${params.compute_size}"
}
// Set the default pod
pod = [
annotation: 'scheduler.illumina.com/presetSize',
value : 'standard-small'
]
withName: 'big_memory_process' {
pod = [
annotation: 'scheduler.illumina.com/presetSize',
value : 'himem-large'
]
}
// Use an FPGA2 instance for dragen processes
withLabel: 'dragen' {
pod = [
annotation: 'scheduler.illumina.com/presetSize',
value : 'fpga2-medium'
]
}
process foo {
pod annotation: 'scheduler.illumina.com/lifecycle', value: "economy"
}
{
"$defs": {
"input_output_options": {
"title": "Input/output options",
"properties": {
"input_file": {
"description": "Input file to compress",
"help_text": "The file that will get compressed",
"type": "string",
"format": "file-path"
},
"compression_level": {
"type": "integer",
"description": "Compression level to use (1-9)",
"default": 5,
"minimum": 1,
"maximum": 9
}
}
}
}
}
$ pipeline-dev project-info --init
pipeline-dev.project-info not found. Let's create it with 2 questions:
Please enter your project name: demo_gzip
Please enter a project description: Bench gzip demo
pipeline-dev deploy-as-flow-pipeline
pipeline-dev launch-validation-in-flow
/data/demo $ pipeline-dev launch-validation-in-flow
pipelineld: 331f209d-2a72-48cd-aa69-070142f57f73
Getting Analysis Storage Id
Launching as ICA Flow Analysis...
ICA Analysis created:
- Name: Test demo_gzip
- Id: 17106efc-7884-4121-a66d-b551a782b620
- Url: https://stage.v2.stratus.illumina.com/ica/projects/1873043/analyses/17106efc-7884-4121-a66d-b551a782620
process iwantstandardsmallresources {
cpus 2
memory '8 GB'
...
# Enter the project context
icav2 enter docs
# Upload the container image to the root directory (/) of the project
icav2 projectdata upload cont_rnaseq.tar /
The main Bundles screen has two tabs: My Bundles and Entitled Bundles. The My Bundles tab shows all the bundles that you are a member of. This tab is where most of your interactions with bundles occur. The Entitled Bundles tab shows the bundles that have been specially created by Illumina or other organizations and shared with you to use in your projects. See Access and Use an Entitled Bundle.
Some bundles come with additional restrictions such as disabling bench access or internet access when running pipelines to protect the data contained in them. When you link these bundles, the restrictions will be enforced on your project. Unlinking the bundle will not remove the restrictions.
As of ICA v.2.29, the content in bundles is linked in such a way that any updates to a bundle are automatically propagated to the projects which have that bundle linked.
If you have created bundle links in ICA versions prior to ICA v2.29 and want to switch them over to links with dynamic updates, you need to unlink and relink them.
Linking an Existing Bundle to a Project
From the main navigation page, select Projects > your_project > Project Settings > Details.
Click the Edit button at the top of the Details page.
Click the + button, under Linked bundles.
Click on the desired bundle, then click the +Link Bundles button.
Click Save.
The assets included in the bundle will now be available in the respective pages within the Project (e.g. Data and Pipelines pages). Any updates to the assets will be automatically available in the destination project.
Click the Edit button at the top of the Details page.
Click the (-) button, next to the linked bundle you wish to remove.
Bundles and projects have to be in the same region in order to be linked. Otherwise, the error The bundle is in a different region than the project so it's not eligible for linking will be displayed.
The owning tenant of a project must have access to a bundle if you want to link that bundle to the project. You do not carry your access to a bundle over if you are invited to projects of other tenants.
When linking a bundle which includes base to a project that does not have base enabled, there are two possibilities:
Base is not allowed due to entitlements: The bundle will be linked and you will be given access to the data, pipelines, samples,... but you will not see the base tables in your project.
Base is allowed, but not yet enabled for the project. The bundle will be linked and you will be given access to the data, pipelines, samples,...but you will not see the base tables in your project and base remains disabled until you enable it.
You can not unlink bundles which were linked by external applications
Create a New Bundle
To create a new bundle and configure its settings, do as follows.
From the main navigation, select Bundles.
Select + Create .
Enter a unique name for the bundle.
From the Region drop-down list, select where the assets for this bundle should be stored.
Set the status of the bundle. When the status of a bundle changes, it cannot be reverted to a draft or released state.
Draft—The bundle can be edited.
Released—The bundle is released. Technically, you can still edit bundle information and add assets to the bundle, but should refrain from doing so.
[optional] Configure the following settings.
Categories—Select an existing category or enter a new one.
Short Description—Enter a description for the bundle.
Enter a release version for the bundle and optionally enter a description for the version.
[Optional] Links can be added with a display name (max 100 chars) and URL (max 2048 chars).
Homepage
License
[Optional] Enter any information you would like to distribute with the bundle in the Documentation section.
Select Save.
There is no option to delete bundles, they must be deprecated instead.
To cancel creating a bundle, select Bundles from the navigation at the top of the screen to return to your bundles overview.
Edit an Existing Bundle
To make changes to a bundle:
From the main navigation, select Bundles.
Select a bundle.
Select Edit.
Modify the bundle information and documentation as needed.
Select Save.
When the changes are saved, they also become available in all projects that have this bundle linked.
Adding Assets to a Bundle
To add assets to a bundle:
Select a bundle.
On the left-hand side, select the type of asset (such as Flow > pipelines, Base > Tables or Bench > Docker Images) you want to add to the bundle.
Select link to add assets to the bundle.
Select the assets and confirm with the link button..
Assets must meet the following requirements before they can be added to a bundle:
For Samples and Data, the project the asset belongs to must have data sharing enabled.
The region of the project containing the asset must match the region of the bundle.
You must have permission to access the project containing the asset.
Pipelines and tools need to be in released status.
must be available in a complete state.
When you link folders to a bundle, a warning is displayed indicating that, depending on the size of the folder, linking may take considerable time. The linking process will run in the background and the progress can be monitored on the Bundles > your_bundle > activity > Batch Jobs screen. To see more details and the progress, double-click the batch job and then double-click the individual item. This will show how many individual files are already linked.
You can not add the same asset twice to a bundle. Once added, the asset will no longer appear in the asset selection list.
You need to be in list view in order to unlink items from a bundle. Select the item and choose the unlink action at the top of the screen.
Which batch jobs are visible as activity depends on the user role.
Create a New Bundle Version
When creating a new bundle version, you can only add assets to the bundle. You cannot remove existing assets from a bundle when creating a new version. If you need to remove assets from a bundle, it is recommended that you create a new bundle. All users wich currently have access to a bundle will automatically have access to the new version as well.
From the main navigation, select Bundles.
Select a bundle.
Select the + Create new Version button.
Make updates as needed and update the version number.
Select Save.
When you create a new version of a bundle, it will replace the old version in your list. To see the old version, open your new bundle and look at Bundles > your_bundle > Details > Versioning. There you can open the previous version which is contained in your new version.
Assets such as data which were added in a previous version of your bundle will be marked in green, while new content will be black.
Add Terms of Use to a Bundle
From the main navigation, Select Bundles > your_bundle > Bundle Settings > Legal from the left hand navigation.
To add Terms of Use to a Bundle, do as follows:
Select + Create New Version.
Use the editor to define Terms of Use for the selected bundle.
Click Save.
[Optional] Require acceptance by clicking the checkbox next to Acceptance required.
Acceptance required will prompt a user to accept the Terms of Use before being able to use a bundle or add the bundle to a project.
To edit the Terms of Use, repeat Steps 1-3 and use a unique version name. If you select acceptance required, you can choose to keep the acceptance status as is or require users to reaccept the terms of use. When reacceptance is required, users need to reaccept the terms in order continue using this bundle in their pipelines. This is indicated when they want to enter projects which use this bundle.
Collaborating on a Bundle
If you want to collaborate with other people on creating a bundle and managing the assets in the bundle, you can add users to your bundle and set their permissions. You use this to create a bundle together, not to use the bundle in your projects.
From the main navigation, select Bundles > your_bundle > Bundle Settings > Team.
To invite a user to collaborate on the bundle, do as follows.
To add a user from your tenant, select Someone of your tenant and select a user from the drop-down list.
To add a user by their email address, select By email and enter their email address.
To add all the users of an entire workgroup, select Add workgroup and select a workgroup from the drop-down list.
Select the Bundle Role drop-down list and choose a role for the user or workgroup. This role defines the ability of the user or workgroup to view or edit bundle settings.
Viewer: view content without editing rights.
Contributor: view bundle content and link/unlink assets.
Repeat as needed to add more users.
Users are not officially added to the bundle until they accept the invitation.
To change the permissions role for a user, select the Bundle Role drop-down list for the user and select a new role.
To revoke bundle permissions from a user, select the trash icon for the user.
Select Save Changes.
Sharing a Bundle
Once you have finalized your bundle and added all assets and legal requirements, you can share your bundle with other tenants to use it in their projects.
Your bundle must be in released status to prevent it from being updated while it is shared.
Go to Bundles > your_bundle > Edit > Details > Bundle status and set it to Released.
Save the change.
Once the bundle is released, you can share it. Invitations are sent to an individual email address, however access is granted and extended to all users and all workgroups inside that tenant.
Go to Bundles > your_bundle > Bundle Settings > Share.
Click Invite and enter the email address of the person you want to share the bundle with. They will receive an email from which they can accept or reject the invitation to use the bundle. The invitation will show the bundle name, description and owner. The link in the invite can only be used once.
Do not to create duplicate entries. You can only use one user/tenant combination per bundle.
You can follow up on the status of the invitation on the Bundles > your_bundle > Bundle Settings > Share page.
If they reject the bundle, the rejection date will be shown.
To re-invite that person again later on, select their email address in the list and choose Remove. You can then create a new invitation. If you do not remove the old entry before sending a new invitation, they will be unable to accept and get an error message stating that the user and bundle combination must be unique. They can also not re-use an invitation once it has been accepted or declined.
If they accept the bundle, the acceptance date will be shown. They will in turn see the bundle under Bundles > Entitled bundles.
To remove access, select their email address in the list and choose Remove.
Entitled Bundles
Entitled bundles are bundles created by Illumina or third parties for you to use in your projects. Entitled bundles can already be part of your tenant when it is part of your subscription. You can see your entitled bundles at Bundles > Entitled Bundles.
To use your shared entitled bundle, add the bundle to your project via Project Linking (Projects > your_project > Data > Manage > Link). Content shared via entitled bundles is read-only, so you cannot add or modify the contents of an entitled bundle. If you lose access to an entitled bundle previously shared with you, the bundle is unlinked and you will no longer be able to access its contents.
Select the corresponding option under Projects > your_project > Project Settings > Team > + Add.
Email invites are sent out as soon as you click the save button on the add team member dialog.
Users can accept or reject invites. The status column shows a green checkmark for accept, an orange question mark for users that have not responded and a red x for users that rejected the invite.
Project Owner
The project owner has administrator-level project rights. To change the project owner, select the Edit project owner button at the top right and select the new project owner from the list. This can be done by the current project owner, the tenant administrator or a project administrator of the current project.
Roles
Every user added to the project team will need to have a role assigned for specific categories of functionality in ICA. These categories are:
Project (contains data and tools to execute analysis)
If a user has been added both as member of a workgroup and as individual user, then the individual rights supersede the group rights. This way, you can add all users in a workgroup and change access for individual users when needed, regardless of their workgroup rights.
Upload and Download rights
While the categories will determine most of what a user can do or see, explicit upload and download rights need to be granted for users. Select the checkbox next to Download allowed and Upload allowed when adding a team member.
Upload and download rights are independent of the assigned role. A user with only viewer rights will still be able to perform uploads and downloads if their upload and download rights are not disabled. Likewise, an administrator can only perform uploads and downloads if their upload and download rights are enabled.
Project Access
The sections below describe the roles and their allowed actions.
No Access
Data Provider
Viewer
Contributor
Administrator
Create a Connector
x
x
x
x
View project resources
Flow Access
No Access
Viewer
Contributor
View analyses results
x
x
Create analyses
x
Create pipelines and tools
Base Access
No Access
Viewer
Contributor
View table records
x
x
Click on links in table
x
x
Create queries
x
Bench Access
No Access
Contributor
Administrator
Execute a notebook
x
x
Start/Stop Workspace
x
x
Create/Delete/Modify workspaces
Parse configuration files and the Nextflow scripts (main.nf, processes, subprocesses, modules) of a pipeline and update the configuration of the pipeline with pod directives to tell ICA what compute instance to run
Strips out parameters that ICA utilizes for pipeline orchestration
Migrates manifest closure to conf/base.ica.config file
Ensures that docker is enabled
Adds workflow.onError to aid troubleshooting
Modifies the processes that reference scripts and tools in the bin/ folder of a pipeline's projectDir, so that when ICA orchestrates your Nextflow pipeline, it can find and properly execute your pipeline process
Generates parameter XML file based on nextflow_schema.json, nextflow.config, conf/ `- Take a look at this to understand a bit more of what's done with the XML, as you may want to make further edits to this file for better usability
Additional edits to ensure your pipeline runs more smoothly on ICA
These scripts are provided to help running Nextflow pipelines in ICA, but they are not an official Illumina product.
nextflow-to-icav2-config
Some examples of Nextflow pipelines that have been lifted over with this repo can be found here.
Some additional examples of ICA-ported Nextflow pipelines are here.
Some additional repos that can help with your ICA experience can be found below:
Local testing your Nextflow pipeline after using these scripts
This naive wrapper will allow you to test your main.nf script. If you have a Nextflow pipeline that is more nf-core like (i.e. where you may have several subprocesses and module files), this script may be more appropriate. Any and all comments are welcome.
ICA concepts to better understand ICA liftover of Nextflow pipelines
Nextflow pipelines on ICA are orchestrated by kubernetes and require a JSON or XML parameters file containing data inputs (i.e. files + folders) and other string-based options for all configurable parameters to properly be passed from ICA to your Nextflow pipelines
Nextflow processes need to contain a reference to a container --- a Docker image that will run that specific process
Nextflow processes will need a pod annotation specified for ICA to know what instance type to run the process.
A table of instance types and the associated CPU + Memory specs can be found under a table named Compute Types
These scripts have been made to be compatible with nf-core pipelines, so you may find the concepts from the documentation here a better starting point.
Using these scripts
The scripts mentioned below can be run in a docker image keng404/nextflow-to-icav2-config:0.0.3
This has:
nf-core installed
All Rscripts in this repo with relevant R libraries installed
The ICA CLI installed, to allow for pipeline creation and CLI templates to request pipeline runs after the pipeline is created in ICA
You'll likely need to run the image with a docker command like this for you to be able to run git commands within the container:
where pwd is your $HOME folder.
As this is an unofficial developer tool to help develop Nextflow pipelines to run on ICA, you may encounter some syntax bugs that can get introduced in your Nextflow code. To help resolve these, please run the steps as described below and then open these files in VisualStudio Code with the Nextflow plugin installed. You may also need to run smoke tests on your code to identify syntax errors you might not catch upon first glance.
Prerequitsites
STEP 0 Github credentials
STEP 1 [OPTIONAL] : create JSON of nf-core pipeline metadata or specify pipeline of interest
If you have a specific pipeline from Github, you can skip this statement below.
You'll first need to download the python module from nf-core via a pip install nf-core command. Then you can use nf-core list --json to return a JSON metadata file containing current pipelines in the nf-core repository.
You can choose which pipelines to git clone, but as a convenience, the wrapper nf-core.conversion_wrapper.R will perform a git pull, parse nextflow_schema.json files and generate parameter XML files, and then read configuration and Nextflow scripts and make some initial modifications for ICA development. Lastly, these pipelines are created in an ICA project of your choosing, so you will need to generate and download an API key from the ICA domain of your choosing.
STEP 2: Obtain API key file
Next, you'll need an API key file for ICA that can be generated using the instructions here.
STEP 3: Create a project in ICA
Finally, you'll need to create a project in ICA. You can do this via the CLI and API, but you should be able to follow these instructions to create a project via the ICA GUI.
STEP 4: Download and configure the ICA CLI (see STEP 2):
A table of all CLI releases for mac, linux, and windows can be found here.
The Project view should be the default view after logging into your private domain (https://my_domain.login.illumina.com) and clicking on your ICA 'card' ( This will redirect you to https://illumina.ica.com/ica).
Let's do some liftovers
GIT_HUB_URL can be specified to grab pipeline code from github. If you intend to liftover anything in the master branch, your GIT_HUB_URL might look like https://github.com/keng404/my_pipeline. If there is a specific release tag you intend to use, you can use the convention https://github.com/keng404/my_pipeline:my_tag.
Alternatively, if you have a local copy/version of a Nextflow pipeline you'd like to convert and use in ICA, you can use the --pipeline-dirs argument to specify this.
In summary, you will need the following prerequisites, either to run the wrapper referenced above or to carry out individual steps below.
git clone nf-core pipelines of interest
Install the python module nf-core and create a JSON file using the command line nf-core list --json > {PIPELINE_JSON_FILE}
Detailed step-by-step breakdown of what nf-core.conversion_wrapper.R does for each Nextflow pipeline
Step 1: Generate an XML file from nf-core pipeline (your pipeline has a nextflow_schema.json)
A Nextflow schema JSON is generated by nf-core's python library nf-core
nf-core can be installed via a pip install nf-core command
Step 2: Create a nextflow.config and a base config file so that it is compatible with ICA.
This script will update your configuration files so that it integrates better with ICA. The flag --is-simple-config will create a base config file from a template. This flag will also be active if no arguments are supplied to --base-config-files.
Step 3: Add helper-debug code and other modifications to your Nextflow pipeline
This step adds some updates to your module scripts to allow for easier troubleshooting (i.e. copy work folder back to ICA if an analysis fails). It also allows for ICA's orchestration of your Nextflow pipeline to properly handle any script/binary in your bin/ folder of your pipeline $projectDir.
Step 4: Update XML to add parameter options --- if your pipeline uses/could use iGenomes
You may have to edit your {PARAMETERS_XML} file if these edits are unnecessary.
Step 5: Sanity check your pipeline code to see if it is valid prior to uploading it into ICA
Step 6: Create a pipeline in ICA by using the following helper script nf-core.create_ica_pipeline.R
Developer mode --- if you plan to develop or modify a pipeline in ICA
Add the flag --developer-mode to the command line above if you have custom groovy libraries or modules files referenced in your pipeline. When this flag is specified, the script will upload these files and directories to ICA and update the parameters XML file to allow you to specify directories under the parameters project_dir and files under input_files. This will ensure that these files and directories will be placed in the $workflow.launchDir when the pipeline is invoked.
How to run a pipeline in ICA via CLI
As a convenience, you can also get a templated CLI command to help run a pipeline (i.e. submit a pipeline request) in ICA via the following:
There will be a corresponding JSON file (i.e. a file with a file extension *ICAv2_CLI_template.json) that saves these values that one could modify and configure to build out templates or launch the specific pipeline run you desire. You can specify the name of this JSON file with the parameter --output-json.
Once you modify this file, you can use --template-json and specify this file to create the CLI you can use to launch your pipeline.
If you have a previously successful analysis with your pipeline, you may find this approach more useful.
Where possible, these scripts search for config files that refer to a test (i.e. test.config,test_full.config,test*config) and create a boolean parameter params.ica_smoke_test that can be toggled on/off as a sanity check that the pipeline works as intended. By default, this parameter is set to false.
When set to true, these test config files are loaded in your main nextflow.config.
select value from list. Enter the values in the options field which appears when you have selected enumeration type.
Field Group
Groups fields. Once you have chosen this, the +Add group field becomes available to add fields to this group.
Bring Your Own Bench Image
Bench images are Docker containers tailored to run in ICA with the necessary permissions, configuration and resources. For more information of Docker images, please refer to https://docs.docker.com/reference/dockerfile/
The following steps are needed to get your bench image running in ICA.
Bring Your Own Bench Image Steps
Requirements
You need to have Docker installed in order to build your images.
For your Docker bench image to work in ICA, they must run on Linux X86 architecture, have the correct user id and initialization script in the Docker file.
For easy reference, you can find examples of preconfigured Bench images on the which you can copy to your local machine and edit to suit your needs.
Bench-console provides an example to build a minimal image compatible with ICA Bench to run a SSH Daemon.
Bench-web provides an example to build a minimal image compatible with ICA Bench to run a Web Daemon.
Bench-rstudio provides an example to build a minimal image compatible with ICA Bench to run a rStudio Open Source.
Scripts
The following scripts must be part of your Docker bench image. Please refer to the examples from the for more details.
Init Script (Dockerfile)
This script copies the ica_start.sh file which takes care of the Initialization and termination of your workspace to the location in your project from where it can be started by ICA when you request to start your workspace.
User (Dockerfile)
The user settings must be set up so that bench runs with UID 1000.
Shutdown Script (ica_start.sh)
To do a clean shutdown, you can capture the sigterm which is transmitted 30 seconds before the workspace is terminated.
Building a Bench Image
Once you have Docker installed and completed the configuration of your Docker files, you can build your bench image.
Open the command prompt on your machine.
Navigate to the root folder of your Docker files.
Execute docker build -f Dockerfile -t mybenchimage:0.0.1 . with mybenchimage being the name you want to give to your image and 0.0.1 replaced with the version number which you want your bench image to be. For more information on this command, see
If you want to build on a mac with Apple Silicon, then the build command is docker buildx build --platform linux/amd64 -f Dockerfile -t mybenchimage:0.0.1 .
Upload Your Docker Image to ICA
Open ICA and log in.
Go to Projects > your_project > Data.
For small Docker images, upload the docker image file which you generated in the previous step. For large Docker images use the to better performance and reliability to import the Docker image.
Start Your Bench Image
Navigate to Projects > your_project > Bench > Workspaces.
Create a new workspace with + Create Workspace or edit an existing workspace.
Fill in the bench workspace details according to .
Access Bench Image
Once your bench image has been started, you can access it via console, web or both, depending on your configuration.
Web access (HTTP) is done from either Projects > your_project > Bench > Workspaces > your_Workspace > Access tab or from the link provided at provided in your running workspace at Projects > your_project > Bench > Workspaces > your_Workspace > Details tab > Access section.
Console access (SSH) is performed from your command prompt by going to the path provided in your running workspace at Projects > your_project > Bench > Workspaces > your_Workspace > Details tab > Access section.
The password needed for SSH access is any one of your personal
Command-line Interface
To execute , your workspace needs a way to run them such as the inclusion of an SSH daemon, be it integrated into your web access image or into your console access. There is no need to download the workspace command-line interface, you can run it from within the workspace.
Restrictions
Root User
The bench image will be instantiated as a container which will be forcedly started as user with UID 1000 and GID 100.
You cannot elevate your permissions in a running workspace.
Do not run containers as root as this is bad security practice.
Read-only Root Filesystem
Only the following folders are writeable:
/data
/tmp
All other folders are mounted as read-only.
Network Access
For inbound access, the following ports on the container are publicly exposed, depending on the selection made at startup.
Web: TCP/8888
Console: TCP/2222
For outbound access, a workspace can be started in two modes:
Public: Access to public IP’s is allowed using TCP protocol.
Restricted: Access to list of URLs are allowed.
Context
Environment Variables
At runtime, the following Bench-specific environment variables are made available to the workspace instantiated from the Bench image.
Name
Description
Example Values
Configuration Files
Following files and folders will be provided to the workspace and made accessible for reading at runtime.
Name
Description
Software Files
At runtime, ICA-related software will automatically be made available at /data/.software in read-only mode.
New versions of ICA software will be made available after a restart of your workspace.
Important Folders
Name
Description
Bench Lifecycle
Workspace Lifecycle
When a bench workspace is instantiated from your selected bench image, the following script is invoked: /usr/local/bin/ica_start.sh
This script needs to be available and executable otherwise your workspace will not boot.
This script is the main process in your running workspace and cannot run to completion as it will stop the workspace and instantiate a restart (see ).
This script can be used to invoke other scripts.
When you stop a workspace, a TERM signal is sent to the main process in your bench workspace. You can trap this signal to handle the stop gracefully (see and shut down child processes of the main process. The workspace will be forcedly shut down after 30 seconds if your main process hasn’t stopped within the given period.
Troubleshooting
Build Argument
If you get the error "docker buildx build" requires exactly 1 argument when trying to build your docker image, then a possible cause is missing the last . of the command.
Server Connection Error
When you stop the workspace when users are still actively using it, they will receive a message showing a Server Connection Error.
Tables
All tables created within Base are gathered on the Projects > your_project > Base > Tables page. New tables can be created and existing tables can be updated or deleted here.
Create a new Table
To create a new table, click Projects > your_project > Base > Tables > +Create. Tables can be created from scratch or from a template that was previously saved. Views on data from Illumina hardware and processes are selected with the option Import from catalogue.
If you make a mistake in the order of columns when creating your table, then as long as you have not saved your table, you can switch to Edit definition to change the column order. The text editor can swap or move columns whereas the built-in editor can only delete columns or add columns to the end of the sequence. When editing in text mode, it is best practice to copy the content of the text editor to a notepad before you make changes because a corrupted syntax will result in the text being wiped or reverted when switching between text and non-text mode.
Once a table is saved it is no longer possible to edit the schema, only new fields can be added. The workaround is switching to text mode, copying the schema of the table to which you want to make modifications and paste it into a new empty table where the necessary changes can be made before saving.
Once created, do not try to modify your table column layout via the Query module as even though you can execute ALTER TABLE commands, the definitions and syntax of the table will go out of sync resulting in processing issues.
Be careful when naming tables when you want to use them in . Table names have to be unique per bundle, so no two tables with the same name can be part of the same bundle.
Empty Table
To create a table from scratch, complete the fields listed below and click the Save button. Once saved, a job will be created to create the table. To view table creation progress, navigate to the Activity page.
Table information
The table name is a required field and must be unique. The first character of the table must be a letter followed by letters, numbers or underscores. The description is optional.
References
Including or excluding references can be done by checking or un-checking the Include reference checkbox. These reference fields are not shown on the table creation page, but are added to the schema definition, which is visible after creating the table (Projects > your_project > Base > Tables > your_table > Schema definition). By including references, additional columns will be added to the which can contain references to the data on the platform:
data_reference: reference to the data element in the Illumina platform from which the record originates
data_name: original name of the data element in the Illumina platform from which the record originates
sample_reference: reference to the sample in the Illumina platform from which the record originates
Schema
In an empty table, you can create a schema by adding a field with the +Add button for each column of the table and defining it. At any time during the creation process, it is possible to switch to the edit definition mode and back. The definition mode shows the JSON code, whereas the original view shows the fields in a table.
Each field requires:
a unique name (*1) with optional description.
a type
String – collection of characters
(*1) Do not use reserved Snowflake keywords such as left, right, sample, select, table,... (https://docs.snowflake.com/en/sql-reference/reserved-keywords) for your schema name as this will lead to SQL compilation errors.
(*2) Float values will be exported differently depending on the output format. For example JSON will use scientific notation so verify that your consecutive processing methods support this.
(*3) Defining the precision when creating tables with SQL is not supported as this will result in rounding issues.
From template
Users can create their own template by making a table which is turned into a template at Projects > your_project > Base > Tables > your_table > Manage (top right) > Save as template.
If a template is created and available/active, it is possible to create a new table based on this template. The table information and references follow the rules of the empty table but in this case the schema will be pre-filled. It is possible to still edit the schema that is based on the template.
Table information
Table status
The status of a table can be found at Projects > your_project > Base > Tables. The possible statuses are:
Available: Ready to be used, both with or without data
Pending: The system is still processing the table, there is probably a process running to fill the table with data
Deleted: The table is deleted functionally; it still exists and can be shown in the list again by clicking the Show deleted tables/views button
Additional Considerations
Tables created from empty data or from a template are available faster.
When copying a table with data, it can remain in a Pending for longer periods of time.
Clicking on the page's refresh button will update the list.
Table details
For any available table, the following details are shown:
Table information: Name, description, status, number of records and data size.
The data size of tables with the same layout and content may vary slightly, depending on when and how the data was written by Snowflake.
Definition: An overview of the table schema, also available in text. Fields can be added to the schema but not deleted. Tip for deleting fields: copy the schema as text and paste in a new empty table where the schema is still editable.
Preview: A preview of the table for the 50 first rows (when data is uploaded into the table). Select show details to see record details.
Source Data: the files that are currently uploaded into the table. You can see the Load Status of the files which can be Prepare Started
Table actions
From within the details of a table it is possible to perform the following actions from the Manage menu (top right) of the table:
Edit: Add fields to the table and change the table description.
Copy: Create a copy from this table in the same or a different project. In order to copy to another project, data sharing of the original project should be enabled in the details of this project. The user also has to have access to both original and target project.
Export as file: Export this table as a CSV or JSON file. The exported file can be found in a project where the user has the access to download it.
Manually importing data to your Table
To manually add data to your table, go to Projects > your_project > Base > Tables > your_table > Manage (top right) > Add Data
Data selection
The data selection screen will show options to select the structure as CSV (comma-separated), TSV (tab-separated) or JSON (JavaScript Object Notation) and the location of your source data. In the first step, you select the data format and the files containing the data.
Data format (required): Select the format of the data which you want to import.
Write preference: Define if data can be written to the table only when the table is empty, if the data should be appended to the table or if the table should be overwritten.
Delimiter: Which delimiter is used in the delimiter separated file. If the required delimiter is not comma, tab or pipe, select custom and define the custom delimiter.
Most of the advanced options are legacy functions and should not be used. The only exceptions are
Encoding: Select if the encoding is UTF-8 (any Unicode character) or ISO-8859-1 (first 256 Unicode characters).
Ignore unknown values: This applies to CSV-formatted files. You can use this function to handle optional fields without separators, provided that the missing fields are located at the end of the row. Otherwise, the parser can not detect the missing separator and will shift fields to the left, resulting in errors.
If headers are used: The columns that have matching fields are loaded, those that have no matching fields are loaded with NULL and remaining fields are discarded.
Data import progress
To see the status of your data import, go to Projects > your_project > Activity > Base Jobs where you will see a job of type Prepare Data which will have succeeded or failed. If it has failed, you can see the error message and details by double-clicking the base job. You can then take corrective actions if the input mismatched with the table design and try to run the import again (with a new copy of the file as each input file can only be used once)
If you need to cancel the import, you can do so while it is scheduled by navigating to the Base Jobs inventory and selecting the job followed by Abort.
List of table data sources
To see which data has been used to populate your table go to Projects > your_project > Base > Tables > your_table > Source Data. This will list all the source data files, including those that failed to be imported. You can not use these files anymore to import again to prevent double entries. The load status will remain empty while the data is being processed and be set to load succeeded or failed after loading completes.
How to load array data in Base
Base Table schema definitions do not include an array type, but arrays can be ingested using either the Repeated mode for arrays containing a single type (ie, String), or the Variant type.
Parsing nested JSON data
If you have a nested JSON structure, you can import it into individual fields of your table.
For example, if your JSON nested structure looks like the above and you want to get it imported into a table with a, b and c having integers as values, you need to create a matching table. This can be done either or via the sql command CREATE OR REPLACE TABLE json_data ( a INTEGER, b INTEGER, c INTEGER);
Format your JSON data to have single lines per structure.
Finally, create a to import your data or perform a .
The resulting table will look like this:
#
A
B
C
Cohort Analysis
Cohort Analysis
From the Cohorts menu in the left hand navigation, select a cohort created in Create Cohort to begin a cohort analysis.
Query Details
The query details can be accessed by clicking the triangle next to Show Query Details. The query details displays the selections used to create a cohort. The selections can be edited by clicking the pencil icon in the top right.
Charts
Charts will be open by default. If not, click Show Charts.
Use the gear icon in the top-right to change viewable chart settings.
There are four charts available to view summary counts of attributes within a cohort as histogram plots.
Single Subject Timeline View:
Display time-stamped events and observations for a single subject on a timeline.The timeline view is visible to only those subjects which have time-series data.
Below attributes are displayed in timeline view: • Diagnosed and Self-Reported Diseases: • Start and end dates • Progression vs. remission • Medication and Other Treatments: • Prescribed and self-medicated • Start date, end date, and dosage at every time point
The timeline utilizes age (at diagnosis, at event, at measurement) as the x-axis and attribute name as the y-axis. If the birthdate is not recorded for a subject, the user can now switch to Date to visualize data.
Measurement Section: A summary of measurements (without values) is displayed under the section titled "Measurements and Laboratory Values Available." Users can click a link to access the Timeline View for detailed results.
Drug Section: The "Drug Name" section lists drug names without repeating the header "Drug Name" for each entry.
Subjects
By Default, the Subjects tab is displayed.
The Subjects tab with a list of all subjects matching your criteria is displayed below Charts with a link to each Subject by ID and other high-level information. By clicking a subject ID, you will be brought to the data collected at the Subject level.
To Exclude specific subjects from subsequent analysis, such as marker frequencies or gene-level aggregated views, you can uncheck the box at the beginning of each row in the subject list. You will then be prompted to save any exclusion(s).
You can Export the list of subjects either to your ICA Project's data folder or to your local disk as a TSV file for subsequent use. Any export will omit subjects that you excluded after you saved those changes. For more information, see at the bottom of this page.
Remove a Subject
Specific subjects can be removed from a Cohort.
Select the Subjects tab.
Subjects in the Cohort, by default are checked.
Structural variant aggregation: Marker Frequency analysis
For each individual cohort, display a table of all observed SVs that overlap with a given gene.
Marker Frequency
Click the Marker Frequency tab, then click the Gene Expression tab.
Down-regulated genes are displayed in blue and Up-regulated genes are displayed in red.
A frequency in the Cohort is displayed and the Matching number/Total is also displayed in the chart.
Genes
You are brought to the Gene tab under the Gene Summary sub-tab.
Select a Gene by typing the gene name into the Search Genes text box.
A Gene Summary
Correlation
For every correlation, subjects contained in each count can be viewed by selecting the count on the bubble or the count on the X-axis and Y-axis.
Clinical vs. Clinical Attribute Comparison – Bubble Plot
Click the Correlation Tab.
In X-axis category, select Clinical.
In X-axis Attribute
Molecular vs. Molecular Attribute Comparison – Bubble Plot
To see a breakdown of Somatic Mutations vs. RNA Expression levels perform the following steps:
Note this comparison is for a Cancer case.
Click the Correlation Tab.
In X-axis category, select Somatic.
In X-axis Attribute
Clinical vs. Molecular Attribute Comparison – Bubble Plot
Note this comparison is for a Cancer case.
Click the Correlation Tab.
In X-axis category, select Somatic.
In X-axis Attribute
Molecular Breakdown
Click the Molecular Breakdown Tab.
In Enter a clinical Attribute, and select a clinical attribute.
In Enter a gene, select a gene by typing a gene name.
Note: for each of the aforementioned bubble plots, you can view the list of subjects by following the link under each subject count associated with an individual bubble or axis label. This will take you to the list of subjects view, see above.
CNV
If there is Copy Number Variant data in the cohort:
Click the CNV tab.
A graph will show CNV a Sample Percentage on the Y-axis and Chromosomes on the X-axis.
Any value above Zero is a copy number gain, and any value below Zero is a copy number loss.
Subject Export for Analysis in ICA Bench
ICA allows for integrated analysis in a computation workspace. You can export your cohorts definitions and, in combination with molecular data in your ICA Project Data, perform, for example, a GWAS analysis.
Confirm the VCF data for your analysis is in ICA Project Data.
From within your ICA Project, Start a Bench Workspace -- See for more details.
Navigate back to ICA Cohorts.
Bench ICA Python Library
This tutorial demonstrates how to use the ICA Python library packaged with the JupyterLab image for Bench Workspaces.
See the JupyterLab documentation for details about the JupyterLab docker image provided by Illumina.
The tutorial will show how authentication to the ICA API works and how to search, upload, download and delete data from a project into a Bench Workspace. The python code snippets are written for compatibility with a Jupyter Notebook.
Python modules
Navigate to Bench > Workspaces and click Enable to enable workspaces. Select +New Workspace to create a new workspace. Fill in the required details and select JupyterLab for the Docker image. Click Save and Start to open the workspace. The following snippets of code can be pasted into the workspace you've created.
This snippet defines the required python modules for this tutorial:
Authentication
This snippet shows how to authenticate using the following methods:
ICA Username & Password
ICA API Token
Data Operations
These snippets show how to manage data in a project. Operations shown are:
Create a Project Data API client instance
List all data in a project
Create a data element in a project
List Data
Create Data
Upload Data
Download Data
Search for Data
Delete Data
Base Operations
These snippets show how to get a connection to a base database and run an example query. Operations shown are:
Create a python jdbc connection
Create a table
Insert data into a table
Query the table
Snowflake Python API documentation can be found
This snipppet defines the required python modules for this tutorial:
Get Base Access Credentials
Create a Table
Add Table Record
Query Table
Delete Table
Workspaces
The main concept in Bench is the Workspace. A workspace is an instance of a Docker image that runs the framework which is defined in the image (for example JupyterLab, R Studio). In this workspace, you can write and run code and graphically represent data. You can use API calls to access data, analyses, Base tables and queries in the platform. Via the command line, R-packages, tools, libraries, IGV browsers, widgets, etc. can be installed.
You can create multiple workspaces within a project and each workspace runs on an individual node and is available in different resource sizes. Each node has local storage capacity, where files and results can be temporarily stored and exported from to be permanently stored in a Project. The size of the storage capacity can range from 1GB – 16TB.
For each workspace, you can see the status by the color.
CWL: Scatter-gather Method
In bioinformatics and computational biology, the vast and growing amount of data necessitates methods and tools that can process and analyze data in parallel. This demand gave birth to the scatter-gather approach, an essential pattern in creating pipelines that offers efficient data handling and parallel processing capabilities. In this tutorial, we will demonstrate how to create a CWL pipeline utilizing the scatter-gather approach. To this purpose, we will use two widely known tools: and . Given the functionalities of both fastp and multiqc, their combination in a scatter-gather pipeline is incredibly useful. Individual datasets can be scattered across resources for parallel preprocessing with fastp. Subsequently, the outputs from each of these parallel tasks can be gathered and fed into multiqc, generating a consolidated quality report. This method not only accelerates the preprocessing of large datasets but also offers an aggregated perspective on data quality, ensuring that subsequent analyses are built upon a robust foundation.
Creating the tools
XML Input Form
Pipelines defined using the "Code" mode require either an XML-based or JSON-based input form to define the fields shown on the launch view in the user interface (UI). The XML-based input form is defined in the "XML Configuration" tab of the pipeline editing view.
The input form XML must adhere to the input form schema.
Empty Form
During the creation of a Nextflow pipeline the user is given an empty form to fill out.
CWL CLI Pipeline Execution
In this tutorial, we will demonstrate how to create and launch a pipeline using the CWL language using the ICA command line interface (CLI).
Installation
Please refer to for installing ICA CLI.
Data Transfer Options
ICA Connector
The platform provides Connectors to facilitate automation for operations on data (ie, upload, download, linking). The connectors are helpful when you want to sync data between ICA and your local computer or link data between projects in ICA.
Rscript create_cli_templates_from_xml.R --workflow-language {xml or nextflow} --parameters-xml {PATH_TO_PARAMETERS_XML}
Deprecated—The bundle is no longer intended for use. By default, deprecated bundles are hidden on the main Bundles screen (unless non-deprecated versions of the bundle exist). Select "Show deprecated bundles" to show all deprecated bundles. Bundles can not be recovered from deprecated status.
Metadata Model—Select a metadata model to apply to the bundle.
Links
Publications
Administrator: full edit rights of content and configuration.
sample_name: name of the sample in the Illumina platform from which the record originates
pipeline_reference: reference to the pipeline in the Illumina platform from which the record originates
pipeline_name: name of the pipeline in the Illumina platform from which the record originates
execution_reference: reference to the pipeline execution in the Illumina platform from which the record originates
account_reference: reference to the account in the Illumina platform from which the record originates
account_name: name of the account in the Illumina platform from which the record originates
Bytes – raw binary data
Integer – whole numbers
Float – fractional numbers (*2)
Numeric – any number (*3)
Boolean – only options are “true” or “false”
Timestamp - Stores number of (milli)seconds passed since the Unix epoch
Date - Stores date in the format YYYY-MM-DD
Time - Stores time in the format HH:MI:SS
Datetime - Stores date and time information in the format YYYY-MM-DD HH:MI:SS
Record – has a child field
Variant - can store a value of any other type, including OBJECT and ARRAY
a mode
Required - Mandatory field
Nullable - Field is allowed to have no value
Repeated - Multiple values are allowed in this field (will be recognized as array in Snowflake)
,
Prepare Succeeded
or
Prepare Failed
and finally
Load Succeeded
or
Load Failed
.
Save as template: Save the schema or an edited form of it as a template.
Add data: Load additional data into the table manually. This can be done by selecting data files previously uploaded to the project, or by dragging and dropping files directly into the popup window for adding data to the table. It’s also possible to load data into a table manually or automatically via a pre-configured job. This can be done on the Schedule page.
Delete: Delete the table.
Custom delimiter: If a custom delimiter is used in the source data, it must be defined here.
Header rows to skip: The number of consecutive header rows (at the top of the table) to skip.
References: Choose which references must be added to the table.
If no headers are used: The fields are loaded in order of occurrence and trailing missing fields are loaded with NULL, trailing additional fields are discarded.
The input files are specified within a single DataInputs node. An individual input is then specified in a separate DataInput node. A DataInput node contains following attributes:
code: an unique id. Required.
format: specifying the format of the input: FASTA, TXT, JSON, UNKNOWN, etc. Multiple entries are possible: example below. Required.
type: is it a FILE or a DIRECTORY? Multiple entries are not allowed. Required.
required: is this input required for the execution of a pipeline? Required.
multiValue: are multiple files as an input allowed? Required.
dataFilter: TBD. Optional.
Additionally, DataInput has two elements: label for labelling the input and description for a free text description of the input.
Single file input
An example of a single file input which can be in a TXT, CSV, or FASTA format.
Folder as an input
To use a folder as an input the following form is required:
Multiple files as an input
For multiple files, set the attribute multiValue to true. This will make it so the variable is considered to be of type list [], so adapt your pipeline when changing from single value to multiValue.
Settings
Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc. The following hierarchy of nodes must be followed: steps > step > tool > parameter. The parameter node must contain following attributes:
code: unique id. This is the parameter name that is passed to the workflow
minValues: how many values (at least) should be specified for this setting. If this setting is required, minValues should be set to 1.
maxValues: how many values (at most) should be specified for this setting
classification: is this setting specified by the user?
In the code below a string setting with the identifier inp1 is specified.
Examples of the following types of settings are shown in the subsequent sections. Within each type, the value tag can be used to denote a default value in the UI, or can be left blank to have no default. Note that setting a default value has no impact on analyses launched via the API.
Integers
For an integer setting the following schema with an element integerType is to be used. To define an allowed range use the attributes minimumValue and maximumValue.
Options
Options types can be used to designate options from a drop-down list in the UI. The selected option will be passed to the workflow as a string. This currently has no impact when launching from the API, however.
Option types can also be used to specify a boolean, for example
Strings
For a string setting the following schema with an element stringType is to be used.
Booleans
For a boolean setting, booleanType can be used.
Limitations
One known limitation of the schema presented above is the inability to specify a parameter that can be multiple type, e.g. File or String. One way to implement this requirement would be to define two optional parameters: one for File input and the second for String input. At the moment ICA UI doesn't validate whether at least one of these parameters is populated - this check can be done within the pipeline itself.
Below one can find both a main.nf and XML configuration of a generic pipeline with two optional inputs. One can use it as a template to address similar issues. If the file parameter is set, it will be used. If the str parameter is set but file is not, the str parameter will be used. If neither of both is used, the pipeline aborts with an informative error message.
Tutorial project
In this project, we will create two simple tools and build a pipeline that we can run on ICA using CLI. The first tool (tool-fqTOfa.cwl) will convert a FASTQ file to a FASTA file. The second tool(tool-countLines.cwl) will count the number of lines in an input FASTA file. The workflow.cwl will combine the two tools to convert an input FASTQ file to a FASTA file and count the number of lines in the resulting FASTA file.
Following are the two CWL tools and scripts we will use in the project. If you are new to CWL, please refer to the cwl user guide for a better understanding of CWL codes. You will also need the cwltool installed to create these tools and processes. You can find installation instructions on the CWL github page.
tool-fqTOfa.cwl
tool-countLines.cwl
workflow.cwl
we don't specify the Docker image used in both tools. In such a case, the default behaviour is to use public.ecr.aws/docker/library/bash:5 image. This image contains basic functionality (sufficient to execute wc and awk commands).
If you want to use a different public image, you can specify it using requirements tag in cwl file. Assuming you want to use *ubuntu:latest' you need to add
If you want to use a Docker image from the ICA Docker repository, you need the link to AWS ECR from ICA GUI. Double-click on the image name in the Docker repository and copy the URL to the clipboard. Add the URL to dockerPull key.
To add a custom or public docker image to the ICA repository, refer to the Docker Repository.
Authentication
Before you can use ICA CLI, you need to authenticate using the Illumina API key. Follow these instructions to authenticate.
Enter/Create a Project
Either create a project or use an existing project to create a new pipeline. You can create a new project using the icav2 projects create command.
If you do not provide the --region flag, the value defaults to the existing region when there is only one region available. When there is more than one region available, a selection must be made from the available regions at the command prompt. The region input can be determined by calling the icav2 regions list command first.
You can select the project to work on by entering the project using the icav2 projects enter command. Thus, you won't need to specify the project as an argument.
You can also use the icav2 projects list command to determine the names and ids of the project you have access to.
Create a pipeline on ICA
projectpipelines is the root command to perform actions on pipelines in a project. The create command creates a pipeline in the current project.
The parameter file specifies the input with additional parameter settings for each step in the pipeline. In this tutorial, input is a FASTQ file shown inside <dataInput> tag in the parameter file. There aren't any specific settings for the pipeline steps resulting in a parameter file below with an empty <steps> tag. Create a parameter file (parameters.xml) with the following content using a text editor.
The following command creates a pipeline called "cli-tutorial" using the workflow.cwl, tools "tool-fqTOfa.cwl" and "tool-countLines.cwl" and parameter file "parameter.xml" with small storage size.
Once the pipeline is created, you can view it using the list command.
Running the pipeline
Upload data to the project using the icav2 projectdata upload command. Refer to the Data page for advanced data upload features. For this test, we will use a small FASTQ file test.fastq containing the following reads.
The "icav2 projectdata upload" command lets you upload data to ica.
The list command lets you view the uploaded file. Note the ID of the file you want to use with the pipeline.
The icav2 projectpipelines start command initiates the pipeline run. The following command runs the pipeline. Write down the id for exploring the analysis later.
If for some reason your create command fails and needs to rerun, you might get an error (ConstraintViolationException). If so, try your command with a different name.
You can check the status of the run using the icav2 projectanalyses get command.
The pipelines can be run using JSON input type as well. The following is an example of running pipelines using JSON input type. Note that JSON input works only with file-based CWL pipelines (built using code, not a graphical editor in ICA).
Notes
runtime.ram and runtime.cpu
runtime.ram and runtime.cpu values are by default evaluated using the compute environment running the host CWL runner. CommandLineTool Steps within a CWL pipeline run on different compute environments than the host CWL runner, so the valuations of the runtime.ram and runtime.cpu for within the CommandLineTool will not match the runtime environment the tool is running in. The valuation of runtime.ram and runtime.cpu can be overridden by specifying cpuMin and ramMin in the ResourceRequirements for the CommandLineTool.
The ICA CLI upload/download proves beneficial when handling large files/folders, especially in situations where you're operating on a remote server by connecting from your local computer. You can use icav2 projects enter <project-name/id> to set the project context for the CLI to use for the commands when relevant. If the project context is not set, you can supply the additional parameter --project-id <project-id> to specify the project for the command.
Upload Data
Download Data
Note: Because of how S3 manages storage, it doesn't have a concept of folders in the traditional sense. So, if you provide the "folder" ID of an empty "folder", you will not see anything downloaded.
ICA API
Another option to upload data to ICA is via ICA API. This option is helpful where data needs to be transferred via automated scripts. You can use the following two endpoints to upload a file to ICA.
Post - /api/projects/{projectId}/data with the following body which will create a partial file at the desired location and return a dataId for the file to be uploaded. {projectId} is the the project id for the destination project. You can find the projectId in yout projects details page (Project > Details > URN > urn:ilmn:ica:project:projectId#MyProject).
Create data in the project by making the API call below. If you don't already have the API-Key, refer to the instructions on the support page for guidance on generating one.
In the example above, we're generating a partial file named 'tempFile.txt' within a project identified by the project ID '41d3643a-5fd2-4ae3-b7cf-b89b892228be', situated inside a folder with the folder ID 'fol.579eda846f1b4f6e2d1e08db91408069'. You can access project, file, or folder IDs either by logging into the ICA web interface or through the use of the ICA CLI.
The response will look like this:
Retrieve the data/file ID from the response (for instance: fil.b13c782a67e24d364e0f08db9f537987) and employ the following format for the Post request - /api/projects/{projectId}/data/{dataId}:createUploadUrl:
The response will look like this:
Use the URL from the response to upload a file (tempFile.txt) as follows:
AWS CLI
ICA allows you to directly upload/download data from ICA using AWS CLI. It is especially helpful when dealing with an unstable internet connection to upload or download a large amount of data. If the transfer gets interrupted midway, you can employ the sync command to resume the transfer from the point it was stopped.
To connect to ICA storage, you must first download and install AWS CLI on your local system. You will need temporary credentials to configure AWS CLI to access ICA storage. You can generate temporary credentials through the ICA CLI, which can be used to authenticate AWS CLI against ICA. The temporary credentials can be obtained using this ICA API endpoint
Generate temporary credentials
Example cli to generate temporary credentials:
If you are trying to upload data to /cli-upload/ folder, you can get the temporary credentials to access the folder using icav2 projectdata temporarycredentials /cli-upload/. It will produce following output with accessKey, secretKey and sessionToken that you will need to configure AWS CLI to access this folder.
Copy the awsTempCredentials.accessKey, awsTempCredentials.secretKey and awsTempCredentials.sessionToken to build the credentials file: ~/.aws/credentials. It should look something like
Example format for credentials file:
The temporary credentials expire in 36 hours. If the temporary credentials expire before the copy is complete, you can use AWS sync command to resume from where it left off.
Following are a few AWS commands to demonstrate the use. The remote path in the commands below are constructed off of the output of temporarycredentials command in this format: s3://<awsTempCredentials.bucket>/<awsTempCredentials.objectPrefix>
Example AWS commands
You can also write scripts to monitor the progress of your copy operation and regenerate and refresh the temporary credentials before they expire.
Rclone
You can also use rclone for data transfer. Generate temporary credentials between your source and destination projects with the steps described above. You can run rclone config to set keys and tokens to configure rclone with the temporary credentials. You will need to select the advanced edit option when asked to enter the session key.
After completing the configuration, your config file (~/.config/rclone/rclone.conf) should look like this:
Example rclone commands
The option --s3-no-check-bucket skips the verification of the Amazon S3 Bucket properties and permissions.
As rclone performs pre-checks on the files to be transferred, it can take some time before the actual transfer starts.
Using rclone in Bench
In Bench, Read-only mounts offer improved performance over using the data/project/ folder when transferring data because of higher concurrency and more efficient file handling. You create read-only mounts with workspace-ctl data create mount
# Retrieve project ID from the Bench workspace environment
projectId = os.environ['ICA_PROJECT']
# Create a Project Data API client instance
projectDataApiInstance = project_data_api.ProjectDataApi(apiClient)
# List all data in a project
pageOffset = 0
pageSize = 30
try:
projectDataPagedList = projectDataApiInstance.get_project_data_list(project_id = projectId, page_size = str(pageSize), page_offset = str(pageOffset))
totalRecords = projectDataPagedList.total_item_count
while pageOffset*pageSize < totalRecords:
for projectData in projectDataPagedList.items:
print("Path: "+projectData.data.details.path + " - Type: "+projectData.data.details.data_type)
pageOffset = pageOffset + 1
except icav2.ApiException as e:
print("Exception when calling ProjectDataAPIApi->get_project_data_list: %s\n" % e)
# Create data element in a project
data = icav2.model.create_data.CreateData(name="test.txt",data_type = "FILE")
try:
projectData = projectDataApiInstance.create_data_in_project(projectId, create_data=data)
fileId = projectData.data.id
except icav2.ApiException as e:
print("Exception when calling ProjectDataAPIApi->create_data_in_project: %s\n" % e)
## Upload a local file to a data element in a project
# Create a local file in a Bench workspace
filename = '/tmp/'+''.join(random.choice(string.ascii_lowercase) for i in range(10))+".txt"
content = ''.join(random.choice(string.ascii_lowercase) for i in range(100))
f = open(filename, "a")
f.write(content)
f.close()
# Calculate MD5 hash (optional)
localFileHash = md5Hash = hashlib.md5((open(filename, 'rb').read())).hexdigest()
try:
# Get Upload URL
upload = projectDataApiInstance.create_upload_url_for_data(project_id = projectId, data_id = fileId)
# Upload dummy file
files = {'file': open(filename, 'r')}
data = open(filename, 'r').read()
r = requests.put(upload.url , data=data)
except icav2.ApiException as e:
print("Exception when calling ProjectDataAPIApi->create_upload_url_for_data: %s\n" % e)
# Delete local dummy file
os.remove(filename)
## Download a data element from a project
try:
# Get Download URL
download = projectDataApiInstance.create_download_url_for_data(project_id=projectId, data_id=fileId)
# Download file
filename = '/tmp/'+''.join(random.choice(string.ascii_lowercase) for i in range(10))+".txt"
r = requests.get(download.url)
open(filename, 'wb').write(r.content)
# Verify md5 hash
remoteFileHash = hashlib.md5((open(filename, 'rb').read())).hexdigest()
if localFileHash != remoteFileHash:
print("Error: MD5 mismatch")
# Delete local dummy file
os.remove(filename)
except icav2.ApiException as e:
print("Exception when calling ProjectDataAPIApi->create_download_url_for_data: %s\n" % e)
# Search for matching data elements in a project
try:
projectDataPagedList = projectDataApiInstance.get_project_data_list(project_id = projectId, full_text="test.txt")
for projectData in projectDataPagedList.items:
print("Path: " + projectData.data.details.path + " - Name: "+projectData.data.id + " - Type: "+projectData.data.details.data_type)
except icav2.ApiException as e:
print("Exception when calling ProjectDataAPIApi->get_project_data_list: %s\n" % e)
# Delete matching data elements in a project
try:
projectDataPagedList = projectDataApiInstance.get_project_data_list(project_id = projectId, full_text="test.txt")
for projectData in projectDataPagedList.items:
print("Deleting file "+projectData.data.details.path)
projectDataApiInstance.delete_data(project_id = projectId, data_id = projectData.data.id)
except icav2.ApiException as e:
print("Exception %s\n" % e)
# API modules
import icav2
from icav2.api import project_base_api
from icav2.model.problem import Problem
from icav2.model.base_connection import BaseConnection
# Helper modules
import os
import requests
import getpass
import snowflake.connector
# Retrieve project ID from the Bench workspace environment
projectId = os.environ['ICA_PROJECT']
# Create a Project Base API client instance
projectBaseApiInstance = project_base_api.ProjectBaseApi(apiClient)
# Get a Base Access Token
try:
baseConnection = projectBaseApiInstance.create_base_connection_details(project_id = projectId)
except icav2.ApiException as e:
print("Exception when calling ProjectBaseAPIApi->create_base_connection_details: %s\n" % e)
## Create a python jdbc connection
ctx = snowflake.connector.connect(
account=os.environ["ICA_SNOWFLAKE_ACCOUNT"],
authenticator=baseConnection.authenticator,
token=baseConnection.access_token,
database=os.environ["ICA_SNOWFLAKE_DATABASE"],
role=baseConnection.role_name,
warehouse=baseConnection.warehouse_name
)
ctx.cursor().execute("USE "+os.environ["ICA_SNOWFLAKE_DATABASE"])
## Create a Table
tableName = "test_table"
ctx.cursor().execute("CREATE OR REPLACE TABLE " + tableName + "(col1 integer, col2 string)")
## Insert data into a table
ctx.cursor().execute(
"INSERT INTO " + tableName + "(col1, col2) VALUES " +
" (123, 'test string1'), " +
" (456, 'test string2')")
## Query the table
cur = ctx.cursor()
try:
cur.execute("SELECT * FROM "+tableName)
for (col1, col2) in cur:
print('{0}, {1}'.format(col1, col2))
finally:
cur.close()
# Delete the table
ctx.cursor().execute("DROP TABLE " + tableName);
<pd:dataInput code="in" format="TXT, CSV, FASTA" type="FILE" required="true" multiValue="false">
<pd:label>Input file</pd:label>
<pd:description>Input file can be either in TXT, CSV or FASTA format.</pd:description>
</pd:dataInput>
<pd:dataInput code="tumor_fastqs" format="FASTQ" type="FILE" required="false" multiValue="true">
<pd:label>Tumor FASTQs</pd:label>
<pd:description>Tumor FASTQ files to be provided as input. FASTQ files must have "_LXXX" in its filename to denote the lane and "_RX" to denote the read number. If either is omitted, lane 1 and read 1 will be used in the FASTQ list. The tool will automatically write a FASTQ list from all files provided and process each sample in batch in tumor-only mode. However, for tumor-normal mode, only one sample each can be provided.
</pd:description>
</pd:dataInput>
<pd:parameter code="ht_seed_len" minValues="0" maxValues="1" classification="USER">
<pd:label>Seed Length</pd:label>
<pd:description>Initial length in nucleotides of seeds from the reference genome to populate into the hash table. Consult the DRAGEN manual for recommended lengths. Corresponds to DRAGEN argument --ht-seed-len.
</pd:description>
<pd:integerType minimumValue="10" maximumValue="50"/>
<pd:value>21</pd:value>
</pd:parameter>
<pd:parameter code="cnv_segmentation_mode" minValues="0" maxValues="1" classification="USER">
<pd:label>Segmentation Algorithm</pd:label>
<pd:description> DRAGEN implements multiple segmentation algorithms, including the following algorithms, Circular Binary Segmentation (CBS) and Shifting Level Models (SLM).
</pd:description>
<pd:optionsType>
<pd:option>CBS</pd:option>
<pd:option>SLM</pd:option>
<pd:option>HSLM</pd:option>
<pd:option>ASLM</pd:option>
</pd:optionsType>
<pd:value>false</pd:value>
</pd:parameter>
% icav2 projectdata upload test.fastq /
oldFilename= test.fastq en newFilename= test.fastq
bucket= stratus-gds-use1 prefix= 0a488bb2-578b-404a-e09d-08d9e3343b2b/test.fastq
Using: 1 workers to upload 1 files
15:23:32: [0] Uploading /Users/user1/Documents/icav2_validation/for_tutorial/working/test.fastq
15:23:33: [0] Uploaded /Users/user1/Documents/icav2_validation/for_tutorial/working/test.fastq to /test.fastq in 794.511591ms
Finished uploading 1 files in 795.244677ms
% icav2 projectdata list
PATH NAME TYPE STATUS ID OWNER
/test.fastq test.fastq FILE AVAILABLE fil.c23246bd7692499724fe08da020b1014 4b197387-e692-4a78-9304-c7f73ad75e44
> icav2 projectdata temporarycredentials --help
This command fetches temporal AWS and Rclone credentials for a given project-data. If path is given, project id from the flag --project-id is used. If flag not present project is taken from the context
Usage:
icav2 projectdata temporarycredentials [path or data Id] [flags]
Flags:
-h, --help help for temporarycredentials
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
#Copy single file to ICA
> rclone copy file.txt s3-config:stratus-gds-use1/53395234-6b20-4fb1-3587-08db9144d245/cli-upload/ --s3-no-check-bucket
#Sync local folder to ICA
> rclone sync cli-upload s3-config:stratus-gds-use1/53395234-6b20-4fb1-3587-08db9144d245/cli-upload/ --s3-no-check-bucket
These examples come with information on the available parameters.
Once the image has been built, save it as docker tar file with the command docker save mybenchimage:0.0.1 | bzip2 > ../mybenchimage-0.0.1.tar.bz2 The resulting tar file will appear next to the root folder of your docker files.
Select the uploaded image file and perform Manage > Change Format.
From the format list, select DOCKER and save the change.
Go to System Settings > Docker Repository > Create > Image.
Select the uploaded docker image and fill out the other details.
Name: The name by which your docker image will be seen in the list
Version: A version number to keep track of which version you have uploaded. In our example this was 0.0.1
Description: Provide a description explaining what your docker images does or is suited for.
Type: The type of this image is Bench. The Tool type is reserved for tool images.
Cluster compatible: Indicates if this docker images is suited for .
Access: This setting must match the available access options of your Docker image. You can choose web access (HTTP), console access (SSH) or both. What is selected here becomes available on the + New Workspace screen. Enabling an option here which your Docker image does not support, will result in access denied errors when trying to run the workspace.
Regions: If your tenant has access to multiple regions, you can select to which regions to replicate the docker image.
Once the settings are entered, select Save. The creation of the Docker image typically takes between 5 and 30 minutes. The status of your docker image will be partial during creation and available once completed.
Save your changes.
Select Start Workspace
Wait for the workspace to be started and you can access it either via console or the GUI.
ICA_BENCH_URL
The host part of the public URL which provides access to the running workspace.
use1-bench.platform.illumina.com
ICA_PROJECT_UUID
The unique identifier related to the ICA project in which the workspace was started.
ICA_URL
The ICA Endpoint URL.
HTTP_PROXY
HTTPS_PROXY
The proxy endpoint in case the workspace was started in restricted mode.
HOME
The home folder.
/data
ICA_WORKSPACE
The unique identifier related to the started workspace. This value is bound to a workspace and will never change.
32781195
ICA_CONSOLE_ENABLED
Whether Console access is enabled for this running workspace.
true, false
ICA_WEB_ENABLED
Whether Web access is enabled for this running workspace.
true, false
ICA_SERVICE_ACCOUNT_USER_API_KEY
An API key that allows interaction with ICA using the ICA CLI and is bound to the permissions defined at startup of the workspace.
/etc/workspace-auth
Contains the SSH rsa public/private keypair which is required to be used to run the workspace SSHD.
/data
This folder contains all data specific to your workspace.
Data in this folder is not persisted in your project and will be removed at deletion of the workspace.
In the default view, the timeline shows the first five disease data and the first five drug/medication data in the plot. Users can choose different attributes or change the order of existing attributes by clicking on the “select attribute” button.
The x-axis shows the person’s age in years, with data points initially displayed between ages 0 to 100. Users can zoom in by selecting the desired range to visualize data points within the selected age range.
Each event is represented by a dot in the corresponding track. Events in the same track can be connected by lines to indicate the start and end period of an event.
Search for a specific subject by typing the Subject ID into the Search Subjects text box.
Get all details available on a subject by clicking the hyperlinked Subject ID in the Subject list.
To remove a specific subject from a Cohort, uncheck the checkbox next to subjects to remove from a Cohort.
Check box selections are maintained while browsing through the pages of the subject list.
Click Save Cohort to save the subjects you would like to exclude.
The specific subjects will no longer be counted in all analysis visualizations.
The specific excluded subjects will be saved for the Cohort.
To add the subjects back to the Cohort, select the checkboxes to checked and click Save Cohort.
Genes can be searched by using the Search Genes text box.
will be displayed that lists information and links to public resources about the selected gene.
A cytogenic map will be displayed based on the selected gene and a vertical orange bar represents gene location in the chromosome.
Click the Variants tab and Show legend and filters if it does not open by default.
Below the interactive legend, you see a set of analysis tracks: Needle Plot, Primate AI, Pathogenic variants, and Exons.
The Needle Plot allows toggling the plot by gnomAD frequency and Sample Count. Select Sample Count in the Plot by legend above the plot. You can also filter the plot to only show variants above/below a certain cut-off for gnomAD frequency (in percent) or absolute sample count.
The Needle Plot allows filtering by PrimateAI Score.
Set a lower (>=) or upper (<=) threshold for the PrimateAI Score to filter variants.
Enter the threshold value in the text box located below the gnomadFreq/SampleCount input box.
If no threshold value is entered, no filter will be applied.
The filter affects both the plot and the table when the “Display only variants shown in the plot above” toggle is enabled.
Filter preferences persist across gene views for a seamless experience.
The following filters are always shown and can be independently set: %gnomAD FrequencySample CountPrimateAI Score . Changes made to these filters are immediately reflected in both the needle plot and the variant list below.
Click on a variant's needle pin to view details about the variant from public resources and counts of variants in the selected cohort by disease category. If you want to view all subjects that carry the given variant, click on the sample count link, which will take you to the list of subjects (see above).
Use the Exon zoom bar from each end of the Amino Acid sequence to zoom in on the gene domain to better separate observations.
The Pathogenic Variant Track shows pop up details with pathogenicity calls, phenotypes, submitter and a link to the ClinVar entry is seen by hovering over the purple triangles.
Below the needle plot is a full listing of variants displayed in the needle plot visualization
Display only variants shown in the plot above. toggle (enabled by default) syncs the table with the Needle Plot. When the toggle is on, the table will display only the variants shown in the Needle Plot, applying all active filters (e.g., variant type, somatic/germline, sample count). When the toggle is off, all reported variants are displayed in the table and table-based filters can be used.
Export to CSV: When the views are synchronized (toggle on), the filtered list of variants can be exported to a CSV file for further analysis.The Phenotypes tab shows a stacked horizontal bar chart which displays molecular breakdown (disease type vs Gene) and subject count for the selected gene.
Note on "Stop Lost" Consequence Variants:
The stop_lost consequence is mapped as Frameshift, Stop lost in the tooltip.
The l Stop gained|lost value includes both stop gain and stop loss variants.
The Gene Expression tab shows known gene expression data from tissue types in GTEx.
The Genetic Burden Test will only be available for de novo variants only.
, select a clinical attribute.
In Y-axis category, select Clinical.
In Y-Axis Attribute, select another clinical attribute.
You will be shown a bubble plot comparing the first clinical attribute on the x-axis to second attributes on the y-axis.
The size of the bubbles correspond to the number of subjects falling into those categories.
, select a gene.
In Y-axis category, select RNA expression.
In Y-Axis Attribute, type a gene and leave Reference Type, NORMAL.
Click Continuous to see violin plots of compared variables.
, type a gene name.
In Y-axis category, select Clinical.
In Y-Axis Attribute, select a clinical attribute.
You are shown a stacked bar-chart by the clinical attribute selected values on the Y-axis.
For each attribute value the bar represents the % of Subjects with RNA Expression, Somatic Mutation, and Multiple Alterations.
Click Chromosome: to select a specific chromosome position.
Create a Cohort of subjects of interest using Create a Cohort.
From the Subjects Tab click the Export subjects... from the top-right of the subject list. The file can be downloaded to the Browser or ICA Project Data.
We suggest using export ...to Data Folder for immediate access to this data in Bench or other areas of ICA.
Create another cohort if needed for your Research and complete the last 3 steps.
Navigate to the Bench workspace created in the second step.
After the workspace has started up, click Access.
Find the /Project/ folder in the Workspace file navigation.
This folder will contain your cohort files created along with any pipeline output data needed for your workspace analysis.
Once a workspace is started, it will be restarted every 30 days for security reasons. Even when you have automatic shutdown configured to be more than 30, the workspace will be restarted after 30 days and the remaining days will be counted in the next cycle.
You can see the remaining time until the next event (Shutdown or restart) in the workspaces overview and on the workspace details.
Create Workspace
If this is the first time you are using a workspace in a Project, click Enable to create new Bench Workspaces. In order to use Bench, you first need to have a workspace. This workspace determines which docker image will be used with which node and storage size.
Complete the following fields and save the changes.
(*1) URLs must comply with the following rules:
URLs can be between 1 and 263 characters including dot (.).
URLs can begin with a leading dot (.).
Domain and Sub-domains:
Can include alphanumeric characters (Letters A-Z and digits 0-9). Case insensitive.
Can contain hyphens (-) and underscores (_), but not as a first or last character.
Dot (.) must be placed after a domain or sub-domain.
If you use a trailing slash like in the path ftp.example.net/folder/ then you will not be able to access the path ftp.example.net/folder without the trailing slash included.
Regex for URL : [(http(s)?):\/\/(www\.)?a-zA-Z0-9@:%._\+~#=-]\{2,256}\.[a-z]\{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)
(*2) When you grant workspace access to multiple users, you need to provide an API key to access the workspace. Authenticate using icav2 config set command. The CLI will prompt for an x-api-key value. enter the API Key generated from the product dashboard. See here for more information.
Example URLs
The following are example URLs which will be considered valid.
The workspace can be edited afterwards when it is stopped, on the Details tab within the workspace. The changes will be applied when the workspace is restarted.
Workspace permissions
When Access limited to workspace owner is selected, only the workspace owner can access the workspace. Everything created in that workspace will belong to the workspace owner.
Administrator vs Contributor
Bench administrators are able to create, edit and delete workspaces and start and stop workspaces. If their permissions match or exceed those of the workspace, they can also access the workspace contents.
Contributors are able to start and stop workspaces and if their permissions match or exceed those of the workspace, they can also access the workspace contents.
Setting Workspace Permissions
The teams setting determines if someone is an administrator or contributor, while the dedicated permissions you set on the workspace level indicate what the workspace itself can and cannot do within your project. For this reason, the users need to meet or exceed the required permissions to enter this workspace and use it.
For security reasons, the Tenant administrator and Project owner can always access the workspace.
If one of your permissions is not high enough as bench contributor, you will see the following message "You are not allowed to use this workspace as your user permissions are not sufficient compared to the permissions of this workspace".
The permissions that a Bench workspace can receive are the following:
Upload rights
Download rights (required)
Project (No Access - Dataprovider - Viewer - Contributor)
Flow (No Access - Viewer - Contributor)
Base (No Access - Viewer - Contributor)
Based on these permissions, you will be able to upload or download data to your ICA project (upload and download rights) and will be allowed to take actions in the Project, Flow and Base modules related to the granted permission.
If you encounter issues when uploading/downloading data in a workspace, the security settings for that workspace may be set to not allow uploads and downloads. This can result in RequestError: send request failed and read: connection reset by peer. This is by design in restricted workspaces and thus limits data access to your project via /data/project to prevent the extraction of large amounts of (proprietary) data.
Workspaces which were created before this functionality existed can be upgraded by enabling these workspace permissions. If the workspaces are not upgraded, they will continue working as before.
Delete workspace (Bench Administrators Only)
To delete a workspace, go to Projects > your_project > Bench > Workspaces > your_workspace and click “Delete”. Note that the delete option is only available when the workspace is stopped.
The workspace will not be accessible anymore, nor will it be shown in the list of workspaces. The content of it will be deleted so if there is any information that should be kept, you can either put it in a docker image which you can use to start from next time, or export it using the API.
Use workspace
The workspace is not always accessible. It needs to be started before it can be used. From the moment a workspace is Running, a node with a specific capacity is assigned to this workspace. From that moment on, you can start working in your workspace.
As long as the workspace is running, the resources provided for this workspace will be charged.
Start workspace
To start the workspace, follow the next steps:
Go to Projects > your_project > Bench > Workspaces > your_workspace > Details
Click on Start Workspace button
On the top of the details tab, the status changes to “Starting”. When you click on the >_Access tab, the message “The workspace is starting” appears.
Wait until the status is “Running” and the “Access” tab can be opened. This can take some time because the necessary resources have to be provisioned.
You can refresh the workspace status by selecting the round refresh symbol at the top right.
Once a workspace is running, it can be manually stopped or it will be automatically shut down after the amount of time configured in the Automatic Shutdown field. Even with automatic shutdown, it is still best practice to stop your workspace run when you no longer need it to save costs.
You can edit running workspaces to update the shutdown timer, shutdown reminder and auto restart reminder.
If you want to open a running workspace in a new tab, then select the link at Projects > your_project > Bench > Workspaces > Details tab > Access. You can also copy the link with the copy symbol in front of the link.
Stop workspace
When you exit a workspace, you can choose to stop the workspace or keep it running. Keeping the workspace running means that it will continue to use resources and incur associated costs. To stop the workspace, select stop in the displayed dialog. You can also stop a workspace by opening it and selecting stop at the top right.
Stopping the workspace will stop the notebook, but will not delete local data. Content will no longer be accessible and no actions can be performed until it is restarted. Any work that has been saved will stay stored.
Storage will continue to be charged until the workspace is deleted.
Administrators have a delete option for the workspace in the exit screen.
The project/tenant administrator can enter and stop workspaces for their project/tenant even if they did not start those workspaces at Projects > your_project > Bench > Workspaces > your_workspace > Details. Be careful not to stop workspaces that are processing data. For security reasons, a log entry is added when a project/tenant administrator enters and exits a workspace.
You can see who is using a workspace in the workspace list view.
Workspace Tabs
Access tab
Once the Workspace is running, the default applications are loaded. These are defined by the start script of the docker image.
The docker images provided by Illumina will load JupyterLab by default. It also contains Tutorial notebooks that can help you get started. Opening a new terminal can be done via the Launcher, + button above the folder structure.
Docker Builds tab (Bench Administrators only)
To ensure that packages (and other objects, including data) are permanently installed on a Bench image, a new Bench image needs to be created, using the BUILD option in Bench. A new image can only be derived from an existing one. The build process uses the DOCKERFILE method, where an existing image is the starting point for the new Docker Image (The FROM directive), and any new or updated packages are additive (they are added as new layers to the existing Docker file).
The Dockerfile commands are all run as ROOT, so it is possible to delete or interfere with an image in such a way that the image is no longer running correctly. The image does not have access to any underlying parts of the platform so will not be able to harm the platform, but inoperable Bench images will have to be deleted or corrected.
In order to create a derived image, open up the image that you would like to use as the basis and select the Build tab.
Name: By default, this is the same name as the original image and it is recommended to change the name.
Version: Required field which can by any value.
Description: The description for your docker image (for example, indicating which apps it contains).
Code: The Docker file commands must be provided in this section.
The first 4 lines of the Docker file must NOT be edited. It is not possible to start a docker file with a different FROM directive. The main docker file commands are RUN and COPY. More information on them is available in the official Docker documentation.
Once all information is present, click the Build button. Note that the build process can take a while. Once building has completed, the docker image will be available on the Data page within the Project. If the build has failed, the log will be displayed here and the log file will be in the Data list.
Tools (Bench Administrators Only)
From within the workspace it is possible to create a tool from the Docker image.
Click the Manage > Create CWL Tool button in the top right corner of the workspace.
Give the tool a name.
Replace the description of the tool to describe what it does.
Add a version number for the tool.
Click the Docker Build tab.
Here the image that accompanies the tool will be created.
Change the name for the image.
Click the General tab. This tab and all next tabs will look familiar from Flow. Enter the information required for the tool in each of the tabs. For more detailed instruction check out the section in the Flow documentation.
Click the Save button in the upper, right-hand corner to start the build process.
The building can take a while. When it has completed, the tool will be available in the Tool Repository.
Workspace Data
To export data from your workspace to your local machine, it is best practice to move the data in your workspace to the /data/project/ folder so that it becomes available in your project under projects > your_project > Data. Although this storage is slow, it offers read and write access and access to the content from within ICA.
For fast read-only access, link folders with the CLI commandworkspace-ctl data create-mount --mode read-only.
For fast read/write access, link non-indexed folders which are visible, but whose contents are not accessible from ICA. Use the CLI commandworkspace-ctl data create-mount --mode read-write to do so. You can not have fast read-write access to indexed folders as the indexing mechanism on those would deteriorate the performance.
Every workspace you start has a read-only /data/.software/ folder which contains the icav2 command-line interface (and readme file).
File Mapping
Activity tab
The last tab of the workspace is the activity tab. On this tab all actions performed in the workspace are shown. For example, the creation of the workspace, starting or stopping of the workspace,etc. The activities are shown with their date, the user that performed the action and the description of the action. This page can be used to check how long the workspace has run.
In the general Activity page of the project, there is also a Bench activity tab. This shows all activities performed in all workspaces within the project, even when the workspace has been deleted. The Activity tab in the workspace only shows the action performed in that workspace. The information shown is the same as per workspace, except that here the workspace in which the action is performed is listed as well.
First, we create the two tools: fastp and multiqc. For this, we need the corresponding Docker images and CWL tool definitions. Please, look up this part of our help sites to learn more how to import a tool into ICA. In a nutshell, once the CWL tool definition is pasted into the editor, the other tabs for editing the tool will be populated. To complete the tool, the user needs to select the corresponding Docker image and to provide a tool version (could be any string).
For this demo, we will use the publicly available Docker images: quay.io/biocontainers/fastp:0.20.0--hdbcaa40_0 for fastp and docker.io/ewels/multiqc:v1.15 for multiqc. In this tutorial one can find how to import publicly available Docker images into ICA.
Furthermore, we will use the following CWL tool definitions:
and
Pipeline
Once the tools are created, we will create the pipeline itself using these two tools at Projects > your_project > Flow > Pipelines > CWL > Graphical:
On the Definition tab, go to the tool repository and drag and drop the two tools which you just created on the pipeline editor.
Connect the JSON output of fastp to multiqc input by hovering over the middle of the round, blue connector of the output until the icon changes to a hand and then drag the connection to the first input of multiqc. You can use the magnification symbols to make it easier to connect these tools.
Above the diagram, drag and drop two input FASTQ files and an output HTML file on to the pipeline editor and connect the blue markers to match the diagram below.
fastp_multiqc
Relevant aspects of the pipeline:
Both inputs are multivalue (as can be seen on the screenshot)
Ensure that the step fastp has scattering configured: it scatters on both inputs using the scatter method 'dotproduct'. This means that as many instances of this step will be executed as there are pairs of FASTQ files. To indicate that this step is executed multiple times, the icons of both inputs have doubled borders.
Important remark
Both input arrays (Read1 and Read2) must be matched. Currently an automatic sorting of input arrays is not supported. You have to take care of matching the input arrays which can be done in either one of two ways (besides the manual specification in the GUI):
Invoke this pipeline in CLI using Bash functionality to sort the arrays
Add a tool to the pipeline which will intake array of all FASTQ files, spread them on R1 and R2 suffixes, and sort them.
We will describe the second way in more detail. The tool will be based on public python Docker docker.io/python:3.10 and have the following definition. In this tool we are providing the Python script spread_script.py via Dirent feature.
Now this tool can added to the pipeline before fastp step.
an analysis exercising this pipeline, preferably with a short execution time, to use as validation test
Start Bench Workspace
For this tutorial, the instance size depends on the flow you import, and whether you use a Bench cluster:
When using a cluster, choose standard-small or standard-medium for the workspace master node
Otherwise, choose at least standard-large if you re-import a pipeline that originally came from nf-core, as they typically need 4 or more CPUs to run.
Select the "single user workspace" permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines
Import Existing Pipeline and Analysis to Bench
The starting point is the analysis id that is used as pipeline validation test (the pipeline id is obtained from the analysis metadata).
If no --analysis-id is provided, the tool lists all the successfull analyses in the current project and lets the developer pick one.
If conda and/or nextflow are not installed, pipeline-dev will offer to install them.
A folder called imported-flow-analysis is created.
Pipeline Nextflow assets are downloaded into the nextflow-src sub-folder.
Results
Run Validation Test in Bench
The following command runs this test profile. If a Bench cluster is active, it runs on your Bench cluster, otherwise it runs on the main workspace instance:
The pipeline-dev tool is using "nextflow run ..." to run the pipeline. The full nextflow command is printed on stdout and can be copy-pasted+adjusted if you need additional options.
Monitoring
When a pipeline is running on your Bench cluster, a few commands help to monitor the tasks and cluster. In another terminal, you can use:
qstat to see the tasks being pending or running
tail /data/logs/sge-scaler.log.<latest available workspace reboot time> to check if the cluster is scaling up or down (it currently takes 3 to 5 minutes to get a new node)
Data Locations
The output of the pipeline is in the outdir folder
Nextflow work files are under the work folder
Log files are .nextflow.log* and output.log
Modify Pipeline
Nextflow files (located in the nextflow-src folder) are easy to modify.
Depending on your environment (ssh access / docker image with JupyterLab or VNC, with and without Visual Studio code), various source code editors can be used.
After modifying the source code, you can run a validation iteration with the same command as before:
Identify Docker Image
Modifying the Docker image is the next step.
Nextflow (and ICA) allow the Docker images to be specified at different places:
in config files such as nextflow-src/nextflow.config
in nextflow code files:
grep container may help locate the correct files:
Docker Image Update: Dockerfile Method
Use case: Update some of the software (mimalloc) by compiling a new version
With the appropriate permissions, you can then "docker login" and "docker push" the new image.
Docker Image Update: Interactive Method
With the appropriate permissions, you can then "docker login" and "docker push" the new image.
Fun fact: VScode with the "Dev Containers" extension lets you edit the files inside your running container:
Beware that this extension creates a lot of temp files in /tmp and in $HOME/.vscode-server. Don't include them in your image...
Update the nextflow code and/or configs to use the new image
Validate your changes in Bench:
Deploy as Flow Pipeline
After generating a few ICA-specific files (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name, so that's what is checked here).
It then asks if we want to update the latest version or create a new one.
At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.
Result
Run Validation Test in Flow
This launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.
Some of the input files will have been copied to your ICA project to allow the launch to take place. They are stored in the folder /data/project/bench-pipeline-dev/temp-data.
Result
Launch DRAGEN Pipelines on CLI
Prerequisite: configure and authenticate the ICA command line interface (CLI).
Configure CLI and Identify Pipeline ID
Obtain a list of your projects with their associated IDs:
Use the ID of the project from the previous step to enter the project context:
Find the pipeline you want to start from the CLI by obtaining a list of pipelines associated with your project:
Find the ID associated with your pipeline of interest.
Identify Input File Parameters
To find the input files parameter, you can use a previously launched projectanalyses with the input command.
Find the previous analyses launched along with their associated IDs:
List the analyses inputs by using the ID found in the previous step:
This will return the Input File Codes, as well as the file names and data IDs of the associated data used to previously launch the pipeline
Identify Configuration Settings
You need to use the ICA API to access the configuration settings of a project analyses that ran successfully.
API-based Configuration Settings
Generate JWT Token from API Key or Basic login credentials
on how to get an API Key https://illumina.gitbook.io/ica/account-management/am-iam#api-keys
If your user has access to multiple domains, you will need to need to add a "?tenant=($domain)" to the request
Response to this request will provide a JWT token {"token":($token)}, use the value of the token in further requests
Using the API endpoint /api/projects/{projectID}/analyses/{analysisId}/configurations to find the configuration file listing out all of required and optional parameters
The response JSON to this API will have configuration items listed as
Nextflow XML file parameters via GUI (Prior to DRAGEN 4.3)
Click the previous GUI run, and select the pipeline that was run. On the pipeline page, select the XML Configuration Tab to view the configuration settings.
In the "steps" section of the XML file, you will find various steps labeled with
and subsequent labels of parameters with a similar structure
The code should be used to generate the later command line parameters e.g.
--parameters enable_map_align:true
Create Launch Command
CWL
Structure of the final command
icav2 projectpipelines start cwl $(pipelineID) --user-referenc Plus input options
Input Options - For CLI, the entire input can be broken down as individual command line arguments
To launch the same analysis as using the GUI, use the same file ID and parameters, if using new data you can use the CLI command icav2 projectdata list to find new file IDs to launch a new instance of the pipeline Required information in Input - Input Data and Parameters
Command Line Arguments
This option requires the use of --type input STRUCTURED along with --input and --parameters
The input parameter names such as Reference and Tumor_FASTQ_Files in the example below are from the pipeline definition where you can give the parameters a name. You can see which of these were used when the pipeline originally ran, in the section above. You can also look at the pipeline definitions for the input parameters, for example the code value of .
Successful Response
Unsuccessful: Pipeline ID not formatted correctly
Check that the pipeline ID is correct based on icav2 projectpipelines list
Unsuccessful: File ID not found
Check that the file ID is correct based on icav2 projectdata list
Unsuccessful: Parameter not found
Nextflow
When using nextflow to start runs, the input-type parameter is not used, but the --project-id is required
Structure of the file command icav2 projectpipelines start nextflow $(pipelineID) --user-reference Plus input options
The Response can be used to determine if the pipeline was submitted successfully.
# Init script invoked at start of a bench workspace
COPY --chmod=0755 --chown=root:root ${FILES_BASE}/ica_start.sh /usr/local/bin/ica_start.sh
# Bench workspaces need to run as user with uid 1000 and be part of group with gid 100
RUN adduser -H -D -s /bin/bash -h ${HOME} -u 1000 -G users ica
# Terminate function
function terminate() {
# Send SIGTERM to child processes
kill -SIGTERM $(jobs -p)
# Send SIGTERM to waitpid
echo "Stopping ..."
kill -SIGTERM ${WAITPID}
}
# Catch SIGTERM signal and execute terminate function.
# A workspace will be informed 30s before forcefully being shutdown.
trap terminate SIGTERM
# Hold init process until TERM signal is received
tail -f /dev/null &
WAITPID=$!
wait $WAITPID
#!/usr/bin/env cwl-runner
cwlVersion: cwl:v1.0
class: CommandLineTool
label: MultiQC
doc: MultiQC is a tool to create a single report with interactive plots for multiple
bioinformatics analyses across many samples.
inputs:
files:
type:
- type: array
items: File
- 'null'
doc: Files containing the result of quality analysis.
inputBinding:
position: 2
directories:
type:
- type: array
items: Directory
- 'null'
doc: Directories containing the result of quality analysis.
inputBinding:
position: 3
report_name:
type: string
doc: Name of output report, without path but with full file name (e.g. report.html).
default: multiqc_report.html
inputBinding:
position: 1
prefix: -n
outputs:
report:
type: File
outputBinding:
glob:
- '*.html'
baseCommand:
- multiqc
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
requirements:
- class: InlineJavascriptRequirement
- class: InitialWorkDirRequirement
listing:
- entry: "import argparse\nimport os\nimport json\n\n# Create argument parser\n\
parser = argparse.ArgumentParser()\nparser.add_argument(\"-i\", \"--inputFiles\"\
, type=str, required=True, help=\"Input files\")\n\n# Parse the arguments\n\
args = parser.parse_args()\n\n# Split the inputFiles string into a list of file\
\ paths\ninput_files = args.inputFiles.split(',')\n\n# Sort the input files\
\ by the base filename\ninput_files = sorted(input_files, key=lambda x: os.path.basename(x))\n\
\n\n# Separate the files into left and right arrays, preserving the order\n\
left_files = [file for file in input_files if '_R1_' in os.path.basename(file)]\n\
right_files = [file for file in input_files if '_R2_' in os.path.basename(file)]\n\
\n# Print the left files for debugging\nprint(\"Left files:\", left_files)\n\
\n# Print the left files for debugging\nprint(\"Right files:\", right_files)\n\
\n# Ensure left and right files are matched\nassert len(left_files) == len(right_files),\
\ \"Mismatch in number of left and right files\"\n\n \n# Write the left files\
\ to a JSON file\nwith open('left_files.json', 'w') as outfile:\n left_files_objects\
\ = [{\"class\": \"File\", \"path\": file} for file in left_files]\n json.dump(left_files_objects,\
\ outfile)\n\n# Write the right files to a JSON file\nwith open('right_files.json',\
\ 'w') as outfile:\n right_files_objects = [{\"class\": \"File\", \"path\"\
: file} for file in right_files]\n json.dump(right_files_objects, outfile)\n\
\n"
entryname: spread_script.py
writable: false
label: spread_items
inputs:
inputFiles:
type:
type: array
items: File
inputBinding:
separate: false
prefix: -i
itemSeparator: ','
outputs:
leftFiles:
type:
type: array
items: File
outputBinding:
glob:
- left_files.json
loadContents: true
outputEval: $(JSON.parse(self[0].contents))
rightFiles:
type:
type: array
items: File
outputBinding:
glob:
- right_files.json
loadContents: true
outputEval: $(JSON.parse(self[0].contents))
baseCommand:
- python3
- spread_script.py
icav2 projects list
ID NAME OWNER
a5690b16-a739-4bd7-a62a-dc4dc5c5de6c Project1 670fd8ea-2ddb-377d-bd8b-587e7781f2b5
ccb0667b-5949-489a-8902-692ef2f31827 Project2 f1aa8430-7058-4f6c-a726-b75ddf6252eb
No of items : 2
icav2 projects enter a5690b16-a739-4bd7-a62a-dc4dc5c5de6c
When the Stop gained filter is applied, Stop lost variants will not appear in the plot or table if the "Display only variants shown in the plot above" toggle is enabled
mkdir demo-flow-dev
cd demo-flow-dev
pipeline-dev import-from-flow
or
pipeline-dev import-from-flow --analysis-id=9415d7ff-1757-4e74-97d1-86b47b29fb8f
Enter the number of the entry you want to use: 21
Fetching analysis 9415d7ff-1757-4e74-97d1-86b47b29fb8f ...
Fetching pipeline bb47d612-5906-4d5a-922e-541262c966df ...
Fetching pipeline files... main.nf
Fetching test inputs
New Json inputs detected
Resolving test input ids to /data/mounts/project paths
Fetching input form..
Pipeline "GWAS pipeline_1.
_2_1_20241215_130117" successfully imported.
pipeline name: GWAS pipeline_1_2_1_20241215_130117
analysis name: Test GWAS pipeline_1_2_1_20241215_130117
pipeline id : bb47d612-5906-4d5a-922e-541262c966df
analysis id : 9415d7ff-1757-4e74-97d1-86b47b29fb8f
Suggested actions:
pipeline-dev run-in-bench
I Iterative dev: Make code changes + re-validate with previous command ]
pipeline-dev deploy-as-flow-pipeline
pipeline-dev run-in-flow
cd imported-flow-analysis
pipeline-dev run-in-bench
/data/demo $ tail /data/logs/sge-scaler.log.*
2025-02-10 18:27:19,657 - SGEScaler - INFO: SGE Marked Overview - {'UNKNOWN': O, 'DEAD': O, 'IDLE': O, 'DISABLED': O, 'DELETED': O, 'UNRESPONSIVE': 0}
2025-02-10 18:27:19,657 - SGEScaler - INFO: Job Status - Active jobs : 0, Pending jobs : 6
2025-02-10 18:27:26,291 - SGEScaler - INFO: Cluster Status - State: Transitioning,
Online Members: 0, Offline Members: 2, Requested Members: 2, Min Members: 0, Max Members: 2
code nextflow-src # Open in Visual Studio Code
code . # Open current dir in Visual Studio Code
vi nextflow-src/main.nf
pipeline-dev run-in-bench
/data/demo-flow-dev $ head nextflow-src/main.nf
nextflow.enable.dsl = 2
process top_level_process t
container 'docker.io/ljanin/gwas-pipeline:1.2.1'
IMAGE_BEFORE=docker.io/ljanin/gwas-pipeline:1.2.1
IMAGE_AFTER=docker.io/ljanin/gwas-pipeline:tmpdemo
# Create directory for Dockerfile
mkdir dirForDockerfile
cd dirForDockerfile
# Create Dockerfile
cat <<EOF > Dockerfile
FROM ${IMAGE_BEFORE}
RUN mkdir /mimalloc-compile \
&& cd /mimalloc-compile \
&& git clone -b v2.0.6 https://github.com/microsoft/mimalloc \
&& mkdir -p mimalloc/out/release \
&& cd mimalloc/out/release \
&& cmake ../.. \
&& make \
&& make install \
&& cd / \
&& rm -rf mimalloc-compile
EOF
# Build image
docker build -t ${IMAGE_AFTER} .
IMAGE_BEFORE=docker.io/ljanin/gwas-pipeline:1.2.1
IMAGE_AFTER=docker.io/ljanin/gwas-pipeline:1.2.2
docker run -it --rm ${IMAGE_BEFORE} bash
# Make some modifications
vi /scripts/plot_manhattan.py
<Fix "manhatten.png" into "manhattAn.png">
<Enter :wq to save and quit vi>
<Start another terminal (try Ctrl+Shift+T if using wezterm)>
# Identify container id
# Save container changes into new image layer
CONTAINER_ID=c18670335247
docker commit ${CONTAINER_ID} ${IMAGE_AFTER}
sed --in-place "s/${IMAGE_BEFORE}/${IMAGE_AFTER}/" nextflow-src/main.nf
/data/demo $ pipeline-dev deploy-as-flow-pipeline
Generating ICA input specs...
Extracting nf-core test inputs...
Deploying project nf-core/demo
- Currently being developed as: dev-nf-core-demo
- Last version updated in ICA: dev-nf-core-demo_v3
- Next suggested version: dev-nf-core-demo_v4
How would you like to deploy?
1. Update dev-nf-core-demo (current version)
2. Create dev-nf-core-demo_v4
3. Enter new name
4. Update dev-nf-core-demo_v3 (latest version updated in ICA)
/data/demo $ pipeline-dev launch-validation-in-flow
pipelineld: 26bc5aa5-0218-4e79-8a63-ee92954c6cd9
Getting Analysis Storage Id
Launching as ICA Flow Analysis...
ICA Analysis created:
- Name: Test dev-nf-core-demo_v4
- Id: cadcee73-d975-435d-b321-5d60e9aec1ec
- Url: https://stage.v2.stratus.illumina.com/ica/projects/1873043/analyses/cadcee73-d975-435d-b321-5d60e9aec1ec
icav2 projectpipelines list
ID CODE DESCRIPTION
fbd6f3c3-cb70-4b35-8f57-372dce2aaf98 DRAGEN Somatic 3.9.5 The DRAGEN Somatic tool identifies somatic variants
b4dc6b91-5283-41f6-8095-62a5320ed092 DRAGEN Somatic Enrichment 3-10-4 The DRAGEN Somatic Enrichment pipeline identifies somatic variants which can exist at low allele frequencies in the tumor sample.
No of items : 2
icav2 projectanalyses list
ID REFERENCE CODE STATUS
3539d676-ae99-4e5f-b7e4-0835f207e425 kyle-test-somatic-2-DRAGEN Somatic 3_9_5 DRAGEN Somatic 3.9.5 SUCCEEDED
f11e248e-9944-4cde-9061-c41e70172f20 kyle-test-somatic-1-DRAGEN Somatic 3_9_5 DRAGEN Somatic 3.9.5 FAILED
No of items : 2
analysisStorage.description 1.2 TB
analysisStorage.id 6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
analysisStorage.name Small
analysisStorage.ownerId 8ec463f6-1acb-341b-b321-043c39d8716a
analysisStorage.tenantId f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
analysisStorage.tenantName ica-cp-admin
analysisStorage.timeCreated 2021-11-05T10:28:20Z
analysisStorage.timeModified 2021-11-05T10:28:20Z
id 51abe34a-2506-4ab5-adef-22df621d95d5
ownerId 47793c21-75a6-3aa8-8147-81b354d0af4d
pipeline.analysisStorage.description 1.2 TB
pipeline.analysisStorage.id 6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
pipeline.analysisStorage.name Small
pipeline.analysisStorage.ownerId 8ec463f6-1acb-341b-b321-043c39d8716a
pipeline.analysisStorage.tenantId f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
pipeline.analysisStorage.tenantName ica-cp-admin
pipeline.analysisStorage.timeCreated 2021-11-05T10:28:20Z
pipeline.analysisStorage.timeModified 2021-11-05T10:28:20Z
pipeline.code DRAGEN Somatic 3.9.5
pipeline.description The DRAGEN Somatic tool identifies somatic variants which can exist at low allele frequencies in the tumor sample. The pipeline can analyze tumor/normal pairs and tumor-only sequencing data. The normal sample, if present, is used to avoid calls at sites with germline variants or systematic sequencing artifacts. Unlike germline analysis, the somatic platform makes no ploidy assumptions about the tumor sample, allowing sensitive detection of low-frequency alleles.
pipeline.id fbd6f3c3-cb70-4b35-8f57-372dce2aaf98
pipeline.language CWL
pipeline.ownerId e9dd2ff5-c9ba-3293-857e-6546c5503d76
pipeline.tenantId 55cb0a54-efab-4584-85da-dc6a0197d4c4
pipeline.tenantName ilmn-dragen
pipeline.timeCreated 2021-11-23T22:55:49Z
pipeline.timeModified 2021-12-09T16:42:14Z
reference kyle-test-somatic-9-DRAGEN Somatic 3_9_5-bc56d4b1-f90e-4039-b3a4-b11d29263e4e
status REQUESTED
summary
tenantId b5b750a6-49d4-49de-9f18-75f4f6a81112
tenantName ilmn-cci
timeCreated 2022-03-16T22:48:31Z
timeModified 2022-03-16T22:48:31Z
userReference kyle-test-somatic-9
400 Bad Request : ICA_API_004 : com.fasterxml.jackson.databind.exc.InvalidFormatException: Cannot deserialize value of type `java.util.UUID` from String "8f57-372dce2aaf98": UUID has to be represented by standard 36-char representation
at [Source: (io.undertow.servlet.spec.ServletInputStreamImpl); line: 1, column: 983] (through reference chain: com.bluebee.rest.v3.publicapi.dto.analysis.SearchMatchingActivationCodesForCwlAnalysisDto["pipelineId"]) (ref. c9cd9090-4ddb-482a-91b5-8471bff0be58)
404 Not Found : ICA_GNRC_001 : Could not find data with ID [fil.35dec404fb37d08d9adf63307] (ref. 91b70c3c-378c-4de2-acc9-794bf18258ec)
400 Bad Request : ICA_EXEC_007 : The specified variableName [DRAGEN] does not exist. Make sure to use an existing variableName (ref. ab296d4e-9060-412c-a4c9-562c63450022)
You can start an analysis from both the dedicated analysis screen or from the actual pipeline.
From Analyses
Navigate to Projects > Your_Project > Flow > Analyses.
Select Start.
Select a single Pipeline.
From Pipelines or Pipeline details
Navigate to Projects > <Your_Project> > Flow > Pipelines
Select the pipeline you want to run or open the pipeline details of the pipeline which you want to run.
Select Start Analysis.
Aborting Analyses
You can abort a running analysis from either the analysis overview (Projects > your_project > Flow > Analyses > your_analysis > Manage > Abort) or from the analysis details (Projects > your_project > Flow > Analyses > your_analysis > Details tab > Abort).
Rerunning Analyses
Once an analysis has been executed, you can rerun it with the same settings or choose to modify the parameters when rerunning. Modifying the parameters is possible on a per-analysis basis. When selecting multiple analyses at once, they will be executed with the original parameters. Draft pipelines are subject to updates and thus can result in a different outcome when rerunning. ICA will display a warning message to inform you of this when you try to rerun an analysis based on a draft pipeline.
When rerunning an analysis, the user reference will be the original user reference (up to 231 characters), followed by _rerun_yyyy-MM-dd_HHmmss.
When there is an XML configuration change on a a pipeline for which you want to rerun an analysis, ICA will display a warning and not fill out the parameters as it cannot guarantee their validity for the new XML. You will need to provide the input data and settings again to rerun the analysis.
Some restrictions apply when trying to rerun an analysis.
Analyses
Rerun
Rerun with modified parameters
To rerun one or more analyses with te same settings:
Navigate to Projects > Your_Project > Flow > Analyses.
In the overview screen, select one or more analyses.
Select Manage > Rerun. The analyses will now be executed with the same parameters as their original run.
To rerun a single analysis with modified parameters:
Navigate to Projects > Your_Project > Flow > Analyses.
In the overview screen, open the details of the analysis you want to rerun by clicking on the analysis user reference.
Select Rerun. (at the top right)
Lifecycle
Status
Description
Final State
When an analysis is started, the availability of resources may impact the start time of the pipeline or specific steps after execution has started. Analyses are subject to delay when the system is under high load and the availability of resources is limited.
During analysis start, ICA runs a verification on the input files to see if they are available. When it encounters files that have not completed their upload or transfer, it will report "Data found for parameter [parameter_name], but status is Partial instead of Available". Wait for the file to be available and restart the analysis.
When the underlying storage provider runs out of storage resources, the Status field of the Analysis details will indicate this. There is no need to abort or rerun the analysis.
Analysis steps logs
During the execution of an analysis, logs are produced for each process involved in the analysis lifecyle. In the analysis details view, the Steps tab is used to view the steps in near real time as they're produced in the running processes. A grid layout is used for analyses with more than 50 steps, a tiled view for analyses with 50 steps or less, though you can choose to also use the grid layout for those by means of the tile/grid button on the top right of the analysis log tab. The steps tab also shows which resources were used as compute type in the different main analysis steps. (For child steps, these are displayed on the parent step)
There are system processes involved in the lifeycle for all analyses (ie. downloading inputs, uploading outputs, etc.) and there are processes which are pipeline-specific, such as processes which execute the pipeline steps. The below table describes the system processes. You can choose to display or hide these system processes with the Show technical steps
Process
Description
Additional log entries will show for the processes which execute the steps defined in the pipeline.
Each process shows as a distinct entry in the steps view with a Queue Date, Start Date, and End Date.
Timestamp
Description
The time between the Start Date and the End Date is used to calculate the duration. The time of the duration is used to calculate the usage-based cost for the analysis. Because this is an active calculation, sorting on this field is not supported.
Each log entry in the Steps view contains a checkbox to view the stdout and stderr log files for the process. Clicking a checkbox adds the log as a tab to the log viewer where the log text is displayed and made available for download.
Analysis Cost
To see the price of an analysis in iCredits, look at Projects > your_project > Flow > Analyses > your_analysis > Details tab. The pricing section will show you the entitlement bundle, storage detail and price in iCredits once the analysis has succeeded, failed or been aborted.
Log Files
By default, the stdout and stderr files are located in the ica_logs subfolder within the analysis. This location can be changed by selecting a different in the current project at the start of the analysis. Do not use a folder which already contains log files as these will be overwritten.
To set the log file location, you can also use the CreateAnalysisLogs section of the Create Analysis .
If you delete these files, no log information will be available on the analysis details > Steps tab.
You can access the log files from the analysis details (projects > your_project > flow > analysis > your_analysis > details tab)
Log Streaming
Logs can also be streamed using websocket client tooling. The API to retrieve analysis step details returns websocket URLs for each step to stream the logs from stdout/stderr during the step's execution. Upon completion, the websocket URL is no longer available.
Analysis Output Mappings
Currently, only FOLDER type output mappings are supported
By default, analysis outputs are directed to a new folder within the project where the analysis is launched. Analysis output mappings may be specified to redirect outputs to user-specified locations consisting of project and path. An output mapping consists of:
the source path on the local disk of the analysis execution environment, relative to the working folder.
the data type, either FILE or FOLDER
the target project ID to direct outputs to; analysis launcher must have contributor access to the project.
If the output folder already exists, any existing contents with the same filenames as those output from the pipeline will be overwritten by the new analysis
Example
In this example, 2 analysis output mappings are specified. The analysis writes data during execution in the working directory at paths out/test and out/test2. The data contained in these folders are directed to project with ID 4d350d0f-88d8-4640-886d-5b8a23de7d81 and at paths /output-testing-01/ and /output-testing-02/ respectively, relative to the root of the project data.
The following demonstrates the construction of the request body to start an analysis with the output mappings described above:
You can jump from the Analysis Details to the individual files and folders by opening the output files tab on the detail view (Projects > your_project > Flow > Analyses > your_analysis > Output files tab > your_output_file) and selecting Open in data.
The Output files section of the analyses will always show the generated outputs, even when they have since been deleted from storage. This is done so you can always see which files were generated during the analysis.
In this case it will no longer be possible to navigate to the actual output files.
analysis output
logs output
Notes
Tags
You can add and remove tags from your analyses.
Navigate to Projects > Your_Project > Flow > Analyses.
Select the analyses whose tags you want to change.
Select Manage > Manage tags.
Both system tags and customs tags exist. User tags are custom tags which you set to help identify and process information while technical tags are set by the system for processing. Both run-in and run-out tags are set on data to identify which analyses use the data. Connector tags determine data entry methods and reference data tags identify where data is used as reference data.
Hyperlinking
If you want to share a link to an analysis, you can copy and paste the URL from your browser when you have the analysis open. The syntax of the analysis link will be <hostURL>/ica/link/project/<projectUUID>/analysis/<analysisUUID>. Likewise, workflow sessions will use the syntax <hostURL>/ica/link/project/<projectUUID>/workflowSession/<workflowsessionUUID>. To prevent third parties from accessing data via the link when it is shared or forwarded, ICA will verify the access rights of every user when they open the link.
Restrictions
Input for analysis is limited to a total of 50,000 files (including multiple copies of the same file). Concurrency limits on analyses prevent resource hogging which could result in resource starvation for other tenants. Additional analyses will be queued and scheduled when currently running analyses complete and free up positions. The theoretical limit is 20, but this can be less in practice, depending on a number of external factors.
Troubleshooting
When your analysis fails, open the analysis details view (Projects > your_project> Flow > Analyses > your_analysis) and select display failed steps. This will give you the steps view filtered on those steps that had non-0 exit codes. If there is only one failed step which has logfiles, the stderr of that step will be displayed.
For pipeline developers: add automatic retrying to the individual steps that fail with error 55 / 56 (provided these steps are idempotent) See for retries.
Exit code 55 indicates analysis failure on economy instances due to an external event such as spot termination. You can retry the analysis.
Exit code 56 indicates analysis failure due to pod disruption and deletion by Kubernetes' Pod Garbage Collector (PodGC) because the node it was running on no longer exists. You can retry the anlaysis.
Service Connector
ICA provides a Service Connector, which is a small program that runs on your local machine to sync data between the platform's cloud-hosted data store and your local computer or server. The Service Connector securely uploads data or downloads results using TLS 1.2. In order to do this, the Connector makes 2 connections:
A control connection, which the Connector uses to get configuration information from the platform, and to update the platform about its activities
A connection towards the storage node, used to transfer the actual data between your local or network storage and your cloud-based ICA storage.
This Connector runs in the background, and configuration is done in the Illumina Connected Analytics (ICA) platform, where you can add upload and download rules to meet the requirements of the current project and any new projects you may create.
The Service Connector looks at any new files and checks their size. As long as the file size is changing, it knows data is still being added to the file and it is not ready for transfer. Only when the file size is stable and does not change anymore will it consider the file to be complete and initiate transfer. Despite this, it is still best practice to not connect the Service Connector to active folders which are used as streaming output for other processes as this can result in incomplete files being transferred when the active processes have extended compute periods in which the file size remains unchanged.
The service connector will handle integrity checking during file transfer, which requires the calculation of hashes on the data. In addition, Transmission speed depends on the available data transfer bandwidth and connection stability. For these reasons, uploading large amounts of data can take considerable time. This can in turn result in temporarily seeing empty folders at the destination location since these are created at the beginning of the transfer process.
Both the CLI and the service connector require x86 architecture. For ARM-based architecture on Mac or Windows, you need to keep x86 emulation enabled. Linux does not support x86 emulation.
Fill out the fields in the New Connector configuration page.
Run the downloaded .exe file. During the installation, the installer will ask for the initialization key. Fill out the initialization key you see in the platform.
The installer will create an Illumina Service Connector, register it as a Windows service, and start the service. That means, if you wait for about 60 seconds, and then refresh the screen in the Platform by using the refresh button in the right top corner of the page, the connector should display as connected.
You can only install 1 connector on Windows. If for some reason, you need to install a new one, first uninstall the old one. You only need to do this when there is a problem with your existing connector. Upgrading a connector is also possible. To do this, you don’t need to uninstall the old one first.
Connector Rules
In the upload and download rules, you define different properties when setting up a connector. A connector can be used by multiple projects and a connector can have multiple upload and download rules. Configuration can be changed anytime. Changes to the configuration will be applied approximately 60 seconds after changes are made in ICA if the connector is already connected. If the connector is not already started when configuration changes are made in ICA, it will take about 60 seconds after the connector is started for the configuration changes to be propagated to the connector. The following are the different properties you can configure when setting up a connector. After adding a rule and installing the connector, you can use the Active checkbox to disable rules.
Below is an example of a new connector setup with an Upload Rule to find all files ending with .tar or .tar.gz located within the local folder C:\Users\username\data\docker-images.
Upload Rules
An upload rule tells the connector which folder on your local disk it needs to watch for new files to upload. The connector contacts the platform every minute to pick up changes to upload rules. To configure upload rules for different projects, first switch into the desired project and select Connectivity. Choose the connector from the list and select Click to add a new upload rule and define the rule. The project field will be automatically filled with the project you are currently within.
Field
Description
Download Rules
When you schedule downloads in the platform, you can choose which connector needs to download the files. That connector needs some way to know how and where it needs to download your files. That’s what a download rule is for. The connector contacts the platform every minute to pick up changes to download rules. The following are the different download rule settings.
Field
Description
Connector Status
You can see the service connector status by the color indicator.
Color
Status
Shared Drives
When you set up your connector for the first time, and your sample files are located on a shared drive, it’s best to create a folder on your local disk, put one of the sample files in there, and do the connector setup with that folder. When this works, try to configure the shared drive.
Transfer to and from a shared drive may be quite slow. That means it can take up to 30 minutes after you configured a shared drive before uploads start. This is due to the integrity check the connector does for each file before it starts uploading. The connector can upload from or download to a shared drive, but there are a few conditions:
The drive needs to be mounted locally. X:\illuminaupload or /Volumes/shareddrive/illuminaupload will work, \\shareddrive\illuminaupload or smb://shareddrive/illuminaupload will not.
The user running the connector must have access to the shared drive without a password being requested.
Update connector to newer version
Illumina might release new versions of the Service Connector, with improvements and/or bug fixes. You can easily download a new version of the Connector with the Download button on the Connectivity screen in the platform. After you downloaded the new installer, run it and choose the option ‘Yes, update the existing installation’.
Uninstall a connector
To uninstall the connector, perform one of the following:
Windows and Linux: Run the uninstaller located in the folder where the connector was installed.
Mac: Move the Illumina Service Connector to your Trash folder.
Log files
The Connector has a log file containing technical information about what’s happening. When something doesn’t work, it often contains clues to why it doesn’t work. Interpreting this log file is not always easy, but it can help the support team to give a fast answer on what is wrong, so it is suggested to attach it to your email when you have upload or download problems. You can find this log file at the following location:
/<Installation Directory>/Illumina Service Connector.app/Contents/java/app/logs/BSC.out
Default: /Applications/Illumina Service Connector.app/Contents/java/app/logs/BSC.out
/<Installation Directory>/logs/BSC.out
Common issues
Operating system
Issue
Solution
General issues
Solution
Tool Repository
A Tool is the definition of a containerized application with defined inputs, outputs, and execution environment details including compute resources required, environment variables, command line arguments, and more.
Create a Tool
Tools define the inputs, parameters, and outputs for the analysis. Tools are available for use in graphical Common Workflow Language (CWL) pipelines by any project in the account.
JSON Schema
In the InputForm.json, use the syntax for the individual components you want to as listed below. This is a listing of all the currently available components and not to be used "as is", but adapted to the inputs you need in your inputform. For more information on JSON schema syntax, please see the .
Refresh to see the analysis status. See lifecycle for more information on statuses.
If for some reason, you want to end the analysis before it can complete, select Projects > Your_Project > Flow > Analyses > Manage > Abort. Refresh to see the status update.
View the analysis status on the Analyses page. See lifecycle for more information on statuses.
If for some reason, you want to end the analysis before it can complete, select Manage > Abort on the Analyses page.
Analyses with draft pipeline
Warn
Warn
Analyses with XML configuration change
Warn
Warn
Update the parameters you want to change.
Select Start Analysis The analysis will now be executed with the updated parameters.
In Progress
Analysis execution is in progress
No
Generating outputs
Transferring the Analysis results
No
Aborting
Analysis has been requested to be aborted
No
Aborted
Analysis has been aborted
Yes
Failed
Analysis has finished with error
Yes
Succeeded
Analysis has finished with success
Yes
the target path relative to the root of the project data to write the outputs.
When the analysis completes the outputs can be seen in the ICA UI, within the folders designated in the payload JSON during pipeline launch (output-testing-01 and output-testing-02).
Edit the user tags, reference data tags (if applicable) and technical tags.
Select Save to confirm the changes.
Analyses using external data
Allowed
-
Analyses using mount paths on input data
Allowed
-
Analyses using user-provided input json
Allowed
-
Analyses using advanced output mappings
-
Requested
The request to start the Analysis is being processed
No
Queued
Analysis has been queued
No
Initializing
Initializing environment and performing validations for Analysis
No
Preparing Inputs
Downloading inputs for Analysis
Setup Environment
Validate analysis execution environment is prepared
Run Monitor
Monitor resource usage for billing and reporting
Prepare Input Data
Download and mount input data to the shared file system
Pipeline Runner
Parent process to execute the pipeline definition
Finalize Output Data
Upload Output Data
Queue Date
The time when the process is submitted to the processes scheduler for execution
{
"$id": "#ica-pipeline-input-form",
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "ICA Pipeline Input Forms",
"description": "Describes the syntax for defining input setting forms for ICA pipelines",
"type": "object",
"additionalProperties": false,
"properties": {
"fields": {
"description": "The list of setting fields",
"type": "array",
"items": {
"$ref": "#/definitions/ica_pipeline_input_form_field"
}
}
},
"required": [
"fields"
],
"definitions": {
"ica_pipeline_input_form_field": {
"$id": "#ica_pipeline_input_form_field",
"type": "object",
"additionalProperties": false,
"properties": {
"id": {
"description": "The unique identifier for this field. Will be available with this key to the pipeline script.",
"type": "string",
"pattern": "^[a-zA-Z-0-9\\-_\\.\\s\\+\\[\\]]+$"
},
"type": {
"type": "string",
"enum": [
"textbox",
"checkbox",
"radio",
"select",
"number",
"integer",
"data",
"section",
"text",
"fieldgroup"
]
},
"label": {
"type": "string"
},
"minValues": {
"description": "The minimal amount of values that needs to be present. Default is 0 when not provided. Set to >=1 to make the field required.",
"type": "integer",
"minimum": 0
},
"maxValues": {
"description": "The maximal amount of values that needs to be present. Default is 1 when not provided.",
"type": "integer",
"exclusiveMinimum": 0
},
"minMaxValuesMessage": {
"description": "The error message displayed when minValues or maxValues is not adhered to. When not provided a default message is generated.",
"type": "string"
},
"helpText": {
"type": "string"
},
"placeHolderText": {
"description": "An optional short hint (a word or short phrase) to aid the user when the field has no value.",
"type": "string"
},
"value": {
"description": "The value for the field. Can be an array for multi-value fields. For 'number' type values the exponent needs to be between -300 and +300 and max precision is 15. For 'integer' type values the value needs to between -100000000000000000 and 100000000000000000"
},
"minLength": {
"type": "integer",
"minimum": 0
},
"maxLength": {
"type": "integer",
"exclusiveMinimum": 0
},
"min": {
"description": "Minimal allowed value for 'integer' and 'number' type. Exponent needs to be between -300 and +300 and max precision is 15.",
"type": "number"
},
"max": {
"description": "Maximal allowed value for 'integer' and 'number' type. Exponent needs to be between -300 and +300 and max precision is 15.",
"type": "number"
},
"choices": {
"type": "array",
"items": {
"$ref": "#/definitions/ica_pipeline_input_form_field_choice"
}
},
"fields": {
"description": "The list of setting sub fields for type fieldgroup",
"type": "array",
"items": {
"$ref": "#/definitions/ica_pipeline_input_form_field"
}
},
"dataFilter": {
"description": "For defining the filtering when type is 'data'.",
"type": "object",
"additionalProperties": false,
"properties": {
"nameFilter": {
"description": "Optional data filename filter pattern that input files need to adhere to when type is 'data'. Eg parts of the expected filename",
"type": "string"
},
"dataFormat": {
"description": "Optional dataformat name array that input files need to adhere to when type is 'data'",
"type": "array",
"contains": {
"type": "string"
}
},
"dataType": {
"description": "Optional data type (file or directory) that input files need to adhere to when type is 'data'",
"type": "string",
"enum": [
"file",
"directory"
]
}
}
},
"regex": {
"type": "string"
},
"regexErrorMessage": {
"type": "string"
},
"hidden": {
"type": "boolean"
},
"disabled": {
"type": "boolean"
},
"emptyValuesAllowed": {
"type": "boolean",
"description": "When maxValues is greater than 1 and emptyValuesAllowed is true, the values may contain null entries. Default is false."
},
"updateRenderOnChange": {
"type": "boolean",
"description": "When true, the onRender javascript function is triggered ech time the user changes the value of this field. Default is false."
},
"streamable": {
"type": "boolean",
"description": "EXPERIMENTAL PARAMETER! Only possible for fields of type 'data'. When true, the data input files will be offered in streaming mode to the pipeline instead of downloading them."
},
"required": [
"id",
"type"
],
"allOf": [
{
"if": {
"description": "When type is 'textbox' then 'dataFilter', 'fields', 'choices', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"textbox"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"choices",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'checkbox' then 'dataFilter', 'fields', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"checkbox"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"choices",
"placeHolderText",
"regex",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'radio' then 'dataFilter', 'fields', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"radio"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"placeHolderText",
"regex",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'select' then 'dataFilter', 'fields', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"select"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"regex",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'number' or 'integer' then 'dataFilter', 'fields', 'choices', 'regex', 'regexErrorMessage', 'maxLength' and 'minLength' are not allowed",
"properties": {
"type": {
"enum": [
"number",
"integer"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"choices",
"regex",
"regexErrorMessage",
"maxLength",
"minLength"
]
}
}
}
},
{
"if": {
"description": "When type is 'data' then 'dataFilter' is required and 'fields', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"data"
]
}
},
"required": [
"type"
]
},
"then": {
"required": [
"dataFilter"
],
"propertyNames": {
"not": {
"enum": [
"fields",
"choices",
"placeHolderText",
"regex",
"regexErrorMessage",
"max",
"min",
"maxLength",
"minLength"
]
}
}
}
},
{
"if": {
"description": "When type is 'section' or 'text' then 'disabled', 'fields', 'updateRenderOnChange', 'classification', 'value', 'minValues', 'maxValues', 'minMaxValuesMessage', 'dataFilter', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"section",
"text"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"disabled",
"fields",
"updateRenderOnChange",
"classification",
"value",
"minValues",
"maxValues",
"minMaxValuesMessage",
"dataFilter",
"choices",
"regex",
"placeHolderText",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'fieldgroup' then 'fields' is required and then 'dataFilter', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' and 'emptyValuesAllowed' are not allowed",
"properties": {
"type": {
"enum": [
"fieldgroup"
]
}
},
"required": [
"type",
"fields"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"choices",
"placeHolderText",
"regex",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min",
"emptyValuesAllowed"
]
}
}
}
}
]
},
"ica_pipeline_input_form_field_choice": {
"$id": "#ica_pipeline_input_form_field_choice",
"type": "object",
"additionalProperties": false,
"properties": {
"value": {
"description": "The value which will be set when selecting this choice. Must be unique over the choices within a field"
},
"text": {
"description": "The display text for this choice, similar as the label of a field. ",
"type": "string"
},
"selected": {
"description": "Optional. When true, this choice value is picked as default selected value. As in selected=true has precedence over an eventual set field 'value'. For clarity it's better however not to use 'selected' but use field 'value' as is used to set default values for the other field types. Only maximum 1 choice may have selected true.",
"type": "boolean"
},
"disabled": {
"type": "boolean"
},
"parent": {
"description": "Value of the parent choice item. Can be used to build hierarchical choice trees."
}
},
"required": [
"value",
"text"
]
}
}
}
}
Name - Enter the name of the connector.
Status - This is automatically updated with the actual status, you do not need to enter anything here.
Debug Information Accessible by Illumina (optional) - Illumina support can request connector debugging information to help diagnose issues. For security reasons, support can only collect this data if the option Debug Information Accessible by Illumina is active. You can choose to either proactively enable this when encountering issues to speed up diagnosis or to only activate it when support requests access. You can at any time revoke access again by deselecting the option.
Description (optional) - Enter any additional information you want to show for this connector.
Mode (required) - Specify if the connector can upload data, download data, both or neither.
Operating system (required) - Select your server or computer operating system.
Add any upload or download rules. See Connector Rules below.
Select Save and download the connector (top right). An initialization key will be displayed in the platform now. Copy this value as it will be needed during installation.
Launch the installer after the download completes and follow the on-screen prompts to complete the installation, including entering the initialization key copied in the previous step. Do not install the connector in the upload folder as this will result in the connector attempting to upload itself and the associated log files.
Double click the downloaded .dmg file. Double click Illumina Service Connector in the window that opens to start the installer. Run through the installer, and fill out the initialization key when asked for it.
To start the connector once installed or after a reboot, open the app. You can find the app on the location where you installed it. The connector icon will appear in your dock when the app is running.
In the platform on the Connectivity page, you can check whether your local connector has been connected with the platform. This can take 60 seconds after you started your connector locally, and you may need to refresh the Connectivity page using the refresh button in the top right corner of the page to see the latest status of your connector.
The connector app needs to be closed to shut down your computer. You can do this from within your dock.
Installations require Java 11 or later. You can check this with ‘java –version’ from a command line terminal. With Java installed, you can run the installer from the command line using the command bash illumina_unix_develop.sh.
Depending on whether you have an X server running or not, it will display a UI, or follow a command line installation procedure. You can force a command line installation by adding a –c flag: bash illumina_unix_develop.sh -c.
The connector can be started by running ./illuminaserviceconnector start from the folder in which the connector was installed.
Description
Additional information about the upload rule.
Assign Format
Select which data format tag the uploaded files will receive. This is used for various things like filtering.
Data owner
The owner of the data after upload.
Project
The projects the rule applies to.
The user who runs the Illumina Service Connector process on the Linux machine needs to have read, write and execute permissions on the installation folder.
Default: /usr/local/illumina
Linux
Corrupted installation script
If you get the following error message “gzip: sfx_archive.tar.gz: not in gzip format. I am sorry, but the installer file seems to be corrupted. If you downloaded that file please try it again. If you transfer that file with ftp please make sure that you are using binary mode.” :
• This indicates the installation script file is corrupted. Editing the shell script will cause it to be corrupt. Please re-download the installation script from ICA.
Linux
Unsupported version error in log file
If the log file gives the following error "Unsupported major.minor version 52.0", an unsupported version of java is present. The connector makes use of java version 8 or 11.
Linux
Manage the connector via the CLI
• Connector installation issues:
It may be necessary to first make the connector installation script executable with:
chmod +x illumina_unix_develop.sh
Once it has been made executable, run the installation script with:
bash illumina_unix_develop.sh
It may be necessary to run with sudo depending on user permissions on the system:
sudo bash illumina_unix_develop.sh
If installing on a headless system, use the -c flag to do everythign from the command line:
bash illumina_unix_develop.sh -c
• Start connector with logging directly to the terminal stdout) (in case log file is not present, likely due to the absence of java version 8 or 11). From within the installation directory run:
./illuminaserviceconnector run
• Check status of connector. From within the install location run:
./illuminaserviceconnector status
• Stop the connector with:
./illuminaserviceconnector stop
• Restart the connector with:
Name
Name of the upload rule.
Active
Set to true to have this rule be active. This allows you to deactivate rules without deleting them.
Local folder
The folder path on the local machine where files to be uploaded are stored.
File pattern
Files with filenames that match the string/pattern will be uploaded.
Location
The location the data will be uploaded to.
Project
The project the data will be uploaded to.
Name
Name of the download rule.
Active
Set to true to have this rule be active. This allows you to deactivate rules without deleting them.
Order of execution
If using multiple download rules, set the order the rules are performed.
Target Local folder
The folder path on the local machine where the files will be downloaded to.
Description
Additional information about the download rule.
Format
The format the files must comply to in order to be scheduled as downloaded.
green
Connected/Active
orange
Pending installation
grey
Installed/Inactive
red
-
Windows
Service connector doesn't connect
First, try restarting your computer. If that doesn’t help, open the Services application (By clicking the Windows icon, and then typing services). In there, there should be a service called Illumina Service Connector.
• If it doesn’t have status Running, try starting it (right mouse click -> start)
• If it has status Running, and still does not connect, you might have a corporate proxy. Proxy configuration is currently not supported for the connector.
• If you do not have a corporate proxy, and your connector still doesn’t connect, contact Illumina Technical Support, and include your connector BSC.out log files.
OS X
Service connector doesn't connect
Check whether the Connector is running. If it is, there should be an Illumina icon in your Dock.
• If it doesn’t, log out and log back in. An Illumina service connector icon should appear in your dock.
• If it still doesn’t, try starting the Connector manually from the Launchpad menu.
• If it has status Running, and still does not connect, you might have a corporate proxy. Proxy configuration is currently not supported for the connector.
• If you do not have a corporate proxy, and your connector still doesn’t connect, contact Illumina Technical Support, and include your connector BSC.out log files.
Linux
Service connector doesn't connect
Check whether the connector process is running with:
ps aux
Linux
Can’t define java version for connector
Connector gets connected, but uploads won’t start
Create a new empty folder on your local disk, put a small file in there, and configure this folder as upload folder.
• If it works, and your sample files are on a shared drive, have a look at the Shared Drives section.
• If it works, and your sample files are on your local disk, there are a few possibilities:
a) There is an error in how the upload folder name is configured in the platform.
b) For big files, or on slow disks, the connector needs quite some time to start the transfer because it needs to calculate a hash to make sure there are no transfer errors. Wait up to 30 minutes, without changing anything to your Connector configuration.
• If this doesn’t work, you might have a corporate proxy. Proxy configuration is currently not supported for the connector.
Upload from shared drive does not work
Follow the guidelines in Shared Drives section.
Inspect the connector BSC.log file for any error messages regarding the folder not being found.
• If there is such a message, there are two options:
a) An issue with the the folder name, such as special characters and spaces. As a best practice, use only alphanumeric characters, underscores, dashes and periods.
b) A permissions issue. In this case, ensure the user running the connector has read & write access, without a password being requested, to the network share.
• If there are no messages indicating the folder cannot be found, it may be necessary to wait for some time until the integrity checks have been done. This check can take quite long on slow disks and slow networks.
Data Transfers are slow
Many factors can affect the speed:
• Distance from upload location to storage location
• Quality of the internet connection
• Hardlines are preferred over WiFi
• Restrictions for up- and download by the company or the provider.
These factors can change every time the customer switches from location (e.g. working from home).
The upload or download progress % goes down instead of up.
This is normal behavior. Instead of one continuous transmission, data is split into blocks so that whenever transmission issues occur, not all data has to be retried. This does result in dropping back to a lower % of transmission completed when retrying.
Connector setup
The connector makes use of java version 8 or 11. If you run the installer and get the following error “Please define INSTALL4J_JAVA_HOME to point to a suitable JVM.”:
• When downloading the correct java version from Oracle, there are 2 variables in the script that can be defined (INSTALL4J_JAVA_HOME_OVERRIDE & INSTALL4J_JAVA_PREFIX), but not INSTALL4J_JAVA_HOME, which is printed in the above error message. Instead, export the variable to your env before running the installation script.
You can export the variable to your env before running the script, like this:
• Note that Java home should not point to the java executable, but to the jre folder. For example:
export INSTALL4J_JAVA_HOME_OVERRIDE=/usr/lib/jvm/java-1.8.0-openjdk-amd64sh illumina _unix_1_13_2_0_35.sh
Select System Settings > Tool Repository > + Create.
Configure tool settings in the tool properties tabs. See Tool Properties.
Select Save.
The following sections describe the tool properties that can be configured in each tab.
Refer to the CWL CommandLineTool Specification for further explanation about many of the properties described below. Not all features described in the specification are supported.
Details Tab
Field
Entry
Name
The name of the tool.
Description
Free text description for information purposes.
Icon
The icon for the tool.
Status
The release of the tool.
Docker image
The registered Docker image for the tool.
Categories
One or more tags to categorize the tool. Select from existing tags or type a new tag name in the field.
Tool Status
The tool release status can be set to Draft, Release Candidate, Released or Deprecated.
The Building and Build Failed options are set by the application and not during configuration.
Status
Description
Draft
Fully editable draft.
Release Candidate
The tool is ready for release. Editing is locked but the tool can be cloned to create a new version.
Released
The tool is released. Tools in this state cannot be edited. Editing is locked but the tool can be cloned to create a new version.
Deprecated
The tool is no longer intended for use in pipelines. but there are no restrictions placed on the tool. That is, it can still be added to new pipelines and will continue to work in existing pipelines. It is merely an indication to the user that the tool should no longer be used.
General Tab
The General provides options to configure the basic command line.
Field
Entry
ID
CWL identifier field
CWL version
The CWL version in use. This field cannot be changed.
Base command
Components of the command. Each argument must be added in a separate line.
Standard out
The name of the file where the Standard Out (STDOUT) stream information will be stored.
Standard error
The name of the file where the Standard Error (STDERR) stream information will be stored.
Requirements
The requirements for triggering an error message. (see below)
The Hints/Requirements include CWL features to indicate capabilities expected in the Tool's execution environment.
Inline Javascript
The Tool contains a property with a JavaScript expression to resolve it's value.
Initial workdir
The workdir can be any of the following types:
String or Expression — A string or JavaScript expression, eg, $(inputs.InputFASTA)
File or Dir — A map of one or more files or directories, in the following format: {type: array, items: [File, Directory]}
Scatter feature — Indicates that the workflow platform must support the scatter and scatterMethod fields.
Arguments Tab
The Arguments tab provides options to configure base command parameters that do not require user input.
Tool arguments may be one of two types:
String or Expression — A literal string or JavaScript expression, eg --format=bam.
Binding — An argument constructed from the binding of an input parameter.
The following table describes the argument input fields.
Field
Entry
Type
Value
The literal string to be added to the base command.
String or expression
Position
The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added.
Binding
Prefix
The string prefix.
Binding
Item separator
The separator that is used between array values.
Binding
Example
Field
Value
Prefix
--output-filename
Value from
$(inputs.inputSAM.nameroot).bam
Input file
/tmp/storage/SRR45678_sorted.sam
Output file
SRR45678_sorted.bam
Inputs Tab
The Inputs tab provides options to define the input files and folders for the tool. The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.
Field
Entry
ID
The file ID.
Label
A short description of the input.
Description
A long description of the input.
Type
The input type, which can be either a file or a directory.
Input options
Optional indicates the input is optional.
Multi value indicates there is more than one input file or directory.
Streamable indicates the file is read or written sequentially without seeking.
Secondary files
The required secondary files or directories.
Settings Tab
The Settings tab provides options to define parameters that can be set at the time of execution. The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.
Field
Entry
ID
The file ID.
Label
A short description of the input.
Description
A long description of the input.
Type
The input type, which can be Boolean, Int, Long, Float, Double or String.
Default Value
The default value to use if the tool setting is not available.
Input options
Optional indicates the input is optional.
Multi value indicates there can be more than one value for the input.
Outputs Tab
The Outputs tab provides options to define the parameters of output files.
The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.
Field
Entry
ID
The file ID.
Label
A short description of the input.
Description
A long description of the input.
Type
The input type, which can be either a file or a directory.
Output options
Optional indicates the input is optional.
Multi value indicates here is more than one input file or directory.
Streamable indicates the file is read or written sequentially without seeking.
Secondary files
The required secondary files or folders.
Edit a Tool
As long as your tool is still in draft mode, you can edit it. Once released, you need to clone it to have an editable copy.
From the System Settings > Tool Repository page, select a tool.
Select Edit.
Update Tool Status
From the System Settings > Tool Repository page, select a tool.
Select the Information tab.
From the Status drop-down menu, select a status.
Select Save.
Creating definitions without the wizard
In addition to the interactive Tool builder, the platform GUI also supports working directly with the raw definition on the right hand side of the screen when developing a new Tool. This provides the ability to write the Tool definition manually or bring an existing Tool's definition to the platform.
Be careful when editing the raw tool definition as this can introduce errors.
A simple example CWL Tool definition is provided below.
After pasting into the editor, the definition is parsed and the other tabs for visually editing the Tool will populate according to the definition contents.
Creating Your First Tool - Tips and Tricks
General Tool - includes your base command and various optional configurations.
The base command is required for your tool to run, e.g. python /path/to/script.py such that python and /path/to/script.py are added in separate lines.
Inline Javascript requirement - must be enabled if you are using Javascript anywhere in your tool definition.
Initial workdir requirement - Dirent Type
Your tool must point to a script that executes your analysis. That script can either be provided in your Docker image or using a Dirent. Defining a script via Dirent allows you to dynamically modify your script without updating your Docker image. In order to define your Dirent script define your script name under Entry name (e.g. runner.sh) and the script content under Entry. Then, point your base command to that custom script, e.g. bash runner.sh.
The difference between Settings and Arguments: Settings are exposed at the pipeline level with the ability to get modified at launch, while Arguments are intended to be immutable and hidden from users launching the pipeline.
How to reference your tool inputs and settings throughout the tool definition?
You can either reference your inputs using their position or ID.
Settings can be referenced using their defined IDs, e.g. $(inputs.InputSetting)
File/Folder inputs can be referenced using their defined IDs, followed by the desired field, e.g. $(inputs.InputFile.path). For additional information please refer to the .
All inputs can also be referenced using their position, e.g. bash script.sh $1 $2
JSON-Based input forms
Introduction
Pipelines defined using the "Code" mode require an XML or JSON-based input form to define the fields shown on the launch view in the user interface (UI).
To create a JSON-based Nextflow (or CWL) pipeline, go to Projects > your_project > Flow > Pipelines > +Create > Nextflow (or CWL) > JSON-based.
Three files, located on the inputform files tab, work together for evaluating and presenting JSON-based input.
inputForm.json contains the actual input form which is rendered when starting the pipeline run.
onRender.js is triggered when a value is changed.
onSubmit.js is triggered when starting a pipeline via the GUI or API.
Use + Create to add additional files and Simulate to test your inputForms.
Scripting execution supports crossfield validation of the values, hiding fields, making them required, .... based on value changes.
inputForm.json
The JSON schema allowing you to define the input parameters. See the page for syntax details.
The inputForm.json file has a size limit of 10 MB and a maximum of 200 fields.
Parameter types
Type
Usage
Parameter Attributes
These attributes can be used to configure all parameter types.
Attribute
Purpose
Tree structure example
"choices" can be used for a single list or for a tree-structured list. See below for an example for how to set up a tree structure.
Experimental Features
Feature
onSubmit.js
The onSubmit.js javascript function receives an input object which holds information about the chosen values of the input form and the pipeline and pipeline execution request parameters. This javascript function is not only triggered when submitting a new pipeline execution request in the user interface, but also when submitting one through the rest API..
Input parameters
Value
Meaning
Return values (taken from the response object)
Value
Meaning
AnalysisError
This is the object used for representing validation errors.
Value
Meaning
onRender.js
Receives an input object which contains information about the current state of the input form, the chosen values and the field value change that triggered the onrender call. It also contains pipeline information. Changed objects are present in the onRender return value object. Any object not present is considered to be unmodified. Changing the storage size in the start analysis screen triggers an onRender execution with storageSize as changed field.
Input Parameters
Return values (taken from the response object)
Value
Meaning
RenderMessage
This is the object used for representing validation errors and warnings. The attributes can be used with first letter lowercase (consistent with the input object attributes) or uppercase.
Value
Meaning
API Beginner Guide
API Basics
Any operation from the ICA graphical user interface can also be performed with the API.
The following are some basic examples on how to use the API. These examples are based on using Python as programming language. For other languages, please see their native documentation on API usage.
Prerequisites
An installed copy of Python. (https://www.python.org/)
The package installer for python (pip) (https://pip.pypa.io/)
Having the python requests library installed (pip install requests)
Authenticating
One of the easiest authentication methods is by means of API keys. To generate an API key, refer to the section. This key is then used in your Python code to authenticate the API calls. It is best practice to regularly update your API keys.
API keys are valid for a single user, so any information you request is for that user to which the key belongs. For this reason, it is best practice to create a dedicated API user so you can manage the access rights for the API by managing those user rights.
API Reference
There is a dedicated where you can enter your API key and try out the different API commands and get an overview of the available parameters.
Converting curl to Python
The examples on the page use curl (Client URL) while Python uses Python requests. There are a number of online tools to automatically convert from curl to python.
To get the curl command,
Look up the endpoint you want to use on the API reference page.
Select Try it out.
Enter the necessary parameters.
Never paste your API authentication key into online tools when performing curl conversion as this poses a significant security risk.
In the most basic form, the curl command
curl my.curlcommand.com
becomes
You will see the following options in the curl commands on the page.
-H means header.
-X means the string is passed "as is" without interpretation.
becomes
Simple API Examples
Request a list of event codes
This is a straightforward request without parameters which can be used to to verify your connection.
In this example, the API key is part of your API call, which means you must update all API calls when the key changes. A better practice is to put this API key in the headers so it is easier to maintain. The full code then becomes
Pretty-printing the result
The list of application codes was returned as a single line, which makes it difficult to read, so let's pretty-print the result.
Retrieving a list of projects
Now that we are able to retrieve information with the API, we can use it for a more practical request like retrieving a list of projects. This API request can also take parameters.
Retrieve all projects
First, we pass the request without parameters to retrieve all projects.
Single parameter
The easiest way to pass a parameter is by appending it to the API request. The following API request will list the projects with a filter on CAT as user tag.
The dataId of the files and folders which you want to copy (their syntax is fil.hexadecimal_identifier and fol.hexadecimal_identifier). You can select a file or folder in the GUI and select it to see the Id (Projects > your_project > Data > your_file > Data details > Id) or you can use the /api/projects/{projectId}/data endpoint.
The destination project to which you want to copy the data.
The full code will then be as follows:
Combined API Example - Running a Pipeline
Now that we have done individual API requests, we can combine them and use the output of one request as input for the next request. When you want to run a pipeline, you need a number of input parameters. In order to obtain these parameters, you need to make a number of API calls first and use the returned results as part of your request to run the pipeline. In the examples below, we will build up the requests one by one so you can run them individually first to see how they work. These examples only follow the happy path to keep them as simple as possible. If you program them for a full project, remember to add error handling. You can also use the GUI to get all the parameters or write them down after performing the individual API calls in this section. Then, you can build your final API call with those values fixed.
Initialization
This block must be added at the beginning of your code
Look for a project in the list of Projects
Previously, we already requested a list of all projects, now we add a search parameter to look for a project called MyProject. (Replace MyProject with the name of the project you want to look for).
Now that we have found our project by name, we need to get the unique project id, which we will use in the combined requests. To get the id, we add the following line to the end of the code above.
Syntax ['items'][0]['id'] means we look for the items list, 0 means we take the first entry (as we presume our filter was accurate enough to only return the correct result and we don't have duplicate project names) and id means we take the data from the id field. Similarly, you can build other expressions to give you the data you want to see, such as ['items'][0]['urn'] to get the urn or ['items'][0]['tags']['userTags'] to get the list of user tags.
Once we have the identifier we need, we add it to a variable which we will call Project_Identifier in our examples.
Retrieve the Pipelines of your Project
Once we have the identifier of our project, we can fill it out in the request to list the pipelines which are part of our project.
This will give us all the available pipelines for that project. As we will only want to run a single pipeline, we can search for our pipeline, which in this example will be the basic_pipeline. Unfortunately, this API call has no direct search parameter, so when we get the list of pipelines, we will look for the id and store that in a variable which we will call Pipeline_Identifier in our examples as follows:
Find which parameters the Pipeline needs.
Once we know the project identifier and the pipeline identifier, we can create an API request to retrieve the list of input parameters which are needed for the pipeline. We will consider a simple pipeline which only needs a file as input. If your pipeline has more input parameters, you will need to set those as well.
Find the Storage Size to use for the analysis.
Here we will look for the id of the extra small storage size. This is done with the 0 in the My_API_Data['items'][0]['id']
Find the files to use as input for your pipeline.
Now we will look for a file "testExample" which we want to use as input and store the file id.
Start the Pipeline.
Finally, we can run the analysis with parameters filled out.
Dirent — A script in the working directory. The Entry name field specifies the file name.
Tool version
The version of the tool specified by the end user. Could be any string.
Release version
The version number of the tool.
Version comment
A description of changes in the updated version.
Links
External reference links.
Documentation
The Documentation field provides options for configuring the HTML description for the tool. The description appears in the Tool Repository but is excluded from exported CWL definitions.
Hints
The requirements for triggering a warning message. (see below)
Value from
The source string or JavaScript expression.
Binding
Separate
The setting to require the Prefix and Value from fields to be added as separate or combined arguments. Tru indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument.
Binding
Shell quote
The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually.
Binding
Format
The input file format.
Position
The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added.
Prefix
The string prefix.
Item separator
The separator that is used between array values.
Value from
The source string or JavaScript expression.
Load contents
The setting to require the Prefix and Value from fields to be added as separate or combined arguments. True indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument.
Separate
The setting to require the Prefix and Value from fields to be added as separate or combined arguments. True indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument.
Shell quote
The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually.
Position
The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added.
Prefix
The string prefix.
Item separator
The separator that is used between array values.
Value from
The source string or JavaScript expression.
Separate
The setting to require the Prefix and Value from fields to be added as separate or combined arguments. True indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument.
Shell quote
The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually.
Format
The input file format.
Globs
The pattern for searching file names.
Load contents
Automatically loads some contents. The system extracts up to the first 64 KiB of text from the file. Populates the contents field with the first 64 KiB of text from the file.
Output eval
Evaluate an expression to generate the output value.
For splitting up fields, to give structure. Rendered as subtitles. No values are to be assigned to these fields.
text
To display informational messages. No values are to be assigned to these fields.
fieldgroup
Can contain parameters or other groups. Allows to have repeating sets of parameters, for instance when a father|mother|child choice needs to be linked to each file input. So if you want to have the same elements multiple times in your form, combine them into a fieldgroup.
Does not support the emptyValuesAllowed attribute.
value
The value of the parameter. Can be considered default value.
minLength
Only applied on type="textbox". Value is a positive integer.
maxLength
Only applied on type="textbox". Value is a positive integer.
min
Minimal allowed value for 'integer' and 'number' type.
for 'integer' type fields the minimal and maximal values are -100000000000000000 and 100000000000000000.
for 'number' type fields the max precision is 15 significant digits and the exponent needs to be between -300 and +300.
max
Maximal allowed value for 'integer' and 'number' type.
for 'integer' type fields the minimal and maximal values are -100000000000000000 and 100000000000000000.
for 'number' type fields the max precision is 15 significant digits and the exponent needs to be between -300 and +300.
choices
A list of choices with for each a "value", "text" (is label), "selected" (only 1 true supported), "disabled". "parent" can be used to build hierarchical choicetrees. "availableWhen" can be used for conditional presence of the choice based on values of other fields. Parent and value must be unique, you can not use the same value for both.
fields
The list of sub fields for type fieldgroup.
dataFilter
For defining the filtering when type is 'data'. Use nameFilter for matching the name of the file, dataFormat for file format and dataType for selecting between files and directories. Tip: To see the data formats, open the file details in ICA and look at the Format on the data details. You can expand the dropdown list to see the syntax.
regex
The regex pattern the value must adhere to. Only applied on type="textbox".
regexErrorMessage
The optional error message when the value does not adhere to the "regex". A default message will be used if this parameter is not present. It is highly recommended to set this as the default message will show the regex which is typically very technical.
hidden
Makes this parameter hidden. Can be made visible later in onRender.js or can be used to set hardcoded values of which the user should be aware.
disabled
Shows the parameter but makes editing it impossible. The value can still be altered by onRender.js.
emptyValuesAllowed
When maxValues is 1 or not set and emptyValuesAllowed is true, the values may contain null entries. Default is false.
updateRenderOnChange
When true, the onRender javascript function is triggered each time the user changes the value of this field. Default is false.
dropValueWhenDisabled
When this is present and true and the field has disabled being true, then the value will be omitted during the submit handling (on the onSubmit result).
The input form json as saved in the pipeline. So the original json, without eventual changes.
currentAnalysisSettings
The current input form JSON as rendered to the user. This can contain already applied changes form earlier onRender passes. Null in the first call, when context is 'Initial' or when analysis is created through CLI/API.
The storage size as chosen by the user. This will initially be null. StorageSize is an object containing an 'id' and 'name' property.
storageSizeOptions
The list of storage sizes available to the user when creating an analysis. Is a list of StorageSize objects containing an 'id' and 'name' property.
textbox
Corresponds to stringType in xml.
checkbox
A checkbox that supports the option of being required, so can serve as an active consent feature. (corresponds to the booleanType in xml).
radio
A radio button group to select one from a list of choices. The values to choose from must be unique.
select
A dropdown selection to select one from a list of choices. This can be used for both single-level lists and tree-based lists.
number
The value is of Number type in javascript and Double type in java. (corresponds to doubleType in xml).
integer
Corresponds to java Integer.
label
The display label for this parameter. Optional but recommended, id will be used if missing.
minValues
The minimal amount of values that needs to be present. Default when not set is 0. Set to >=1 to make the field required.
maxValues
The maximal amount of values that need to be present. Default when not set is 1.
minMaxValuesMessage
The error message displayed when minValues or maxValues is not adhered to. When not set, a default message is generated.
helpText
A helper text about the parameter. Will be displayed in smaller font with the parameter.
placeHolderText
An optional short hint ( a word or short phrase) to aid the user when the field has no value.
Streamable inputs
Adding "streamable":true to an input field of type "data" makes it a streamable input.
settings
The value of the setting fields. Corresponds to settingValues in the onRender.js. This is a map with field id as key and an array of field values as value. For convenience, values of single-value fields are present as the individual value and not as an array of length 1. In case of fieldGroups, the value can be multiple levels of arrays. For fields of type data the values in the json are data ids (fil.xxxx). To help with validation, these are expanded and made available as an object here containing the id, name, path, format, size and a boolean indicating whether the data is external. This info can be used to validate or pick the chosen storageSize.
settingValues
To maximize the opportunity for reusing code between onRender and onSubmit, the 'settings' are also exposed as settingValues like in the onRender input.
pipeline
Info about the pipeline: code, tenant, version, and description are all available in the pipeline object as string.
analysis
Info about this run: userReference, userName, and userTenant are all available in the analysis object as string.
storageSize
The storage size as chosen by the user. This will initially be null. StorageSize is an object containing an 'id' and 'name' property.
storageSizeOptions
The list of storage sizes available to the user when creating an analysis. Is a list of StorageSize objects containing an 'id' and 'name' property.
settings
The value of the setting fields. This allows modifying the values or applying defaults and such. Or taking info of the pipeline or analysis input object. When settings are not present in the onSubmit return value object, they are assumed to be not modified.
validationErrors
A list of AnalysisError essages representing validation errors. Submitting a pipeline execution request is not possible while there are still validation errors.
analysisSettings
The input form json with potential applied changes. The discovered changes will be applied in the UI when viewing the analysis.
fieldId / FieldId
The field which has an erroneous value. When not present, a general error/warning is displayed. To display an error on the storage size, use the storageSizeFieldid.
index / Index
The 0-starting index of the value which is incorrect. Use this when a particular value of a multivalue field is not correct. When not present, the entire field is marked as erroneous. The value can also be an array of indexes for use with fieldgroups. For instance, when the 3rd field of the 2nd instance of a fieldgroup is erroneous, a value of [ 1 , 2 ] is used.
message / Message
The error/warning message to display.
context
"Initial"/"FieldChanged"/"Edited".
Initial is the value when first displaying the form when a user opens the start run screen.
The value is FieldChanged when a field with 'updateRenderOnChange'=true is changed by the user.
Edited (Not yet supported in ICA) is used when a form is displayed later again, this is intended for draft runs or when editing the form during reruns.
changedFieldId
The id of the field that changed and which triggered this onRender call. context will be FieldChanged. When the storage size is changed, the fieldId will be storageSize.
analysisSettings
The input form json as saved in the pipeline. This is the original json, without changes.
currentAnalysisSettings
The current input form json as rendered to the user. This can contain already applied changes form earlier onRender passes. Null in the first call, when context is Initial.
settingValues
The current value of all settings fields. This is a map with field id as key and an array of field values as value for multivalue fields. For convenience, values of single-value fields are present as the individual value and not as an array of length 1. In case of fieldGroups, the value can be multiple levels of arrays. For fields of type data the values in the json are data ids (fil.xxxx). To help with validation, these are expanded and made available as an object here containing the id, name, path, format, size and a boolean indicating whether the data is external. This info can be used to validate or pick the chosen storageSize.
pipeline
Information about the pipeline: code, tenant, version, and description are all available in the pipeline object as string.
analysis
Information about this run: userReference, userName, and userTenant are all available in the analysis object as string.
analysisSettings
The input form json with potential applied changes. The discovered changes will be applied in the UI.
settingValues
The current, potentially altered map of all setting values. These will be updated in the UI.
validationErrors
A list of RenderMessages representing validation errors. Submitting a pipeline execution request is not possible while there are still validation errors.
validationWarnings
A list of RenderMessages representing validation warnings. A user may choose to ignore these validation warnings and start the pipeline execution request.
storageSize
The suitable value for storageSize. Must be one of the options of input.storageSizeOptions. When absent or null, it is ignored.
validation errors and validation warnings can use 'storageSize' as fieldId to let an error appear on the storage size field. 'storageSize' is the value of the changedFieldId when the user alters the chosen storage size.
fieldId / FieldId
The field which has an erroneous value. When not present, a general error/warning is displayed. To display an error on the storage size, use the storageSizeFieldid.
index / Index
The 0-starting index of the value which is incorrect. Use this when a particular value of a multivalue field is not correct. When not present, the entire field is marked as erroneous. The value can also be an array of indexes for use with fieldgroups. For instance, when the 3rd field of the 2nd instance of a fieldgroup is erroneous, a value of [ 1 , 2 ] is used.
# The requests library will allow you to make HTTP requests.
import requests
# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
'X-API-Key': '<your_generated_API_key>',
}
# Store the API request in response.
response = requests.get('https://ica.illumina.com/ica/rest/api/eventCodes', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Response status code: ", response.status_code)
# Display the data from the request.
print(response.json())
# The requests library will allow you to make HTTP requests.
import requests
# JSON will allow us to format and interpret the output.
import json
# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
'X-API-Key': '<your_generated_API_key>',
}
# Store the API request in response.
response = requests.get('https://ica.illumina.com/ica/rest/api/eventCodes', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Print JSON data in readable format with indentation and sorting.
print(json.dumps(My_API_Data, indent=3, sort_keys=True))
# The requests library will allow you to make HTTP requests.
import requests
# JSON will allow us to format and interpret the output.
import json
# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
'X-API-Key': '<your_generated_API_key>',
}
# Store the API request in response.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Print JSON data in readable format with indentation and sorting.
print(json.dumps(My_API_Data, indent=3, sort_keys=True))
# The requests library will allow you to make HTTP requests.
import requests
# Fill out your generated API key.
headers = {
'accept': 'application/vnd.illumina.v3+json',
'X-API-Key': '<your_generated_API_key>',
'Content-Type': 'application/vnd.illumina.v3+json',
}
# Enter the files and folders, the destination folder, and the action to perform when the destination data already exists.
data = '{"items": [{"dataId": "fil.0123456789abcdef"}, {"dataId": "fil.735040537abcdef"}], "destinationFolderId": "fol.1234567890abcdef", "copyUserTags": true,"copyTechnicalTags": true,"copyInstrumentInfo": true,"actionOnExist": "SKIP"}'
# Replace <Project_Identifier> with the actual identifier of the destination project.
response = requests.post(
'https://ica.illumina.com/ica/rest/api/projects/**<Project_Identifier>**/dataCopyBatch',
headers=headers,
data=data,
)
# Display the response status code.
print("Response status code: ", response.status_code)
# The requests library will allow you to make HTTP requests.
import requests
# JSON will allow us to format and interpret the output.
import json
# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
'X-API-Key': '<your_generated_API_key>',
}
# Store the API request in response. Here we look for a project called "MyProject".
response = requests.get('https://ica.illumina.com/ica/rest/api/projects?search=MyProject', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Print JSON data in readable format with indentation and sorting.
print(json.dumps(My_API_Data, indent=3, sort_keys=True))
print(My_API_Data['items'][0]['id'])
# Get the project identifier.
Project_Identifier = My_API_Data['items'][0]['id']
# Find Pipeline
# Store the API request in response. Here we look for the list of pipelines in MyProject.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/pipelines', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find Pipeline Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Store the list of pipelines for further processing.
pipelineslist = json.dumps(My_API_Data)
# Set "basic_pipeline" as the pipeline to search for. Replace this with your target pipeline.
target_pipeline = "basic_pipeline"
found_pipeline = None
# Look for the code to match basic_pipeline and store the ID.
for item in My_API_Data['items']:
if 'pipeline' in item and item['pipeline'].get('code') == target_pipeline:
found_pipeline = item['pipeline']
Pipeline_Identifier = found_pipeline['id']
break
print("Pipeline Identifier: " + Pipeline_Identifier)
# Find Parameters
# Store the API request in response. Here we look for the Parameters in basic_pipeline
response = requests.get('https://ica.illumina.com/ica/rest/api/pipelines/'+(Pipeline_Identifier)+'/inputParameters', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find Parameters Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Get the parameters and store in the Parameters variable.
Parameters = My_API_Data['items'][0]['code']
print("Parameters: ",Parameters)
# Store the API request in response. Here we look for the analysis storage size.
response = requests.get('https://ica.illumina.com/ica/rest/api/analysisStorages', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find analysisStorages Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Get the storage size. We will select extra small.
Storage_Size = My_API_Data['items'][0]['id']
print("Storage_Size: ",Storage_Size)
# Get Input File
# Store the API request in response. Here we look for the Files testExample.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/data?fullText=testExample', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find input file Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data
My_API_Data = response.json()
# Get the first file ID.
InputFile = My_API_Data['items'][0]['data']['id']
print("InputFile id: ",InputFile)
# The requests library will allow you to make HTTP requests.
import requests
# JSON will allow us to format and interpret the output.
import json
# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
'X-API-Key': '<your_generated_API_key>',
}
# Replace <your_generated_API_key> with your actual generated API key here.
Postheaders = {
'accept': 'application/vnd.illumina.v4+json',
'X-API-Key': '<your_generated_API_key>',
'Content-Type': 'application/vnd.illumina.v4+json',
}
# Find project
# Store the API request in response. Here we look for a project called "MyProject".
response = requests.get('https://ica.illumina.com/ica/rest/api/projects?search=MyProject', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find Project response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Get the project identifier.
Project_Identifier = My_API_Data['items'][0]['id']
print("Project_Identifier: ",Project_Identifier)
# Find Pipeline
# Store the API request in response. Here we look for the list of pipelines in MyProject.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/pipelines', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find Pipeline Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Store the list of pipelines for further processing.
pipelineslist = json.dumps(My_API_Data)
# Set "basic_pipeline" as the pipeline to search for. Replace this with your target pipeline.
target_pipeline = "basic_pipeline"
found_pipeline = None
# Look for the code to match basic_pipeline and store the ID.
for item in My_API_Data['items']:
if 'pipeline' in item and item['pipeline'].get('code') == target_pipeline:
found_pipeline = item['pipeline']
Pipeline_Identifier = found_pipeline['id']
break
print("Pipeline Identifier: " + Pipeline_Identifier)
# Find Parameters
# Store the API request in response. Here we look for the Parameters in basic_pipeline.
response = requests.get('https://ica.illumina.com/ica/rest/api/pipelines/'+(Pipeline_Identifier)+'/inputParameters', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find Parameters Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Get the parameters and store in the Parameters variable.
Parameters = My_API_Data['items'][0]['code']
print("Parameters: ",Parameters)
# Get Storage Size
# Store the API request in response. Here we look for the analysis storage size.
response = requests.get('https://ica.illumina.com/ica/rest/api/analysisStorages', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find analysisStorages Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Get the storage size. We will select extra small.
Storage_Size = My_API_Data['items'][0]['id']
print("Storage_Size: ",Storage_Size)
# Get Input File
# Store the API request in response. Here we look for the Files testExample.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/data?fullText=testExample', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find input file Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Get the first file ID.
InputFile = My_API_Data['items'][0]['data']['id']
print("InputFile id: ",InputFile)
# Finally, we can run the analysis with parameters filled out.
data = '{"userReference":"api_example","pipelineId":"'+(Pipeline_Identifier)+'","tags":{"technicalTags":[],"userTags":[],"referenceTags":[]},"analysisStorageId":"'+(Storage_Size)+'","analysisInput":{"inputs":[{"parameterCode":"'+(Parameters)+'","dataIds":["'+(InputFile)+'"]}]}}'
print (data)
response = requests.post('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/analysis:nextflow',headers=Postheaders,data=data,)
print("Post Response status code: ", response.status_code)
./illuminaserviceconnector restart
Notifications
Notifications (Projects > your_project > Project Settings > Notifications ) are events to which you can subscribe. When they are triggered, they deliver a message to an external target system such as emails, Amazon SQS or SNS systems or HTTP post requests. The following table describes available system events to which you can subscribe:
Description
Code
Details
Payload
Analysis failure
ICA_EXEC_001
Emitted when an analysis fails
Analysis
Analysis success
When you subscribe to overlapping event codes such as ICA_EXEC_002 (analysis success) and ICA_EXEC_028 (analysis status change) you will get both notifications when analysis success occurs.
When integrating with external systems, it is advised to not solely rely on ICA notifications, but to also add a polling system to check the status of long-running tasks. For example verifying the status of long-running (>24h) analyses with a 12 hour interval.
Delivery Targets
Event notifications can be delivered to the following delivery targets:
Delivery Target
Description
Value
Subscribing to Notifications
To create a subscription via the GUI, select Projects > your_project > Project Settings > Notifications > +Create > ICA event. Select an event from the dropdown menu and fill out the requested fields, depending on the selected delivery targets, the fields will change.
Once created, you can disable, enable or delete the notification subscriptions at Projects > your_project > Project Settings > Notifications.
Subscriptions can only be deleted if there are no failed or pending notifications, so if the delete button is not available, look at the failed notifications details of the subscription. Projects > your_project > Project Settings > Notifications > your_notification > Delivery failed tab. From there, either reprocess or delete the failed notification so you can delete the notification subscription.
Amazon Resource Policy Settings
In order to allow the platform to deliver events to Amazon SQS or SNS delivery targets, a cross-account policy needs to be added to the target Amazon service.
Substitute the variables in the example above according to the table below.
Variable
Description
See examples for setting policies in and .
Amazon SNS Topic
To create a subscription to deliver events to an Amazon SNS topic, you can use either the GUI or API endpoints.
GUI
To create a subscription via the GUI, select Projects > your_project > Project Settings > Notifications > +Create > ICA event. Select an event from the dropdown menu, insert optional filter, select the channel type (SNS), and then insert the ARN from the target SNS topic and the AWS region.
API
To create a subscription via API, use the endpoint /api/notificationChannel to create a channel and then /api/projects/{projectId}/notificationSubscriptions to create a notification subscription.
Amazon SQS Queue
To create a subscription to deliver events to an Amazon SQS queue, you can use either GUI or API endpoints.
GUI
To create a subscription via the GUI, select Projects > your_project > Project Settings > Notifications > +Create > ICA event.
Select an event from the dropdown menu
Choose SQS as the way to receive the notifications and enter your SQS URL.
Depending on the event, you can choose a payload version. Not all payload versions are applicable for all events and targets, so the system will filter the options out for you.
API
To create a subscription via API, use the endpoint /api/notificationChannel to create a channel and then /api/projects/{projectId}/notificationSubscriptions to create a notification subscription.
Messages delivered to AWS SQS contain the following event body attributes:
Attribute
Description
The following example is a Data Updated event payload sent to an AWS SQS delivery target (condensed for readability):
Filtering
Notification subscriptions will trigger for all events matching the configured event type. A filter may be configured on a subscription to limit the matching strategy to only those event payloads which match the filter.
The filter expressions leverage the library for describing the matching pattern to be applied to event payloads. The filter must be in the format [?(<expression>)].
Examples
The Analysis Success event delivers a JSON event payload matching the Analysis data model (as output from the API to ).
The below examples demonstrate various filters operating on the above event payload:
Filter on a pipeline, with a code that starts with ‘Copy’. You’ll need a regex expression for this:
[?($.pipeline.code =~ /Copy.*/)]
Filter on status (note that the Analysis success event is only emitted when the analysis is successful):
[?($.status == 'SUCCEEDED')]
Examples for other events
Filtering ICA_DATA_104 on owning project name. The top level keys on which you can filter are under the payload key, so payload is not included in this filter expression.
Custom events let you trigger notification subscriptions using event that are not part of the system-defined event types. When creating a custom subscription, a custom event code can be specified to use within the project. Events can then be sent to the specified event code using a POST API with the request body specifying the event payload.
API
Custom events can be defined using the API. To create a custom event for your project, follow the steps below:
Create a new custom event POST {ICA_URL}/ica/rest/api/projects/{projectId}/customEvents
a. Your custom event code must be 1-20 characters long, e.g. 'ICA_CUSTOM_123'.
b. This event code will be used to reference that custom event type.
Create a new notification channel POST {ICA_URL}/ica/rest/api/notificationChannels
a. If there already is a notification channel with the desired configuration within the same project, you can get the existing channel ID using the call GET {ICA_URL}/ica/rest/api/notificationChannels.
GUI
To create a subscription via the GUI, select Projects > your_project > Project Settings > Notifications > +Create > Custom event.
Once the steps above have been completed successfully, the call from the first step POST {ICA_URL}/ica/rest/api/projects/{projectId}/customEvents could be reused with the same event code to continue sending events through the same channel and subscription.
Below is a sample Python function used inside an ICA pipeline to post custom events for each failed metric:
Finally, you can enter a filter expression to get only those events are relevant for you. Only those events matching the expression will be received.
Both payload Version V3 and V4 guarantee the presence of the final state (SUCCEEDED, FAILED, FAILED_FINAL, ABORTED) but depending on the flow (so not every intermediate state is guaranteed):
V3 can have status REQUESTED - IN_PROGRESS - SUCCEEDED
V4 can have status REQUESTED - QUEUED - INITIALIZING - PREPARING_INPUTS - IN_PROGRESS - GENERATING_OUTPUTS - SUCCEEDED
Filter on pipeline, having a technical tag “Demo":
[?('Demo' in $.pipeline.pipelineTags.technicalTags)]
Combination of multiple expressions using &&. It's best practice to surround each individual expression with parentheses:
Create a notification subscription POST {ICA_URL}/ica/rest/api/projects/{projectId}/customNotificationSubscriptions.
a. Use the event code created in step 1.
b. Use the channel ID from step 2.
ICA_EXEC_002
Emitted when an analysis succeeds
Analysis
Analysis aborted
ICA_EXEC_027
Emitted when an analysis is aborted either by the system or the user
Analysis
Analysis status change
ICA_EXEC_028
Emitted when an state transition on an analysis occurs
Analysis
Base Job failure
ICA_BASE_001
Emitted when a Base job fails
BaseJob
Base Job success
ICA_BASE_002
Emitted when a Base job succeeds
BaseJob
Data transfer success
ICA_DATA_002
Emitted when a data transfer is marked as Succeeded
DataTransfer
Data transfer stalled
ICA_DATA_025
Emitted when data transfer hasn't progressed in the past 2 minutes
DataTransfer
Data <action>
ICA_DATA_100
Subscribing to this serves as a wildcard for all project data status changes and covers those changes that have no separate code. This does not include DataTransfer events or changes that trigger no data status changes such as adding tags to data.
ProjectData
Data linked to project
ICA_DATA_104
Emitted when a file is linked to a project
ProjectData
Data can not be created in non-indexed folder
ICA_DATA_105
Emitted when attempting to create data in a non-indexed folder
ProjectData
Data deleted
ICA_DATA_106
Emitted when data is deleted
ProjectData
Data created
ICA_DATA_107
Emitted when data is created
ProjectData
Data uploaded
ICA_DATA_108
Emitted when data is uploaded
ProjectData
Data updated
ICA_DATA_109
Emitted when data is updated
ProjectData
Data archived
ICA_DATA_110
Emitted when data is archived
ProjectData
Data unarchived
ICA_DATA_114
Emitted when data is unarchived
ProjectData
Job status changed
ICA_JOB_001
Emitted when a job changes status (INITIALIZED, WAITING_FOR_RESOURCES, RUNNING, STOPPED, SUCCEEDED, PARTIALLY_SUCCEEDED, FAILED)
JobId
Sample completed
ICA_SMP_002
Emitted when a sample is marked as completed
ProjectSample
Sample linked to a project
ICA_SMP_003
Emitted when a sample is linked to a project
ProjectSample
Workflow session start
ICA_WFS_001
Emitted when workflow is started
WorkflowSession
Workflow session failure
ICA_WFS_002
Emitted when workflow fails
WorkflowSession
Workflow session success
ICA_WFS_003
Emitted when workflow succeeds
WorkflowSession
Workflow session aborted
ICA_WFS_004
Emitted when workflow is aborted
WorkflowSession
Mail
E-mail delivery
E-mail Address
Sqs
AWS SQS Queue
AWS SQS Queue URL
Sns
AWS SNS Topic
AWS SNS Topic ARN
Http
Webhook (POST request)
platform_aws_account
The platform AWS account ID: 079623148045
action
For SNS use SNS:Publish. For SQS, use SQS:SendMessage
arn
The Amazon Resource Name (ARN) of the target SNS topic or SQS queue
def post_custom_event(metric_name: str, metric_value: str, threshold: str, sample_name: str):
api_url = f"{ICA_HOST}/api/projects/{PROJECT_ID}/customEvents"
headers = {
"Content-Type": "application/vnd.illumina.v3+json",
"accept": "application/vnd.illumina.v3+json",
"X-API-Key": f"{ICA_API_KEY}"
}
content = {\"code\": \"ICA_CUSTOM_123\", \"content\": { \"metric_name\": metric_name, \"metric_value\": metric_value,\"threshold\": threshold, \"sample_name\": sample_name}}
json_data = json.dumps(content)
response = requests.post(api_url, data=json_data, headers=headers)
if response.status_code != 204:
print(f"[EVENT-ERROR] Could not post metric failure event for the metric {metric_name} (sample {sample_name}).")
Bench Command Line Interface
Command Index
The following is a list of available bench CLI commands and thier options.
Please refer to the examples from the Illumina website for more details.
workspace-ctl
workspace-ctl completion
workspace-ctl compute
workspace-ctl compute get-cluster-details
workspace-ctl compute get-logs
workspace-ctl compute get-pools
workspace-ctl compute scale-pool
workspace-ctl data
workspace-ctl data create-mount
workspace-ctl data delete-mount
workspace-ctl data get-mounts
workspace-ctl help
workspace-ctl help completion
workspace-ctl help compute
workspace-ctl help compute get-cluster-details
workspace-ctl help compute get-logs
workspace-ctl help compute get-pools
workspace-ctl help compute scale-pool
workspace-ctl help data
workspace-ctl help data create-mount
workspace-ctl help data delete-mount
workspace-ctl help data get-mounts
workspace-ctl help help
workspace-ctl help software
workspace-ctl help software get-server-metadata
workspace-ctl help software get-software-settings
workspace-ctl help workspace
workspace-ctl help workspace get-cluster-settings
workspace-ctl help workspace get-connection-details
workspace-ctl help workspace get-workspace-settings
workspace-ctl software
workspace-ctl software get-server-metadata
workspace-ctl software get-software-settings
workspace-ctl workspace
workspace-ctl workspace get-cluster-settings
workspace-ctl workspace get-connection-details
workspace-ctl workspace get-workspace-settings
Usage:
workspace-ctl [flags]
workspace-ctl [command]
Available Commands:
completion Generate completion script
compute
data
help Help about any command
software
workspace
Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
-h, --help help for workspace-ctl
--help-tree
--help-verbose
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Use "workspace-ctl [command] --help" for more information about a command.
cmd execute error: accepts 1 arg(s), received 0
Usage:
workspace-ctl compute [flags]
workspace-ctl compute [command]
Available Commands:
get-cluster-details
get-logs
get-pools
scale-pool
Flags:
-h, --help help for compute
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Use "workspace-ctl compute [command] --help" for more information about a command.
Usage:
workspace-ctl compute get-cluster-details [flags]
Flags:
-h, --help help for get-cluster-details
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Usage:
workspace-ctl compute get-logs [flags]
Flags:
-h, --help help for get-logs
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Usage:
workspace-ctl compute get-pools [flags]
Flags:
--cluster-id string Required. Cluster ID
-h, --help help for get-pools
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Usage:
workspace-ctl compute scale-pool [flags]
Flags:
--cluster-id string Required. Cluster ID
-h, --help help for scale-pool
--help-tree
--help-verbose
--pool-id string Required. Pool ID
--pool-member-count int Required. New pool size
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Usage:
workspace-ctl data [flags]
workspace-ctl data [command]
Available Commands:
create-mount Create a data mount under /data/mounts. Return newly created mount.
delete-mount Delete a data mount
get-mounts Returns the list of data mounts
Flags:
-h, --help help for data
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Use "workspace-ctl data [command] --help" for more information about a command.
Create a data mount under /data/mounts. Return newly created mount.
Usage:
workspace-ctl data create-mount [flags]
Aliases:
create-mount, mount
Flags:
-h, --help help for create-mount
--help-tree Display commands as a tree
--help-verbose Extended help topics and options
--mode string Enum:["read-only","read-write"]. Mount mode i.e. read-only, read-write
--mount-path string Where to mount the data, e.g. /data/mounts/hg38data (or simply hg38data)
--source string Required. Source data location, e.g. /data/project/myData/hg38 or fol.bc53010dec124817f6fd08da4cf3c48a (ICA folder id)
--wait Wait for new mount to be available on all nodes before sending response
--wait-timeout int Max number of seconds for wait option. Absolute max: 300 (default 300)
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Delete a data mount
Usage:
workspace-ctl data delete-mount [flags]
Aliases:
delete-mount, unmount
Flags:
-h, --help help for delete-mount
--help-tree
--help-verbose
--id string Id of mount to remove
--mount-path string Path of mount to remove
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Returns the list of data mounts
Usage:
workspace-ctl data get-mounts [flags]
Aliases:
get-mounts, list-mounts
Flags:
-h, --help help for get-mounts
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Usage:
workspace-ctl [flags]
workspace-ctl [command]
Available Commands:
completion Generate completion script
compute
data
help Help about any command
software
workspace
Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
-h, --help help for workspace-ctl
--help-tree
--help-verbose
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Use "workspace-ctl [command] --help" for more information about a command.
To load completions:
Bash:
$ source <(yourprogram completion bash)
# To load completions for each session, execute once:
# Linux:
$ yourprogram completion bash > /etc/bash_completion.d/yourprogram
# macOS:
$ yourprogram completion bash > /usr/local/etc/bash_completion.d/yourprogram
Zsh:
# If shell completion is not already enabled in your environment,
# you will need to enable it. You can execute the following once:
$ echo "autoload -U compinit; compinit" >> ~/.zshrc
# To load completions for each session, execute once:
$ yourprogram completion zsh > "${fpath[1]}/_yourprogram"
# You will need to start a new shell for this setup to take effect.
fish:
$ yourprogram completion fish | source
# To load completions for each session, execute once:
$ yourprogram completion fish > ~/.config/fish/completions/yourprogram.fish
PowerShell:
PS> yourprogram completion powershell | Out-String | Invoke-Expression
# To load completions for every new session, run:
PS> yourprogram completion powershell > yourprogram.ps1
# and source this file from your PowerShell profile.
Usage:
workspace-ctl completion [bash|zsh|fish|powershell]
Flags:
-h, --help help for completion
Usage:
workspace-ctl compute [flags]
workspace-ctl compute [command]
Available Commands:
get-cluster-details
get-logs
get-pools
scale-pool
Flags:
-h, --help help for compute
--help-tree
--help-verbose
Use "workspace-ctl compute [command] --help" for more information about a command.
Usage:
workspace-ctl compute get-cluster-details [flags]
Flags:
-h, --help help for get-cluster-details
--help-tree
--help-verbose
Usage:
workspace-ctl compute get-logs [flags]
Flags:
-h, --help help for get-logs
--help-tree
--help-verbose
Usage:
workspace-ctl compute get-pools [flags]
Flags:
--cluster-id string Required. Cluster ID
-h, --help help for get-pools
--help-tree
--help-verbose
Usage:
workspace-ctl compute scale-pool [flags]
Flags:
--cluster-id string Required. Cluster ID
-h, --help help for scale-pool
--help-tree
--help-verbose
--pool-id string Required. Pool ID
--pool-member-count int Required. New pool size
Usage:
workspace-ctl data [flags]
workspace-ctl data [command]
Available Commands:
create-mount Create a data mount under /data/mounts. Return newly created mount.
delete-mount Delete a data mount
get-mounts Returns the list of data mounts
Flags:
-h, --help help for data
--help-tree
--help-verbose
Use "workspace-ctl data [command] --help" for more information about a command.
Create a data mount under /data/mounts. Return newly created mount.
Usage:
workspace-ctl data create-mount [flags]
Aliases:
create-mount, mount
Flags:
-h, --help help for create-mount
--help-tree
--help-verbose
--mount-path string Where to mount the data, e.g. /data/mounts/hg38data (or simply hg38data)
--source string Required. Source data location, e.g. /data/project/myData/hg38 or fol.bc53010dec124817f6fd08da4cf3c48a (ICA folder id)
--wait Wait for new mount to be available on all nodes before sending response
--wait-timeout int Max number of seconds for wait option. Absolute max: 300 (default 300)
Delete a data mount
Usage:
workspace-ctl data delete-mount [flags]
Aliases:
delete-mount, unmount
Flags:
-h, --help help for delete-mount
--help-tree
--help-verbose
--id string Id of mount to remove
--mount-path string Path of mount to remove
Returns the list of data mounts
Usage:
workspace-ctl data get-mounts [flags]
Aliases:
get-mounts, list-mounts
Flags:
-h, --help help for get-mounts
--help-tree
--help-verbose
Help provides help for any command in the application.
Simply type workspace-ctl help [path to command] for full details.
Usage:
workspace-ctl help [command] [flags]
Flags:
-h, --help help for help
Usage:
workspace-ctl software [flags]
workspace-ctl software [command]
Available Commands:
get-server-metadata
get-software-settings
Flags:
-h, --help help for software
--help-tree
--help-verbose
Use "workspace-ctl software [command] --help" for more information about a command.
Usage:
workspace-ctl software get-server-metadata [flags]
Flags:
-h, --help help for get-server-metadata
--help-tree
--help-verbose
Usage:
workspace-ctl software get-software-settings [flags]
Flags:
-h, --help help for get-software-settings
--help-tree
--help-verbose
Usage:
workspace-ctl workspace [flags]
workspace-ctl workspace [command]
Available Commands:
get-cluster-settings
get-connection-details
get-workspace-settings
Flags:
-h, --help help for workspace
--help-tree
--help-verbose
Use "workspace-ctl workspace [command] --help" for more information about a command.
Usage:
workspace-ctl workspace get-cluster-settings [flags]
Flags:
-h, --help help for get-cluster-settings
--help-tree
--help-verbose
Usage:
workspace-ctl workspace get-connection-details [flags]
Flags:
-h, --help help for get-connection-details
--help-tree
--help-verbose
Usage:
workspace-ctl workspace get-workspace-settings [flags]
Flags:
-h, --help help for get-workspace-settings
--help-tree
--help-verbose
Usage:
workspace-ctl software [flags]
workspace-ctl software [command]
Available Commands:
get-server-metadata
get-software-settings
Flags:
-h, --help help for software
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Use "workspace-ctl software [command] --help" for more information about a command.
Usage:
workspace-ctl software get-server-metadata [flags]
Flags:
-h, --help help for get-server-metadata
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Usage:
workspace-ctl software get-software-settings [flags]
Flags:
-h, --help help for get-software-settings
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Usage:
workspace-ctl workspace [flags]
workspace-ctl workspace [command]
Available Commands:
get-cluster-settings
get-connection-details
get-workspace-settings
Flags:
-h, --help help for workspace
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Use "workspace-ctl workspace [command] --help" for more information about a command.
Usage:
workspace-ctl workspace get-cluster-settings [flags]
Flags:
-h, --help help for get-cluster-settings
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Usage:
workspace-ctl workspace get-connection-details [flags]
Flags:
-h, --help help for get-connection-details
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Usage:
workspace-ctl workspace get-workspace-settings [flags]
Flags:
-h, --help help for get-workspace-settings
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Connect AWS S3 Bucket
You can use your own S3 bucket with Illumina Connected Analytics (ICA) for data storage. This section describes how to configure your AWS account to allow ICA to connect to an S3 bucket.
These instructions utilize the AWS CLI. Follow the AWS CLI documentation for instructions to download and install.
When configuring a new project in ICA to use a preconfigured S3 bucket, create a folder on your S3 bucket in the AWS console. This folder will be connected to ICA as a prefix.
Failure to create a folder will result in the root folder of your S3 bucket being assigned which will block your S3 bucket from being used for other ICA projects with the error "Conflict while updating file/folder. Please try again later."
Because of how and does not send events for S3 folders, the following restrictions must be taken into account for ICA project data stored in S3.
When creating an empty folder in S3, it will not be visible in ICA.
Prerequisites
The AWS S3 bucket must exist in the same AWS region as the ICA project. Refer to the table below for a mapping of ICA project regions to AWS regions:
ICA Project Region
AWS Region
(*) BSSH is not currently deployed on the South Korea instance, resulting in limited functionality in this region with regard to sequencer integration.
You can use unversioned, versioned and suspendedbuckets as own S3 storage.
If you connect buckets with object versioning, the data in ICA will be automatically synced with the data in objectstore. When an object is deleted without specifying a particular version, a Delete marker is created on the objectstore to indicate that the object has been deleted. ICA will reflect the object state by deleting the record from the database. No further action on your side is needed to sync.
You can enable SSE using an Amazon S3-managed key (SSE-S3). Instructions for using KMS-managed (SSE-KMS) keys are found .
Configuration
1 - Configure Bucket CORS Permission
ICA requires cross-origin resource sharing (CORS) permissions to write to the S3 bucket for uploads via the browser. Refer to (expand the "Using the S3 console" section) documentation for instructions on enabling CORS via the AWS Management Console.
In the cross-origin resource sharing (CORS) section, enter the following content.
2 - Create Data Access Permission - AWS IAM Policy
ICA requires specific permissions to access data in an AWS S3 bucket. These permissions are contained in an AWS IAM Policy.
Permissions
Refer to the documentation for instructions on creating an AWS IAM Policy via the AWS Management Console. Use the following configuration during the process:
paste the JSON policy document below. Note the example below provides access to all objects prefixes in the bucket.
Replace YOUR_BUCKET_NAME with the name of the S3 bucket you created for ICA. Replace YOUR_FOLDER_NAME with the name of the folder in your S3 bucket.
On Versioned OR Suspended buckets, paste the JSON policy document below. Note the example below provides access to all objects prefixes in the bucket.
(Optional) Set policy name to "illumina-ica-admin-policy"
To create the IAM Policy via the AWS CLI, create a local file named illumina-ica-admin-policy.json containing the policy content above and run the following command. Be sure the path to the policy document (--policy-document) leads to the path where you saved the file:
3 - Create AWS IAM User
An AWS IAM User is needed to create an Access Key for ICA to connect to the AWS S3 Bucket. The policy will be attached to the IAM user to grant the user the necessary permissions.
Refer to the documentation for instructions on creating an AWS IAM User via the AWS Management Console. Use the following configuration during the process:
(optional) Set user name to "illumina_ica_admin"
Select the Programmatic access option for the type of access
Select Attach existing policies directly when setting the permissions, and choose the policy created in
To create the IAM user and attach the policy via the AWS CLI, enter the following command (AWS IAM users are global resources and do not require a region to be specified). This command creates an IAM user illumina_ica_admin, retrieves your AWS account number, and then attaches the policy to the user.
4. -Create AWS Access Key
If the Access Key information was retrieved during the , skip this step.
Refer to the AWS documentation for instructions on creating an AWS Access Key via the AWS Console. See the "To create, modify, or delete another IAM user's access keys (console)" sub-section.
Use the command below to create the Access Key for the illumina_ica_admin IAM user. Note the SecretAccessKey is sensitive and should be stored securely. The access key is only displayed when this command is executed and cannot be recovered. A new access key must be created if it is lost.
The AccessKeyId and SecretAccessKey values will be provided to ICA in the next step.
5 - S3 Bucket Policy
Connecting your S3 bucket to ICA does not require any additional bucket policies.
What if you need a bucket policy for use cases beyond ICA?
The bucket policy must then support the essential permissions needed by ICA without inadvertently restricting its functionality.
Be sure to replace the following fields:
6 - Block Public Access to S3 bucket (optional)
By default, public access to the S3 bucket is allowed. For increased security, it is advised to block public access with the following command:
To block public access to S3 buckets on account level, you can use the AWS Console on the website.
7 - Create ICA Storage Credential
To connect your S3 account to ICA, you need to add a storage credential in ICA containing the Access Key ID and Access Key created in the previous step. From the ICA home screen, navigate to System Settings > Credentials > Create > Storage Credential to create a new storage credential.
Provide a name for the storage credentials, ensure the type is set to "AWS user" and provide the Access Key ID and Secret Access Key.
With the secret credentials created, a storage configuration can be created using the secret credential. Refer to the instructions to for details.
The key prefix is mandatory in your storage credentials if you created a folder as recommended in step 2 .
8 - Enabling Cross Account Access for Copy and Move Operations
ICA uses AssumeRole to copy and move objects from a bucket in an AWS account to another bucket in another AWS account. To allow cross account access to a bucket, the following policy statements must be added in the S3 bucket policy:
Be sure to replace the following fields:
ASSUME_ROLE_ARN: Replace this field with the ARN of the cross account role you want to give permission to. Refer to the table below to determine which region-specific Role ARN should be used.
The ARN of the cross account role you want to give permission to is specified in the Principal. Refer to the table below to determine which region-specific Role ARN should be used.
Region
Role ARN
Troubleshooting
Common Issues
The following are common issues encountered when connecting an AWS S3 bucket through a storage configuration
Error Type
Error Message
Description/Fix
Conflicting bucket notifications
This error occurs when an existing bucket notification's event information overlaps with the notifications ICA is trying to add. only allows overlapping events with non-overlapping prefix. Depending on the conflicts on the notifications, the error can be presented in any of the following:
Volume Configuration cannot be provisioned: storage container is already set up for customer's own notification
Invalid parameters for volume configuration: found conflicting storage container notifications with overlapping prefixes
Failed to update bucket policy: Configurations overlap. Configurations on the same bucket cannot share a common event type
Solution:
In the Amazon S3 Console, review your current S3 bucket's notification configuration and look for prefixes that overlap with your Storage Configuration's key prefix
Delete the existing notification that overlaps with your Storage Configuration's key prefix
ICA will perform a series of steps in the background to re-verify the connection to your bucket.
GetTemporaryUploadCredentialsAsync failure
This error can occur when recreating a recently deleted storage configuration.
To fix the issue, you have to delete the bucket notifications:
In the select the bucket for which you need to delete the notifications from the list.
Choose properties
Navigate to the Event Notifications section and choose the check box for the event notifications with name gds:objectcreated, gds:objectremoved and gds:objectrestore and click Delete.
If you do not want to wait 15 minutes, you can revalidate the current storage configuration for an immediate update on the System Settings > Storage > Manage > Validate.
When moving folders in S3, the original, but empty, folder will remain visible in ICA and must be manually deleted there.
When deleting a folder and its contents in S3, the empty folder will remain visible in ICA and must be manually deleted there.
Projects cannot be created with ./ as prefix since S3 does not allow uploading files with this key prefix.
Japan
ap-northeast-1
Singapore
ap-southeast-1
South Korea*
ap-northeast-2
UK
eu-west-2
United Arab Emirates
me-central-1
United States
us-east-1
Replace YOUR_BUCKET_NAME with the name of the S3 bucket you created for ICA. Replace YOUR_FOLDER_NAME with the name of the folder in your S3 bucket.
(Optional) Retrieve the Access Key ID and Secret Access Key by choosing to Download .csv
YOUR_BUCKET_NAME: Replace this field with the name of the S3 bucket you created for ICA.
YOUR_ACCOUNT_ID: Replace this field with your account ID number.
YOUR_IAM_USER: Replace this field with the name of your IAM user created for ICA.
In this example, restriction is enabled on the bucket policy to prevent any kind of access to the bucket. However, there is an exceptionrule added for the IAM user that ICA is using to connect to the S3 bucket. The exception rule is allowing ICA to perform the above S3 action permissions necessary for ICA functionalities.
Additionally, the exception rule is applied to the STS federated user session principal associated with ICA. Since ICA leverages the AWS STS to provide temporary credentials that allow users to perform actions on the S3 bucket, it is crucial to include these STS federated user session principals in your policy's whitelist. Failing to do so could result in 403 Forbidden errors when users attempt to interact with the bucket's objects using the provided temporary credentials.
YOUR_BUCKET_NAME: Replace this field with the name of the S3 bucket you created for ICA.
arn:aws:iam::079623148045:role/ica_apn1_crossacct
Singapore (SG)
arn:aws:iam::079623148045:role/ica_aps1_crossacct
South Korea (KR)
arn:aws:iam::079623148045:role/ica_apn2_crossacct
UK (GB)
arn:aws:iam::079623148045:role/ica_euw2_crossacct
United Arab Emirates (AE)
arn:aws:iam::079623148045:role/ica_mec1_crossacct
United States (US)
arn:aws:iam::079623148045:role/ica_use1_crossacct
Conflict
Found conflicting storage container notifications for {prefix}{eventTypeMsg}
See
Conflict
Found conflicting storage container notifications with overlapping prefixes{prefixMsg}{eventTypeMsg}
See
Customer Container Notification Exists
Volume Configuration cannot be provisioned: storage container is already set up for customer's own notification
See
Invalid Access Key ID
Failed to update bucket policy: The AWS Access Key Id you provided does not exist in our records.
Check the status of the AWS Access Key ID in the console. If not active, activate it. If missing, create it.
Invalid Paramater
Missing credentials for storage container
Check the storage credential. AccessKeyId and/or SecretAccessKey is not set.
Invalid Parameter
Missing bucket name for storage container
Bucket name has not been set for the storage configuration.
Invalid Parameter
The storage container name has invalid characters
Storage container name can only contain lowercase letters, numbers, hyphens, and periods.
Invalid Parameter
Storage Container '{storageContainer}' does not exist
Update storage configuration container to a valid s3 bucket.
Invalid Parameter
Invalid parameters for volume configuration: {message}
Invalid Storage Container Location
Storage container must be located in the {region} region
Update storage configuration region to match storage container region.
Invalid Storage Container Location
Storage container must be located in one of the following regions: {regions}
Update storage configuration region to match storage container region.
Missing Configuration
Missing queue name for storage container notification
Missing Configuration
Missing system topic name for storage container notification
Missing Configuration
Missing lambda ARN for storage container notification
Missing Configuration
Missing subscription name for storage container notification
Missing Storage Account Settings
The storage account '{storageAccountName}' needs HNS (Hierarchical Namespace) enabled.
Missing Storage Container Settings
Missing settings for storage container
Wait 15 minutes for the storage to become available in ICA
Australia
ap-southeast-2
Canada
ca-central-1
Germany
eu-central-1
India
ap-south-1
Indonesia
ap-southeast-3
Israel
il-central-1
Australia (AU)
arn:aws:iam::079623148045:role/ica_aps2_crossacct
Canada (CA)
arn:aws:iam::079623148045:role/ica_cac1_crossacct
Germany (EU)
arn:aws:iam::079623148045:role/ica_euc1_crossacct
India (IN)
arn:aws:iam::079623148045:role/ica_aps3_crossacct
Indonesia (ID)
arn:aws:iam::079623148045:role/ica_aps4_crossacct
Israel (IL)
arn:aws:iam::079623148045:role/ica_ilc1_crossacct
Access Forbidden
Access forbidden: {message}
Mostly occurs because of lack of permission. Fix: Review IAM policy, Bucket policy, ACLs for required permissions
Unsupported principal
Unsupported principal: The policy type ${policy_type} does not support the Principal element. Remove the Principal element.
A Pipeline is a series of Tools with connected inputs and outputs configured to execute in a specific order.
Linking Existing Pipelines
Linking a pipeline (Projects > your_project > Flow > Pipelines > Link) adds that pipeline to your project. This is not as a copy, but as the actual pipeline, so any changes to the pipeline are atomatically propagated to and from any project which has this pipeline linked.
You can link a pipeline if it is not already linked to your project and it is from your tenant or available in your bundle or activation code.
Activation codes are tokens which allow you to run your analyses and are used for accounting and allocating the appropriate resources. ICA will automatically determine the best matching activation code, but this can be overwritten if needed.
If you unlink a pipeline it removes the pipline from your project, but it remains part of the list of pipelines of your tenant, so it can be linked to other projects later on.
Select Nextflow (XML / JSON / ) , CWL Graphical or CWL code (XML / JSON / ) to create a new Pipeline.
Configure pipeline settings in the pipeline property tabs.
Pipelines use the latest tool definition when the pipeline was last saved. Tool changes do not automatically propagate to the pipeline. In order to update the pipeline with the latest tool changes, edit the pipeline definition by removing the tool and re-adding it back to the pipeline.
Individual Pipeline files are limited to 20 Megabytes. If you need to add more than this, split your content over multiple files.
Pipeline Statuses
For pipeline authors sharing and distributing their pipelines, the draft, released, deprecated, and archived statuses provide a structured framework for managing pipeline availability, user communication, and transition planning. To change the pipeline status, select it at Projects > your_project > Pipelines > your_pipeline > change status.
You can edit pipelines while they are in Draft status. Once they move away from draft, pipelines can no longer be edited. Pipelines can be cloned (top right in the details view) to create a new editable version.
Pipeline Properties
The following sections describe the properties that can be configured in each tab of the pipeline editor.
Depending on how you design the pipeline, the displayed tabs differ between the graphical and code definitions. For CWL you have a choice on how to define the pipeline, Nextflow is always defined in code mode.
Any additional source files related to your pipeline will be displayed here in alphabetical order.
See the following pages for language-specific details for defining pipelines:
Details
The details tab provides options for configuring basic information about the pipeline.
Field
Entry
The following information becomes visible when viewing the pipeline details.
Field
Entry
The clone action will be shown in the pipeline details at the top-right. Cloning a pipeline allows you to create modifications without impacting the original pipeline. When cloning a pipeline, you become the owner of the cloned pipeline. When you clone a pipeline, you must give it a unique name because no duplicate names are allowed within all projects of the tenant. So thename must be unique per tenant. It is possible that you see the same pipeline name twice when a pipeline linked from another tenant is cloned with that same name in your tenant. The name is then still unique per tenant, but you will see them both in your tenant.
When you clone a Nextflow pipeline, a verification of the configured Nextflow version is done to prevent the use of deprecated versions.
Documentation
The Documentation tab provides is the place where you explain how your pipeline works to users. The description appears in the tool repository but is excluded from exported CWL definitions. If no documentation has been provided, this tab will be empty.
Definition (Graphical)
When using graphical mode for the pipeline definition, the Definition tab provides options for configuring the pipeline using a visualization panel and a list of component menus.
Menu
Description
In graphical mode, you can drag and drop inputs into the visualization panel to connect them to the tools. Make sure to connect the input icons to the tool before editing the input details in the component menu. Required tool inputs are indicated by a yellow connector.
Safari is not supported as browser for graphical editing.
When creating a graphical CWL pipeline, do not use spaces in the input field names, use underscores instead. The API performs normalization of input names when running the analysis to prevent issues with special characters (such as accented letters) by replacing them with their more common (unaccented) counterpart. Part of this normalization includes replacing spaces in names with underscores. This normalization is applied to file input name, reference file input name, step id and step parameters.
You will encounter the error ICA_API_004 "No value found for required input parameter" when trying to run an API analysis on a graphical pipeline that has been designed with spaces in input parameters.
XML Configuration / JSON Inputform Files (Code)
This page is used to specify all relevant information about the pipeline parameters.
There is a limit of 200 reports per report pattern which will be shown when you have multiple reports matching your regular expression.
Compute Resources
Compute Nodes
For each process defined by the workflow, ICA will launch a compute node to execute the process.
For each compute type, the standard (default - AWS on-demand) or economy (AWS spot instance) tiers can be selected.
When selecting an fpga instance type for running analyses on ICA, it is recommended to use the medium size. While the large size offers slight performance benefits, these do not proportionately justify the associated cost increase for most use cases.
When no type is specified, the default type of compute node is
You can see which resources were used in the different analysis steps at Projects > your_project > Flow > Analyses > your_analysis > Steps tab. (For child steps, these are displayed on the parent step)
By default, compute nodes have no scratch space. This is an advanced setting and should only be used when absolutely necessary as it will incur additional costs and may offer only limited performance benefits because it is not local to the compute node.
For simplicity and better integration, consider using shared storage available at /ces. It is what is provided in the Small/Medium/Large+ compute types. This shared storage is used when writing files with relative paths.
Scratch space notes
If you do require scratch space via a Nextflow pod annotation or a CWL resource requirement, the path is /scratch.
For Nextflow pod annotation: 'volumes.illumina.com/scratchSize', value: '1TiB' will reserve 1 TiB.
Compute Types
Daemon sets and system processes consume approximately 1 CPU and 2 GB Memory from the base values shown in the table. Consumption will vary based on the activity of the pod.
1DRAGEN pipelines running on fpga2 compute type will incur a DRAGEN license cost of 0.10 iCredits per gigabase of data processed, with additional discounts as shown below.
80 or less gigabase per sample - no discount - 0.10 iCredits per gigabase
The DRAGEN_Map_Alignpipeline running on fpga2 has the standard DRAGEN license cost of 0.10 iCredits per Gbase processed, with but replaces the standard volume discounts with the discounts shown below.
10 or less gigabase per sample - no discount - 0.10 iCredits per gigabase
(2) The compute type himem-xlarge has low availability.
FPGA1 instances were decommissioned on Nov 1st 2025. Please migrate to F2 for improved capacity and performance with up to 40% reduced turnaround time for analysis.
(3) The transfer size selected is based on the selected storage size for compute type and used during upload and download system tasks.
Nextflow/CWL Files (Code)
Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list. The following formats are supported:
DIFF (.diff)
GROOVY (.groovy .nf)
JAVASCRIPT (.js .javascript)
JSON (.json)
If the file type is not recognized, it will default to text display. This can result in the application interpreting binary files as text when trying to display the contents.
Main.nf (Nextflow code)
The Nextflow project main script.
Nextflow.config (Nextflow code)
The Nextflow configuration settings.
Workflow.cwl (CWL code)
The Common Workflow Language main script.
Adding Files
Multiple files can be added by selecting the +Create option at the bottom of the screen to make pipelines more modular and manageable.
Metadata Model
See
Report
Here patterns for detecting report files in the analysis output can be defined. On opening an analysis result window of this pipeline, an additional tab will display these report files. The goal is to provide a pipeline-specific user-friendly representation of the analysis result.
To add a report select the + symbol on the left side. Provide your report with a unique name, a regular expression matching the report and optionally, select the format of the report. This must be the source format of the report data generated during the analysis.
There is a limit of 20 reports per report pattern which will be shown when you have multiple reports matching your regular expression.
Start a New Analysis
Use the following instructions to start a new analysis for a single pipeline.
Select the pipeline or pipeline details of the pipeline you want to run.
Select Start Analysis.
Analysis Settings
The Start Analysis screen provides the configuration options for the analysis.
Field
Entry
1 When using the API, you can to be outside of the current project.
Aborting Analyses
You can abort a running analysis from either the analysis overview (Projects > your_project > Flow > Analyses > your_analysis > Manage > Abort) or from the analysis details (Projects > your_project > Flow > Analyses > your_analysis > Details tab > Abort).
View Analysis Results
You can view analysis results on the Analyses page or in the output folder on the Data page.
Select a project, and then select the Flow > Analyses page.
Select an analysis.
From the output files tab, expand the list if needed and select an output file.
When creating a graphical CWL pipeline, drag connectors to link tools to input and output files in the canvas. Required tool inputs are indicated by a yellow connector.
Select Save.
User selectable for running the pipeline. This must be large enough to run the pipeline, but setting it too large incurs unnecessary costs.
Family
A group of pipeline versions. To specify a family, select Change, and then select a pipeline or pipeline family. To change the order of the pipeline, select Up or Down. The first pipeline listed is the default and the remainder of the pipelines are listed as Other versions. The current pipeline appears in the list as this pipeline.
Version comment
A description of changes in the updated version.
Links
External reference links. (max 100 chars as name and 2048 chars as link)
Tool repository
A list of tools available to be used in the pipeline.
standard-small
.
For CWL, adding - class: ResourceRequirement tmpdirMin: 5000 to your requirements section will reserve 5000 MiB for CWL.
Avoid the following as it does not align with ICAv2 scratch space configuration.
Container overlay tmp path: /tmp
Legacy paths: /ephemeral
Environment Variables ($TMPDIR, $TEMP and $TMP)
Bash Command mktemp
CWL runtime.tmpdir
32
standard-large
standard, large
standard-xlarge
16
64
standard-xlarge
standard, xlarge
standard-2xlarge
32
128
standard-2xlarge
standard, 2xlarge
standard-3xlarge
64
256
standard-3xlarge
standard, 3xlarge
hicpu-small
16
32
hicpu-small
hicpu, small
hicpu-medium
36
72
hicpu-medium
hicpu, medium
hicpu-large
72
144
hicpu-large
hicpu, large
himem-small
8
64
himem-small
himem, small
himem-medium
16
128
himem-medium
himem, medium
himem-large
48
384
himem-large
himem, large
himem-xlarge2
92
700
himem-xlarge
himem, xlarge
hiio-small
2
16
hiio-small
hiio, small
hiio-medium
4
32
hiio-medium
hiio, medium
fpga2-medium1
24
256
fpga2-medium
fpga2,medium
fpga2-large1
48
512
fpga2-large
fpga2,large
gpu-small
8
61
gpu-small
gpu, small
gpu-medium
32
244
gpu-medium
gpu, medium
transfer-small3
4
10
transfer-small
transfer, small
transfer-medium 3
8
15
transfer-medium
transfer, medium
transfer-large3
16
30
transfer-large
transfer, large
> 80 to 160 gigabase per sample - 20% discount - 0.08 iCredits per gigabase
> 160 to 240 gigabase per sample - 30% discount - 0.07 iCredits per gigabase
> 240 to 320 gigabase per sample - 40% discount - 0.06 iCredits per gigabase
> 320 and more gigabase per sample - 50% discount - 0.05 iCredits per gigabase
DRAGEN Iterative gVCF Genotyper (iGG) will incur a license cost of 0.6216 iCredits per gigabase. For example, a sample of 3.3 gigabase human reference will result in 2 iCredits per sample. The associated Compute costs will be based on the compute instance chosen.
The ORA (Original Read Archive) compression pipeline is part of the DRAGEN platform. It performs lossless genomic data compression to reduce the size of FASTQ and FASTQ.GZ files (up to 4-6x smaller) while preserving data integrity with internal checksum verification. The ORA compression pipeline has a license cost of 0.017 iCredits per input Gbase; decompression does not have an associated license cost.
> 10 to 25 gigabase per sample - 30% discount - 0.07 iCredits per gigabase
> 25 to 60 gigabase per sample - 70% discount - 0.03 iCredits per gigabase
> 60 and more gigabase per sample - 85% discount - 0.015 iCredits per gigabase
Identification of the pipeline in Uniform Resource Name
Machine profiles
Compute types available to use with Tools in the pipeline.
Shared settings
Settings for pipelines used in more than one tool.
Reference files
Descriptions of reference files used in the pipeline.
Input files
Descriptions of input files used in the pipeline.
Output files
Descriptions of output files used in the pipeline.
Tool
Details about the tool selected in the visualization panel.
Compute Type
CPUs
Mem (GiB)
Nextflow (pod.value)
CWL (type, size)
standard-small
2
8
standard-small
standard, small
standard-medium
4
16
standard-medium
standard, medium
standard-large
User Reference
The unique analysis name.
Pipeline
This is not editable, but provides a link to the pipeline so you want to look up details of the pipeline.
User tags (optional)
One or more tags used to filter the analysis list. Select from existing tags or type a new tag name in the field.
Notification (optional)
Enter your email address if you want to be notified when the analysis completes.
Output Folder1
Select a folder in which the output folder of the analysis should be located. When no folder is selected, the output folder will be located in the root of the project.
When you open the folder selection dialog, you have the option to create a new folder (bottom of the screen). You can create nested folders by using the folder/subfolder syntax.
Do not use a / before the first folder or after the last subfolder in the folder creation dialog.
Logs Folder
Select a folder in which the logs of the analysis should be located. When no logs folder is selected, the logs will be stored as subfolder in the output folder. When a logs folder is selected which is different from the output folder, the outputs and logs folders are separated.
Files that already exist in the logs folder will be overwritten with new versions.
When you open the folder selection dialog, you have the option to create a new folder (bottom of the screen). You can create nested folders by using the folder/subfolder syntax.
Note: Choose a folder that is empty and not in use for other analyses, as files will be overwritten.
Note: Do not use a / before the first folder or after the last subfolder in the folder creation dialog.
Status
Draft
Purpose
Use the draft status while developing or testing a pipeline version internally.
Best Practice
Only share draft pipelines with collaborators who are actively involved in development.
Status
Released
Purpose
The released status signals that a pipeline is stable and ready for general use.
Best Practice
Share your pipeline when it is ready for broad use. Ensure users have access to current documentation and know where to find support or updates. Releasing a pipeline is only possible if all tools of that pipeline must be in released status.
Status
Deprecated
Purpose
Deprecation is used when a pipeline version is scheduled for retirement or replacement. Deprecated pipelines can not be linked to bundles, but will not be unlinked from existing bundles. Users who already have access will still be able to start analyses. You can add a message (max 256 chars) when deprecating pipelines.
Best Practice
Deprecate in advance of archiving a pipeline, making sure the new pipeline is available in the same bundle as the deprecated pipeline. This will allow the pipeline author to link the new or alternative pipeline in the deprecation message field.
Status
Archived
Purpose
Archiving a pipeline version removes it from active use; users can no longer launchanalyses. Archived pipelines can not be linked to bundles, but are not automatically unlinked from bundles or projects. You can add a message (max 256 chars) when archiving pipelines.
Best Practice
Warn users in advance: Deprecate the pipeline before archiving to allow existing users time to transition. Use the archive message to point users to the new or alternative pipeline
Find the links to CLI builds in the Releases section below.
Downloading the Installer
In the Releases section below, select the matching operating system in the link column for the version you want to install. This will download the installer for that operating system.
Version Check
To determine which CLI version you are currently using, navigate to your currently installed CLI and use the CLI command icav2 version For help on this command use icav2 version -h.
Integrity Check
Checksums are provided alongside each downloadable CLI binary to verify file integrity. The checksums are generated using the SHA256 algorithm. To use the checksums:
Download the CLI binary for your OS
Download the corresponding checksum using the links in the table
Calculate the SHA256 checksum of the downloaded CLI binary
There are a variety of open source tools for calculating the SHA256 checksum. See the below tables for examples.
For CLI v2.3.0 and later:
OS
Command
For CLI v2.2.0
OS
Command
Releases
Version
Link
Checksum
To access release history of CLI versions prior to v2.0.0, please see the ICA v1 documentation .
Diff the calculated SHA256 checksum with the downloaded checksum. If the checksums match, the integrity is confirmed.
ICA Cohorts data can be viewed in an ICA Project Base instance as a shared database. A shared database in ICA Base operates as a database view. To use this feature, enable Base for your project prior to starting any ICA Cohorts ingestions. See Base for more information on enabling this feature in your ICA Project.
ICA Cohorts Base Tables
After ingesting data into your project, select Phenotypic and Molecular data are available to view in Base. See Cohorts Import for instruction on importing data sets into Cohorts.
Post ingestion, data will be represented in Base.
Select BASE from the ICA left-navigation and click Query.
Under the New Query window, a list of tables is displayed. Expand the Shared Database for Project \<your project name\> .
Cohorts tables will be displayed.
To preview the table and fields click each view listed.
Clicking any of these views then selecting PREVIEW on the right-hand side will show you a preview of the data in the tables.
If your ingestion includes Somatic variants, there will be two molecular tables: ANNOTATED_SOMATIC_MUTATIONS and ANNOTATED_VARIANTS. All ingestions will include a PHENOTYPE table.
The PHENOTYPE table includes a harmonized set that is collected across all data ingestions and is not representative of all data ingested for the Subject or Sample. Sample information is also displayed in this table, if applicable. Sample information drives the annotation process if molecular data is included in the ingestion. That data is stored in the PHENOTYPE table.
Phenotype Data
Field Name
Type
Description
Sample Information
Field Name
Type
Description
Sample Attribute
This table is an entity-attribute value table of supplied sample data matching Cohorts accepted attributes.
Field Name
Type
Description
Study Information
Field Name
Type
Description
Subject
Field
Type
Description
Subject Attribute
This table is an entity-attribute value table of supplied subject data matching Cohorts accepted attributes.
Field
Type
Description
Disease
Field
Type
Description
Drug Exposure
Field
Type
Description
Measurement
Field
Type
Description
Procedure
Field
Type
Description
Annotated Variants
This table will be available for all projects with ingested molecular data
Annotated Somatic Mutations
This table will only be available for data sets with ingested Somatic molecular data.
Annotated Copy Number Variants
This table will only be available for data sets with ingested CNV molecular data.
Annotated Structural Variants
This table will only be available for data sets with ingested SV molecular data. Note that ICA Cohorts stores copy number variants in a separate table.
Raw RNAseq data tables for genes and transcripts
These tables will only be available for data sets with ingested RNAseq molecular data.
Table for gene quantification results:
The corresponding transcript table uses TRANSCRIPT_ID instead of GENE_ID and GENE_HGNC.
Differential expression tables for genes and transcripts
These tables will only be available for data sets with ingested RNAseq molecular data.
Table for differential gene expression results:
The corresponding transcript table uses TRANSCRIPT_ID instead of GENE_ID and GENE_HGNC.
Data
The Data section gives you access to the files and folders stored in the project as well as those linked to the project. Here, you can perform searches and data management operations such as moving, copying, deleting and (un)archiving.
See also which are a special form of data storage optimised for fast processing.
SEX
STRING
Sex field to drive annotation
POPULATION
STRING
Population Designation for 1000 Genomes Project
SUPERPOPULATION
STRING
Superpopulation Designation from 1000 Genomes Project
RACE
STRING
Race according to NIH standard
CONDITION_ONTOLOGIES
VARIANT
Diagnosis Ontology Source
CONDITION_IDS
VARIANT
Diagnosis Concept Ids
CONDITIONS
VARIANT
Diagnosis Names
HARMONIZED_CONDITIONS
VARIANT
Diagnosis High-level concept to drive UI
LIBRARYTYPE
STRING
Seqencing technology
ANALYTE
STRING
Substance sequenced
TISSUE
STRING
Tissue source
TUMOR_OR_NORMAL
STRING
Tumor designation for somatic
GENOMEBUILD
STRING
Genome Build to drive annotations - hg38 only
SAMPLE_BARCODE_VCF
STRING
Sample ID from VCF
AFFECTED_STATUS
NUMERIC
Affected, Unaffected, or Unknown for Family Based Analysis
FAMILY_RELATIONSHIP
STRING
Relationship designation for Family Based Analysis
CREATEDATE
DATE
Date and time of record creation
LASTUPDATEDATE
DATE
Date and time of last update of record
STUDY
STRING
Study subject belongs to
CREATEDATE
DATE
Date and time of record creation
LASTUPDATEDATE
DATE
Date and time of record update
NUMERIC
Chromosome ID: 1..22, 23=X, 24=Y, 25=Mt
DBSNP
STRING
dbSNP Identifiers
VARIANT_KEY
STRING
Variant ID in the form "1:12345678:12345678:C"
NIRVANA_VID
STRING
Broad Institute VID: "1-12345678-A-C"
VARIANT_TYPE
STRING
Description of Variant Type (e.g. SNV, Deletion, Insertion)
VARIANT_CALL
NUMERIC
1=germline, 2=somatic
DENOVO
BOOLEAN
true / false
GENOTYPE
STRING
"G|T"
READ_DEPTH
NUMERIC
Sequencing read depth
ALLELE_COUNT
NUMERIC
Counts of each alternate allele for each site across all samples
ALLELE_DEPTH
STRING
Unfiltered count of reads that support a given allele for an individual sample
FILTERS
STRING
Filter field from VCF. If all filters pass, field is PASS
ZYGOSITY
NUMERIC
0 = hom ref, 1 = het ref/alt, 2 = hom alt, 4 = hemi alt
GENEMODEL
NUMERIC
1=Ensembl, 2=RefSeq
GENE_HGNC
STRING
HUGO/HGNC gene symbol
GENE_ID
STRING
Ensembl gene ID ("ENSG00001234")
GID
NUMERIC
NCBI Entrez Gene ID (RefSeq) or numerical part of Ensembl ENSG ID
TRANSCRIPT_ID
STRING
Ensembl ENST or RefSeq NM_
CANONICAL
STRING
Transcript designated 'canonical' by source
CONSEQUENCE
STRING
missense, stop gained, intronic, etc.
HGVSC
STRING
The HGVS coding sequence name
HGVSP
STRING
The HGVS protein sequence name
STRING
Chromosome without 'chr' prefix
DBSNP
NUMERIC
dbSNP Identifiers
VARIANT_KEY
STRING
Variant ID in the form "1:12345678:12345678:C"
MUTATION_TYPE
NUMERIC
Rank of consequences by expected impact: 0 = Protein Truncating to 40 = Intergenic Variant
VARIANT_CALL
NUMERIC
1=germline, 2=somatic
GENOTYPE
STRING
"G|T"
REF_ALLELE
STRING
Reference allele
ALLELE1
STRING
First allele call in the tumor sample
ALLELE2
STRING
Second allele call in the tumor sample
GENEMODEL
NUMERIC
1=Ensembl, 2=RefSeq
GENE_HGNC
STRING
HUGO/HGNC gene symbol
GENE_ID
STRING
Ensembl gene ID ("ENSG00001234")
TRANSCRIPT_ID
STRING
Ensembl ENST or RefSeq NM_
CANONICAL
BOOLEAN
Transcript designated 'canonical' by source
CONSEQUENCE
STRING
missense, stop gained, intronic, etc.
HGVSP
STRING
HGVS nomenclature for AA change: p.Pro72Ala
NUMERIC
Numerical representation of the chromosome, X=23, Y=24, Mt=25
GENE_ID
STRING
NCBI or Ensembl gene identifier
GID
NUMERIC
Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix
START_POS
NUMERIC
First affected position on the chromosome
STOP_POS
NUMERIC
Last affected position on the chromosome
VARIANT_TYPE
NUMERIC
1 = copy number gain, -1 = copy number loss
COPY_NUMBER
NUMERIC
Observed copy number
COPY_NUMBER_CHANGE
NUMERIC
Fold-chang of copy number, assuming 2 for diploid and 1 for haploid as the baseline
SEGMENT_VALUE
NUMERIC
Average FC for the identified chromosomal segment
PROBE_COUNT
NUMERIC
Probes confirming the CNV (arrays only)
REFERENCE
NUMERIC
Baseline taken from normal samples (1) or averaged disease tissue (2)
GENE_HGNC
STRING
HUGO/HGNC gene symbol
NUMERIC
Numerical representation of the chromosome, X=23, Y=24, Mt=25
BEGIN
NUMERIC
First affected position on the chromosome
END
NUMERIC
Last affected position on the chromosome
BAND
STRING
Chromosomal band
QUALIITY
NUMERIC
Quality from the original VCF
FILTERS
ARRAY
Filters from the original VCF
VARIANT_TYPE
STRING
Insertion, deletion, indel, tandem_duplication, translocation_breakend, inversion ("INV"), short tandem repeat ("STR2")
For translocations, the other affected chromosome as a numeric value, X=23, Y=24, Mt=25
BONDPOS
STRING
For translocations, positions on the other affected chromosome
BONDORDER
NUMERIC
3 or 5: Whether this fragment (the current variant/VID) "receives" the other chromosome's fragment on it's 3' end, or attaches to the 5' of the other chromosome fragment
GENOTYPE
STRING
Called genotype from the VCF
GENOTYPE_QUALITY
NUMERIC
Genotype call quality
READCOUNTSSPLIT
ARRAY
Read counts
READCOUNTSPAIRED
ARRAY
Read counts, paired end
REGULATORYREGIONID
STRING
Ensembl ID for the affected regulatory region
REGULATORYREGIONTYPE
STRING
Type of the regulatory region
CONSEQUENCE
ARRAY
Variant consequence according to SequenceOntology
TRANSCRIPTID
STRING
Ensembl of RefSeq transcript identifier
TRANSCRIPTBIOTYPE
STRING
Biotype of the transcript
INTRONS
STRING
Count of impacted introns out of the total number of introns, specified as "M/N"
GENEID
STRING
Ensembl or RefSeq gene identifier
GENEHGNC
STRING
HUGO/HGNC gene symbol
ISCANONICAL
BOOLEAN
Is the transcript ID the canonical one according to Ensembl?
PROTEINID
STRING
RefSeq or Ensembl protein ID
SOURCEID
NUMERICAL
Gene model: 1=Ensembl, 2=RefSeq
STRING
Ensembl or RefSeq gene identifier
GID
NUMERIC
Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix
GENE_HGNC
STRING
HUGO/HGNC gene symbol
SOURCE
STRING
Gene model: 1=Ensembl, 2=RefSeq
TPM
NUMERICAL
Transcripts per million
LENGTH
NUMERICAL
The length of the gene in base pairs.
EFFECTIVE_LENGTH
NUMERICAL
The length as accessible to RNA-seq, accounting for insert-size and edge effects.
NUM_READS
NUMERICAL
The estimated number of reads from the gene. The values are not normalized.
STRING
Ensembl or RefSeq gene identifier
GID
NUMERIC
Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix
GENE_HGNC
STRING
HUGO/HGNC gene symbol
SOURCE
STRING
Gene model: 1=Ensembl, 2=RefSeq
BASEMEAN
NUMERICAL
FC
NUMERICAL
Fold-change
LFC
NUMERICAL
Log of the fold-change
LFCSE
NUMERICAL
Standard error for log fold-change
PVALUE
NUMERICAL
P-value
CONTROL_SAMPLECOUNT
NUMERICAL
Number of samples used as control
CONTROL_LABEL
NUMERICAL
Label used for controls
SAMPLE_BARCODE
STRING
Sample Identifier
SUBJECTID
STRING
Identifer for Subject entity
STUDY
STRING
Study designation
AGE
NUMERIC
Age in years
SAMPLE_BARCODE
STRING
Original sample barcode used in VCF column
SUBJECTID
STRING
Original identifier for the subject record
DATATYPE
ARRAY
The categorization of molecular data
TECHNOLOGY
ARRAY
The sequencing method
SAMPLE_ BARCODE
STRING
Original sample barcode used in VCF column
SUBJECTID
STRING
Original identifier for the subject record
ATTRIBUTE_NAME
STRING
Cohorts meta-data driven field name
ATTRIBUTE_VALUE
VARIANT
List of values entered for the field
NAME
STRING
Study name
CREATEDATE
DATE
Date and time of study creation
LASTUPDATEDATE
DATE
Data and time of record update
SUBJECTID
STRING
Original identifier for the subject record
AGE
FLOAT
Age entered on subject record if applicable
SEX
STRING
-
ETHNICITY
STRING
-
SUBJECTID
STRING
Original identifier for the subject record
ATTRIBUTE_NAME
STRING
Cohorts meta-data driven field name
ATTRIBUTE_VALUE
VARIANT
List of values entered for the field
SUBJECTID
STRING
Original identifier for the subject record
TERM
STRING
Code for disease term
OCCURRENCES
STRING
List of occurrence related data
SUBJECTID
STRING
Original identifier for the subject record
TERM
STRING
Code for drug term
OCCURRENCES
STRING
List of occurrence related data of drug exposure
SUBJECTID
STRING
Original identifier for the subject record
TERM
STRING
Code for measurement term
OCCURRENCES
STRING
List of occurrences and values related to lab or measurement data
SUBJECTID
STRING
Original identifier for the subject record
TERM
STRING
Code for procedure term
OCCURRENCES
STRING
List of occurrences and values related procedure data
Field Name
Type
Description
SAMPLE_BARCODE
STRING
Original sample barcode used in VCF column
STUDY
STRING
Study designation
GENOMEBUILD
STRING
Only hg38 is supported
CHROMOSOME
STRING
Chromosome without 'chr' prefix
Field Name
Type
Description
SAMPLE_BARCODE
STRING
Original sample barcode, used in VCF column
SUBJECTID
STRING
Identifier for Subject entity
STUDY
STRING
Study designation
GENOMEBUILD
STRING
Only hg38 is supported
Field Name
Type
Description
SAMPLE_BARCODE
STRING
Sample barcode used in the original VCF
GENOMEBUILD
STRING
Genome build, always 'hg38'
NIRVANA_VID
STRING
Variant ID of the form 'chr-pos-ref-alt'
CHRID
STRING
Chromosome without 'chr' prefix
Field Name
Type
Description
SAMPLE_BARCODE
STRING
Sample barcode used in the original VCF
GENOMEBUILD
STRING
Genome build, always 'hg38'
NIRVANA_VID
STRING
Variant ID of the form 'chr-pos-ref-alt'
CHRID
STRING
Chromosome without 'chr' prefix
Field Name
Type
Description
GENOMEBUILD
STRING
Genome build, always 'hg38'
STUDY_NAME
STRING
Study designation
SAMPLE_BARCODE
STRING
Sample barcode used in the original VCF
LABEL
STRING
Group label specified during import: Case or Control, Tumor or Normal, etc.
Field Name
Type
Description
GENOMEBUILD
STRING
Genome build, always 'hg38'
STUDY_NAME
STRING
Study designation
SAMPLE_BARCODE
STRING
Sample barcode used in the original VCF
CASE_LABEL
STRING
Study designation
CHROMOSOMEID
CHROMOSOME
CID
CID
GENE_ID
GENE_ID
Recommended Practices
File/Folder Naming
ICA supports UTF-8 characters in file and folder names for data. Please follow the guidelines detailed below. (For more information about recommended approaches to file naming that can be applicable across platforms, please refer to the AWS S3 documentation.)
Characters generally considered "safe"
Alphanumeric characters
0-9
a-z
A-Z
Special characters
Exclamation point !
Hyphen -
Folders and files cannot be renamed after they have been created. To rename a folder, you will need to create a new folder with the desired name, move the contents from the original folder into the new one, and then delete the original folder. Please see Move Data section for more information.
Troubleshooting
If you get an error "Unable to generate credentials from the objectstore as the requested path is too long." from AWS when requesting temporary credentials, then the path should be shortened.
You can truncate the sample name and user reference or use advanced output mapping in the API which avoids generating the long folders and creates output in the targetPath-defined location.
Data privacy should be carefully considered when adding data in ICA, either through storage configurations (ie, AWS S3) or ICA data upload. Be aware that when adding data from cloud storage providers by creating a storage configuration, ICA will provide access to the data. Ensure the storage configuration source settings are correct and ensure uploads do not include unintended data in order to avoid unintentional privacy breaches. More guidance can be found in the ICA Security and Compliance section.
To prevent cost issues, you can not perform actions such as copying and moving data which would write data to the workspace when the project billing mode is set to tenant and the owning tenant of the folder is not the current user's tenant.
Viewing Data
On the Projects > your_project > Data page, you can view file information and preview files.
Files
To view file details click on the filename to see the file details.
Run input tags identify the last 100 pipelines which used this file as input.
Connector tags indicate if the file was added via browser upload or connector.
To view file contents, select the checkbox at the beginning of the line and then select View from the top menu. Alternatively, you can first click on the filename to see the details and then click the view tab to preview the file.
When you share the data view by sharing the link from your browser, filters and sorting is retained in links, so the recipient will see the same data and order.
If your data is the result of an analysis, you can find the analysis which created it at Projects > your_project > Data > your_data > view > Data details tab > Source analysis. Clicking the link here will open the analysis.
To see the ongoing actions (copying from, copying to, moving from, moving to) on data in the data overview (Projects > your_project > Data), add the ongoing actions column from the column list. This contains a list of ongoing actions sorted by when they were created.
You can also consult the data detail view for ongoing actions by clicking on the data in the overview. When clicking on an ongoing action itself, the data job details of the most recent created data job are shown.
Folders
If you open a folder by clicking it, you can see the folder details link at the top right. This will open the details screen where you can consult the folder size and number of files in that folder, the owning project, ongoing actions and folder id. You can also download the folder and all contents here with the download button.
Secondary Data
When Secondary Data is added to a data record, those secondary data records are mounted in the same parent folder path as the primary data file when the primary data file is provided as an input to a pipeline. Secondary data is intended to work with the CWL secondaryFiles feature. This is commonly used with genomic data such as BAM files with companion BAM index files (refer to https://www.ncbi.nlm.nih.gov/tools/gbench/tutorial6/ for an example).
Hyperlinking to Data
To hyperlink to data, use the following syntax:
Variable
Location
ServerURL
see browser addres bar
projectID
At YourProject > Details > URN > urn:ilmn:ica:project:ProjectID#MyProject
FolderID
At YourProject > Data > folder > folder details > ID
AnalysisID
At YourProject > Flow > Analyses > YourAnalysis > ID
Normal permission checks still apply with these links. If you try to follow a link to data to which you do not have access, you will be returned to the main project screen or login screen, depending on your permissions.
Uploading Data
Uploading data to the platform makes it available for consumption by analysis workflows and tools.
UI Upload
To upload data manually via the drag-and-drop interface in the platform UI, go to Projects > your_project > Data and either
Drag a file from your system into the Choose a file or drag it here box.
Select the Choose a file or drag it here box, and then choose a file. Select Open to upload the file.
Your files are added to the Data page with status partial during upload and become available when upload completes.
Do not close the ICA tab in your browser while data uploads.
Uploads via the UI are limited to 5TB and no more than 100 concurrent files at a time, but for practical and performance reasons, it is recommended to use the CLI or Service connector when uploading large amounts of data.
Upload Data via CLI
For instructions on uploading/downloading data via CLI, see CLI Data Transfer.
Copying Data
You can copy data from your project to a different folder within the same project or you can copy data from another project to your current project, provided you have the necessary access rights.
You can copy data from a subfolder to a higher-level folder to move data up one or more levels (folder/destination/source). You can not copy data from the source folder onto itself or onto a subfolder of the source folder as this would result in a loop.
The person copying the data must have the following rights:
Copy Data Rights
Source Project
Destination Project
Within a project
Contributor rights
Upload and Download rights
Contributor rights
Upload and Download rights
Between different projects
Download rights
Viewer rights
Upload rights
Contributor rights
The following restrictions apply when copying data:
Copy Data Restrictions
Source Project
Destination Project
Within a project
No linked data
No partial data
No archived data
No Linked data
Between different projects
Data sharing enabled
No partial data
No archived data
No linked data
Within the same region
Data in the "Partial" or "Archived" state will be skipped during a copy job.
To use data copy:
Go to the destination project for your data copy and proceed to Projects > your_project > Data > Manage > Copy From.
Optionally, use the filters (Type, Name, Status, Format or additional filters) to filter out the data or search with the search box.
Select the data (individual files or folders with data) you want to copy.
Select any meta data which you want to keep with the copied data (user tags, technical system tags or instrument information).
Select which action to take if the data already exists (overwrite exsiting data, don't copy or keep both the original and the new copy by appending a number to the copied data).
Select Copy Data to copy the data to your project. You can see the progress in Projects > your_project > Activity > Batch Jobs and if your browser permits it, a pop-up message will be displayed whan the copy process completes.
The outcome can be
INITIALIZED
WAITING_FOR_RESOURCES
RUNNING
STOPPED - When choosing to stop the batch job.
SUCCEEDED - All files and folders are copied.
PARTIALLY_SUCCEEDED - Some files and folders could be copied, but not all. Partially succeeded will typically occur when files were being modified or unavailable while the copy process was running.
FAILED - None of the files and folders could be copied.
To see the ongoing actions on data in the data overview (Projects > your_project > Data), you can add the ongoing actions column from the column list with the three column symbol at the top right, next to the filter funnel. You can also consult the data detail view for ongoing actions by clicking on the data in the overview.
There is a difference in copy type behavior between copying files and folders. The behavior is designed for files and it is best practice to not copy folders if there already is a folder with the same name in the destination location.
Notes on copying data
Copying data comes with an additional storage cost as it will create a copy of the data.
You can copy over the same data multiple times.
Copying data from your own S3 storage requires additional configuration. See and ..
On the command-line interface, the command to copy data is icav2 projectdata copy.
Move Data
You can move data both within a project and between different projects to which you have access. If you allow notifications from your browser, a pop-up will appear when the move is completed.
Move From is used when you are in the destination location.
Move To is used when you are in the source location. Before moving the data, pre-checks are performed to verify that the data can be moved and no currently running operations are being performed on the folder. Conflicting jobs and missing permissions will be reported. Once the move has started, no other operation should be performed on the data being moved to avoid potential data loss or duplication. Adding or (un)archiving files during the move may result in duplicate folders and files with different identifiers. If this happens, you will need to manually delete the duplicate files and move the files which were skipped during the initial move.
When you move data from one location to another, you should not change the source data while the Move job is in progress. This will result in jobs getting aborted. Please expand the "Troubleshooting" section below for information on how to fix this if it occurs.
Troubleshooting
If the source or destination of data being moved is modified, the Move jobs will detect the changes and abort the job.
Modifying data at either the source or destination during a Move process can result in incomplete data transfer. Users can still manually move any remaining data afterward.
This partial move may cause data at the destination to become unsynchronized between the object store (S3) and ICA. To resolve this, users can create a folder session on the parent folder of the destination directory by following the steps in the API: and then . Ensure the Move job is already aborted before submitting the folder session create and complete requests. Wait for the session status t
There are a number of rights and restrictions related to data move as this will delete the data in the source location.
Move Data Rights
Source Project
Destination Project
Within a project
Contributor rights
Contributor rights
Between different projects
Download rights
Contributor rights
Upload rights
Viewer rights
Move Data Restrictions
Source Project
Destination Project
Within a project
No linked data
No partial data
No archived data
No Linked data
Between different projects
Data sharing enabled
Data owned by user's tenant
No linked data
No linked data
Within same region
Move jobs will fail if any data being moved is in the Partial or Archived state.
Move Data From
Move Data From is used when you are in the destination location.
Navigate to Projects > your_project > Data > your_destination_location > Manage > Move From.
Select the files and folders which you want to move.
Select the Move button. Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.
Move Data To
Move Data To is used when you are in the source location. You will need to select the data you want to move from to current location and the destination to move it to.
Navigate to Projects > your_project > Data > your_source_location.
Select the files and folders which you want to move.
Select to Projects > your_project > Data > your_source_location > Manage > Move To.
Select your target project and location.
Note: You can create a new folder to move data to by filling in the "New folder name (optional)" field. This does NOT rename an existing folder. To rename an existing folder, please see .
Select the Move button. Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.
Move Status
INITIALIZED
WAITING_FOR_RESOURCES
RUNNING
STOPPED - When choosing to stop the batch job.
SUCCEEDED - All files and folders are moved.
PARTIALLY_SUCCEEDED - Some files and folders could be moved, but not all. Partially succeeded will typically occur when files were being modified or unavailable while the move process was running.
FAILED - None of the files and folders could be moved.
To see the ongoing actions on data in the data overview (Projects > your_project > Data), you can add the ongoing actions column from the column list with the three column symbol at the top right, next to the filter funnel. You can also consult the data detail view for ongoing actions by clicking on the data in the overview.
Restrictions:
A total maximum of 1000 items can be moved in one operation. An item can be either a file or a folder. Folders with subfolders and subfiles still count as one item.
You can not move files and folders to a destination where one or more files or folders with the same name already exists.
You can not move data and folders to linked data.
You can not move a folder to itself.
You can not move data which is in the process of being moved.
You can not move data across regions.
You can not move data from externally-managed projects.
You can not move linked data.
You can not move data between regions.
You can not move externally managed data.
You can only move data when it has status available.
To move data across projects, it must be owned by the user's tenant.
If you do not select a target folder for Move Data To, the root folder of the target project is used.
If you are only able to select your source project as the target data project, this may indicate that data sharing (Projects > your_project > Project Settings > Details > Data Sharing) is not enabled for your project or that you do not have have upload rights in other projects.
Download Data
Single files can be downloaded directly from within the UI.
Select the checkbox next to the file which you want to download, followed by Download > Select Browser Download > Download.
You can also download files from their details screen. Click on the file name and select Download at the bottom of the screen. Depending on the size of your file, it may take some time to load the file contents.
Schedule for Download
You can trigger an asynchronous download via service connector using the Schedule for Download button with one or more files selected.
Select a file or files to download.
Select Download > Schedule download (for files or folders). This will display a list of all available connectors.
Select a connector and optionally, enter your email address if you want to be notified of download completion, and then select Download.
If you do not have a connector, you can click the Don't have a connector yet? option to create a new connector. You must then install this new connector and return to the file selection in step 1 to use it.
You can view the progress of the download or stop the download on the Activity page for the project.
Export Project Data Information
The data records contained in a project can be exported in CSV, JSON, and excel format.
Select one or more files to export.
Select Export.
Select the following export options:
To export only the selected file, select the Selected rows as the Rows to export option. To export all files on the page, select Current page.
To export only the columns present for the file, select the Visible columns as the Columns to export option.
Select the export format.
Archiving and Deleting files
To manually archive or delete files, do as follows:
Select the checkbox next to the file or files to delete or archive.
Select Manage, and then select one of the following options:
Archive — Move the file or files to long-term storage (event code ICA_DATA_110).
Unarchive — Return the file or files from long-term storage. Unarchiving can take up to 48 hours, regardless of file size. Unarchived files can be used in analysis (event code ICA_DATA_114).
Delete — Remove the file completely (event code ICA_DATA_106).
When attempting concurrent archiving or unarchiving of the same file, a message will inform you to wait for the currently running (un)archiving to finish first.
To archive or delete files programmatically, you can use ICA's API endpoints:
The Python snippet below exemplifies the approach: it sets (or updates if set already) the time to be archived for a specific file:
To delete a file at specific timepoint, the key 'willBeDeletedAt' should be added or changed using the API call. If running in the terminal, a successful run will finish with the message ‘200’. In the ICA UI, you can check the details of the file to see the updated values for ‘Time To Be Archived’ (willBeArchivedAt) or ‘Time To Be Deleted’ (willBeDeletedAt), as shown in the screenshot.
Linking Project Data
Data linking creates a dynamicread-only view to the source data. You can use data linking to get access to data without running the risk of modifying the source material and to share data between projects. In addition, linking ensures changes to the source data are immediately visible and no additional storage is required. You can recognise linked data by the green color and see the owning project as part of the details.
Since this is read-only access, you cannot perform actions on linked data that need write access. Actions like (un)archiving, linking, creating, deleting, adding or moving data and folders, and copying data into the linked data are not possible.
Linking data is only possible from the root folder of your destination project. The action is disabled in project subfolders.
Linking a parent folder after linking a file or subfolder will unlink the file or subfolder and link the parent folder. So root\linked_subfolder will become root\linked_parentfolder\linked_subfolder.
Migrating snapshot linked data. (linked before ICA release v.2.29)
Before ICA version v.2.29, when data was linked, a snapshot was created of the file and folder structure. These links created a read-only view of the data as it was at the time of linking, but did not propagate changes to the file and folder structure. If you want to use the advantages of the new way of linking with dynamic updates, unlink the data and relink it. Since snapshot linking has been deprecated, all new data linking done in ICA v.2.29 or later has dynamic content updates.
Initial linking can take considerable time when there is a large amount of source data. However, once the initial link is made, updates to the source data will be instantaneous. You can monitor the progress at Projects > your_project > activity > Batch Jobs.
linking data from another project.
Select Projects > your_project > Data > Manage, and then select Link.
To view data by project, select the funnel symbol, and then select Owning Project. If you only know which project the data is linked to, you can choose to filter on linked projects.
Select the checkbox next to the file or files to add.
Select Select Data.
Your files are added to the Data page. To view the linked data file, select Add filter, and then select Links.
Display Owning Project
if you have selected multiple owning projects, you can add the owning project column to see which project owns the data.
At the top of the screen, next to the filer icon, select the three columns.
The Add/remove columns tab will appear.
Choose Owning Project (or Linked Projects)
Linking Folders
If you link a folder instead of individual files, a warning is displayed indicating that, depending on the size of the folder, linking may take considerable time. The linking process will run in the background and the progress can be monitored on the Projects > your_project > activity > Batch Jobs screen.
To see more details, double-click the batch job.
To see how many individual files are already linked, double-click the item.
Unlinking Project Data
To unlink the data, go to the root level of your project and select the linked folder or if you have linked individual files separately, then you can select those linked files (limited to 100 at a time) and select Manage > Unlink. As during linking a folder, when unlinking, the progress can be monitored at Projects > your_project > activity > Batch Jobs.
import requests
import json
from config import PROJECT_ID, DATA_ID, API_KEY
url_get="https://ica.illumina.com/ica/rest/api/projects/" + PROJECT_ID + "/data/" + DATA_ID
# set the API get headers
headers = {
'X-API-Key': API_KEY,
'accept': 'application/vnd.illumina.v3+json'
}
# set the API put headers
headers_put = {
'X-API-Key': API_KEY,
'accept': 'application/vnd.illumina.v3+json',
'Content-Type': 'application/vnd.illumina.v3+json'
}
# Helper function to insert willBeArchivedAt after field named 'region'
def insert_after_region(details_dict, timestamp):
new_dict = {}
for k, v in details_dict.items():
new_dict[k] = v
if k == 'region':
new_dict['willBeArchivedAt'] = timestamp
if 'willBeArchivedAt' in details_dict:
new_dict['willBeArchivedAt'] = timestamp
return new_dict
# 1. Make the GET request
response = requests.get(url_get, headers=headers)
response_data = response.json()
# 2. Modify the JSON data
timestamp = "2024-01-26T12:00:04Z" # Replace with the provided timestamp
response_data['data']['details'] = insert_after_region(response_data['data']['details'], timestamp)
# 3. Make the PUT request
put_response = requests.put(url_get, data=json.dumps(response_data), headers=headers_put)
print(put_response.status_code)
To add filters, select the funnel/filtersymbol at the top right, next to the search field.
Filters are reset when you exit the current screen.
Sorting
To sort data, select the three vertical dots in the column header on which you want to sort and chose ascending or descending.
Sorting is retained when you exit the current screen.
Displaying Columns
To change which columns are displayed, select the threecolumnssymbol and select which columns should be shown.
You can keep track of which files are externally controlled and which are ICA-managed by means of the “managed by” column.
The displayed columns are retained when you exit the current screen.
Replace
Overwrites the existing data. Folders will copy their data in an existing folder with existing files. Existing files will be replaced when a file with the same name is copied and new files will be added. The remaining files in the target folder will remain unchanged.
Don't copy
The original files are kept. If you selected a folder, files that do not yet exist in the destination folder are added to it. Files that already exist at the destination are not copied over and the originals are kept.
Keep both
Files have a number appended to them if they already exist. If you copy folders, the folders are merged, with new files added to the destination folder and original files kept. New files with the same name get copied over into the folder with a number appended.
The build number, together with the used libraries and licenses are provided in the accompanying readme file.
icav2
icav2 analysisstorages
icav2 analysisstorages list
icav2 completion
This command generates custom completion functions for icav2 tool. These functions facilitate the generation of context-aware suggestions based on the user's input and specific directives provided by the icav2 tool. For example, for ZSH shell the completion function _icav2() is generated. It could provide suggestions for available commands, flags, and arguments depending on the context, making it easier for the user to interact with the tool without having to constantly refer to documentation.
To enable this custom completion function, you would typically include it in your Zsh configuration (e.g., in .zshrc or a separate completion script) and then use the compdef command to associate the function with the icav2 command:
This way, when the user types icav2 followed by a space and presses the TAB key, Zsh will call the _icav2 function to provide context-aware suggestions based on the user's input and the icav2 tool's directives.
icav2 completion bash
icav2 completion fish
icav2 completion powershell
icav2 completion zsh
icav2 config
icav2 config get
icav2 config reset
icav2 config set
icav2 dataformats
icav2 dataformats list
icav2 help
icav2 jobs
icav2 jobs get
icav2 metadatamodels
icav2 metadatamodels list
icav2 pipelines
icav2 pipelines get
icav2 pipelines list
icav2 projectanalyses
icav2 projectanalyses get
icav2 projectanalyses input
icav2 projectanalyses list
icav2 projectanalyses output
icav2 projectanalyses update
icav2 projectdata
icav2 projectdata archive
icav2 projectdata copy
icav2 projectdata create
icav2 projectdata delete
icav2 projectdata download
Example 1
Using this command all the files starting with VariantCaller- will be downloaded (prerequisite: a tool is installed on the machine):
Example 2
Here an example of how to download all BAM files from a project (we are using some jq features to remove '.bam.bai' and '.bam.md5sum' files)
Tip: If you want to look up a file id from the GUI, go to that file and open te details view. The file id can be found on the top left side and will begin with fil.
icav2 projectdata downloadurl
icav2 projectdata folderuploadsession
icav2 projectdata get
icav2 projectdata link
icav2 projectdata list
It is best practice to always surround your path with quotes if you want to use the * wildcard. Otherwise, you may run into situations where the command results in "accepts at most 1 arg(s), received x" as it returns folders with the same name, but different amounts of subfolders.
For more information on how to use pagination, please refer to
If you want to look up a file id from the GUI, go to that file and open te details view. The file id can be found on the top left side and will begin with fil.
Example to list files in the folder SOURCE
Example to list only subfolders in the folder SOURCE
icav2 projectdata mount
icav2 projectdata move
icav2 projectdata temporarycredentials
icav2 projectdata unarchive
icav2 projectdata unlink
icav2 projectdata unmount
icav2 projectdata update
icav2 projectdata upload
Example for uploading multiple files
In this example all the fastq.gz files from source will be uploaded to target using xargs utility.
Example for uploading multiple files using a CSV file
In this example we upload multiple bam files specified with the corresponding path in the file bam_files.csv. The files will be renamed. We are using screen in detached mode (this creates a new session but not attaching to it):
icav2 projectpipelines
icav2 projectpipelines create
icav2 projectpipelines create cwl
icav2 projectpipelines create cwljson
icav2 projectpipelines create nextflow
icav2 projectpipelines create nextflowjson
icav2 projectpipelines input
icav2 projectpipelines link
icav2 projectpipelines list
icav2 projectpipelines start
icav2 projectpipelines start cwl
icav2 projectpipelines start cwljson
Field definition
A field can only have values (--field) and a data field can only have datavalues (--field-data). To create multiple fields or data fields, you have to repeat the flag.
For example
matches
The following example with --field and --field-data
matches
Group definition
A group will only have values (--group) and a data group can only have datavalues (--group-data). Add flags multiple times for multiple groups and fields in the group.
icav2 projectpipelines start nextflow
icav2 projectpipelines start nextflowjson
Field definition
A field can only have values (--field) and a data field can only have datavalues (--field-data). To create multiple fields or data fields, you have to repeat the flag.
For example
matches
The following example with --field and --field-data
matches
Group definition
A group will only have values (--group) and a data group can only have datavalues (--group-data). Add flags multiple times for multiple groups and fields in the group.
icav2 projectpipelines unlink
icav2 projects
icav2 projects create
icav2 projects enter
icav2 projects exit
icav2 projects get
icav2 projects list
icav2 projectsamples
icav2 projectsamples complete
icav2 projectsamples create
icav2 projectsamples delete
icav2 projectsamples get
icav2 projectsamples link
icav2 projectsamples list
icav2 projectsamples listdata
icav2 projectsamples unlink
icav2 projectsamples update
icav2 regions
icav2 regions list
icav2 storagebundles
icav2 storagebundles list
icav2 storageconfigurations
icav2 storageconfigurations list
icav2 tokens
icav2 tokens create
icav2 tokens refresh
icav2 version
Command line interface for the Illumina Connected Analytics, a genomics platform-as-a-service
Usage:
icav2 [command]
Available Commands:
analysisstorages Analysis storages commands
completion Generate the autocompletion script for the specified shell
config Config actions
dataformats Data format commands
help Help about any command
jobs Job commands
metadatamodels Metadata model commands
pipelines Pipeline commands
projectanalyses Project analyses commands
projectdata Project Data commands
projectpipelines Project pipeline commands
projects Project commands
projectsamples Project samples commands
regions Region commands
storagebundles Storage bundle commands
storageconfigurations Storage configurations commands
tokens Tokens commands
version The version of this application
Flags:
-t, --access-token string JWT used to call rest service
-h, --help help for icav2
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-v, --version version for icav2
-k, --x-api-key string api key used to call rest service
Use "icav2 [command] --help" for more information about a command.
This is the root command for actions that act on analysis storages
Usage:
icav2 analysisstorages [command]
Available Commands:
list list of storage id's
Flags:
-h, --help help for analysisstorages
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 analysisstorages [command] --help" for more information about a command.
This command lists all the analysis storage id's
Usage:
icav2 analysisstorages list [flags]
Flags:
-h, --help help for list
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
compdef _icav2 icav2
Generate the autocompletion script for icav2 for the specified shell.
See each sub-command's help for details on how to use the generated script.
Usage:
icav2 completion [command]
Available Commands:
bash Generate the autocompletion script for bash
fish Generate the autocompletion script for fish
powershell Generate the autocompletion script for powershell
zsh Generate the autocompletion script for zsh
Flags:
-h, --help help for completion
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 completion [command] --help" for more information about a command.
Generate the autocompletion script for the bash shell.
This script depends on the 'bash-completion' package.
If it is not installed already, you can install it via your OS's package manager.
To load completions in your current shell session:
source <(icav2 completion bash)
To load completions for every new session, execute once:
#### Linux:
icav2 completion bash > /etc/bash_completion.d/icav2
#### macOS:
icav2 completion bash > $(brew --prefix)/etc/bash_completion.d/icav2
You will need to start a new shell for this setup to take effect.
Usage:
icav2 completion bash
Flags:
-h, --help help for bash
--no-descriptions disable completion descriptions
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Generate the autocompletion script for the fish shell.
To load completions in your current shell session:
icav2 completion fish | source
To load completions for every new session, execute once:
icav2 completion fish > ~/.config/fish/completions/icav2.fish
You will need to start a new shell for this setup to take effect.
Usage:
icav2 completion fish [flags]
Flags:
-h, --help help for fish
--no-descriptions disable completion descriptions
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Generate the autocompletion script for powershell.
To load completions in your current shell session:
icav2 completion powershell | Out-String | Invoke-Expression
To load completions for every new session, add the output of the above command
to your powershell profile.
Usage:
icav2 completion powershell [flags]
Flags:
-h, --help help for powershell
--no-descriptions disable completion descriptions
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Generate the autocompletion script for the zsh shell.
If shell completion is not already enabled in your environment you will need
to enable it. You can execute the following once:
echo "autoload -U compinit; compinit" >> ~/.zshrc
To load completions in your current shell session:
source <(icav2 completion zsh)
To load completions for every new session, execute once:
#### Linux:
icav2 completion zsh > "${fpath[1]}/_icav2"
#### macOS:
icav2 completion zsh > $(brew --prefix)/share/zsh/site-functions/_icav2
You will need to start a new shell for this setup to take effect.
Usage:
icav2 completion zsh [flags]
Flags:
-h, --help help for zsh
--no-descriptions disable completion descriptions
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Config command provides functions for CLI configuration management.
Usage:
icav2 config [command]
Available Commands:
get Get configuration information
reset Remove the configuration information
set Set configuration information
Flags:
-h, --help help for config
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 config [command] --help" for more information about a command.
Get configuration information.
Usage:
icav2 config get [flags]
Flags:
-h, --help help for get
Remove configuration information.
Usage:
icav2 config reset [flags]
Flags:
-h, --help help for reset
Set configuration information. Following information is asked when starting the command :
- server-url : used to form the url for the rest api's.
- x-api-key : api key used to fetch the JWT used to authenticate to the API server.
- colormode : set depending on your background color of your terminal. Input's and errors are colored. Default is 'none', meaning that no colors will be used in the output.
- table-format : Output layout, defaults to a table, other allowed values are json and yaml
Usage:
icav2 config set [flags]
Flags:
-h, --help help for set
This is the root command for actions that act on Data formats
Usage:
icav2 dataformats [command]
Available Commands:
list List data formats
Flags:
-h, --help help for dataformats
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 dataformats [command] --help" for more information about a command.
This command lists the data formats you can use inside of a project
Usage:
icav2 dataformats list [flags]
Flags:
-h, --help help for list
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Help provides help for any command in the application.
Simply type icav2 help [path to command] for full details.
Usage:
icav2 help [command] [flags]
Flags:
-h, --help help for help
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This is the root command for actions that act on jobs
Usage:
icav2 jobs [command]
Available Commands:
get Get details of a job
Flags:
-h, --help help for jobs
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 jobs [command] --help" for more information about a command.
This command fetches the details of a job using the argument as an id (uuid).
Usage:
icav2 jobs get [job id] [flags]
Flags:
-h, --help help for get
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This is the root command for actions that act on metadata models
Usage:
icav2 metadatamodels [command]
Available Commands:
list list of metadata models
Flags:
-h, --help help for metadatamodels
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 metadatamodels [command] --help" for more information about a command.
This command lists all the metadata models
Usage:
icav2 metadatamodels list [flags]
Flags:
-h, --help help for list
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This is the root command for actions that act on pipelines
Usage:
icav2 pipelines [command]
Available Commands:
get Get details of a pipeline
list List pipelines
Flags:
-h, --help help for pipelines
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 pipelines [command] --help" for more information about a command.
This command fetches the details of a pipeline without a project context
Usage:
icav2 pipelines get [pipeline id] [flags]
Flags:
-h, --help help for get
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command lists the pipelines without the context of a project
Usage:
icav2 pipelines list [flags]
Flags:
-h, --help help for list
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This is the root command for actions that act on projects analysis
Usage:
icav2 projectanalyses [command]
Available Commands:
get Get the details of an analysis
input Retrieve input of analyses commands
list List of analyses for a project
output Retrieve output of analyses commands
update Update tags of analyses
Flags:
-h, --help help for projectanalyses
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 projectanalyses [command] --help" for more information about a command.
This command returns all the details of a analysis.
Usage:
icav2 projectanalyses get [analysis id] [flags]
Flags:
-h, --help help for get
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Retrieve input of analyses commands
Usage:
icav2 projectanalyses input [analysisId] [flags]
Flags:
-h, --help help for input
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command lists the analyses for a given project. Sorting can be done on
- reference
- userReference
- pipeline
- status
- startDate
- endDate
- summary
Usage:
icav2 projectanalyses list [flags]
Flags:
-h, --help help for list
--max-items int maximum number of items to return, the limit and default is 1000
--page-offset int Page offset, only used in combination with sort-by. Offset-based pagination has a result limit of 200K rows and does not guarantee unique results across pages
--page-size int32 Page size, only used in combination with sort-by. The amount of rows to return. Use in combination with the offset or cursor parameter to get subsequent results. Default and max value of pagesize=1000 (default 1000)
--project-id string project ID to set current project context
--sort-by string specifies the order to list items
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Retrieve output of analyses commands
Usage:
icav2 projectanalyses output [analysisId] [flags]
Flags:
-h, --help help for output
--project-id string project ID to set current project context
--raw-output Add this flag if output should be in raw format. Applies only for Cwl pipelines ! This flag needs no value, adding it sets the value to true.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Updates the user and technical tags of an analysis
Usage:
icav2 projectanalyses update [analysisId] [flags]
Flags:
--add-tech-tag stringArray Tech tag to add to analysis. Add flag multiple times for multiple values.
--add-user-tag stringArray User tag to add to analysis. Add flag multiple times for multiple values.
-h, --help help for update
--project-id string project ID to set current project context
--remove-tech-tag stringArray Tech tag to remove from analysis. Add flag multiple times for multiple values.
--remove-user-tag stringArray User tag to remove from analysis. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This is the root command for actions that act on projects data
Usage:
icav2 projectdata [command]
Available Commands:
archive archive data
copy Copy data to a project
create Create data id for a project
delete delete data
download Download a file/folder
downloadurl get download url
folderuploadsession Get details of a folder upload
get Get details of a data
link Link data to a project
list List data
mount Mount project data
move Move data to a project
temporarycredentials fetch temporal credentials for data
unarchive unarchive data
unlink Unlink data to a project
unmount Unmount project data
update Updates the details of a data
upload Upload a file/folder
Flags:
-h, --help help for projectdata
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 projectdata [command] --help" for more information about a command.
This command archives data for a given project
Usage:
icav2 projectdata archive [path or data Id] [flags]
Flags:
-h, --help help for archive
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command copies data between projects. Use data id or a combination of path and --source-project-id to identify the source data. By default, the root folder of your current project will be used as destination. If you want to specify a destination, use --destination-folder to specify the destination path or folder id.
Usage:
icav2 projectdata copy [data id] or [path] [flags]
Flags:
--action-on-exist string what to do when a file or folder with the same name already exists: OVERWRITE|SKIP|RENAME (default "SKIP")
--background starts job in background on server. Does not provide upload progress updates. Use icav2 jobs get with the current job.id value
--copy-instrument-info copy instrument info form source data to destination data
--copy-technical-tags copy technical tags form source data to destination data
--copy-user-tags copy user tags form source data to destination data
--destination-folder string folder id or path to where you want to copy the data, default root of project
-h, --help help for copy
--polling-interval int polling interval in seconds for job status, values lower than 30 will be set to 30 (default 30)
--project-id string project ID to set current project context
--source-project-id string project ID from where the data needs to be copied, mandatory when using source path notation
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command creates a data on a project. It takes name of file/folder as an argument
Usage:
icav2 projectdata create [name] [flags]
Flags:
--data-type string (*) Data type : FILE or FOLDER
--folder-id string Id of the folder
--folder-path string Folder path under which the new project data will be created.
--format string Only allowed for file, sets the format of the file.
-h, --help help for create
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command deletes data for a given project
Usage:
icav2 projectdata delete [path or dataId] [flags]
Flags:
-h, --help help for delete
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Download a file/folder. Source path can be a data id or a path. Source path for download of a folder should end with '*'. For files : Target defines either local folder into which the download will occur, or a path with a new name for the file. If the file already exists locally, it is overwritten. For folders : If folder does not exist locally, it will be created automatically. Overwrite of an existing folder will need to be acknowledged.
Usage:
icav2 projectdata download [source data id or path] [target path] [flags]
Flags:
--exclude string Regex filter for file names to exclude from download.
--exclude-source-path Indicates that on folder download, the CLI will not create the parent folders of the downloaded folder in ICA on your local machine.
-h, --help help for download
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
icav2 projectdata list --data-type FILE --file-name VariantCaller- --match-mode FUZZY -o json | jq -r '.items[].id' > filelist.txt; for item in $(cat filelist.txt); do echo "--- $item ---"; icav2 projectdata download $item . ; done;
icav2 projectdata list --file-name .bam --match-mode FUZZY -o json | jq -r '.items[] | select(.details.format.code == "BAM") | [.id] | @tsv' > filelist.txt; for item in $(cat filelist.txt); do echo "--- $item ---"; icav2 projectdata download $item . ; done
This command returns the data download url for a given project
Usage:
icav2 projectdata downloadurl [path or data Id] [flags]
Flags:
-h, --help help for downloadurl
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command fetches the details a folder upload
Usage:
icav2 projectdata folderuploadsession [project id] [data id] [folder upload session id] [flags]
Flags:
-h, --help help for folderuploadsession
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command fetches the details a data
Usage:
icav2 projectdata get [data id] or [path] [flags]
Flags:
-h, --help help for get
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This links data to a project. Use data id or the path + the source project flag identify the data.
Usage:
icav2 projectdata link [data id] or [path] [flags]
Flags:
-h, --help help for link
--project-id string project ID to set current project context
--source-project-id string project ID from where the data needs to be linked
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command lists the data for a given project. Page-offset can only be used in combination with sort-by. Sorting can be done on
- timeCreated
- timeModified
- name
- path
- fileSizeInBytes
- status
- format
- dataType
- willBeArchivedAt
- willBeDeletedAt
Usage:
icav2 projectdata list [path] [flags]
Flags:
--data-type string Data type. Available values : FILE or FOLDER
--eligible-link Add this flag if output should contain only the data that is eligible for linking on the current project. This flag needs no value, adding it sets the value to true.
--file-name stringArray The filenames to filter on. The filenameMatchMode-parameter determines how the filtering is done. Add flag multiple times for multiple values.
-h, --help help for list
--match-mode string Match mode for the file name. Available values : EXACT (default), EXCLUDE, FUZZY.
--max-items int maximum number of items to return, the limit and default is 1000
--page-offset int Page offset, only used in combination with sort-by. Offset-based pagination has a result limit of 200K rows and does not guarantee unique results across pages
--page-size int32 Page size, only used in combination with sort-by. The amount of rows to return. Use in combination with the offset or cursor parameter to get subsequent results. Default and max value of pagesize=1000 (default 1000)
--parent-folder Indicates that the given argument is path of the parent folder. All children are selected for list, not the folder itself. This flag needs no value, adding it sets the value to true.
--project-id string project ID to set current project context
--sort-by string specifies the order to list items
--status stringArray Add the status of the data. Available values : PARTIAL, AVAILABLE, ARCHIVING, ARCHIVED, UNARCHIVING, DELETING. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
icav2 projectdata list --project-id <project_id> --parent-folder /SOURCE/
icav2 projectdata list --project-id <project_id> --parent-folder /SOURCE/ --data-type FOLDER
This command mounts the project data as a file system directory for a given project
Usage:
icav2 projectdata mount [mount directory path] [flags]
Flags:
--allow-other Allow other users to access this project
-h, --help help for mount
--list List currently mounted projects
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command moves data between projects. Use data id or a combination of path and --source-project-id to identify the source data. By default, the root folder of your current project will be used as destination. If you want to specify a destination, use --destination-folder to specify the destination path or folder id.
Usage:
icav2 projectdata move [data id] or [path] [flags]
Flags:
--background starts job in background on server. Does not provide upload progress updates. Use icav2 jobs get with the current job.id value
--destination-folder string folder id or path to where you want to move the data, default root of project
-h, --help help for move
--polling-interval int polling interval in seconds for job status, values lower than 30 will be set to 30 (default 30)
--project-id string project ID to set current project context
--source-project-id string project ID from where the data needs to be moved, mandatory when using source path notation
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command fetches temporal AWS and Rclone credentials for a given project-data. If path is given, project id from the flag --project-id is used. If flag not present project is taken from the context
Usage:
icav2 projectdata temporarycredentials [path or data Id] [flags]
Flags:
-h, --help help for temporarycredentials
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command unarchives data for a given project
Usage:
icav2 projectdata unarchive [path or dataId] [flags]
Flags:
-h, --help help for unarchive
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This unlinks data from a project. Use path or id to identifiy the data.
Usage:
icav2 projectdata unlink [data id] or [path] [flags]
Flags:
-h, --help help for unlink
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command unmounts previously mounted project data
Usage:
icav2 projectdata unmount [flags]
Flags:
--directory-path string Set path to unmount
-h, --help help for unmount
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command updates some details of a data. Only user/tech tags, format and dates of will be archived/delete can be updated.
Usage:
icav2 projectdata update [data id] or [path] [flags]
Flags:
--add-tech-tag stringArray Tech tag to add. Add flag multiple times for multiple values.
--add-user-tag stringArray User tag to add. Add flag multiple times for multiple values.
--format-code string Format to assign to the data. Only available for files.
-h, --help help for update
--project-id string project ID to set current project context
--remove-tech-tag stringArray Tech tag to remove. Add flag multiple times for multiple values.
--remove-user-tag stringArray User tag to remove. Add flag multiple times for multiple values.
--will-be-archived-at string Time when data will be archived. Format is YYYY-MM-DD. Time is set to 00:00:00UTC time. Only available for files.
--will-be-deleted-at string Time when data will be deleted. Format is YYYY-MM-DD. Time is set to 00:00:00UTC time. Only available for files.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Upload a file/folder. For files : if the target path does not already exist, it will be created automatically. For folders : overwrite will need to be acknowledged. Argument "icapath" is optional.
Usage:
icav2 projectdata upload [local path] [icapath] [flags]
Flags:
--existing-sample Link to existing sample
-h, --help help for upload
--new-sample Create and link to new sample
--num-workers int number of workers to parallelize. Default calculated based on CPUs available.
--project-id string project ID to set current project context
--sample-description string Set Sample Description for new sample
--sample-id string Set Sample id of existing sample
--sample-name string Set Sample name for new sample or from existing sample
--sample-technical-tag stringArray Set Sample Technical tag for new sample
--sample-user-tag stringArray Set Sample User tag for new sample
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This is the root command for actions that act on projects pipeline
Usage:
icav2 projectpipelines [command]
Available Commands:
create Create a pipeline
input Retrieve input parameters of pipeline
link Link pipeline to a project
list List of pipelines for a project
start Start a pipeline
unlink Unlink pipeline from a project
Flags:
-h, --help help for projectpipelines
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 projectpipelines [command] --help" for more information about a command.
This command creates a pipeline in the current project
Usage:
icav2 projectpipelines create [command]
Available Commands:
cwl Create a cwl pipeline
cwljson Create a cwl Json pipeline
nextflow Create a nextflow pipeline
nextflowjson Create a nextflow Json pipeline
Flags:
-h, --help help for create
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 projectpipelines create [command] --help" for more information about a command.
This command creates a CWL pipeline in the current project using the argument as code for the pipeline
Usage:
icav2 projectpipelines create cwl [code] [flags]
Flags:
--category stringArray Category of the cwl pipeline. Add flag multiple times for multiple values.
--comment string Version comments
--description string (*) Description of pipeline
-h, --help help for cwl
--html-doc string Html documentation for the cwl pipeline
--links string links in json format
--parameter string (*) Path to the parameter XML file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--project-id string project ID to set current project context
--proprietary Add the flag if this pipeline is proprietary
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'.
--tool stringArray Path to the tool cwl file. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--workflow string (*) Path to the workflow cwl file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command creates a CWL Json pipeline in the current project using the argument as code for the pipeline
Usage:
icav2 projectpipelines create cwljson [code] [flags]
Flags:
--category stringArray Category of the cwl pipeline. Add flag multiple times for multiple values.
--comment string Version comments
--description string (*) Description of pipeline
-h, --help help for cwljson
--html-doc string Html documentation for the cwl pipeline
--inputForm string (*) Path to the input form file.
--links string links in json format
--onRender string Path to the on render file.
--onSubmit string Path to the on submit file.
--otherInputForm stringArray Path to the other input form files. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--project-id string project ID to set current project context
--proprietary Add the flag if this pipeline is proprietary
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'.
--tool stringArray Path to the tool cwl file. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--workflow string (*) Path to the workflow cwl file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command creates a Nextflow pipeline in the current project
Usage:
icav2 projectpipelines create nextflow [code] [flags]
Flags:
--category stringArray Category of the nextflow pipeline. Add flag multiple times for multiple values.
--comment string Version comments
--config string Path to the config nextflow file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--description string (*) Description of pipeline
-h, --help help for nextflow
--html-doc string Html documentation fo the nexflow pipeline
--links string links in json format
--main string (*) Path to the main nextflow file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--nextflow-version string Version of nextflow language to use.
--other stringArray Path to the other nextflow file. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--parameter string (*) Path to the parameter XML file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--project-id string project ID to set current project context
--proprietary Add the flag if this pipeline is proprietary
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command creates a Nextflow Json pipeline in the current project
Usage:
icav2 projectpipelines create nextflowjson [code] [flags]
Flags:
--category stringArray Category of the nextflow pipeline. Add flag multiple times for multiple values.
--comment string Version comments
--config string Path to the config nextflow file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--description string (*) Description of pipeline
-h, --help help for nextflowjson
--html-doc string Html documentation fo the nexflow pipeline
--inputForm string (*) Path to the input form file.
--links string links in json format
--main string (*) Path to the main nextflow file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--nextflow-version string Version of nextflow language to use.
--onRender string Path to the on render file.
--onSubmit string Path to the on submit file.
--other stringArray Path to the other nextflow file. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--otherInputForm stringArray Path to the other input form files. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--project-id string project ID to set current project context
--proprietary Add the flag if this pipeline is proprietary
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Retrieve input parameters of pipeline
Usage:
icav2 projectpipelines input [pipelineId] [flags]
Flags:
-h, --help help for input
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This links a pipeline to a project. Use code or id to identifiy the pipeline. If code is not found, argument is used as id.
Usage:
icav2 projectpipelines link [pipeline code] or [pipeline id] [flags]
Flags:
-h, --help help for link
--project-id string project ID to set current project context
--source-project-id string project ID from where the pipeline needs to be linked, mandatory when using pipeline code
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command lists the pipelines for a given project
Usage:
icav2 projectpipelines list [flags]
Flags:
-h, --help help for list
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command starts a pipeline in the current project
Usage:
icav2 projectpipelines start [command]
Available Commands:
cwl Start a CWL pipeline
cwljson Start a CWL Json pipeline
nextflow Start a Nextflow pipeline
nextflowjson Start a Nextflow Json pipeline
Flags:
-h, --help help for start
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 projectpipelines start [command] --help" for more information about a command.
This command starts a CWL pipeline for a given pipeline id, or for a pipeline code from the current project.
Usage:
icav2 projectpipelines start cwl [pipeline id] or [code] [flags]
Flags:
--data-id stringArray Enter data id's as follows : dataId{optional-mount-path} . Add flag multiple times for multiple values. Mount path is optional and can be absolute and relative and can not contain curly braces.
--data-parameters stringArray Enter data-parameters as follows : parameterCode:referenceDataId . Add flag multiple times for multiple values.
-h, --help help for cwl
--idempotency-key string Add a maximum 255 character idempotency key to prevent duplicate requests. The response is retained for 7 days so the key must be unique during that timeframe.
--input stringArray Enter inputs as follows : parametercode:dataId,dataId{optional-mount-path},dataId,... . Add flag multiple times for multiple values. Mount path is optional and can be absolute and relative and can not contain curly braces and commas.
--input-json string Analysis input JSON string. JSON input works only with file-based CWL pipelines (built using code, not a graphical editor in ICA).
--output-parent-folder string The id of the folder in which the output folder should be created.
--parameters stringArray Enter single-value parameters as code:value. Enter multi-value parameters as code:"'value1','value2','value3'". To add multiple values, add the flag multiple times.
--project-id string project ID to set current project context
--reference-tag stringArray Reference tag. Add flag multiple times for multiple values.
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'
--technical-tag stringArray Technical tag. Add flag multiple times for multiple values.
--type-input string (*) Input type STRUCTURED or JSON
--user-reference string (*) User reference
--user-tag stringArray User tag. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command starts a CWL Json pipeline for a given pipeline id, or for a pipeline code from the current project. See ICA CLI documentation for more information (https://help.ica.illumina.com/).
Usage:
icav2 projectpipelines start cwljson [pipeline id] or [code] [flags]
Flags:
--field stringArray Fields. Add flag multiple times for multiple fields. --field fieldA:value --field multivalueFieldB:value1,value2
--field-data stringArray Data fields. Add flag multiple times for multiple fields. --field-data fieldA:fil.id --field-data multivalueFieldB:fil.id1,fil.id2
--group stringArray Groups. Add flag multiple times for multiple fields in the group. --group groupA.index1.multivalueFieldA:value1,value2 --group groupA.index1.fieldB:value --group groupB.index1.fieldA:value --group groupB.index2.fieldA:value
--group-data stringArray Data groups. Add flag multiple times for multiple fields in the group. --group-data groupA.index1.multivalueFieldA:fil.id1,fil.id2 --group-data groupA.index1.fieldB:fil.id --group-data groupB.index1.fieldA:fil.id --group-data groupB.index2.fieldA:fil.id
-h, --help help for cwljson
--idempotency-key string Add a maximum 255 character idempotency key to prevent duplicate requests. The response is retained for 7 days so the key must be unique during that timeframe.
--output-parent-folder string The id of the folder in which the output folder should be created.
--project-id string project ID to set current project context
--reference-tag stringArray Reference tag. Add flag multiple times for multiple values.
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 list'.
--technical-tag stringArray Technical tag. Add flag multiple times for multiple values.
--user-reference string (*) User reference
--user-tag stringArray User tag. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
--field asection:SECTION1
--field atext:"this is atext text"
--field ttt:tb1
--field notallowedrole:f
--field notallowedcondition:"this is a not allowed text box"
--field maxagesum:20
--field-data txts1:fil.ade9bd0b6113431a2de108d9fe48a3d8
--field-data txts2:fil.ade9bd0b6113431a2de108d9fe48a3d7{/dir1/dir2},fil.ade9bd0b6113431a2de108d9fe48a3d6{/dir3/dir4}
This command starts a Nextflow pipeline for a given pipeline id, or for a pipeline code from the current project.
Usage:
icav2 projectpipelines start nextflow [pipeline id] or [code] [flags]
Flags:
--data-parameters stringArray Enter data-parameters as follows : parameterCode:referenceDataId . Add flag multiple times for multiple values.
-h, --help help for nextflow
--idempotency-key string Add a maximum 255 character idempotency key to prevent duplicate requests. The response is retained for 7 days so the key must be unique during that timeframe.
--input stringArray Enter inputs as follows : parametercode:dataId,dataId{optional-mount-path},dataId,... . Add flag multiple times for multiple values. Mount path is optional and can be absolute and relative and can not contain curly braces and commas.
--output-parent-folder string The id of the folder in which the output folder should be created.
--parameters stringArray Enter single-value parameters as code:value. Enter multi-value parameters as code:"'value1','value2','value3'". To add multiple values, add the flag multiple times.
--project-id string project ID to set current project context
--reference-tag stringArray Reference tag. Add flag multiple times for multiple values.
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 list'.
--technical-tag stringArray Technical tag. Add flag multiple times for multiple values.
--user-reference string (*) User reference
--user-tag stringArray User tag. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command starts a Nextflow Json pipeline for a given pipeline id, or for a pipeline code from the current project. See ICA CLI documentation for more information (https://help.ica.illumina.com/).
Usage:
icav2 projectpipelines start nextflowjson [pipeline id] or [code] [flags]
Flags:
--field stringArray Fields. Add flag multiple times for multiple fields. --field fieldA:value --field multivalueFieldB:value1,value2
--field-data stringArray Data fields. Add flag multiple times for multiple fields. --field-data fieldA:fil.id --field-data multivalueFieldB:fil.id1,fil.id2
--group stringArray Groups. Add flag multiple times for multiple fields in the group. --group groupA.index1.multivalueFieldA:value1,value2 --group groupA.index1.fieldB:value --group groupB.index1.fieldA:value --group groupB.index2.fieldA:value
--group-data stringArray Data groups. Add flag multiple times for multiple fields in the group. --group-data groupA.index1.multivalueFieldA:fil.id1,fil.id2 --group-data groupA.index1.fieldB:fil.id --group-data groupB.index1.fieldA:fil.id --group-data groupB.index2.fieldA:fil.id
-h, --help help for nextflowjson
--idempotency-key string Add a maximum 255 character idempotency key to prevent duplicate requests. The response is retained for 7 days so the key must be unique during that timeframe.
--output-parent-folder string The id of the folder in which the output folder should be created.
--project-id string project ID to set current project context
--reference-tag stringArray Reference tag. Add flag multiple times for multiple values.
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 list'.
--technical-tag stringArray Technical tag. Add flag multiple times for multiple values.
--user-reference string (*) User reference
--user-tag stringArray User tag. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
--field asection:SECTION1
--field atext:"this is atext text"
--field ttt:tb1
--field notallowedrole:f
--field notallowedcondition:"this is a not allowed text box"
--field maxagesum:20
--field-data txts1:fil.ade9bd0b6113431a2de108d9fe48a3d8
--field-data txts2:fil.ade9bd0b6113431a2de108d9fe48a3d7{/dir1/dir2},fil.ade9bd0b6113431a2de108d9fe48a3d6{/dir3/dir4}
This unlinks a pipeline from a project. Use code or id to identifiy the pipeline. If code is not found, argument is used as id.
Usage:
icav2 projectpipelines unlink [pipeline code] or [pipeline id] [flags]
Flags:
-h, --help help for unlink
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This is the root command for actions that act on projects
Usage:
icav2 projects [command]
Available Commands:
create Create a project
enter Enter project context
exit Exit project context
get Get details of a project
list List projects
Flags:
-h, --help help for projects
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 projects [command] --help" for more information about a command.
This command creates a project.
Usage:
icav2 projects create [projectname] [flags]
Flags:
--billing-mode string Billing mode , defaults to PROJECT (default "PROJECT")
--data-sharing Indicates whether the data and samples created in this project can be linked to other Projects. This flag needs no value, adding it sets the value to true.
-h, --help help for create
--info string Info about the project
--metadata-model string Id of the metadata model.
--owner string Owner of the project. Default is the current user
--region string Region of the project. When not specified : takes a default when there is only 1 region, else a choice will be given.
--short-descr string Short pipelineDescription of the project
--storage-bundle string Id of the storage bundle. When not specified : takes a default when there is only 1 bundle, else a choice will be given.
--storage-config string An optional storage configuration id to have self managed storage.
--storage-config-sub-folder string Required when specifying a storageConfigurationId. The subfolder determines the object prefix of your self managed storage.
--technical-tag stringArray Technical tags for this project. Add flag multiple times for multiple values.
--user-tag stringArray User tags for this project. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command sets the project context for future commands
Usage:
icav2 projects enter [projectname] or [project id] [flags]
Flags:
-h, --help help for enter
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command switches the user back to their personal context
Usage:
icav2 projects exit [flags]
Flags:
-h, --help help for exit
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command fetches the details of the current project. If no project id is given, we take the one from the config file.
Usage:
icav2 projects get [project id] [flags]
Flags:
-h, --help help for get
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command lists the projects for the current user. Page-offset can only be used in combination with sort-by. Sorting can be done on
- name
- shortDescription
- information
Usage:
icav2 projects list [flags]
Flags:
-h, --help help for list
--max-items int maximum number of items to return, the limit and default is 1000
--page-offset int Page offset, only used in combination with sort-by. Offset-based pagination has a result limit of 200K rows and does not guarantee unique results across pages
--page-size int32 Page size, only used in combination with sort-by. The amount of rows to return. Use in combination with the offset or cursor parameter to get subsequent results. Default and max value of pagesize=1000 (default 1000)
--sort-by string specifies the order to list items
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This is the root command for actions that act on projects samples
Usage:
icav2 projectsamples [command]
Available Commands:
complete Set sample to complete
create Create a sample for a project
delete Delete a sample for a project
get Get details of a sample
link Link data to a sample for a project
list List of samples for a project
listdata List data from given sample
unlink Unlink data from a sample for a project
update Update a sample for a project
Flags:
-h, --help help for projectsamples
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 projectsamples [command] --help" for more information about a command.
The sample status will be set to 'Available' and a sample completed event will be triggered as well.
Usage:
icav2 projectsamples complete [sampleId] [flags]
Flags:
-h, --help help for complete
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command creates a sample for a project. It takes the name of the sample as argument.
Usage:
icav2 projectsamples create [name] [flags]
Flags:
--description string Description
-h, --help help for create
--project-id string project ID to set current project context
--technical-tag stringArray Technical tag. Add flag multiple times for multiple values.
--user-tag stringArray User tag. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command deletes a sample from a project. The different flags indicate the way they are deleted. Only 1 flag can be used.
Usage:
icav2 projectsamples delete [sampleId] [flags]
Flags:
--deep Delete the entire sample: sample and linked files will be deleted from your project.
-h, --help help for delete
--mark Mark a sample as deleted.
--unlink Unlinking the sample: sample is deleted and files are unlinked and available again for linking to another sample.
--with-input Delete the sample as well as its input data: sample is deleted from your project, the input files and pipeline output folders are still present in the project but will not be available for linking to a new sample.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command fetches the details a sample using the argument as a name, if nothing found, the argument is used as an id (uuid).
Usage:
icav2 projectsamples get [sample id] or [name] [flags]
Flags:
-h, --help help for get
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command adds data to a project sample. Argument is the id of the project sample
Usage:
icav2 projectsamples link [sampleId] [flags]
Flags:
--data-id stringArray (*) Data id of the data that needs to be linked to the project sample. Add flag multiple times for multiple values.
-h, --help help for link
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command lists the samples for a given project
Usage:
icav2 projectsamples list [flags]
Flags:
-h, --help help for list
--include-deleted Include the deleted samples in the list. Default set to false.
--project-id string project ID to set current project context
--technical-tag stringArray Technical tags to filter on. Add flag multiple times for multiple values.
--user-tag stringArray User tags to filter on. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command lists the data for a given sample. It only supports offset based, and default sorting is done on timeCreated. Sorting can be done on timeCreated
- timeModified
- name
- path
- fileSizeInBytes
- status
- format
- dataType
- willBeArchivedAt
- willBeDeletedAt
Usage:
icav2 projectsamples listdata [sampleId] [path] [flags]
Flags:
--data-type string Data type. Available values : FILE or FOLDER
--file-name stringArray The filenames to filter on. The filenameMatchMode-parameter determines how the filtering is done. Add flag multiple times for multiple values.
-h, --help help for listdata
--match-mode string Match mode for the file name. Available values : EXACT (default), EXCLUDE, FUZZY.
--max-items int maximum number of items to return, the limit and default is 1000
--page-offset int Page offset, only used in combination with sort-by. Offset-based pagination has a result limit of 200K rows and does not guarantee unique results across pages
--page-size int32 Page size, only used in combination with sort-by. The amount of rows to return. Use in combination with the offset or cursor parameter to get subsequent results. Default and max value of pagesize=1000 (default 1000)
--parent-folder Indicates that the given argument is path of the parent folder. All children are selected for list, not the folder itself. This flag needs no value, adding it sets the value to true.
--project-id string project ID to set current project context
--sort-by string specifies the order to list items (default "timeCreated Desc")
--status stringArray Add the status of the data. Available values : PARTIAL, AVAILABLE, ARCHIVING, ARCHIVED, UNARCHIVING, DELETING. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command removes data from a project sample. Argument is the id of the project sample
Usage:
icav2 projectsamples unlink [sampleId] [flags]
Flags:
--data-id stringArray (*) Data id of the data that will be removed from the project sample. Add flag multiple times for multiple values.
-h, --help help for unlink
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command updates a sample for a project. Name,description, user and technical tags can be updated
Usage:
icav2 projectsamples update [sampleId] [flags]
Flags:
--add-tech-tag stringArray Tech tag to add. Add flag multiple times for multiple values.
--add-user-tag stringArray User tag to add. Add flag multiple times for multiple values.
-h, --help help for update
--name string Name
--project-id string project ID to set current project context
--remove-tech-tag stringArray Tech tag to remove. Add flag multiple times for multiple values.
--remove-user-tag stringArray User tag to remove. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This is the root command for actions that act on regions
Usage:
icav2 regions [command]
Available Commands:
list list of regions
Flags:
-h, --help help for regions
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 regions [command] --help" for more information about a command.
This command lists all the regions
Usage:
icav2 regions list [flags]
Flags:
-h, --help help for list
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This is the root command for actions that act on storage bundles
Usage:
icav2 storagebundles [command]
Available Commands:
list list of storage bundles
Flags:
-h, --help help for storagebundles
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 storagebundles [command] --help" for more information about a command.
This command lists all the storage bundles id's
Usage:
icav2 storagebundles list [flags]
Flags:
-h, --help help for list
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This is the root command for actions that act on storage configurations
Usage:
icav2 storageconfigurations [command]
Available Commands:
list list of storage configurations
Flags:
-h, --help help for storageconfigurations
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 storageconfigurations [command] --help" for more information about a command.
This command lists all the storage configurations
Usage:
icav2 storageconfigurations list [flags]
Flags:
-h, --help help for list
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This is the root command for actions that act on tokens
Usage:
icav2 tokens [command]
Available Commands:
create Create a JWT token
refresh Refresh a JWT token from basic authentication
Flags:
-h, --help help for tokens
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 tokens [command] --help" for more information about a command.
This command creates a JWT token from the API key.
Usage:
icav2 tokens create [flags]
Flags:
-h, --help help for create
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
This command refreshes a JWT token from basic authentication with gantype JWT-bearer that is set with the -t flag.
Usage:
icav2 tokens refresh [flags]
Flags:
-h, --help help for refresh
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
The version of this application
Usage:
icav2 version [flags]
Flags:
-h, --help help for version
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service