Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
When using the applications provided on the platform for diagnostic purposes, it is the responsibility of the user to determine regulatory requirements and to validate for intended use, as appropriate.
The platform is hosted in regions listed below.
Australia
AU
Canada
CA
Germany
EU
India
IN
Indonesia
ID
Japan
JP
Singapore
SG
South Korea
KR
United Kingdom
GB
United Arab Emirates
AE
United States
US
The platform hosts a suite of RESTful HTTP-based application programming interfaces (APIs) to perform operations on data and analysis resources. A web application user-interface is hosted alongside the API to deliver an interactive visualization of the resources and enables additional functionality beyond automated analysis and data transfer. Storage and compute costs are presented via usage information in the account console, and a variety of compute resource options are specifiable for applications to fine tune efficiency.
The user documentation provides material for learning the basics of interacting with the platform including examples and tutorials. Start with the Get Started documentation to learn more.
Use the search bar on the top right to navigate through the help docs and find specific topics of interest.
If you have any questions, contact Illumina Technical Support by phone or email:
Illumina Technical Support | [email protected] | 1-800-809-4566
For customers outside the United States, Illumina regional Technical Support contact information can be found at www.illumina.com/company/contact-us.html.
To see the current ICA version you are logged in to, click your username found on the top right of the screen and then select About.
To view a list of the products to which you have access, select the 9 dots symbol at the top right of ICA. This will list your products. If you have multiple regional applications for the same product, the region of each is shown between brackets.
The More Tools category presents the following options
My Illumina Dashboard to monitor instruments, streamline purchases and keep track of upcoming activities.
Link to the Support Center for additional information and help.
Link to the order management from where you can keep track of your current and past orders.
In the Release Notes section of the documentation, posts are made for new versions of deployments of the core platform components.
Get Started gives an overview of how to access and configure ICA with Network settings providing access prerequisites.
To see what is new in the latest version, use the software release notes and the document revision history.
The home section provides an overview of the main ICA sections such as projects (the main work location), bundles (asset packages), logging, metadata (to capture additional information), Docker and tool images (containerised applications) and how to configure (your own) storage.
Projects are your primary work locations which contain your data and samples. Here you will create pipelines and use them for analyses. You configure who can access your project by means of the teams settings. The results can be processed with the help of Base, Bench or Cohorts. Projects can be considered as a binder for your work and information.
This section contains information on how to download, install, authenticate, configure and use the command line interface, as alternative for the ICA GUI.
Information and tutorials on cloud analysis auto launch.
There is a set of step-by-step Tutorials
In the Reference section, you can find more information on the API, Pricing, Security and Compliance,
For an overview of the available subscription tiers and functionality, please refer to this page on the Illumina website.
Flow provides tooling for building and running secondary analysis pipelines. The platform supports analysis workflows constructed using Common Workflow Language (CWL) and Nextflow. Each step of an analysis pipeline executes a containerized application using inputs passed into the pipeline or output from previous steps.
You can configure the following components in Illumina Connected Analytics Flow:
Reference Data — Reference Data for Graphical CWL flows. See .
Pipelines — One or more tools configured to process input data and generate output files. See .
Analyses — Launched instance of a pipeline with selected input data. See .
The event log shows an overview of system events with options to search and filter. For every entry, it lists the following:
Event date and time
Category (error, warn or info)
Code
Description
Tenant
Up to 200,000 results will be be returned. If your desired records are outside the range of the returned records, please refine the filters or use the search function at the top right.
Export is restricted to the amount of entries shown per page. You can use the selector at the bottom to set this to up to 1000 entries per page.
Every Base user has 1 snowflake username: ICA_U_<id>
For each user/project-bundle combination a role is created: ICA_UR_<id>_<name project/bundle>__<id>
This role receives the viewer or contributor role of the project/bundle, depending on their permissions in ICA.
Every project or bundle has a dedicated Snowflake database.
For each database, 2 roles are created:
<project/bundle name>_<id>_VIEWER
<project/bundle name>_<id>_CONTRIBUTOR
This role receives
REFERENCE and SELECT rights on the tables/views within the project's PUBLIC schema.
Grants on the viewer roles of the bundles linked to the project.
This role receives the following rights on current an future objects in the project's/bundle database in the PUBLIC schema:
ownership
select, insert, update, delete, truncate and references on tables/views/materialized views
usage on sequences/functions/procedures/file formats
write, read and usage on stages
select on streams
monitor and operate on tasks
It also receives grant on the viewer role of the project.
For each project (not bundle!) 2 warehouses are created, whose size can be changed ICA at projects > your_project > project settings > details.
<projectname>_<id>_QUERY
<projectname>_<id>_LOAD
The CLI supports outputs in table, JSON, and YAML formats. The format is set using the output-format configuration setting through a command line option, environment variable, or configuration file.
Dates are output as UTC times when using JSON/YAML output format and local times when using table format.
To set the output format, use the following setting:
--output-format <string>
json - Outputs in JSON format
yaml - Outputs in YAML format
table - Outputs in tabular format
Illumina® Connected Analytics is a cloud-based software platform intended to be used to manage, analyze, and interpret large volumes of multi-omics data in a secure, scalable, and flexible environment. The versatility of the system allows the platform to be used for a broad range of applications.
Since there is a storage cost associated with the data in your projects, it is good practice to regularly check how much cost is being generated by your projects and evaluate which data can be removed from cloud storage. The instructions provided here will help you quickly determine which data is generating the highest storage costs.
To see how much storage costs are currently being generated for your tenant, you can look at the usage explorer at https://platform.illumina.com/usage/ or from within ICA, navigate to the 9-dot symbol () in the top right next to your name and choose the usage explorer from the menu.
From the usage explorer overview screen, you can see below the graphical representation which projects are incurring the highest storage costs.\
When you have determined which projects are incurring the largest storage costs, you can find out which files within that project are taking up the most space. To find the largest files in your project,
Go to Projects > your_project > Data and switch to list view with the () icon left above your files.
Use the column icon () top right to add the size column to your view. You can drag the size column to the desired position in your list view or use the move left and use right options which appear when you select the three vertical dots.
Select Sort descending to show the largest files first.
Once you have the list sorted like this, you can evaluate if those large files are still needed, if the can be sent to archive (manage > archive) or if they can be deleted (manage > delete).
ICA Cohorts is a cohort analysis tool integrated with Illumina Connected Analytics (ICA). ICA Cohorts combines subject- and sample-level metadata, such as phenotypes, diseases, demographics, and biometrics, with molecular data stored in ICA to perform tertiary analyses on selected subsets of individuals.
This video is an overview of Illumina Connnected Analytics. It walks through a Multi-Omics Cancer workflow that can be found here:
Intuitive UI for selecting subjects and samples to analyze and compare: deep phenotypical and clinical metadata, molecular features including germline, somatic, gene expression.
Comprehensive, harmonized data model exposed to ICA Base and ICA Bench users for custom analyses.
Run analyses in ICA Base and ICA Bench and upload final results back into Cohorts for visualization.
Out-of-the-box statistical analyses including genetic burden tests, GWAS/PheWAS.
Rich public data sets covering key disease areas to enrich private data analysis.
Easy-to-use visualizations for gene prioritization and genetic variation inspection.
ICA Cohorts contains a variety of freely available data sets covering different disease areas and sequencing technologies. For a list of currently available data, .
Please see the for all content related to Cloud Analysis Auto-Launch:
The platform provides Connectors to facilitate automation for operations on data (ie, upload, download, linking).
sync data between your local computer or server and the project's cloud-based data storage.
link data between individual projects.
Illumina® Connected Analytics is a cloud-based software platform intended to be used to manage, analyze, and interpret large volumes of multi-omics data in a secure, scalable, and flexible environment. The versatility of the system allows the platform to be used for a broad range of applications.



Download links for the CLI can be found at the Release History.
After the file is downloaded, place the CLI in a folder that is included in your $PATH environment variable list of paths, typically /usr/local/bin. Open the Terminal application, navigate to the folder where the downloaded CLI file is located (usually your Downloads folder), and run the following command to copy the CLI file to the appropriate folder. If you do not have write access to your /usr/local/bin folder, then you may use sudo (which requires a password) prior to the cp command. For example:
sudo cp icav2 /usr/local/binIf you do not have sudo access on your system, contact your administrator for installation. Alternately, you may place the file in an alternate location and update your $PATH to include this location (see the documentation for your shell to determine how to update this environment variable).
You will also need to make the file executable so that the CLI can run:
sudo chmod a+x /usr/local/bin/icav2You will likely want to place the CLI in a folder that is included in your $PATH environment variable list of paths. In Windows, you typically want to save your applications in the C:\Program Files folder. If you do not have write access to that folder, then open a CMD window in administrative mode (hold down the SHIFT key as you right-click on the CMD application and select "Run as administrator"). Type in the following commands (assuming you have saved ica.exe in your current folder):
mkdir "C:\Program Files\Illumina"
copy icav2.exe "C:\Program Files\Illumina"Then you make sure that the C:\Program Files\Illumina folder is included in your %path% list of paths. Please do a web search for how to add a path to your %path% system environment variable for your particular version of Windows.
icav2 allows project data to be mounted on a local system. This feature is currently available on Linux and Mac systems only. Although not supported, users have successfully used Windows Subsystem for Linux (WSL) on Windows to use icav2 projectdata mount command. Please refer to the WSL documentation for installing WSL.
icav2 (>=2.3.0) installed and authenticated in the local system.
For MAC refer to macFuse.
For other operating systems, refer to OS specific documentation for FUSE driver installation.
A project created on ICA v2 with data in it. If you don't already have a project, please follow the instructions here to create a project.
Identify the project id by running the following command:
% icav2 projects list
ID NAME OWNER
422d5119-708b-4062-b91b-b398a3371eab demo b23f3ea6-9a84-3609-bf1d-19f1ea931fa3Provide the project id under "ID" column above to the mount command to mount the project data for the project.
% icav2 projectdata mount mnt --project-id 422d5119-708b-4062-b91b-b398a3371eab
Check the content of the mount.
% ls mnt
sampleX.final.count.tsvicav2 utilizes the FUSE driver to mount project data, providing both read and write capabilities. However, there are some limitations on the write capabilities that are enforced by the underlying AWS S3 storage. For more information, please refer to this page.
WARNING Do NOT use the CP -f command to copy or move data to a mounted location. This will result in data loss as data on the destination location will be deleted.
You can unmount the project data using the 'unmount' command.
% icav2 projectdata unmount
Project with identifier 422d5119-708b-4062-b91b-b398a3371eab was unmounted from mnt.In ICA Cohorts, metadata describe any subjects and samples imported into the system in terms of attributes, including:
subject:
demographics such as age, sex, ancestry;
phenotypes and diseases;
biometrics such as body height, body mass index, etc.;
pathological classification, tumor stages, etc.;
family and patient medical history;
sample:
sample type such as FFPE,
tissue type,
sequencing technology: whole genome DNA-sequencing, RNAseq, single-cell RNAseq, among others.
You can use these attributes while creating a cohort to define the cases and/or controls that you want to include.
During import, you will be asked to upload a metadata sheet as a tab-delimited (TSV) file. An example sheet is available for download on the Import files page in the ICA Cohorts UI.
A metadata sheet will need to contain at least these four columns per row:
Subject ID - identifier referring to individuals; use the column header "SubjectID".
Sample ID - identifier for a sample. Sample IDs need to match the corresponding column header in VCF/GVCFs; each subject can have multiple samples, these need to be specified in individual rows for the same SubjectID; use the column header "SampleID".
Biological sex - can be "Female (XX)", "Female"; "Male (XY)", "Male"; "X (Turner's)"; "XXY (Klinefelter)"; "XYY"; "XXXY" or "Not provided". Use the column header "DM_Sex" (demographics).
Sequencing technology - can be "Whole genome sequencing", "Whole exome sequencing", "Targeted sequencing panels", or "RNA-seq"; use the column header "TC" (technology).
A description of all attributes and data types currently supported by ICA Cohorts can be found here: ICA_Cohorts_Supported_Attributes.xlsx
You can download an example of a metadata sheet, which contains some samples from The Cancer Genome Atlas (TCGA) and their publicly available clincal attributes, here: ICA_Cohorts_Example_Metadata.tsv
A list of concepts and diagnoses that cover all public data subjects to easily navigate the new concept code browser for diagnosis can be found here: PublicData_AllConditionsSummarized.xlsx
Once a cluster is started, the cluster manager can be accessed from the workspace node.
Every cluster member has a certain capacity which is determined by the selected Resource model for the cluster member.
The following complex values have been added to the SGE cluster environment and are requestable.
static_cores (default: 1)
static_mem (default: 2G)
These values are used to avoid oversubscription of a node which can result in Out-Of-Memory or unresponsiveness. You need to ensure these limits are not exceeded.
To ensure stability of the system, some headroom is deducted from the total node capacity.
These two values are used by the SGE auto scaler when running in dynamic mode. The SGE auto scaler will summarise all pending jobs and their requested resources to determine the scale up/down operation within the defined range.
Cluster members will remain in the cluster for at least 300 seconds. The Auto scaler only executes one scale up/down operation at a time and is stabilised before taking on a new operation.
Job requests that require more resources than the capacity of the selected resource model will be ignored by the auto scaler and will wait indefinitely.
The operation of the auto scaler can be monitored in the log file /data/logs/sge-scaler.log
Submitting a single job
qsub -l static_mem=1G -l static_cores=1 /data/myscript.shSubmitting a job array
qsub -l static_mem=1G -l static_cores=1 -t 1-100 /data/myscript.shListing all members of the cluster
qhostlisting all jobs in the cluster
qstat -fShowing the details of a job.
qstat -f -j <jobId>Deleting a job.
qdel <jobId>Showing the details of an executed job.
qacct -j <jobId>SGE command line options and configuration details can be found here.
The GWAS and PheWAS tabs in ICA Cohorts allow you to visualize precomputed analysis results for phenotypes/diseases and genes, respectively. Note that these do not reflect the subjects that are part of the cohort that you created.
ICA Cohorts currently hosts GWAS and PheWAS analysis results for approximately 150 quantitative phenotypes (such as "LDL direct" and "sitting height") and about 700 diseases.
Navigate to the GWAS tab and start looking for phenotypes and diseases in the search box. Cohorts will suggest the best matches against any partial input ("cancer") you provide. After selecting a phenotype/disease, Cohorts will render a Manhattan plot, by default collapsed to gene level and organized by their respective position in each chromosome.
Circles in the Manhattan plot indicate binary traits, potential associations between genes and diseases. Triangles indicate quantitative phenotypes with regression Beta different from zero, and point up or down to depict positive or negative correlation, respectively.
Hovering over a circle/triangle will display the following information:
gene symbol
variant group (see below)
P-value, both raw and FDR-corrected
number of carriers of variants of the given type
number of carriers of variants of any type
regression Beta
For gene-level results, Cohorts distinguishes five different classes of variants: protein truncating; deleterious; missense; missense with a high ILMN PrimateAI score (indicating likely damaging variants); and synonymous variants. You can limit results to either of these five classes, or select All to display all results together.
Deleterious variants (del): the union of all protein-truncating variants (PTVs, defined below), pathogenic missense variants with a PrimateAI score greater than a gene-specific threshold, and variants with a SpliceAI score greater than 0.2.
Protein-truncating variants (ptv): variant consequences matching any of stop_gained, stop_lost, frameshift_variant, splice_donor_variant, splice_acceptor_variant, start_lost, transcript_ablation, transcript_truncation, exon_loss_variant, gene_fusion, or bidirectional_gene_fusion.
missense_all: all missense variants regardless of their pathogenicity.
missense, PrimateAI optimized (missense_pAI_optimized): only pathogenic missense variants with primateAI score greater than a gene-specific threshold.
missenses and PTVs (missenses_and_ptvs_all): the union of all PTVs, SpliceAI > 0.2 variants and all missense variants regardless of their pathogenicity scores.
all synonymous variants (syn).
To zoom in to a particular chromosome, click the chromosome name underneath the plot, or select the chromosome from the drop down box, which defaults to Whole genome.
To browse PheWAS analysis results by gene, navigate to the PheWAS tab and enter a gene of interest into the search box. The resulting Manhattan plot will show phenotypes and diseases organized into a number of categories, such as "Diseases of the nervous system" and "Neoplasms". Click on the name of a category, shown underneath the plot, to display only those phenotypes/diseases, or select a category from the drop down, which defaults to All.
Bench Workspaces use a FUSE driver to mount project data directly into a workspace file system. There are both read and write capabilities with some limitations on write capabilities that are enforced by the underlying AWS S3 storage.
As a user, you are allowed to do the following actions from Bench (when having the correct user permissions compared to the workspace permissions) or through the CLI:
Copy project data
Delete project data
Mount project data (CLI only)
Unmount project data (CLI only)
When you have a running workspace, you will find a file system in Bench under the project folder along with the basic and advanced tutorials. When opening that folder, you will see all the data that resides in your project.
This is a fully mounted version of the project data. Changes in the workspace to project data cannot be undone.
The FUSE driver allows the user to easily copy data from /data/project to the local workspace and vice versa. There is a file size limit of 500 GB per file for the FUSE driver.
The FUSE driver also allows you to delete data from your project. This is different from the use of Bench before where you took a local copy and still kept the original file in your project.
Deleting project data through Bench workspace through the FUSE driver will permanently delete the data in the Project. This action cannot be undone.
Using the FUSE driver through the CLI is not supported for Windows users. Linux users will be able to use the CLI without any further actions, Mac users will need to install the kernel extension from macFuse.
Mount and unmount of data needs to be done through the CLI. In Bench this happens automatically and is not needed anymore.
Do NOT use the CP -f command to copy or move data to a mounted location. This will result in data loss as data on the destination location will be deleted.
Once a file is written, it cannot be changed! You will not be able to update it in the project location because of the restrictions mentioned above.
Trying to update files or saving you notebook in the project folder will typically result in File Save Error for fusedrivererror.ipynb Invalid response: 500 Internal Server Error.
Some examples of other actions or commands that will not work because of the above mentioned limitations:
Save a jupyter notebook or R script on the /project location
Add/remove a file from an existing zip file
Redirect with append to an existing file e.g. echo "This will not work" >> myTextFile.txt
Rename a file due to the existing association between ICA and AWS
Move files or folders.
Using vi or another editor
A file can be written only sequentially. This is a restriction that comes from the library the FUSE driver uses to store data in AWS. That library supports only sequential writing, random writes are currently not supported. The FUSE driver will detect random writes and the write will fail with an IO error return code. Zip will not work since zip writes a table of contents at the end of the file. Please use gzip.
Listing data (ls -l) reads data from the platform. The actual data comes from AWS and there can be a short delay between the writing of the data and the listing being up to date. As a result, a file that is written may appear temporarily as a zero length file, a file that is deleted may appear in the file list. This is a tradeoff, the FUSE driver caches some information for a limited time and during that time the information may seem wrong. Note that besides the FUSE driver, the library used by the FUSE driver to implement the raw FUSE protocol and the OS kernel itself may also do caching.
To use a specific file in a jupyter notebook, you will need to use '/data/project/filename'.
This functionality won't work for old workspaces unless you enable the permissions for that old workspace.
ICA provides a tool called Bench for interactive data analysis. This is a sandboxed workspace which runs a docker image with access to the data and pipelines within a project. This workspace runs on the Amazon S3 system and comes with associated processing and provisioning costs. It is therefore best practice to not keep your Bench instances running indefinitely, but stopping them when not in use.
Having access to Bench depends on the following conditions:
Bench needs to be included in your ICA subscription.
The project owner needs to enable Bench for their project.
Individual users of that project need to be given access to Bench.
After creating a project, go to Projects > your_project > Bench > Workspaces page and click the Enable button. The entitlements you have determine the available resources for your Bench workspaces. If you have multiple entitlements, all the resources of your individual entitlements are taken into account. Once bench is enabled, users with matching permissions have access to the Bench module in that project.
Once Bench has been enabled for your project, the combination of roles and teams settings determines if a user can access Bench.
Tenant administrators and project owners are always able to access Bench and perform all actions.
The teams settings page at Projects > your_project > Project Settings > Team determines the role for the user/workgroup.
No Access means you have no access to the Bench workspace for that project.
Contributor gives you the right to start and stop the Bench workspace and to access the workspace contents, but not to create or edit the workspace.
Administrator gives you the right to create, edit, delete, start and stop the Bench workspace, and to access the actual workspace contents. In addition, the administrator can also build new derived Bench images and tools.
Finally, a verification is done of your user rights against the required workspace permissions. You will only have access when your user rights meet or exceed the required workspace permissions. The possible required Workspace permissions include:
Upload / Download rights (Download rights are mandatory for technical reasons)
Project Level (No Access / Data Provider / Viewer / Contributor)
Flow (No Access / Viewer / Contributor)
Base (No Access / Viewer / Contributor)
ICA supports running pipelines defined using .
To specify a compute type for a CWL CommandLineTool, use the ResourceRequirement with a custom namespace.
Reference for available compute types and sizes.
The ICA Compute Type will be determined automatically based on coresMin/coresMax (CPU) and ramMin/ramMax (Memory) values using a "best fit" strategy to meet the minimum specified requirements (refer to the table).
For example, take the following ResourceRequirements:
This would result in a best fit of standard-large ICA Compute Type request for the tool.
If the specified requirements can not be met by any of the presets, the task will be rejected and failed.
FPGA requirements can not be set by means of CWL ResourceRequirements.
The Machine Profile Resource in the graphical editor will override whatever is set for requirements in the ResourceRequirement.
If no Docker image is specified, Ubuntu will be used as default. Both : and / can be used as separator.
ICA supports overriding workflow requirements at load time using Command Line Interface (CLI) with JSON input. Please refer to for more details on the CWL overrides feature.
In ICA you can provide the "override" recipes as a part of the input JSON. The following example uses CWL overrides to change the environment variable requirement at load time.
You can verify the integrity of the data by comparing the hash which is usually () an MD5 (Message Digest Algorithm 5) checksum. This is a common cryptographic hash function that generates a fixed-size, 128-bit hash value from any input data. This hash value is unique to the content of the data, meaning even a slight change in the data will result in a significantly different MD5 checksum. AWS S3 calculates this checksum when data is uploaded and stores it in the ETag (Entity tag).
For files smaller than 16 MB, you can directly retrieve the MD5 checksum using our endpoints. Make an API GET call to the https://ica.illumina.com/ica/rest/api/projects/{projectId}/data/{dataId} endpoint specifying the data Id you want to check and the corresponding project ID. The response you receive will be in JSON format, containing various file metadata. Within the JSON response, look for the objectETag field. This value is the MD5 checksum for the file you have queried. You can compare this checksum with the one you compute locally to ensure file integrity.
This ETag does not change and can be used as a file integrity check even when that file is archived, unarchived and/or copied to another location. Changes to the metadata have no impact on the ETag
For larger files, the process is different due to computation limitations. In these cases, we recommend using a dedicated pipeline on our platform to explicitly calculate the MD5 checksum. Below you can find both a main.nf file and the corresponding XML for a possible Nextflow pipeline to calculate the MD5 checksum for FASTQ files.
You can use samples to group information related to a sample, including input files, output files, and analyses. You can consider samples as creating a binder to collect related information. When you then link that sample to another project, you bring over the empty binder, but not the files contained in it. In that project, you can then add your own data to it.
You can search for samples (excluding their metadata) with the Search button at the top right.
To add a new sample, do as follows.
Select Projects > your_project > Samples.
To add a new sample, select + Create, and then enter a unique name and description for the sample.
To add files to the sample, see
Your sample is added to the Samples page. To view information on the sample, select the sample name in the overview.
You can add files to a sample after creating the sample. Any files that are not currently linked to a sample are listed on the Unlinked Files tab.
To add an unlinked file to a sample, do as follows.
Go to Projects > your_project > Samples > Unlinked files tab.
Select a file or files, and then select one of the following options:
Create sample — Create a new sample that includes the selected files.
Link to sample — Select an existing sample in the project to which you link the file.
Alternatively, you can add unlinked files from the sample details.
Going to Projects > your_project > Samples > your_sample.
Select your sample to open the details.
Go to the Data tab and select link.
If the data is not in your project, you will need to add it in Projects > your_project > Data
Data can only be linked to a single sample, so once you have linked data to a sample, it will no longer appear in the list of data to choose form.
To remove files from samples,
Go to Projects > your_project > Samples > your_sample > Data
Select the files you want to remove.
Select Unlink.
A Sample can be linked to a project from a separate project to make it available in read-only capacity.
Navigate to the Samples view in the Project
Click the Link button
Select the Sample(s) to link to the project
Click the Link button
Data linked to Samples is not automatically linked to the project. The data must be linked separately from the Data view. Samples also must be available in a complete state in order to be linked.
If you want to remove a sample, select it and use the delete option from the top navigation row. You will be presented a choice of how to handle the data in the sample.
Unlink all data without deleting it.
Delete input data and unlink other data.
Delete all data.
Running a Spark application in a Bench Spark Cluster
The JupyterLab environment is by default configured with 3 additional kernels
PySpark –
PySpark –
PySpark –
When one of the above kernels is selected, the spark context is automatically initialised and can be accessed using the sc object.
The PySpark - Local runtime environment launches the spark driver locally on the workspace node and all spark executors are created locally on the same node. It does not require a spark cluster to run and can be used for running smaller spark applications which don’t exceed the capacity of a single node.
The spark configuration can be found at /data/.spark/local/conf/spark-defaults.conf.
The PySpark – Remote runtime environment launches the spark driver locally on the workspace node and interacts with the Manager for scheduling tasks onto executors created across the Bench Cluster.
This configuration will not dynamically spin up executors, hence it will not trigger the cluster to auto scale when using a Dynamic Bench cluster.
The spark configuration can be found at /data/.spark/remote/conf/spark-defaults.conf.
The PySpark – Remote - Dynamic runtime environment launches the spark driver locally on the workspace node and interacts with the Manager for scheduling tasks onto executors created across the Bench Cluster.
This configuration will increase/decrease the required executors which will result into a cluster that auto scales using a Dynamic Bench cluster
The spark configuration can be found at /data/.spark/remote/conf-dynamic/spark-defaults.conf.
Every cluster member has a certain capacity depending on the selection of the model for the member.
A spark application consists of 1 or more jobs. Each Job consists out of one or more stages. Each stage consists out of one or more tasks. Task are handled by executors and executors are run on a worker (cluster member).
The following setting define the amount of cpus needed per task
The following settings define the size of a single executor which handles the execution of a task
The above example allows an executor to handle 4 tasks concurrently and share a total capacity of 4Gb of memory. Depending on the resource model chosen (e.g. standard-2xlarge) a single cluster member (worker node) is able to run multiple executors concurrently (e.g. 32 cores, 128 Gb for 8 concurrent executors on a single cluster member)
The Spark UI can be accessed via the Cluster. The Web Access URL is displayed in the Workspace details page
This Spark UI will register all applications submitted when using one of the Remote Jupyter kernels. It will provide an overview of the registered workers (Cluster members) and the applications running in the Spark cluster.
See the website
Bench workspaces require setting a Docker image to use as the image for the workspace. Illumina Connected Analytics (ICA) provides a default Docker image with installed.
JupyterLab supports (.ipynb). Notebook documents consist of a sequence of cells which may contain executable code, markdown, headers, and raw text.
The JupyterLab Docker image contains the following environment variables:
Included in the default JupyterLab Docker image is a python library with APIs to perform actions in ICA, such as add data, launch pipelines, and operate on Base tables. The python library is generated from the using .
The ICA Python library API documentation can be found in folder /etc/ica/data/ica_v2_api_docs within the JupyterLab Docker image.
See the for examples on using the ICA Python library.
The ICA CLI uses an Illumina API Key to authenticate. An Illumina API Key can be acquired through the product dashboard after logging into a domain. See for instructions to create an Illumina API Key.
Authenticate using icav2 config set command. The CLI will prompt for an x-api-key value. Input the API Key generated from the product dashboard here. See the example below (replace EXAMPLE_API_KEY with the actual API Key).
The CLI will save the API Key to the config file as an encrypted value.
If you want to overwrite existing environment values, use the command icav2 config set.
To remove an existing configuration/session file, use the command icav2 config reset.\
Check the server and confirm you are authenticated using icav2 config get
If during these steps or in the future you need to reset the authentication, you can do so using the command: icav2 config reset
Reference Data are reference genome sets which you use to help look for deviations and to compare your data against.
Reference data properties are located at the main navigation level and consist of the following free text fields.
Types
Species
Reference Sets
Once these are configured,
Go to your data in Projects > your_project > Data.
Select the data you want to use as reference data and Manage > Use as reference data.
Fill out the configuration screen
You can see the result at the main navigation level > Reference Data (outside of projects) or in Projects > your_project > Flow > Reference Data.
To use a reference set from within a project, you have first to add it. Select Projects > your_project > Flow > Reference Data > Link. Then select a reference set to add to your project.
Navigate to Reference Data (Not from within a project, but outside of project context, so at the main navigation level).
Select the data set(s) you wish to add to another region and select Copy to another project.
Select a project located in the region where you want to add your reference data.
You can check in which region(s) Reference data is present by opening the Reference set and viewing Copy Details.
Allow a few minutes for new copies to become available before use.
To create a pipeline with a reference data, use the mode. Projects > your_project > Flow > Pipelines > +Create > CWL Graphical. Use the reference data icon instead of regular input icon. On the right hand side use the Reference files submenu to specify the name, the format, and the filters. You can specify the options for an end-user to choose from and a default selection. You can select more than 1 file, but you can only select 1 at a time (so, repeat process to select multiple reference files). If you only select 1 reference file, that file will be the only one users can use with your pipeline. In the screenshot a reference data with two options is presented.
If your pipeline was built to give users the option of choosing among multiple input reference files, they will see the option to select among the reference files you configured, under Settings. After clicking the magnifying glass icon the user can select from provided options.
requirements:
ResourceRequirement:
https://platform.illumina.com/rdf/ica/resources:type: fpga
https://platform.illumina.com/rdf/ica/resources:size: small
https://platform.illumina.com/rdf/ica/resources:tier: standardrequirements:
ResourceRequirement:
ramMin: 10240
coresMin: 6icav2 projectpipelines start cwl cli-tutorial --data-id fil.a725a68301ee4e6ad28908da12510c25 --input-json '{
"ipFQ": {
"class": "File",
"path": "test.fastq"
},
"cwltool:overrides": {
"tool-fqTOfa.cwl": {
"requirements": {
"EnvVarRequirement": {
"envDef": {
"MESSAGE": "override_value"
}
}
}
}
}
}' --type-input JSON --user-reference overrides-examplenextflow.enable.dsl = 2
process md5sum {
container "public.ecr.aws/lts/ubuntu:22.04"
pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
input:
file txt
output:
stdout emit: result
path '*', emit: output
publishDir "out", mode: 'symlink'
script:
txt_file_name = txt.getName()
id = txt_file_name.takeWhile { it != '.'}
"""
set -ex
echo "File: $txt_file_name"
echo "Sample: $id"
md5sum ${txt} > ${id}_md5.txt
"""
}
workflow {
txt_ch = Channel.fromPath(params.in)
txt_ch.view()
md5sum(txt_ch).result.view()
}<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition">
<pd:dataInputs>
<pd:dataInput code="in" format="FASTQ" type="FILE" required="true" multiValue="true">
<pd:label>Input</pd:label>
<pd:description>FASTQ files input</pd:description>
</pd:dataInput>
</pd:dataInputs>
<pd:steps/>
</pd:pipeline>ICA_URL
https://ica.illumina.com/ica (ICA server URL)
ICA_PROJECT (Obsolete)
ICA project ID
ICA_PROJECT_UUID
Current ICA project UUID
ICA_SNOWFLAKE_ACCOUNT
ICA Snowflake (Base) Account ID
ICA_SNOWFLAKE_DATABASE
ICA Snowflake (Base) Database ID
ICA_PROJECT_TENANT_NAME
Name of the owning tenant of the project where the workspace is created.
ICA_STARTING_USER_TENANT_NAME
Name of the tenant of the user which last started the workspace.
ICA_COHORTS_URL
URL of the Cohorts web application used to support the Cohort's view
icav2 config set
Creating /Users/johngenome/.icav2/config.yaml
Initialize configuration settings [default]
server-url [ica.illumina.com]:
x-api-key : EXAMPLE_API_KEY
output-format (allowed values table,yaml,json defaults to table) :
colormode (allowed values none,dark,light defaults to none) :icav2 config get
access-token: ""
colormode: none
output-format: table
server-url: ica.illumina.com
x-api-key: !!binary HASHED_EXAMPLE_API_KEYspark.task.cpus 1spark.executor.cores 4
spark.executor.memory 4g




The Activity view shows the status and history of long-running activities including Data Transfers, Base Jobs, Base Activity, Bench Activity and Batch Jobs.
The Data Transfers tab shows the status of data uploads and downloads. You can sort, search and filter on various criteria and export the information. Show ongoing transfers (top right) allows you to filter out the completed and failed transfers to focus on current activity.
Transfers with a yellow background indicate that service connector rules have been modified in ways that prevent planned files from being uploaded. Please verify your service connectors to resolve this.
The Base Jobs tab gives an overview of all the actions related to a table or a query that have run or are running (e.g., Copy table, export table, Select * from table, etc.)
The jobs are shown with their:
Creation time: When did the job start
Description: The query or the performed action with some extra information
Type: Which action was taken
Status: Failed or succeeded
Duration: How long the job took
Billed bytes: The used bytes that need to be paid for
Failed jobs provide information on why the job failed. Details are accessed by double-clicking the failed job. Jobs in progress can be aborted here.
The Base Activity tab gives an overview of previous results (e.g., Executed query, Succeeded Exporting table, Created table, etc.) Collecting this information can take considerable time. For performance reasons, only the activity of the last month (rolling window) with a limit of 1000 records is shown and available for download as Excel or JSON. To get the data for the last year without limit on the number of records, use the export as file function. No activity data is retained for more than one year.
The activities are shown with:
Start Time: The moment the action was started
Query: The SQL expression.
Status: Failed or succeeded
Duration: How long the job took
User: The user that requested the action
Size: For SELECT queries, the size of the query results is shown. Queries resulting in less than 100Kb of data will be shown with a size of <100K
The Bench Activity tab shows the actions taken on Bench Workspaces in the project.
The activities are shown with:
Workspace: Workspace where the activity took place
Date: Date and time of the activity
User: User who performed the activity
Action: Which activity was performed
The Batch Jobs tab allows users to monitor progress of Batch Jobs in the project. It lists Data Downloads, Sample Creation (double-click entries for details) and Data Linking (double-click entries for details). The (ongoing) Batch Job details are updated each time they are (re)opened, or when the refresh button is selected at the bottom of the details screen. Batch jobs which have a final state such as Failed or Succeeded are removed from the activity list after 7 days.
Which batch jobs are visible depends on the user role.
Project Creator
Project Collaborator same tenant
Project Collaborator different tenant
All batch jobs
All batch jobs
Only batch jobs of own tenant
This walk-through is meant to represent a typical workflow when building and studying a cohort of rare genetic disorder cases.
Create a new Project to track your study:
Login to the ICA
Navigate to Projects
Create a new project using the New Project button.
Give your project a name and click Save.
Navigate to the ICA Cohorts module by clicking COHORTS in the left navigation panel then choose Cohorts.
Navigate to the ICA Cohorts module by clicking Cohorts in the left navigation panel.
Click Create Cohort button.
Enter a name for your cohort, like Rare Disease + 1kGP at top, left of pencil icon.
From the Public Data Sets list select:
DRAGEN-1kGP
All Rare genetic disease cohorts
Notice that a cohort can also be created based on Technology, Disease Type and Tissue.
Under Selected Conditions in right panel, click on Apply
A new page opens with your cohort in a top-level tab.
Expand Query Details to see the study makeup of your cohort.
A set of 4 Charts will be open by default. If they are not, click Show Charts.
Use the gear icon in the top-right of the Charts pane to change chart settings.
The bottom section is demarcated by 8 tabs (Subjects, Marker Frequency, Genes, GWAS, PheWAS, Correlation, Molecular Breakdown, CNV).
The Subjects tab displays a list of exportable Subject IDs and attributes.
Clicking on a Subject ID link pops up a Subject details page.
A recent GWAS publication identified 10 risk genes for intellectual disability (ID) and autism. Our task is to evaluate them in ICA Cohorts: TTN, PKHD1, ANKRD11, ARID1B, ASXL3, SCN2A, FHL1, KMT2A, DDX3X, SYNGAP1.
First let’s Hide charts for more visual space.
Click the Genes tab where you need to query a gene to see and interact with results.
Type SCN2A into the Gene search field and select it from autocomplete dropdown options.
The Gene Summary tab now lists information and links to public resources about SCN2A.
Click on the Variants tab to see an interactive Legend and analysis tracks.
The Needle Plot displays gnomAD Allele Frequency for variants in your cohort.
Note that some are in SCN2A conserved protein domains.
In Legend, switch the Plot by option to Sample Count in your cohort.
In Legend, uncheck all Variant Types except Stop gained. Now you should see 7 variants.
Hover over pin heads to see pop-up information about particular variants.
The Primate AI track displays Scores for potential missense variants, based on polymorphisms observed in primate species. Points above the dashed line for the 75th percentile may be considered "likely pathogenic" as cross-species sequence is highly conserved; you often see high conservancy at the functional domains. Points below the 25th percentile may be considered "likely benign".
The Pathogenic variants track displays markers from ClinVar color-coded by variant type. Hover over to see pop-ups with more information.
The Exons track shows mRNA exon boundaries with click and zoom functionality at the ends.
Below the Needle Plot and analysis tracks is a list of "Variants observed in the selected cohort"
Export Gene Variants table icon is above the legend on right side.
Now let's click on the Gene Expression tab to see a Bar chart of 50 normal tissues from GTEx in transcripts per million (TPM). SCN2A is highly expressed in certain Brain tissues, indicating specificity to where good markers for intellectual disability and autism could be expected.
As a final exercise in discovering good markers, click on the tab for Genetic Burden Test. The table here associates Phenotypes with Mutations Observed in each Study selected for our cohort, alongside Mutations Expected to derive p-values. Given all considerations above, SCN2A is good marker for intellectual disability (p < 1.465 x 10 -22) and autism (p < 5.290 x 10 -9).
Continue to check the other genes of interest in step 1.
You can compare up to four previously created individual cohorts, to view differences in variants and mutations, RNA expression, copy number variation, and distribution of clinical attributes. Once comparisons are created, they are saved in the Comparisons left-navigation tab of the Cohorts module.
Select Cohorts from the left-navigation panel.
Select 2 to 4 cohorts already created. If you have not created any cohorts, See Create a Cohort documentation.
Click Compare Cohorts in the right-navigation panel.
Note you are now in the Comparisons left-navigation tab of the Cohorts module.
In the Charts Section, if the COHORTS item is not displayed, click the gear icon in the top right and add Cohorts as the first attribute and click Save.
The COHORTS item in the charts panel will provide a count of subjects in each cohort and act as a legend for color representation throughout comparison screens.
For each clinical attribute category, a bar chart is displayed. Use the gear icon to select attributes to display in the charts panel.
You can share a comparison with other team members in the same ICA Project. Please refer to the section on "Sharing a Cohort" on "Create a Cohort" for details on sharing, unsharing, deleting, and archiving, which are analogous for sharing comparisons.
Select the Attributes tab
Attribute categories are listed and can be expanded using the down-arrows next to the category names. The categories available are based on cohorts selected. Categories and attributes are part of the ICA Cohorts metadata template that map to each Subject.
For example, use the drop-down arrow next to Vital status to view sub-categories and frequencies across selected cohorts.
Select the Genes tab
Search for a gene of interest using its HUGO/HGNC gene symbol
Variants and mutations will be displayed as one needle plot for each cohort that is part of the comparison (see Cohort analysis -> Genes in this online help for more details)
As additional filter options, you can view only those variants that are occur in every cohort; that are unique to one cohort; that have been observed in at least two cohorts; or any variant.
Select the Survival Summary tab.
Attribute categories are listed and can be expanded using the down-arrows next to the category names.
Select the drop-down arrow for Therapeutic interventions.
In each subcategory there is a sum of the subject counts across select cohorts.
For each cohort, designated by a color, there is a Subject count and Median survival (years) column.
Type Malignancy in the Search Box and an auto-complete dropdown suggests three different attributes.
Select Synchronous malignancy and the results are automatically opened and highlighted in orange.
Click Survival Comparison tab.
A Kaplan-Meier Curve is rendered based on each cohort.
P-Value Displayed at the top of Survival Comparison indicates whether there is statistically significant variance between survival probabilities over time of any pair of cohorts (CI=0.95).
When comparing two cohorts, the P-Value is shown above the two survival curves. For three or four cohorts, P-Values are shown as a pair-wise heatmap, comparing each cohort to every other cohort.
Select the Marker Frequency tab.
Select either Gene expression (default), Somatic mutation, or Copy number variation
For gene expression (up- versus down-regulated) and for copy number variation (gain versus loss), Cohorts will display a list of all genes with bidirectional barcharts
For somatic mutations, the barcharts are unidirectional and indicate the percentage of samples with a mutation in each gene per cohort.
Bars are color-coded by cohort, see the accompanying legend.
Each row shows P-value(s) resulting from pairwise comparison of all cohorts. In the case of comparing two cohorts, the numerical P-value will be displayed in the table. In the case of comparing three or more cohorts, the pairwise P-values are shown as a triangular heatmap, with details available as a tooltip.
Select the Correlation tab.
Similar to the single-cohort view (Cohort Analysis | Correlation), choose two clinical attributes and/or genes to compare.
Depending on the available data types for the two selections (categorical and/or continuous), Cohorts will display a bubble plot, violin plot, or scatter plot.
The ICA CLI is a useful tool for uploading, downloading and viewing information about data stored within ICA projects. If not already authenticated, please see the Authentication section of the CLI help pages. Once the CLI has been authenticated with your account, use the command below to list all projects:
icav2 projects list
The first column of the output (table format, which is default) will show the ID. This is the project ID and will be used in the examples below.
To upload a file called Sample-1_S1_L001_R1_001.fastq.gz to the project, copy the project id and use the command syntax below:
icav2 projectdata upload Sample-1_S1_L001_R1_001.fastq.gz --project-id <project-id>
To verify the file has uploaded, run the following to get a list of all files stored within the specified project:
icav2 projectdata list --project-id <project-id>
This will show a file ID starting with fil. which can then be used to get more information about the file and its attributes:
icav2 projectdata get <file-id> --project-id <project-id>
It is necessary to use --project-id in the above example if not entered into a specific project context. In order to enter a project context use the command below.
icav2 projects enter <project-name or project-id>
This will infer the project id, so that it does not need to be entered into each command.
The ICA CLI can also be used to download files via command line. This can be especially helpful if the download destination is a remote server or HPC cluster that you are logged into from a local machine. To download into the current folder, run the following from the command line terminal:
icav2 projectdata download <file-id> ./
The above assumes you have entered into a project context. If this is not the case, either enter the project that contains the desired data, or be sure to supply the --project-id option in the command.
To fetch temporary AWS credentials for given project data, use the command icav2 projectdata temporarycredentials [path or data Id] [flags]. If the path is provided, the project id from the flag --project-id is used. If the --project-id flag is not present, then the project id is taken from the context. The returned AWS credentials for file or folder upload expire after 36 hours.
For information on options such as using the ICA API and AWS CLI to transfer data, visit the Data Transfer Options tutorial.
The ICA CLI accepts configuration settings from multiple places, such as environment variables, configuration file, or passed in as command line arguments. When configuration settings are retrieved, the following precedence is used to determine which setting to apply:
Command line options - Passed in with the command such as --access-token
Environment variables - Stored in system environment variables such as ICAV2_ACCESS_TOKEN
Default config file - Stored by default in the ~/.icav2/config.yaml on macOS/Linux and C:\Users\USERNAME\.icav2\.config on Windows
The following global flags are available in the CLI interface:
-t, --access-token string JWT used to call rest service
-h, --help help for icav2
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceEnvironment variables provide another way to specify configuration settings. Variable names align with the command line options with the following modifications:
Upper cased
Prefix ICAV2_
All dashes replaced by underscore
For example, the corresponding environment variable name for the --access-token flag is ICAV2_ACCESS_TOKEN.
The environment variable ICAV2_ICA_NO_RETRY_RATE_LIMITING allows to disable the retry mechanism. When it is set to 1, no retries are performed. For any other value, http code 429 will result in 4 retry attempts:
after 500 milliseconds
after 2 seconds
after 10 seconds
after 30 seconds
Upon launching icav2 for the first time, the configuration yaml file is created and the default config settings are set. Enter an alternative server URL or press enter to leave it as the default. Then enter your API Key and press enter.
After installing the CLI, open a terminal window and enter the icav2 command. This will initialize a default configuration file in the home folder at .icav2/config.yaml.
To reset the configuration, use ./icav2 config reset
Resetting the configuration removes the configuration from the host device and cannot be undone. The configuration needs to be recreated.
Configuration settings is stored in the default configuration file:
access-token: ""
colormode: none
output-format: table
server-url: ica.illumina.com
x-api-key: !!binary SMWV6dEXAMPLEThe file ~/.icav2/.session.ica.yamlon macOS/Linux and C:\Users\USERNAME\.icav2\.session.ica on Windows will contain the access-token and project-id. These are output files and should not be edited as they are automatically updated.
This variable is used to set the API Key.
Command line options - Passed as --x-api-key <your_api_key> or -k <your_api_key>
Environment variables - Stored in system as ICAV2_X_API_KEY
Default config file - Use icav2 config set to update ~/.icav2/config.yaml(macOS/Linux) or C:\Users\USERNAME\.icav2\.config (Windows)
If you are a new user, please consult the Illumina Connected Software Registration Guide for detailed guidance on setting up an account and registering a subscription.
The platform requires a provisioned tenant in the Illumina account management (IAM) system with access to the Illumina Connected Analytics (ICA) application. Once a tenant has been provisioned, a tenant administrator will be assigned. The tenant administrator has permission to manage account access including add users, create workgroups, and add additional tenant administrators.
Each tenant is assigned a domain name used to login to the platform. The domain name is used in the login URL to navigate to the appropriate login page in a web browser. The login URL is https://<domain_name>.login.illumina.com with <domain_name> replaced by the actual domain name.
New user accounts can be created
For more details on identity and access management, please see the Illumina Connected Software help.
by the tenant administrator by logging in to their domain and navigating to Illumina Account Management under their profile at the top right
or by the user by accessing https://platform.login.illumina.com and selecting the option Don't have an account.
Once the account has been added to the domain, the tenant administrator may assign registered users to workgroups which bundle users with permission to use the ICA application. Registered users can be made workgroup administrators by tenant administrators or existing workgroup administrators.
If you want to use the command-line interface (CLI) or the Application Programming Interface (API), you can use an API Key as credentials when logging in. API Keys operate similar to a user name and password combination and must be kept secure and rotated on a regular basis (preferably yearly). `
When keys are compromised or no longer in use, they must be revoked. This is done through the domain login URL by navigating to the User menu item on the left and selecting "API Keys", followed by selecting the key and using the trash icon next to it.
API Keys are limited to 10 per user and are managed through the product dashboard after logging in through the domain login URL. See Managing API Keys for more information.
{% hint style="warning" %} For security reasons, do not use accounts with administrator level access to generate API keys. Create a specific CLI user with basic permissions instead. This will minimize the possible impact of compromised keys. {% endhint %}
{% hint style="warning" %} Once the API key generation window is closed, the key contents will not be accessible through the domain login page, so be sure to store it securely for future reference. {% endhint %}
The web application provides a visual user interface (UI) for navigating resources in the platform, managing projects, and extended features beyond the API. To access the web application, navigate to the Illumina Connected Analytics portal.
On the left, you have the navigation bar (1) which will auto-collapse on smaller screens. To collapse it, use the double arrow symbol (2). When collapsed, use the >> symbol to expand it.
The central part (3) of the display is the item on which you are performing your actions and the breadcrumb menu (4) to return to the projects overview or a previous level. You can also use your browser's back button to return to the level from which you came.
At the top right, you have icons to refresh contents (5) Illumina product access (6), access to the online help (7) and user information (8), .
The command-line interface offers a developer-oriented experience for interacting with the APIs to manage resources and launch analysis workflows. Find instructions for using the command-line interface including download links for your operating system in the CLI documentation.
The HTTP-based application programming interfaces (APIs) are listed in the API Reference section of the documentation. The reference documentation provides the ability to call APIs from the browser page and shows detailed information about the API schemas. HTTP client tooling such as Postman or cURL can be used to make direct calls to the API outside of the browser.
{% hint style="info" %} When accessing the API using the API Reference page or through REST client tools, the Authorization header must be provided with the value set to Bearer <token> where <token> is replaced with a valid JSON Web Token (JWT). For generating a JWT, see JSON Web Token (JWT). {% endhint %}
The object data models for resources that are created in the platform include a unique id field for identifying the resource. These fixed machine-readable IDs are used for accessing and modifying the resource through the API or CLI, even if the resource name changes.
Accessing the platform APIs requires authorizing calls using JSON Web Tokens (JWT). A JWT is a standardized trusted claim containing authentication context. This is a primary security mechanism to protect against unauthorized cross-account data access.
A JWT is generated by providing user credentials (API Key or username/password) to the token creation endpoint. Token creation can be performed using the API directly or the CLI.
A storage configuration provides ICA with information to connect to an external cloud storage provider, such as AWS S3. The storage configuration validates that the information provided is correct, and then continuously monitors the integration.
Refer to the following pages for instructions to setup supported external cloud storage providers:
The storage configuration requires credentials to connect to your storage. AWS uses the security credentials to authenticate and authorize your requests. On the System Settings > Credentials > Create > Storage Credential, you can enter these credentials. Long-term access keys consist of a combination of the access key ID and secret access key as a set.
Fill out the following fields:
Type—The type of access credentials. This will usually be AWS user.
Name—Provide a name to easily identify your access key.
Access key ID—The access key you created.
Secret access key—Your related secret access key.
You can share the credentials you own with other users of your tenant. To do so select your credentials at System Settings > Credentials and choose Share.
For more information, refer to the documentation.
In the ICA main navigation, select System Settings > Storage > Create.
Configure the following settings for the storage configuration.
Type—Use the default value, eg, AWS_S3. Do not change.
Region—Select the region where the bucket is located.
Configuration name—You will use this name when creating volumes that reside in the bucket. The name length must be in between 3 and 63 characters.
Description—Here you can provide a description for yourself or other users to identify this storage configuration.
Bucket name—Enter the name of your S3 bucket.
Key prefix —You can provide a key prefix to allow only files inside the prefix to be accessible. Although this setting is optional, it is highly recommended to use a key prefix and mandatory when using . The key prefix must end with "/".
If a key prefix is specified, your projects will only have access to that folder and subfolders. For example, using the key prefix folder-1/ ensures that only the data from the folder-1 folder in your S3 bucket is synced with your ICA project. Using prefixes and distinct folders for each ICA project is the recommended configuration as it allows you to use the same S3 bucket for different projects.
Using no key prefix (not recommended) results in syncing all data in your S3 bucket (starting from root level) with your ICA project. Your project will have access to your entire S3 bucket, which prevents that S3 bucket from being used for other ICA projects.
Secret—Select the credentials to associate with this storage configuration. These were created on the Credentials tab.
Server Side Encryption [Optional]—If needed, you can enter the algorithm and key name for server-side encryption processes.
Select Save.
ICA performs a series of steps in the background to verify the connection to your bucket. This can take several minutes. You may need to manually refresh the list to verify that the bucket was successfully configured. Once the storage configuration setup is complete, the configuration can be used while .
With the action Manage > Set as default for region, you select which storage will be used as default storage in a region for new projects of your tenant. Only one storage can be default at a time for a region, so selecting a new storage as default will unselect the previous default. If you do not want to have a default, you can select the default storage and the action will become Unset as default for region.
The System Settings > Credentials > select your credentials > Manage > Share action is used to make the storage available to everyone in your tenant. By default, storage is private per user so that you have complete control over the contents. Once you decide you want to share the storage, simply select it and use the Share action. Do take into account that once shared, you can not unshare the storage. Once your shared storage is used in a project, it can also no longer be deleted.
Filenames beginning with / are not allowed, so be careful when entering full path names. Otherwise the file will end up on S3 but not be visible in ICA. If this happens, access your S3 storage directly and copy the data to where it was intended. If you are using an Illumina-managed S3 storage, submit a support request to delete the erroneous data.
In the ICA main navigation, select System Settings > Storage > select your storage > Manage > Delete. You can then create a new storage configuration to reuse the bucket name and key prefix.
Every 4 hours, ICA will verify the storage configuration and credentials to ensure availability. When an error is detected, ICA will attempt to reconnect once every 15 minutes. After 200 consecutively failed connection attempts (50 hours), ICA will stop trying to connect.
When you update your credentials, the storage configuration is automatically validated. In addition, you can manually trigger revalidation when ICA has stopped trying to connect by selecting the storage and then clicking Validate on the System Settings > Storage > select your storage > Manage > Validate.
Refer to this for the troubleshooting guide.
ICA supports the following storage classes. Please see the for more information on each:
If you are using , which allows S3 to automatically move files into different cost-effective storage tiers, please do NOT include the Archive and Deep Archive Access tiers, as these are not supported by ICA yet. Instead, you can use lifecycle rules to automatically move files to Archive after 90 days and Deep Archive after 180 days. Lifecycle rules are supported for user-managed buckets.
This section describes how to connect an AWS S3 Bucket with enabled. General instructions for configuring your AWS account to allow ICA to connect to an S3 bucket are found on .
Follow the for how to create S3 bucket with SSE-KMS key.
S3-SSE-KMS must be in the same region as your ICA v2.0 project. See the for more information.
In the "Default encryption" section, enable Server-side encryption and choose AWS Key Management Service key (SSE-KMS). Then select Choose your AWS KMS key.
Once the bucket is set, create a folder with encryption enabled in the bucket that will be linked in the ICA storage configuration. This folder will be connected to ICA as a . Although it is technically possible to use the root folder, this is not recommended as it will cause the S3 bucket to no longer be available for other projects.
Follow the for connecting an S3 bucket to ICA.
In the step :
Add permission to use KMS key by adding kms:Decrypt, kms:Encrypt, and kms:GenerateDataKey
Add the ARN KMS key arn:aws:kms:xxx on the first "Resource"
On Unversioned buckets, the permssions will match the following:
On Versioned OR Suspended buckets, the permssions will match the following:
At the end of the policy setting, there should be 3 permissions listed in the "Summary".
Follow the for how to create a storage configuration in ICA.
On step 3 in process above, continue with the [Optional] Server Side Encryption to enter the algorithm and key name for server-side encryption processes.
On "Algorithm", input aws:kms
On "Key Name", input the ARN KMS key: arn:aws:kms:xxx
Although "Key prefix" is optional, it is highly recommended to use this and not use the root folder of your S3 bucket. "Key prefix" refers to the folder name in the bucket which you created.
In addition to following the instructions to , the KMS policy must include the following statement for AWS S3 Bucket with SSE-KMS Encyption (refer to the Role ARN table from the linked page for the ASSUME_ROLE_ARN value):
Bench has the ability to handle containers inside a running workspace. This allows you to install and package software more easily as a container image and provides capabilities to pull and run containers inside a workspace.
Bench offers a container runtime as a service in your running workspace. This allows you to do standardized container operations such as pulling in images from public and private registries, build containers at runtime from a Dockerfile, run containers and eventually publish your container to a registry of choice to be used in different ICA products such as ICA Flow.
The Container Service is accessible from your Bench workspace environment by default.
The container service uses the workspace disk to store any container images you pulled in or created.
To interact with the Container Service, a container remote client CLI is exposed automatically in the /data/.local/bin folder. The Bench workspace environment is preconfigured to automatically detect where the Container Service is made available using environment variables. These environment variables are automatically injected into your environment and are not determined by the Bench Workspace Image.
Use either docker or podman cli to interact with the Container Service. Both are interchangeable and support all the standardized operations commonly known.
To run a container, the first step is to either build a container from a source container or pull in a container from a registry
A public image registry does not require any form of authentication to pull the container layers.
The following command line example shows how to pull in a commonly known image.
To pull images from a private registry, the Container Service needs to authenticate to the Private Registry.
The following command line example shows how to instruct the Container Service to login into the Private registry.hub.docker.com registry
Depending on the Registry setup you can publish Container Images with or without authentication. If Authentication is required, follow the login procedure described in
The following command line example shows how to publish a locally available Container Image to a private registry in Dockerhub.
The following example shows how to save a locally available Container Image as a compressed tar archive.
This lets you upload the into the Private ICA Docker Registry.
The following example shows how to list all locally available Container Images
Container Images require storage capacity on the Bench Workspace disk. The capacity is shown when listing the locally available container images. The container Images are persisted on disk and remain available whenever a workspace stops and restarts.
The following example shows how to clean up a locally available Container Image
A Container Image can be instantiated in a Container running inside a Bench Workspace.
By default the workspace disk (/data) will be made available inside the running Container. This lets you to access data from the workspace environment.
When running a Container, the default user defined in the Container Image manifest will be used and mapped to the uid and the gid of the user in the running Bench Workspace (uid:1000, gid: 100). This will ensure files created inside the running container on the workspace disk will have the same file ownership permissions.
The following command line example shows how to run an instance a locally available Container Image as a normal user
Running a Container as root user maps the uid and gid inside the running Container to the running non-root user in the Bench Workspace. This lets you act as user with uid 0 and gid 0 inside the context of the container.
By enabling this functionality, you can install system level packages inside the context of the Container. This can be leveraged to run tools that require additional system level packages at runtime.
The following command line example shows how to run an instance of a locally available Container as root user and install system level packages
When no specific mapping is defined using the --userns flag, the user in the running Container user will be mapped to an undefined uid and gid based on an offset of id 100000. Files created in your workspace disk from the running Container will also use this uid and gid to define the ownership of the file.
Building a Container
To build a Container Image, you need to describe the instructions in a Dockerfile.
This next example builds a local Container Image and tags it as myimage:1.0 The Dockerfile used in this example is
The following command line example will build the actual Container Image
The platform GUI provides the Project Connector utility which allows data to be linked automatically between projects. This creates a one-way dynamic link for files and samples from source to destination, meaning that additions and deletions of data in the source project also affect the destination project. This differs from or which create editable copies of the data. In the destination project, you can delete data which has been moved or copied and unlink data which has been linked.
Select the source project (project that will own the data to be linked) from the Projects page (Projects > your_source_project).
Select Project Settings > Details.
Select Edit
Under Data Sharing ensure the value is set to Yes
Select Save
Select the destination project (the project to which data from the source project will be linked) from the Projects page (Projects > your_destination_project).
From the projects menu, select Project Settings > Connectivity > Project Connector
Select + Create and complete the necessary fields.
Check the box next to Active to ensure the connector will be active.
Name (required) — Provide a unique name for the connector.
Type (required) — Select the data type that will be linked (either File or Sample)
Source Project - Select the source poject whose data will be linked to.
Filter Expression (optional) — Enter an expression to restrict which files will be linked via the connector (see below)
Tags (optional) — Add tags to restrict what data will be linked via the connector. Any data in the source project with matching tags will be linked to the destination project.
The examples below will restrict linking Files based on the Format field.
Only Files with Format of FASTQ will be linked:
[?($.details.format.code == 'FASTQ')]
Only Files with Format of VCF will be linked:
[?($.details.format.code == 'VCF')]
The examples below will restrict linked Files based on a filenames.
Exact match to 'Sample-1_S1_L001_R1_001.fastq.gz':
[?($.details.name == 'Sample-1_S1_L001_R1_001.fastq.gz')]
Ends with '.fastq.gz':
[?($.details.name =~ /.*\.fastq.gz/)]
Starts with 'Sample-':
[?($.details.name =~ /Sample-.*/)]
Contains '_R1_':
[?($.details.name =~ /.*_R1_.*/)]
The examples below will restrict linking Samples based on User Tags and Sample name, respectively.
Only Samples with the User Tag 'WGS-Project-1'
[?('WGS-Project-1' in $.tags.userTags)]
Link a Sample with the name 'BSSH_Sample_1':
[?($.name == 'BSSH_Sample_1')]
You can access the databases and tables within the Base module using Python from your local machine. Once retrieved as e.g. pandas object, the data can be processed further. In this tutorial, we will describe how you could create a Python script which will retrieve the data and visualize it using Dash framework. The script will contain the following parts:
Importing dependencies and variables.
Function to fetch the data from Base table.
Creating and running the Dash app.
This part of the code imports the dependencies which have to be installed on your machine (possibly with pip). Furthermore, it imports the variables API_KEY and PROJECT_ID from the file named config.
We will be creating a function called fetch_data to obtain the data from Base table. It can be broken into several logically separated parts:
Retrieving the token to access the Base table together with other variables using API.
Establishing the connection using the token.
SQL query itself. In this particular example, we are extracting values from two tables Demo_Ingesting_Metrics and BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL. The table Demo_Ingesting_Metrics contains various metrics from DRAGEN analyses (e.g. the number of bases with quality at least 30 Q30_BASES) and metadata in the column ica which needs to be flattened to access the value Execution_reference. Both tables are joined on this Execution_reference value.
Fetching the data using the connection and the SQL query.
Here is the corresponding snippet:
Once the data is fetched, it is visualized in an app. In this particular example, a scatter plot is presented with END_DATE as x axis and the choice of the customer from the dropdown as y axis.
Now we can create a single Python script called dashboard.py by concatenating the snippets and running it. The dashboard will be accessible in the browser on your machine.
Non-indexed folders () are designed for optimal performance in situations where no file actions are needed. They serve as fast storage in situations like temporary analysis file storage where you don't need access or searches via the GUI to individual files or subfolders within the folder. Think of a non-indexed folder as a data container. You can access the container which contains all the data, but you can not access the individual data files within the container from the GUI. As non-indexed folders contain data, they count towards your total project storage.
The GUI considers non-indexed folders as a single object. You can access the contents from a non-indexed folder
as Analysis input/output
in Bench
via the API
Base is a genomics data aggregation and knowledge management solution suite. It is a secure and scalable integrated genomics data analysis solution which provides Information management and knowledge mining. You can analyze, aggregate and query data for new insights that can inform and improve diagnostic assay development, clinical trials, patient testing and patient care. Clinically relevant data needs to be generated and extracted from routine clinical testing and clinical questions need to be asked across all data and information sources. As a large data store, Base provides a secure and compliant environment to accumulate data, allowing for efficient exploration of the aggregated data. This data consists of test results, patient data, metadata, reference data, consent and QC data.
Base can be used by for different use cases:
Clinical and Academic Researchers:
Big data storage solution housing all aggregated sample test outcomes
Analyze information by way of a convenient query formalism
Look for signals in combined phenotypic and genotypic data
Analyze QC patterns over large cohorts of patients
Securely share (sub)sets of data with other scientists
Generate reports and analyze trends in a straightforward and simple manner.
Bioinformaticians:
Access, consult, audit, and query all relevant data and QC information for tests run
All accumulated data and accessible pipelines can be used to investigate and improve bioinformatics for clinical analysis
Metadata is captured via automatic pipeline version tracking, including information on individual tools and/or reference files used during processing for each sample analyzed, information on the duration of the pipeline, the execution path of the different analytical steps, or in case of failure, exit codes can be warehoused.
Product Developers and Service Providers:
Better understand the efficiency of kits and tests
Analyze usage, understand QC data trends, improve products
Store and aggregate business intelligence data such as lab identification, consumption patterns and frequency, as well as allow renderings of test result outcome trends and much more.
Data Warehouse Creation: Build a relational database for your Project in which desired data sets can be selected and aggregated. Typical data sets include pipeline output metrics and other suitable data files generated by the ICA platform which can be complemented by additional public (or privately built) databases.
Report and Export: Once created, a data warehouse can be mined using standard database query instructions. All Base data is stored in a structured and easily accessible way. An interface allows for the selection of specific datasets and conditional reporting. All queries can be stored, shared, and re-used in the future. This type of standard functionality supports most expected basic mining operations, such as variant frequency aggregation. All result sets can be downloaded or exported in various standard data formats for integration in other reporting or analytical applications.
Detect Signals and Patterns: extensive and detailed selection of subsets of patients or samples adhering to any imaginable set of conditions is possible. Users can, for example, group and list subjects based on a combination of (several) specific genetic variants in combination with patient characteristics such as therapeutic (outcome) information. The built-in integration with public datasets allows users to retrieve all relevant publications, or clinically significant information for a single individual or a group of samples with a specific variant. Virtually any possible combination of stored sample and patient information allow for detecting signals and patterns by a simple single query on the big data set.
Profile/Cluster patients: use and re-analyze patient cohort information based on specific sample or individual characteristics. For instance, they might want to run a next agile iteration of clinical trials with only patients that respond. Through integrated and structured consent information allowing for time-boxed use, combined with the capability to group subjects by the use of a simple query, patients can be stratified and combined to export all relevant individuals with their genotypic and phenotypic information to be used for further research.
Share your data: Data sharing is subject to strict ethical and regulatory requirements. Base provides built-in functionality to securely share (sub)sets of your aggregated data with third parties. All data access can be monitored and audited, in this way Base data can be shared with people in and outside of an organization in a compliant and controlled fashion.
Base is a module that can be found in a project. It is shown in the menu bar of the project.
The access to activate the Base module is controlled based upon the chosen subscription (full and premium subscriptions give access to Base) when registering the account. This will all happen automatically after the first user logs into the system for that account. So from the moment the account is up and running, the Base module will also be ready to be enabled.
When a user has created a project, they can go to the Base pages and click the Enable button. From that moment on, every user who has the proper permissions has access to the Base module in that project.
Only the project owner can enable Illumina Connected Analytics Base. Make sure that your subscription for the domain includes Base.
Navigate to Projects > your_project > Base > Tables / Query / Schedule.
Select Enable
Access to the projects and all modules located within the project is provided via the Team page within the project.
The status and history of Base activities and jobs are shown on the page.
Nextflow offers support for Scatter-gather pattern natively. The initial uses this pattern by splitting the FASTA file into chunks to channel records in the task splitSequences, then by processing these chunks in the task reverse.
In this tutorial, we will create a pipeline which will split a TSV file into chunks, sort them, and merge them together.
Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create > Nextflow > XML based button to start creating a Nextflow pipeline.
In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.
First, we present the individual processes. Select +Nextflow files > + Create and label the file split.nf. Copy and paste the following definition.
Next, select +Create and name the file sort.nf. Copy and paste the following definition.
Select +Create again and label the file merge.nf. Copy and paste the following definition.
Add the corresponding main.nf file by navigating to the Nextflow files > main.nf tab and copying and pasting the following definition.
Here, the operators flatten and collect are used to transform the emitting channels. The Flatten operator transforms a channel in such a way that every item of type Collection or Array is flattened so that each single entry is emitted separately by the resulting channel. The collect operator collects all the items emitted by a channel to a List and return the resulting object as a sole emission.
Finally, copy and paste the following XML configuration into the XML Configuration tab.
Click the Generate button (at the bottom of the text editor) to preview the launch form fields.
Click the Save button to save the changes.
Go to the Pipelines page from the left navigation pane. Select the pipeline you just created and click Start New Analysis.
Fill in the required fields indicated by red "*" sign and click on Start button. You can monitor the run from the Analyses page. Once the Status changes to Succeeded, you can click on the run to access the results page.
In Projects > your_project > Flow > Analyses > your_analysis > Steps you can see that the input file is split into multiple chunks, then these chunks are sorted and merged.
DRAGEN can run in workspaces
In either FPGA mode (hardware-accelerated) or software mode when using FPGA instances. This can be useful when comparing performance gains by hardware acceleration or to distribute concurrent processes between the FPGA and cpu.
In software mode when using non-FPGA instances.
The DRAGEN command line parameters to specify the location of the licence file are different.
FPGA mode uses LICENSE_PARAMS=``"--lic-instance-id-location /opt/dragen-licence/instance-identity.protected --lic-credentials /opt/dragen-licence/instance-identity.protected/dragen-creds.lic"
Software mode uses LICENSE_PARAMS="--sw-mode --lic-credentials /opt/dragen-licence/instance-identity.protected/dragen-creds-sw-mode.lic"
DRAGEN software is provided in specific Bench images with names starting with Dragen. For example (available versions may vary):
Dragen 4.4.1 - Minimal provides DRAGEN 4.4.1 and SSH access
Dragen 4.4.6 provides DRAGEN 4.4.6, SSH and JupyterLab.
The instance type is selected during workspace creation (Projects > your_project > Bench > Workspaces). The amount of RAM available on the instance is critical. 256GiB RAM is a safe choice to run DRAGEN in production. All FPGA2 instances offer 256GiB or more of RAM.
When running in Software mode, use (348GiB RAM) or (144 GiB RAM) to ensure enough RAM is available for your runs.
Using an fpga2-medium .
Using a standard-xlarge .
Software mode is activated with the DRAGEN --sw-mode parameter.
The project details page contains the properties of the project, such as the location, owner, storage and linked bundles. This is also the place where you add assets in the form of linked .
The project details are configured during project creation and may be updated by the project owner, entities with the project Adminstrator role, and tenant administrators.
Click the Edit button at the top of the Details page.
Click the + button, under LINKED BUNDLES.
Click on the desired bundle, then click the Link button.
Click Save.
If your linked bundle contained a pipeline, then it will appear in Projects > your_project > Flow > Pipelines.
A project's billing mode determines the strategy for how costs are charged to billable accounts.
For example, with billing mode set to Tenant, if tenant A has created a project resource and uses it in their project, then tenant A will pay for the resource data, compute costs and storage costs of any output they generate within the project. When they share the project with tenant B, then tenant B will pay the compute and storage for the data which they generate in that project. Put simply, in billing mode tenant, the person who generates data pays for the processing and storage of that data, regardless of who owns the actual project.
If the project billing mode is updated after the project has been created, the updated billing mode will only be applied to resources generated after the change.
If you are using your own S3 storage, then the billing mode impacts where collaborator data is stored.
Project billing will result in using your S3 storage for the data.
Tenant billing will result in collaborator data being stored in Illumina-managed storage instead of your own S3 storage.
Tenant billing, when your collaborators also have their own S3 storage and have it set as default, will result in their data being stored in their S3 storage.
Use the Create OAuth access token button to generate an OAuth access token which is valid for 12 hours after generation. This token can be used by Snowflake and Tableau to access the data in your Base databases and tables for this Project.
See for more information.
from dash import Dash, html, dcc, callback, Output, Input
import plotly.express as px
from config import API_KEY, PROJECT_ID
import requests
import snowflake.connector
import pandas as pddef fetch_data():
# Your data fetching and processing code here
# retrieving the Base oauth token
url = 'https://ica.illumina.com/ica/rest/api/projects/' + PROJECT_ID + '/base:connectionDetails'
# set the API headers
headers = {
'X-API-Key': API_KEY,
'accept': 'application/vnd.illumina.v3+json'
}
response = requests.post(url, headers=headers)
ctx = snowflake.connector.connect(
account=response.json()['dnsName'].split('.snowflakecomputing.com')[0],
authenticator='oauth',
token=response.json()['accessToken'],
database=response.json()['databaseName'],
role=response.json()['roleName'],
warehouse=response.json()['warehouseName']
)
cur = ctx.cursor()
sql = '''
WITH flattened_Demo_Ingesting_Metrics AS (
SELECT
flattened.value::STRING AS execution_reference_Demo_Ingesting_Metrics,
t1.SAMPLEID,
t1.VARIANTS_TOTAL_PASS,
t1.VARIANTS_SNPS_PASS,
t1.Q30_BASES,
t1.READS_WITH_MAPQ_3040_PCT
FROM
Demo_Ingesting_Metrics t1,
LATERAL FLATTEN(input => t1.ica) AS flattened
WHERE
flattened.key = 'Execution_reference'
) SELECT
f.execution_reference_Demo_Ingesting_Metrics,
f.SAMPLEID,
f.VARIANTS_TOTAL_PASS,
f.VARIANTS_SNPS_PASS,
t2."EXECUTION_REFERENCE",
t2.END_DATE,
f.Q30_BASES,
f.READS_WITH_MAPQ_3040_PCT
FROM
flattened_Demo_Ingesting_Metrics f
JOIN
BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL t2
ON
f.execution_reference_Demo_Ingesting_Metrics = t2."EXECUTION_REFERENCE";
'''
cur.execute(sql)
data = cur.fetch_pandas_all()
return data
df = fetch_data()
app = Dash(__name__)
#server = app.server
app.layout = html.Div([
html.H1("My Dash Dashboard"),
html.Div([
html.Label("Select X-axis:"),
dcc.Dropdown(
id='x-axis-dropdown',
options=[{'label': col, 'value': col} for col in df.columns],
value=df.columns[5] # default value
),
html.Label("Select Y-axis:"),
dcc.Dropdown(
id='y-axis-dropdown',
options=[{'label': col, 'value': col} for col in df.columns],
value=df.columns[2] # default value
),
]),
dcc.Graph(id='scatterplot')
])
@callback(
Output('scatterplot', 'figure'),
Input('y-axis-dropdown', 'value')
)
def update_graph(value):
return px.scatter(df, x='END_DATE', y=value, hover_name='SAMPLEID')
if __name__ == '__main__':
app.run(debug=True)S3 Standard
Available
S3 Intelligent-Tiering
Available
S3 Express One Zone
Available
S3 Standard-IA
Available
S3 One Zone-IA
Available
S3 Glacier Instant Retrieval
Available
S3 Glacier Flexible Retrieval
Archived
S3 Glacier Deep Archive
Archived
Reduced redundancy (not recommended)
Available
# Pull Container image from Dockerhub
/data $ docker pull alpine:latest # Pull a Container Image from Dockerhub
/data $ docker login -u <username> registry.hub.docker.com
Password:
Login Succeeded!
/data $ docker pull registry.hub.docker.com/<privateContainerUri>:<tag> # Push a Container Image to a Private registry in Dockerhub
/data $ docker pull alpine:latest
/data $ docker tag alpine:latest registry.hub.docker.com/<privateContainerUri>:<tag>
/data $ docker push registry.hub.docker.com/<privateContainerUri>:<tag> # Save a Container Image as a compressed archive
/data $ docker pull alpine:latest
/data $ docker save alpine:latest | bzip2 > /data/alpine_latest.tar.bz2 # List all local available images
/data $ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/library/alpine latest aded1e1a5b37 3 weeks ago 8.13 MB # Remove a locally available image
/data $ docker rmi alpine:latest # Run a Container as a normal user
/data $ docker run -it --rm alpine:latest
~ $ id
uid=1000(ica) gid=100(users) groups=100(users) # Run a Container as root user
/data $ docker run -it --rm --userns keep-id:uid=0,gid=0 --user 0:0 alpine:latest
/ # id
uid=0(root) gid=0(root) groups=0(root)
/ # apk add rsync
...
/ # rsync
rsync version 3.4.0 protocol version 32
... # Run a Container as a non-mapped root user
/data $ docker run -it --rm --user 0:0 alpine:latest
/ # id
uid=0(root) gid=0(root) groups=100(users),0(root)
/ # touch /data/myfile
/ #
# Exited the running Container back to the shell in the running Bench Workspace
/data $ ls -al /data/myfile
-rw-r--r-- 1 100000 100000 0 Mar 13 08:27 /data/myfile FROM alpine:latest
RUN apk add rsync
COPY myfile /root/myfile # Build a Container image locally
/data $ mkdir /tmp/buildContext
/data $ touch /tmp/buildContext/myFile
/data $ docker build -f /tmp/Dockerfile -t myimage:1.0 /tmp/buildContext
...
/data $ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/library/alpine latest aded1e1a5b37 3 weeks ago 8.13 MB
localhost/myimage 1.0 06ef92e7544f About a minute ago 12.1 MB mkdir /data/demo
cd /data/demo
# download ref
wget --progress=dot:giga https://s3.amazonaws.com/stratus-documentation-us-east-1-public/dragen/reference/Homo_sapiens/hg38.fa -O hg38.fa
# => 0.5min
# Build ht-ref
mkdir ref
dragen --build-hash-table true --ht-reference hg38.fa --output-directory ref
# => 6.5min
# run DRAGEN mapper
FASTQ=/opt/edico/self_test/reads/midsize_chrM.fastq.gz
# Next line is needed to resolve "run the requested pipeline with a pangenome reference, but a linear reference was provided" in DRAGEN (4.4.1 and others). Comment out when encountering unrecognised option '--validate-pangenome-reference=false'.
DRAGEN_VERSION_SPECIFIC_PARAMS="--validate-pangenome-reference=false"
# License Parameters
LICENSE_PARAMS="--lic-instance-id-location /opt/dragen-licence/instance-identity.protected --lic-credentials /opt/dragen-licence/instance-identity.protected/dragen-creds.lic"
mkdir out
dragen -r ref --output-directory out --output-file-prefix out -1 $FASTQ --enable-variant-caller false --RGID x --RGSM y ${LICENSE_PARAMS} ${DRAGEN_VERSION_SPECIFIC_PARAMS}
# => 1.5min (10 sec if fpga already programmed)mkdir /data/demo
cd /data/demo
# download ref
wget --progress=dot:giga https://s3.amazonaws.com/stratus-documentation-us-east-1-public/dragen/reference/Homo_sapiens/hg38.fa -O hg38.fa
# => 0.5min
# Build ht-ref
mkdir ref
dragen --build-hash-table true --ht-reference hg38.fa --output-directory ref
# => 6.5min
# run DRAGEN mapper
FASTQ=/opt/edico/self_test/reads/midsize_chrM.fastq.gz
# Next line is needed to resolve "run the requested pipeline with a pangenome reference, but a linear reference was provided" in DRAGEN (4.4.1 and others). Comment out when encountering ERROR: unrecognised option '--validate-pangenome-reference=false'.
DRAGEN_VERSION_SPECIFIC_PARAMS="--validate-pangenome-reference=false"
# When using DRAGEN 4.4.6 and later, the line above should be extended with --min-memory 0 to skip the memory check.
DRAGEN_VERSION_SPECIFIC_PARAMS="--validate-pangenome-reference=false --min-memory 0"
# License Parameters
LICENSE_PARAMS="--sw-mode --lic-credentials /opt/dragen-licence/instance-identity.protected/dragen-creds-sw-mode.lic"
mkdir out
dragen -r ref --output-directory out --output-file-prefix out -1 $FASTQ --enable-variant-caller false --RGID x --RGSM y ${LICENSE_PARAMS} ${DRAGEN_VERSION_SPECIFIC_PARAMS}
# => 2minName
Name of the project unique within the tenant. Alphanumerics, underscores, dashes, and spaces are permitted.
Short Description
Short description of the project
Project Owner
Owner of the project (has Administrator access to the project)
Storage Configuration
Storage configuration to use for data stored in the project
User Tags
User tags on the project
Technical Tags
Technical tags on the project
Metadata Model
Metadata model assigned to the project
Project Location
Project region where data is stored and pipelines are executed. Options are derived from the Entitlement(s) assigned to user account, based on the purchased subscription
Storage Bundle
Storage bundle assigned to the project. Derived from the selected Project Location based on the Entitlement in the purchased subscription
Billing Mode
Billing mode assigned to the project
Data sharing
Enables data and samples in the project to be linked to other projects
Project
All incurred costs will be charged to the tenant of the project owner
Tenant
Incurred costs will be charged to the tenant of the user owning the project resource (ie, data, analysis). The only exceptions are base tables and queries, as well as bench compute and storage costs, which are always billed to the project owner.
process split {
container 'public.ecr.aws/lts/ubuntu:22.04'
pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
cpus 1
memory '512 MB'
input:
path x
output:
path("split.*.tsv")
"""
split -a10 -d -l3 --numeric-suffixes=1 --additional-suffix .tsv ${x} split.
"""
}process sort {
container 'public.ecr.aws/lts/ubuntu:22.04'
pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
cpus 1
memory '512 MB'
input:
path x
output:
path '*.sorted.tsv'
"""
sort -gk1,1 $x > ${x.baseName}.sorted.tsv
"""
}process merge {
container 'public.ecr.aws/lts/ubuntu:22.04'
pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
cpus 1
memory '512 MB'
publishDir 'out', mode: 'symlink'
input:
path x
output:
path 'merged.tsv'
"""
cat $x > merged.tsv
"""
}nextflow.enable.dsl=2
include { sort } from './sort.nf'
include { split } from './split.nf'
include { merge } from './merge.nf'
params.myinput = "test.test"
workflow {
input_ch = Channel.fromPath(params.myinput)
split(input_ch)
sort(split.out.flatten())
merge(sort.out.collect())
}<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
<pd:dataInputs>
<pd:dataInput code="myinput" format="TSV" type="FILE" required="true" multiValue="false">
<pd:label>myinput</pd:label>
<pd:description></pd:description>
</pd:dataInput>
</pd:dataInputs>
<pd:steps/>
</pd:pipeline>
Creation
Yes
You can create non-indexed folders at Projects > your_project > Data > Manage > Create non-indexed folder. or with the /api/projects/{projectId}/data:createNonIndexedFolder endpoint
Deletion
Yes
You can delete non-indexed folders by selecting them at Projects > your_project > Data > select the folder > Manage > Delete.
or with the /api/projects/{projectId}/data/{dataId}:delete endpoint
Uploading Data
API Bench Analysis
Use non-indexed folders as normal folders for Analysis runs and bench. Different methods are available with the API such as creating temporary credentials to upload data to S3 or using /api/projects/{projectId}/data:createFileWithUploadUrl
Downloading Data
Yes
Use non-indexed folders as normal folders for Analysis runs and bench. Use temporary credentials to list and download data with the API.
Analysis Input/Output
Yes
Non-indexed files can be used as input for an analysis and the non-indexed folder can be used as output location. You will not be able to view the contents of the input and output in the analysis details screen.
Bench
Yes
Non-indexed folders can be used in Bench and the output from Bench can be written to non-indexed folders. Non-indexed folders are accessible across Bench workspaces within a project.
Viewing
No
The folder is a single object, you can not view the contents.
Linking
Yes
You cannot see non-indexed folder contents.
Copying
No
Prohibited to prevent storage issues.
Moving
No
Prohibited to prevent storage issues.
Managing tags
No
You cannot see non-indexed folder contents.
Managing format
No
You cannot see non-indexed folder contents.
Use as Reference Data
No
You cannot see non-indexed folder contents.

On the Schedule page at Projects > your_project > Base > Schedule, it’s possible to create a job for importing different types of data you have access to into an existing table.
When creating or editing a schedule, Automatic import is performed when the Active box is checked. The job will run at 10 minute intervals. In addition, for both active and inactive schedules, a manual import is performed when selecting the schedule and clicking the »run button.
There are different types of schedules that can be set up:
Files
Metadata
Administrative data.
This type will load the content of specific files from this project into a table. When adding or editing this schedule you can define the following parameters:
Name (required): The name of the scheduled job
Description: Extra information about the schedule
File name pattern (required): Define in this field a part or the full name of the file name or of the tag that the files you want to upload contain. For example, if you want to import files named sample1_reads.txt, sample2_reads.txt, … you can fill in _reads.txt in this field to have all files that contain _reads.txt imported to the table.
Generated by Pipelines: Only files generated by these selected pipelines are taken into account. When left clear, files from all pipelines are used.
Target Base Table (required): The table to which the information needs to be added. A drop-down list with all created tables is shown. This means the table needs to be created before the schedule can be created.
Write preference (required): Define data handling; whether it can overwrite the data
Data format (required): Select the data format of the files (CSV, TSV, JSON)
Delimiter (required): to indicate which delimiter is used in the delimiter separated file. If the delimiter is not present in list, it can be indicated as custom.
Active: The job will run automatically if checked
Custom delimiter: the custom delimiter that is used in the file. You can only enter a delimiter here if custom delimiter is selected.
Header rows to skip: The number of consecutive header rows (at the top of the table) to skip.
References: Choose which references must be added to the table
Advanced Options
Encoding (required): Select the encoding of the file.
Null Marker: Specifies a string that represents a null value in a CSV/TSV file.
Quote: The value (single character) that is used to quote data sections in a CSV/TSV file. When this character is encountered at the beginning and end of a field, it will be removed. For example, entering " as quote will remove the quotes from "bunny" and only store the word bunny itself.
Ignore unknown values: This applies to CSV-formatted files. You can use this function to handle optional fields without separators, provided that the missing fields are located at the end of the row. Otherwise, the parser can not detect the missing separator and will shift fields to the left, resulting in errors.
If headers are used: The columns that have matching fields are loaded, those that have no matching fields are loaded with NULL and remaining fields are discarded.
If no headers are used: The fields are loaded in order of occurrence and trailing missing fields are loaded with NULL, trailing additional fields are discarded.
This type will create two new tables: BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL and ICA_PROJECT_SAMPLE_META_DATA. The job will load metadata (added to the samples) into ICA_PROJECT_SAMPLE_META_DATA. The process gathers the metadata from the samples via the data linked to the project and the metadata from the analyses in this project. Furthermore, the schedular will add provenance data to BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL. This process gathers the execution details of all the analyses in the project: the pipeline name and status, the user reference, the input files (with identifiers), and the settings selected at runtime. This enables you to track the lineage of your data and to identify any potential sources of errors or biases. So, for example, the following query will count how many times each of the pipelines was executed and sort it accordingly:
SELECT PIPELINE_NAME, COUNT(*) AS Appearances
FROM BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL
GROUP BY PIPELINE_NAME
ORDER BY Appearances DESC;To obtained the similar table for the failed runs, you can execute the following SQL query:
SELECT PIPELINE_NAME, COUNT(*) AS Appearances
FROM BB_PROJECT_PIPELINE_EXECUTIONS_DETAIL
WHERE PIPELINE_STATUS = 'Failed'
GROUP BY PIPELINE_NAME
ORDER BY Appearances DESC;When adding or editing this schedule you can define the following parameters:
Name (required): the name of this scheduled job.
Description: Extra information about the schedule.
Include sensitive meta data fields: in the meta data fields configuration, fields can be set to sensitive. When checked, those fields will also be added.
Active: the job will run automatically if ticked.
Source (Tenant Administrators Only):
Project (default): All administrative data from this project will be added.
Account: All administrative data from every project in the account will be added. When a tenant admin creates the tenant-wide table with administrative data in a project and invites other users to this project, these users will see this table as well.
This type will automatically create a table and load administrative data into this table. A usage overview of all executions is considered administrative data.
When adding or editing this schedule the following parameters can be defined:
Name (required): The name of this scheduled job.
Description: Extra information about the schedule.
Include sensitive metadata fields: In the metadata fields configuration, fields can be set to sensitive. When checked, those fields will also be added.
Active: The job will run automatically if checked.
Source (Tenant Administrators Only):
Project (default): All administrative data from this project will be added.
Account: All administrative data from every project in the account will be added. When a tenant admin creates the tenant-wide table with administrative data in a project and invites other users to this project, these users will see this table as well.
Schedules can be deleted. Once deleted, they will no longer run, and they will not be shown in the list of schedules.
When clicking the Run button, or Save & Run when editing, the schedule will start the job of importing the configured data in the correct tables. This way the schedule can be run manually. The result of the job can be seen in the tables.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:Encrypt",
"kms:GenerateDataKey",
"s3:PutBucketNotification",
"s3:ListBucket",
"s3:GetBucketNotification",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:kms:xxx",
"arn:aws:s3:::BUCKET_NAME"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:RestoreObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::BUCKET_NAME/YOUR_FOLDER_NAME/*"
},
{
"Effect": "Allow",
"Action": [
"sts:GetFederationToken"
],
"Resource": [
"*"
]
}
]
}{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:Encrypt",
"kms:GenerateDataKey",
"s3:PutBucketNotification",
"s3:ListBucket",
"s3:GetBucketNotification",
"s3:GetBucketLocation",
"s3:ListBucketVersions",
"s3:GetBucketVersioning"
],
"Resource": [
"arn:aws:kms:xxx",
"arn:aws:s3:::BUCKET_NAME"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:RestoreObject",
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:GetObjectVersion"
],
"Resource": "arn:aws:s3:::BUCKET_NAME/YOUR_FOLDER_NAME/*"
},
{
"Effect": "Allow",
"Action": [
"sts:GetFederationToken"
],
"Resource": [
"*"
]
}
]
} {
"Sid": "AllowCrossAccountAccess",
"Effect": "Allow",
"Principal": {
"AWS": "ASSUME_ROLE_ARN"
},
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": "*"
}



move
x
x
x
x
x
copy
x
x
x
x
manual link
x
x
x
project connector
x
x
x
This tutorial aims to guide you through the process of creating CWL tools and pipelines from the very beginning. By following the steps and techniques presented here, you will gain the necessary knowledge and skills to develop your own pipelines or transition existing ones to ICA.
The foundation for every tool in ICA is a Docker image (externally published or created by the user). Here we present how to create your own Docker image for the popular tool (FASTQC).
Copy the contents displayed below to a text editor and save it as a Dockerfile. Make sure you use an editor which does not add formatting to the file.
FROM centos:7
WORKDIR /usr/local
# DEPENDENCIES
RUN yum -y install java-1.8.0-openjdk wget unzip perl && \
yum clean all && \
rm -rf /var/cache/yum
# INSTALLATION fastqc
RUN wget http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.9.zip --no-check-certificate && \
unzip fastqc_v0.11.9.zip && \
chmod a+rx /usr/local/FastQC/fastqc && rm -rf fastqc_v0.11.9.zip
# Adding FastQC to the PATH
ENV PATH $PATH:/usr/local/FastQC
# DEFAULTS
ENV LANG=en_US.UTF-8
ENV LC_ALL=en_US.UTF-8
ENTRYPOINT []
## how to build the docker image
## docker build --file fastqc-0.11.9.Dockerfile --tag fastqc-0.11.9:0 .
## docker run --rm -i -t --entrypoint /bin/bash fastqc-0.11.9:0Open a terminal window, place this file in a dedicated folder and navigate to this folder location. Then use the following command:
docker build --file fastqc-0.11.9.Dockerfile --tag fastqc-0.11.9:1 .Check the image has been successfully built:
docker imagesCheck that the container is functional:
docker run --rm -i -t --entrypoint /bin/bash fastqc-0.11.9:1Once inside the container check that the fastqc command is responsive and prints the expected help message. Remember to exit the container.
Save a tar of the previously built image locally:
docker save fastqc-0.11.9:1 -o fastqc-0.11.9:1.tar.gzUpload your docker image .tar to an ICA project (browser upload, Connector, or CLI).
In Projects > your_project > Data, select the uploaded .tar file, then click Manage > Change Format , select DOCKER and Save.
Now go outside of the Project and go to System Settings > Docker Repository, Select Create > Image. Select your docker file and fill out a name and version and set your type to tool and Press Select.
While outside of any Project go to System Settings > Tool Repository and Select +Create. Fill the mandatory fields (Name and Version) and look for a Docker image to link to the tool.
Tool creation in ICA adheres to the cwl standard.
You can create a tool by either pasting the tool definition in the code syntax field on the right or you can use the different tabs to manually define inputs, outputs, arguments, settings, etc …
In this tutorial we will use the CWL tool syntax method. Paste the following content in the General tab.
#!/usr/bin/env cwl-runner
# (Re)generated by BlueBee Platform
$namespaces:
ilmn-tes: http://platform.illumina.com/rdf/iap/
cwlVersion: cwl:v1.0
class: CommandLineTool
label: FastQC
doc: FastQC aims to provide a simple way to do some quality control checks on raw
sequence data coming from high throughput sequencing pipelines.
inputs:
Fastq1:
type: File
inputBinding:
position: 1
Fastq2:
type:
- File
- 'null'
inputBinding:
position: 3
outputs:
HTML:
type:
type: array
items: File
outputBinding:
glob:
- '*.html'
Zip:
type:
type: array
items: File
outputBinding:
glob:
- '*.zip'
arguments:
- position: 4
prefix: -o
valueFrom: $(runtime.outdir)
- position: 1
prefix: -t
valueFrom: '2'
baseCommand:
- fastqcSince the user needs to specify the output folder for FASTQC application (-o prefix), we are using the $(runtime.outdir) runtime parameter to point to the designated output folder.
Navigate to Projects > your_project > Flow > Pipelines > +Create > CWL Graphical.
Fill the mandatory fields and click on the Definition tab to open the Graphical Editor.
Expand the Tool Repository menu (lower right) and drag your FastQC tool into the Editor field (center).
Now drag one Input and one Output file icon (on top) into the Editor field as well. Both may be given a Name (editable fields on the right when icon is selected) and need a Format attribute. Set the Input Format to fastq and Output Format to html. Connect both Input and Output files to the matching nodes on the tool itself (mouse over the node, then hold-click and drag to connect).
Press Save, you just created your first FastQC pipeline on ICA!
First make sure you have at least one Fastq file uploaded and/or linked to your Project. You may use Fastq files available in the Bundle.
Navigate to Pipelines and select the pipeline you just created, then press Start analysis
Fill the mandatory fields and click on the + button to open the File Selection dialog box. Select one of the Fastq files available to you.
Press Start analysis on the top right, the platform is now orchestrating the pipeline execution.
Navigate to Projects > your_project > Flow > Analyses and observe that the pipeline execution is now listed and will first be in Status Requested. After a few minutes the Status should change to In Progress and then to Succeeded.
Once this Analysis succeeds click it to enter the Analysis details view. You will see the FastQC HTML output file listed on the Output files tab. Click on the file to open Data Details view. Since it is an HTML file Format there is a View tab that allows visualizing the HTML within the browser.
This walk-through is intended to represent a typical workflow when building and studying a cohort of oncology cases.
Click Create Cohort button.
Select the following studies to add to your cohort:
TCGA – BRCA – Breast Invasive Carcinoma
TCGA – Ovarian Serous Cystadenocarcinoma
Add a Cohort Name = TCGA Breast and Ovarian_1472
Click on Apply.
Expand Show query details to see the study makeup of your cohort.
Charts will be open by default. If not, click Show charts
Use the gear icon in the top-right to change viewable chart settings.
Tip:
Disease Type,Histological Diagnosis,Technology,Overall Survivalhave interesting data about this cohorts
The Subject tab with all Subjects list is displayed below Charts with a link to each Subject by ID and other high-level information, like Data Types measured and reported. By clicking a subject ID, you will be brought to the data collected at the Subject level.
Search for subject TCGA-E2-A14Y and view the data about this Subject.
Click the TCGA-E2-A14Y Subject ID link to view clinical data for this Subject that was imported via the metadata.tsv file on ingest.
Note: the Subject is a 35 year old Female with vital status and other phenotypes that feed up into the
Subjectattribute selection criteria when creating or editing cohorts.
Click X to close the Subject details.
Click Hide charts to increase interactive landscape.
Click the Marker Frequency tab, then click the Somatic Mutation tab.
Review the gene list and mutation frequencies.
Note that PIK3CA has a high rate of mutation in the Cohort (ranked 2nd with 33% mutation frequency in 326 of the 987 Subjects that have Somatic Mutation data in this cohort).
Do Subjects with PIK3CA mutations have changes in PIK3CA RNA Expression?
Click the Gene Expression tab, search for PIK3CA
PIK3CA RNA is down-regulated in 27% of the subjects relative to normal samples.
Switch from normal to disease Reference where the Subject’s denominator is the median of all disease samples in your cohort.
The count of matching vs. total subjects that have PIK3CA up-regulated RNA which may indicate a distinctive sub-phenotype.
Click directly on PIK3CA gene link in the Gene Expression table.
You are brought to the Gene tab under the Gene Summary sub-tab that lists information and links to public resources about PIK3CA.
Click the Variants tab and Show legend and filters if it does not open by default.
Below the interactive legend you see a set of analysis tracks: Needle Plot, Primate AI, Pathogenic variants, and Exons.
The Needle Plot allows toggling the plot by gnomAD frequency and Sample Count. Select Sample Count in the Plot by legend above the plot.
There are 87 mutations distributed across the 1068 amino acid sequence, listed below the analysis tracks. These can be exported via the icon into a table.
We know that missense variants can severely disrupt translated protein activity. Deselect all Variant Types except for Missense from the Show Variant Type legend above the needle plot.
Many mutations are in the functional domains of the protein as seen by the colored boxes and labels on the x-axis of the Needle Plot.
Hover over the variant with the highest sample count in the yellow PI3Ka protein domain.
The pop-up shows variant details for the 64 Subjects observed with it: 63 in the Breast Cancer study and 1 in the Ovarian Cancer Study.
Use the Exon zoom bar from each end of the Amino Acid sequence to zoom in to the PI3Ka domain to better separate observations.
There are three different missense mutations at this locus changing the wildtype Glutamine at different frequencies to Lysine (64), Glycine (6), or Alanine (2).
The Pathogenic Variant Track shows 7 ClinVar entries for mutations stacked at this locus affecting amino acid 545. Pop up details with pathogenicity calls, phenotypes, submitter and a link to the ClinVar entry is seen by hovering over the purple triangles.
Note the Primate AI track and high Primate AI score.
Primate AI track displays Scores for potential missense variants, based on polymorphisms observed in primate species. Points above the dashed line for the 75th percentile may be considered likely pathogenic as cross-species sequence is highly conserved; you often see high conservancy at the functional domains. Points below the 25th percentile may be considered "likely benign".
Click the Expression tab and notice that normal Breast and normal Ovarian tissue have relatively high PIK3CA RNA Expression in GTex RNAseq tissue data but ubiquitously expressed.
You can access the databases and tables within the Base module using snowSQL command-line interface. This is useful for external collaborators who do not have access to ICA core functionalities. In this tutorial we will describe how to obtain the token and use it for accessing the Base module. This tutorial does not cover how to install and configure snowSQL.
Once the Base module has been enabled within a project, the following details are shown in Projects > your_project > Project Settings > Details.
After clicking the button Create OAuth access token, the pop-up authenticator is displayed.
After clicking the button Generate snowSQL command the pop-up authenticator presents the snowSQL command.
Copy the snowSQL command and run it in the console to log in.
You can also get the OAuth access token via API by providing <PROJECT ID> and <YOUR KEY>.
API Call:
curl -X 'POST' \
'https://ica.illumina.com/ica/rest/api/projects/<PROJECT ID>/base:connectionDetails' \
-H 'accept: application/vnd.illumina.v3+json' \
-H 'X-API-Key: <YOUR KEY>' Response
{
"authenticator": "oauth",
"accessToken": "XXXXXXXXXX",
"dnsName": "use1sf01.us-east-1.snowflakecomputing.com",
"userPrincipalName": "xxxxx",
"databaseName": "xxxxx",
"schemaName": "xxx",
"warehouseName": "xxxxxx",
"roleName": "xxx"
}Template snowSQL:
snowsql -a use1sf01.us-east-1 -u <userPrincipalName> --authenticator=oauth -r <roleName> -d <databaseName> -s PUBLIC -w <warehouseName> --token="<accessToken>"Now you can perform a variety of tasks such as:
Querying Data: execute SQL queries against tables, views, and other database objects to retrieve data from the Snowflake data warehouse.
Creating and Managing Database Objects: create tables, views, stored procedures, functions, and other database objects in Snowflake. you can also modify and delete these objects as needed.
Loading Data: load data into Snowflake from various sources such as local files, AWS S3, Azure Blob Storage, or Google Cloud Storage.
Overall, snowSQL CLI provides a powerful and flexible interface to work with Snowflake, allowing external users to manage data warehouse and perform a variety of tasks efficiently and effectively without access to the ICA core.
Show all tables in the database:
>SHOW TABLES;Create a new table:
create TABLE demo1(sample_name VARCHAR, count INT);List records in a table:
SELECT * FROM demo1;Load data from a file: To load data from a file, you can start by create a staging area in the internal storage using the following commend:
>CREATE STAGE myStage;You can then upload the local file to the internal storage using the following command:
> PUT file:///path/to/data.tsv @myStage;You can check if the file was uploaded properly using LIST command:
> LIST @myStage;Finally, Load data by using COPY TO command. The command assumes the data.tsv is a tab delimited file. You can easily modify the following command to import JSON file setting TYPE=JSON.
> COPY INTO demo1(sample_name, count) FROM @mystage/data.tsv FILE_FORMAT = (TYPE = 'CSV' FIELD_DELIMITER = '\t');Load data from a string: If you have data as JSON string, you can import the data into the tables using following commands.
> SET myJSON_str = '{"sample_name": "from-json-str", "count": 1}';
> INSERT INTO demo1(sample_name, count)
> SELECT
PARSE_JSON($myJSON_str):sample_name::STRING,
PARSE_JSON($myJSON_str):count::INTLoad data into specific columns: If you want to load sample_name into the table, you can remove the "count" from the column and the value list as below:
> SET myJSON_str = '{"sample_name": "from-json-str", "count": 1}';
> INSERT INTO demo1(sample_name)
SELECT
PARSE_JSON($myJSON_str):sample_name::STRING;List the views of the database to which you are connected. As shared database and catalogue views are created within the project database, they will be listed. However, it does not show views which are granted via another database, role or from bundles.
>SHOW VIEW;Show grants, both directly on the tables and views and grants to roles which in turn have grants on tables and views.
>SHOW GRANTS;Workspaces can have their own dedicated cluster which consists of a number of nodes. First the workspace node, which is used for interacting with the cluster, is started. Once the workspace node is started, the workspace cluster can be started.
The cluster consists of 2 components
The manager node which orchestrates the workload across the members.
Anywhere between 0 and up to maximum 50 member nodes.
Static - A static cluster has a manager node and a static number of members. At start-up of the cluster, the system ensures the predefined number of members are added to the cluster. These nodes will keep running as long as the entire cluster runs. The system will not automatically remove or add nodes depending on the job load. This gives the fastest resource availability, but at additional cost as unused nodes stay active, waiting for work.
Dynamic - A dynamic cluster has a manager node and a dynamic number of workers up to a predefined maximum (with a hard limit of 50). Based on the job load the system will scale the number of members up or down. This saves resources as only as much worker nodes as needed to perform the work are being used.
You manage Bench Clusters via the Illumina Connected Analytics UI in Projects > your_project > Bench > Workspaces > your_workspace > Details.
The following settings can be defined for a bench cluster:
Web access
Enable or disable web access to the cluster manager.
Dedicated Cluster Manager
Use a dedicated node for the cluster manager. This means that an entire machine of the type defined at resource model is reserved for your cluster manager. If no dedicated cluster manager is selected, one core per cluster member will be reserved for scheduling. For example, if you have 2 nodes of standard-medium (4 cores) and no dedicated cluster manager, then only 6 (2x3) cores are available to run tasks as each node reserves 1 core for the cluster manager.
Type
Choose between cluster members
Scaling interval
For static, set the number of cluster member nodes (maximum 50), for dynamic, choose the minimum and maximum (up to 50) amount of cluster member nodes.
Resource model
The type of on which the cluster member(s) will run. For every cluster member, one of these machines is used as resource, so be aware of the possible cost impact when running many machines with a high individual .
Economy mode
Economy mode uses AWS . This halves many compute iCredit rates vs standard mode, but may be interrupted. See for a list of which resource models support economy pricing.
Include ephemeral storage
Select this to create scratch space for your nodes. Enabling it will make the storage size selector appear. The stored data in this space is deleted when the instance is terminated. When you deselect this option, the storage size is 0.
Storage size
How much storage space (1GB - 16 TB) should be reserved per node as dedicated scratch space, available at /scratch
Once the workspace is started, the cluster can be started at Projects > your_project > Bench > Workspaces > your_workspace > Details and the cluster can be stopped without stopping the workspace. Stopping the workspace will also stop all clusters in that workspace.
Data in a bench workspace can be divided into three groups:
Workspace data is accessible in read/write mode and can be accessed from all workspace components (workspace node, cluster manager node, cluster member nodes ) at /data. The size of the workspace data is defined at the creation of the workspace but can be increased when editing a workspace in the Illumina connected analytics UI. This is persistent storage and data remains when a workspace is shut down.
Project data can be accessed from all workspace components at /data/project. Every component will have their own dedicated mount to the project. Depending on the project data permissions you will be able to access it in either Read-Only or Read-Write mode.
Scratch data is available on the cluster members at /scratch and can be used to store intermediate results for a given job dedicated to that member. This is temporary storage, and all data is deleted when a cluster member is removed from the cluster.
All mounts occur in /data/mounts/, see data access and workspace-ctl data.
Managing these mounts is done via the workspace cli /data/.local/bin/workspace-ctl in the workspace. Every node will have his dedicated mount.
For fast data access, bench offers a mount solution to expose project data on every component in the workspace. This mount provides read-only access to a given location in the project data and is optimized for high read throughput per single file with concurrent access to files. It will try to utilise the full bandwidth capacity of the node.
All mounts occur in path /data/mounts/
workspace-ctl data get-mountsFor fast read-only access, link folders with the CLI command workspace-ctl data create-mount --mode read-only.
workspace-ctl data create-mount --mount-path /data/mounts/mydata --source /data/project/mydataworkspace-ctl data delete-mount --mount-path /data/mounts/mydataICA Cohorts comes front-loaded with a variety of publicly accessible data sets, covering multiple disease areas and also including healthy individuals.
1kGP-DRAGEN
3202 WGS: 2504 original samples plus 698 relateds
Presumed healthy
DDD
4293 (3664 affected), de novos only
Developmental disorders
EPI4K
356, de novos only
Epilepsy
ASD Cohorts
6786 (4266 affected), de novos only
Autism Spectrum disorder
; ; ; ; ;
De Ligt et al.
100, de novos only
Intellectual disability
Homsy et al.
1213, de novos only
Congenital heart disease (HP:0030680)
Lelieveld et al.
820, de novos only
Intellectual disability
Rauch et al.
51, de novos only
Intellectual disability
Rare Genomes Project
315 WES (112 pedigrees)
Various
https://raregenomes.org/
TCGA
ca. 4200 WES, ca. 4000 RNAseq
12 tumor types
https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga
GEO
RNAseq
Auto-immune disorders, incl. asthma, arthritis, SLE, MS, Crohn's disease, Psoriasis, Sjögren's Syndrome
For GEO/GSE study identifiers, please refer to the in-product list of studies
RNAseq
Kidney diseases
For GEO/GSE study identifiers, please refer to the in-product list of studies
RNAseq
Central nervous system diseases
For GEO/GSE study identifiers, please refer to the in-product list of studies
RNAseq
Parkinson's disease
For GEO/GSE study identifiers, please refer to the in-product list of studies
In order to create a Tool or Bench image, a Docker image is required to run the application in a containerized environment. Illumina Connected Analytics supports both public Docker images and private Docker images uploaded to ICA.
Use Docker images built for x86 architecture or multi-platform images that support x86. You can build Docker images that support both ARM and X86 structure.
Navigate to System Settings > Docker Repository.
Click Create > External image to add a new external image.
Add your full image URL in the Url field, e.g. docker.io/alpine:latest or registry.hub.docker.com/library/alpine:latest. Docker Name and Version will auto-populate. (Tip: do not add http:// or https:// in your URL)
Do not use :latest when the repository has rate limiting enabled as this interferes with caching and incurs additional data transfer.
(Optional) Complete the Description field.
Click Save.
The newly added image will appear in your Docker Repository list. You can differentiate between internal and external images by looking at the Source column. If this column is not visible, you can add it with the columns icon ().
In order to use private images in your tool, you must first upload them as a TAR file.
Navigate to Projects > your_project .
Upload your private image as a TAR file, either by dragging and dropping the file in the Data tab, using the CLI or a Connector. For more information please refer to the project Data.
Select your uploaded TAR file and click in the top menu on Manage > Change Format .
Select DOCKER from the drop-down menu and Save.
Navigate to System Settings > Docker Repository (outside of your project).
Click on Create > Image.
Click on the magnifying glass to find your uploaded TAR image file.
Select the appropriate region and if needed, filter on project from the drop-down menus to find your file.
Select that file.
Select the appropriate region, fill in the Docker Name, Version, cluster compatibility (only available for bench images) and whether it is a tool or a bench image and click Save.
The newly added image should appear in your Docker Repository list. Verify it is marked as Available under the Status column to ensure it is ready to be used in your tool or pipeline.
Navigate to System Settings > Docker Repository.
Either
Select the required image(s) and go to Manage > Add Region.
OR double-click on a required image, check the box matching the region you want to add, and select update.
In both cases, allow a few minutes for the image to become available in the new region (the status becomes available in table view).
To remove regions, go to Manage > Remove Region or unselect the regions from the Docker image detail view.
You can download your created Docker images at System Settings > Docker Images > your_Docker_image > Manage > Download.
In order to be able to download Docker images, the following requirements must be met:
The Docker image can not be from an entitled bundle.
Only self-created Docker images can be downloaded.
The Docker image must be an internal image and in status Available.
You can only select a single Docker image at a time for download.
You need a service connector with a download rule to download the Docker image.
Docker image size should be kept as small as practically possible. To this end, it is best practice to compress the image. After compressing and uploading the image, select your uploaded file and click Manage > Change Format in the top menu to change it to Docker format so ICA can recognize the file.
Let's create the with a JSON input form.
Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create > Nextflow > JSON based button to start creating a Nextflow pipeline.
In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.
First, we present the individual processes. Select Nextflow files > + Create and label the file split.nf. Copy and paste the following definition.
Next, select +Create and name the file sort.nf. Copy and paste the following definition.
Select +Create again and label the file merge.nf. Copy and paste the following definition.
Edit the main.nf file by navigating to the Nextflow files > main.nf tab and copying and pasting the following definition.
Here, the operators flatten and collect are used to transform the emitting channels. The Flatten operator transforms a channel in such a way that every item of type Collection or Array is flattened so that each single entry is emitted separately by the resulting channel. The collect operator collects all the items emitted by a channel to a List and return the resulting object as a sole emission.
On the Inputform files tab, edit the inputForm.json to allow selection of a file.
Click the Simulate button (at the bottom of the text editor) to preview the launch form fields.
The onSubmit.js and onRender.js can remain with their default scripts and are just shown here for reference.
Click the Save button to save the changes.
In this tutorial, we will demonstrate how to create and launch a DRAGEN pipeline using the CWL language.
In ICA, CWL pipelines are built using tools developed in CWL. For this tutorial, we will use the "DRAGEN Demo Tool" included with DRAGEN Demo Bundle 3.9.5.
1.) Start by selecting a project at the Projects inventory.
2.) In the details page, select Edit.
3.) In the edit mode of the details page, click the + button in the LINKED BUNDLES section.
4.) In the Add Bundle to Project window: Select the DRAGEN demo tool bundle from the list. Once you have selected the bundle, the Link Bundles button becomes available. Select it to continue.
Tip: You can select multiple bundles using
Ctrl + Left mouse buttonorShift + Left mouse button.
5.) In the project details page, the selected bundle will appear under the LINKED BUNDLES section. If you need to remove a bundle, click on the - button. Click Save to save the project with linked bundles.
1.) From the project details page, select Pipelines > CWL
2.) You will be given options to create pipelines using a graphical interface or code. For this tutorial, we will select Graphical.
3.) Once you have selected the Graphical option, you will see a page with multiple tabs. The first tab is the Information page where you enter pipeline information. You can find the details for different fields in the tab in the . The following three fields are required for the INFORMATION page.
Code: Provide pipeline name here.
Description: Provide pipeline description here.
Storage size: Select the storage size from the drop-down menu.
4.) The Documentation tab provides options for configuring the HTML description for the tool. The description appears in the tool repository but is excluded from exported CWL definitions.
5.) The Definition tab is used to define the pipeline. When using graphical mode for the pipeline definition, the Definition tab provides options for configuring the pipeline using a visualization panel (A) and a list of component menus (B). You can find details on each section in the component menu
6.) To build a pipeline, start by selecting Machine PROFILE from the component menu section on the right. All fields are required and are pre-filled with default values. Change them as needed.
The profile Name field will be updated based on the selected Resource. You can change it as needed.
Color assigns the selected color to the tool in the design view to easily identify the machine profile when more than one tool is used in the pipeline.
Tier lets you select Standard or Economy tier for AWS instances. Standard is on-demand ec2 instance and Economy is spot ec2 instance. You can find the difference between the two AWS instances . You can find the price difference between the two Tiers .
Resource lets you choose from various compute resources available. In this case, we are building a DRAGEN pipeline and we will need to select a resource with FPGA in it. Choose from FPGA resources (FPGA Medium/Large) based on your needs.
7.) Once you have selected the Machine Profile for the tool, find your tool from the Tool Repository at the bottom section of the component menu on the right. In this case, we are using the DRAGEN Demo Tool. Drag and drop the tool from the Tool Repository section to the visualization panel.
8.) The dropped tool will show the machine profile color, number of outputs and inputs, and warning to indicate missing parameters, mandatory values, and connections. Selecting the tool in the visualization panel activates the tool (DRAGEN Demo Tool) component menu. On the component menu section, you will find the details of the tool under Tool - DRAGEN Demo Tool. This section lists the inputs, outputs, additional parameters, and the machine profile required for the tool. In this case, the DRAGEN Demo Tool requires three inputs (FASTQ read 1, FASTQ read 2, and a Reference genome). The tool has two outputs (a VCF file and an output folder). The tool also has a mandatory parameter (Output File Prefix). Enter the value for the input parameter (Output File Prefix) in the text box.
9.) The top right corner of the visualization panel has icons to zoom in and out in the visualization panel followed by three icons: ref, in, and out. Based on the type of input/output needed, drag and drop the icons into the visualization area. In this case, we need three inputs (read 1, read 2, and Reference hash table.) and two outputs (VCF file and output folder). Start by dragging and dropping the first input (a). Connect the input to the tool by clicking on the blue dot at the bottom of the input icon and dragging it to the blue dot representing the first input on the tool (b). Select the input icon to activate the input component menu. The input section for the first input lets you enter the Name, Format, and other relevant information based on tool requirements. In this case, for the first input, enter the following information:
Name: FASTQ read 1
Format: FASTQ
Comments: any optional comments
10.) Repeat the step for other inputs. Note that the Reference hash table is treated as the input for the tool rather than Reference files. So, use the input icon instead of the reference icon.
11.) Repeat the process for two outputs by dragging and connecting them to the tool. Note that when connecting output to the tool, you will need to click on the blue dot at the bottom of the tool and drag it to the output.
12.) Select the tool and enter additional parameters. In this case, the tool requires Output File Prefix. Enter demo_ in the text box.
13.) Click on the Save button to save the pipeline. Once saved, you can run it from the Pipelines page under Flow from the left menus as any other pipeline.
In this tutorial, we will be using the example RNASeq pipeline to demonstrate the process of lifting a simple Nextflow pipeline over to ICA.
This approach is applicable in situations where your main.nf file contains all your pipeline logic and illustrates what the liftover process would look like.
Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click the +Create > Nextflow > XML based button to start creating a Nextflow pipeline.
In the Details tab, add values for the required Code (unique pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.
Copy and paste the into the Nextflow files > main.nf tab. The following comparison highlights the differences between the original file and the version for deployment in ICA. The main difference is the explicit specification of containers and pods within processes. Additionally, some channels' specification are modified, and a debugging message is added. When copying and pasting, be sure to remove the text highlighted in red (marked with -) and add the text highlighted in green (marked with +).
In the XML configuration, the input files and settings are specified. For this particular pipeline, you need to specify the transcriptome and the reads folder. Navigate to the XML Configuration tab and paste the following:
Click the Generate button (at the bottom of the text editor) to preview the launch form fields.
Click the Save button to save the changes.
Go to the Pipelines page from the left navigation pane. Select the pipeline you just created and click Start New Analysis.
Fill in the required fields indicated by red "*" sign and click on Start Analysis button. You can monitor the run from the Analyses page. Once the Status changes to Succeeded, you can click on the run to access the results page.
Developing on the cloud incurs inherent runtime costs due to compute and storage used to execute workflows. Here are a few tips that can facilitate development.
Leverage the cross-platform nature of these workflow languages. Both CWL and Nextflow can be run locally in addition to on ICA. When possible, testing should be performed locally before attempting to run in the cloud. For Nextflow, can be utilized to specify settings to be used either locally or on ICA. An example of advanced usage of a config would be applying the to a set of process names (or labels) so that they use the higher performance local scratch storage attached to an instance instead of the shared network disk,
When trying to test on the cloud, it's oftentimes beneficial to create scripts to automate the deployment and launching / monitoring process. This can be performed either using the or by creating your own scripts integrating with the REST API.
For scenarios in which instances are terminated prematurely (for example, while using spot instances) without warning, you can implement scripts like the following to retry the job a certain number of times. Adding the following script to 'nextflow.config' enables five retries for each job, with increasing delays between each try.
Note: Adding the retry script where it is not needed might introduce additional delays.
When hardening a Nextflow to handle resource shortages (for example exit code 2147483647), an immediate retry will in most circumstances fail because the resources have not yet been made available. It is best practice to use which has an increasing backoff delay, allowing the system time to provide the necessary resources.
When publishing your Nextflow pipeline, make sure your have defined a container such as 'public.ecr.aws/lts/ubuntu:22.04' and are not using the default container 'ubuntu:latest'.
To limit potential costs, there is a timeout of 96 hours: if the analysis does not complete within four days, it will go to a 'Failed' state. This time begins to count as soon as the input data is being downloaded. This takes place during the ICA 'Requested' step of the analysis, before going to 'In Progress'. In case parallel tasks are executed, running time is counted once. As an example, let's assume the initial period before being picked up for execution is 10 minutes and consists of the request, queueing and initializing. Then, the data download takes 20 minutes. Next, a task runs on a single node for 25 minutes, followed by 10 minutes of queue time. Finally, three tasks execute simultaneously, each of them taking 25, 28, and 30 minutes, respectively. Upon completion, this is followed by uploading the outputs for one minute. The overall analysis time is then 20 + 25 + 10 + 30 (as the longest task out of three) + 1 = 86 minutes:
If there are no available resources or your project priority is low, the time before download commences will be substantially longer.
By default, Nextflow will not generate the trace report. If you want to enable generating the report, add the section below to your userNextflow.config file.
Useful Links
process split {
cpus 1
memory '512 MB'
input:
path x
output:
path("split.*.tsv")
"""
split -a10 -d -l3 --numeric-suffixes=1 --additional-suffix .tsv ${x} split.
"""
}process sort {
cpus 1
memory '512 MB'
input:
path x
output:
path '*.sorted.tsv'
"""
sort -gk1,1 $x > ${x.baseName}.sorted.tsv
"""
}process merge {
cpus 1
memory '512 MB'
publishDir 'out', mode: 'move'
input:
path x
output:
path 'merged.tsv'
"""
cat $x > merged.tsv
"""
}nextflow.enable.dsl=2
include { sort } from './sort.nf'
include { split } from './split.nf'
include { merge } from './merge.nf'
params.myinput = "test.test"
workflow {
input_ch = Channel.fromPath(params.myinput)
split(input_ch)
sort(split.out.flatten())
merge(sort.out.collect())
}{
"fields": [
{
"id": "myinput",
"label": "myinput",
"type": "data",
"dataFilter": {
"dataType": "file",
"dataFormat": ["TSV"]
},
"maxValues": 1,
"minValues": 1
}
]
}function onSubmit(input) {
var validationErrors = [];
return {
'settings': input.settings,
'validationErrors': validationErrors
};
}function onRender(input) {
var validationErrors = [];
var validationWarnings = [];
if (input.currentAnalysisSettings === null) {
//null first time, to use it in the remainder of he javascript
input.currentAnalysisSettings = input.analysisSettings;
}
switch(input.context) {
case 'Initial': {
renderInitial(input, validationErrors, validationWarnings);
break;
}
case 'FieldChanged': {
renderFieldChanged(input, validationErrors, validationWarnings);
break;
}
case 'Edited': {
renderEdited(input, validationErrors, validationWarnings);
break;
}
default:
return {};
}
return {
'analysisSettings': input.currentAnalysisSettings,
'settingValues': input.settingValues,
'validationErrors': validationErrors,
'validationWarnings': validationWarnings
};
}
function renderInitial(input, validationErrors, validationWarnings) {
}
function renderEdited(input, validationErrors, validationWarnings) {
}
function renderFieldChanged(input, validationErrors, validationWarnings) {
}
function findField(input, fieldId){
var fields = input.currentAnalysisSettings['fields'];
for (var i = 0; i < fields.length; i++){
if (fields[i].id === fieldId) {
return fields[i];
}
}
return null;
}withName: 'process1|process2|process3' { scratch = '/scratch/' }
withName: 'process3' { stageInMode = 'copy' } // Copy the input files to scratch instead of symlinking to shared network diskprocess {
maxRetries = 4
errorStrategy = { sleep(task.attempt * 60000 as long); return'retry'} // Retry with increasing delay
}Analysis task
request
queued
initializing
input download
single task
queue
parallel tasks
generating outputs
completed
96 hour limit
1m (not counted)
7m (not counted)
2m (not counted)
20m
25m
10m
30m
1m
-
Status in ICA
status requested
status queued
status initializing
status preparing inputs
status in progress
status in progress
status in progress
status generating outputs
status succeeded
trace.enabled = true
trace.file = '.ica/user/trace-report.txt'
trace.fields = 'task_id,hash,native_id,process,tag,name,status,exit,module,container,cpus,time,disk,memory,attempt,submit,start,complete,duration,realtime,queue,%cpu,%mem,rss,vmem,peak_rss,peak_vmem,rchar,wchar,syscr,syscw,read_bytes,write_bytes,vol_ctxt,inv_ctxt,env,workdir,script,scratch,error_action'













#!/usr/bin/env nextflow
+nextflow.enable.dsl=2
/*
* The following pipeline parameters specify the reference genomes
* and read pairs and can be provided as command line options
*/
-params.reads = "$baseDir/data/ggal/ggal_gut_{1,2}.fq"
-params.transcriptome = "$baseDir/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa"
params.outdir = "results"
+println("All input parameters: ${params}")
workflow {
- read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true )
+ read_pairs_ch = channel.fromFilePairs("${params.reads}/*_{1,2}.fq")
- INDEX(params.transcriptome)
+ INDEX(Channel.fromPath(params.transcriptome))
FASTQC(read_pairs_ch)
QUANT(INDEX.out, read_pairs_ch)
}
process INDEX {
- tag "$transcriptome.simpleName"
+ container 'quay.io/nextflow/rnaseq-nf:v1.1'
+ pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-medium'
input:
path transcriptome
output:
path 'index'
script:
"""
salmon index --threads $task.cpus -t $transcriptome -i index
"""
}
process FASTQC {
+ container 'quay.io/nextflow/rnaseq-nf:v1.1'
+ pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-medium'
tag "FASTQC on $sample_id"
publishDir params.outdir
input:
tuple val(sample_id), path(reads)
output:
path "fastqc_${sample_id}_logs"
script:
- """
- fastqc.sh "$sample_id" "$reads"
- """
+ """
+ # we need to explicitly specify the output directory for fastqc tool
+ # we are creating one using sample_id variable
+ mkdir fastqc_${sample_id}_logs
+ fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads}
+ """
}
process QUANT {
+ container 'quay.io/nextflow/rnaseq-nf:v1.1'
+ pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-medium'
tag "$pair_id"
publishDir params.outdir
input:
path index
tuple val(pair_id), path(reads)
output:
path pair_id
script:
"""
salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
"""
}<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
<pd:dataInputs>
<pd:dataInput code="reads" format="UNKNOWN" type="DIRECTORY" required="true" multiValue="false">
<pd:label>Folder with FASTQ files</pd:label>
<pd:description></pd:description>
</pd:dataInput>
<pd:dataInput code="transcriptome" format="FASTA" type="FILE" required="true" multiValue="false">
<pd:label>FASTA</pd:label>
<pd:description>FASTA file</pd:description>
</pd:dataInput>
</pd:dataInputs>
<pd:steps/>
</pd:pipeline>






ICA Cohorts lets you create a research cohort of subjects and associated samples based on the following criteria:
Project:
Include subjects that are part of any ICA Project that you own or that is shared with you.
Sample:
Sample type such as FFPE.
Tissue type.
Sequencing technology: Whole genome DNA-sequencing, RNAseq, single-cell RNAseq, etc.
Subject:
Subject inclusion by Identifier:
Input a list of Subject Identifiers (up to 100 entries) when defining a cohort.
The Subject Identifier filter is combined using AND logic with any other applied filters.
Within the list of subject identifiers, OR logic is applied (i.e., a subject matches if it is in the provided list).
Demographics such as age, sex, ancestry.
Biometrics such as body height, body mass index.
Family and patient medical history.
Sample:
Sample type such as FFPE.
Tissue type.
Sequencing technology: Whole genome DNA-sequencing, RNAseq, single-cell RNAseq, etc.
Disease:
Phenotypes and diseases from standardized ontologies.
Drug:
Drugs from standardized ontologies along with specific typing, stop reasons, drug administration routes, and time points.
Molecular attributes:
Samples with a somatic mutation in one or multiple, specified genes.
Samples with a germline variant of a specific type in one or multiple, specified genes.
Samples over- or under-expressed in one or multiple, specified genes.
Samples with a copy number gain or loss involving one or multiple, specified genes.
ICA Cohorts currently uses six standard medical ontologies to 1) annotate each subject during ingestion and then to 2) search for subjects: HPO for phenotypes, MeSH, SNOMED-CT, ICD9-CM, ICD10-CM, and OMIM for diseases. By default, any 'type-ahead' search will find matches from all six; and you can limit the search to only the one(s) you prefer. When searching for subjects using names or codes from one of these ontologies, ICA Cohorts will automatically match your query against all the other ontologies, therefore returning subjects that have been ingested using a corresponding entry from another ontology.
In the 'Disease' tab, you can search for subjects diagnosed with one or multiple diseases, as well as phenotypes, in two ways:
Start typing the English name of a disease/phenotype and pick from the suggested matches. Continue typing if your disease/phenotype of interest is not listed initially.
Use the mouse to select the term or navigate to the term in the dropdown using the arrow buttons.
If applicable, the concept hierarchy is shown, with ancestors and immediate children visible.
For diagnostic hierarchies, concept children count and descendant count for each disease name is displayed.
Descendant Count: Displays next to each disease name in the tree hierarchy (e.g., "Disease (10)").
Leaf Nodes: No children count shown for leaf nodes.
Missing Counts: Children count is hidden if unavailable.
Show Term Count: A new checkbox below "Age of Onset" that is always checked. Unchecking it hides the descendant count.
Select a checkbox to include the diagnostic term along with all of its children and decedents.
Expand the categories and select or deselect specific disease concepts.
Paste one or multiple diagnostic codes separated by a pipe (‘|’).
In the 'Drug' tab, you can search for subjects who have a specific medication record:
Start typing the concept name for the drug and pick from suggested matches. Continue typing if the drug is not listed initially.
Paste one or multiple drug concept codes. ICA Cohorts currently use RXNorm as a standard ontology during ingestion. If multiple concepts are in your instance of ICA Cohorts, they will be listed under 'Concept Ontology.'
'Drug Type' is a static list of qualifiers that denote the specific administration of the drug. For example, where the drug was dispensed.
'Stop Reason' is a static list of attributes describing a reason why a drug was stopped if available in the data ingestion.
'Drug Route' is a static list of attributes that describe the physical route of administration of the drug. For example, Intravenous Route (IV).
In the ‘Measurements’ tab, you can search for vital signs and laboratory test data leveraging LOINC concept codes. ·
Start typing the English name of the LOINC term, for example, ‘Body height’. A dropdown will appear with matching terms. Use the mouse or down arrows to select the term.
Upon selecting a term, the term will be available for use in a query.
Terms can be added to your query criteria.
For each term, you can set a value `Greater than or equal`, `Equals`, `Less than or equal`, `In range`, or `Any value`.
`Any value` will find any record where there is an entry for the measurement independent of an available value.
Click `Apply` to add your criteria to the query.
Click `Update Now` to update the running count of the Cohort.Include/Exclude
As attributes are added to the 'Selected Condition' on the right-navigation panel, you can choose to include or exclude the criteria selected.
Select a criterion from 'Subject', 'Disease', and/or 'Molecular' attributes by filling in the appropriate checkbox on the respective attribute selection pages.
When selected, the attribute will appear in the right-navigation panel.
You can use the 'Include' / 'Exclude' dropdown next to the selected attribute to decide if you want to include or exclude subjects and samples matching the attribute.
Note: the semantics of 'Include' work in such a way that a subject needs to match only one or multiple of the 'included' attributes in any given category to be included in the cohort. (Category refers to disease, sex, body height, etc.) For example, if you specify multiple diseases as inclusion criteria, subjects will only need to be diagnosed with one of them. Using 'Exclude', you can exclude any subject who matches one or multiple exclusion criteria; subjects do not have to match all exclusion criteria in the same category to be excluded from the cohort.
Note: This feature is not available on the 'Project' level selections as there is no overlap between subjects in datasets.
Note: Using exclusion criteria does not account for NULL values. For example, if the Super-population 'Europeans' is excluded, subjects will be in your cohort even if they do not contain this data point.
Once you selected Create Cohort, the above data are organized in tabs such as Project, Subject, Disease, and Molecular. Each tab then contains the aforementioned sections, among others, to help you identify cases and/or controls for further analysis. Navigate through these tabs, or search for an attribute by name to directly jump to that tab and section, and select attributes and values that are relevant to describe your subjects and samples of interest. Assign a new name to the cohort you created, and click Apply to save the cohort.
After creating a Cohort, select the Duplicate icon.
A copy of the Cohort definition will be created and tagged with "_copy".
Deleting a Cohort Definition can be accomplished by clicking the Delete Cohort icon.
This action cannot be undone.
After creating a Cohort, users can set a Cohort bookmark as Shared. By sharing a Cohort, the Cohort will be available to be applied across the project by other users with access to the Project. Cohorts created in a Project are only accessible at scope of the user. Other users in the project cannot see the cohort created unless they use this sharing functionality.
Create a Cohort using the directions above.
To make the Cohort available to other users in your Project, click the Share icon.
The Share icon will be filled in black and the Shared Status will be turned from Private to Shared.
Other users with access to Cohorts in the Project can now apply the Cohort bookmark to their data in the project.
To unshare the Cohort, click the Share icon.
The icon will turn from black to white, and other users within the project will no longer have access to this cohort definition.
A Shared Cohort can be Archived.
Select a Shared Cohort with a black Shared Cohort icon.
Click the Archive Cohort icon.
You will be asked to confirm this selection.
Upon archiving the Cohort definition, the Cohort will no longer be seen by other users in the Project.
The archived Cohort definition can be unarchived by clicking the Unarchive Cohort icon.
When the Cohort definition is unarchived, it will be visible to all users in the Project.
You can link cohorts data sets to a bundle as follows:
Create or edit a bundle at Bundles from the main navigation.
Navigate to Bundles > your_bundle > Cohorts > Data Sets.
Select Link Data Set to Bundle.
Select the data set which you want to link and +Select.
After a brief time, the cohorts data set will be linked to your bundle and ICA_BASE_100 will be logged.
If you can not find the cohorts data sets which you want to link, verify if
Your data set is part of a project (Projects > your_project > Cohorts > Data Sets)
This project is set to Data Sharing (Projects > your_project > Project Settings > Details)
You can unlink cohorts data sets from bundles as follows:
Edit the desired bundle at Bundles from the main navigation.
Navigate to Bundles > your_bundle > Cohorts > Data Sets.
Select the cohorts data set which you want to unlink.
Select Unlink Data Set from Bundle.
After a brief time, the cohorts data set will be unlinked from your bundle and ICA_BASE_101 will be logged.
ICA supports running pipelines defined using Nextflow. See this tutorial for an example.
In order to run Nextflow pipelines, the following process-level attributes within the Nextflow definition must be considered.
Nextflow version
20.10.0 (deprecated ⚠️), 22.04.3 (supported ✅), 24.10.2 (default ⭐)
Executor
Kubernetes
The following table shows when which Nextflow version is
default (⭐) This version will be proposed when creating a new Nextflow pipeline.
supported (✅) This version can be selected when you do not want the default Nextflow version.
deprecated (⚠️) This version can not be selected for new pipelines, but pipelines using this version will still work.
removed (❌). This version can not be selected when creating new pipelines and pipelines using this version will no longer work.
The switchover always happens in the January release of that year.
v20.10.0
⚠️
❌
❌
❌
v22.04.3
✅
⚠️
❌
❌
v24.10.2
⭐
⭐
✅
✅
v25.10.x
✅
⭐
✅
v26.10.x
✅
⭐
v27.10.x
✅
You can select the Nextflow version while building a pipeline as follows:
GUI
Select the Nextflow version at Projects > your_project > flow > pipelines > your_pipeline > Details tab.
API
Select the Nextflow version by setting it in the optional field "pipelineLanguageVersionId".
When not set, a default Nextflow version will be used for the pipeline.
For each compute type, you can choose between the scheduler.illumina.com/lifecycle: standard (default - AWS on-demand) or scheduler.illumina.com/lifecycle: economy (AWS spot instance) tiers.
To specify a compute type for a Nextflow process, use the pod directive within each process. Set the annotation to scheduler.illumina.com/presetSize and the value to the desired compute type. A list of available compute types can be found here. The default compute type, when this directive is not specified, is standard-small (2 CPUs and 8 GB of memory).
pod annotation: 'scheduler.illumina.com/presetSize', value: 'fpga2-medium'Inputs are specified via the XML input form or JSON-based input form. The specified code in the XML will correspond to the field in the params object that is available in the workflow. Refer to the tutorial for an example.
Outputs for Nextflow pipelines are uploaded from the out folder in the attached shared filesystem. The publishDir directive can be used to symlink (recommended), copy or move data to the correct folder. Data will be uploaded to the ICA project after the pipeline execution completes.
publishDir 'out', mode: 'symlink'During execution, the Nextflow pipeline runner determines the environment settings based on values passed via the command-line or via a configuration file (see Nextflow Configuration documentation). When creating a Nextflow pipeline, use the nextflow.config tab in the UI (or API) to specify a nextflow configuration file to be used when launching the pipeline.
Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list.
The following configuration settings will be ignored if provided as they are overridden by the system:
executor.name
executor.queueSize
k8s.namespace
k8s.serviceAccount
k8s.launchDir
k8s.projectDir
k8s.workDir
k8s.storageClaimName
k8s.storageMountPath
trace.enabled
trace.file
trace.fields
timeline.enabled
timeline.file
report.enabled
report.file
dag.enabled
dag.fileSetting a timeout to between 2 and 4 times the expected processing time with the time directive for processes or task will ensure that no stuck processes remain indefinitely. Stuck process keep incurring costs for the occupied resources, so if the process can not complete within that timespan, it is safer and more economical to end the process and retry.
When you want to use a sample sheet with references to files as Nextflow input, add an extra input to the pipeline. This extra input lets the user select the samplesheet-mentioned files from their project. At run time, those files will get staged in the working directory, and when Nextflow parses the samplesheet and looks for those files without paths, it will find them there. You can not use file paths in a sample sheet without selecting the files in the input form because files are only passed as file/folder ids in the API payload when the analysis is launched.
You can include public data such as http urls because Nextflow is able download those. Nextflow is also able to download publicly accessible S3 urls (s3://...). You can not use Illumina's urn:ilmn:ica:region:... structure.
Illumina Connected Analytics allows you to create and assign metadata to capture additional information about samples.
Every tenant has a root metadata model that is accessible to all projects of that tenant. This allows an organization to collect the same piece of information, such as an ID number, for every sample in every project. Within this root model, you can configure multiple metadata submodels, even at different levels. These submodels inherit all fields and groups from their parent models.
Illumina recommends that you limit the amount of fields or field groups you add to the root model. Fields can have various types containing single or multiple values and field groups contain fields that belong together, such as all fields related to quality metrics. If there are any misconfigured items in the root model, it will carry over into all other tenant metadata models. Once a root model is published, the fields and groups that are defined within it cannot be deleted, only more fields can be added.
Do not use dots (.) in the metadata model names, fieldgroup names or field names as this can cause issues with field data.
When configuring a project, you can assign a published metadata model for all samples in the project. This metadata model can be any published metadata model in your tenant such as the root model, or one of the lower level submodels. When a metadata model is selected for a project, all fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.
Metadata gives information about a sample and can be provided by the user, the pipeline and the API. There are 2 general categories of metadata models: Project Metadata models and Pipeline Metadata models . Both models contain metadata fields and groups.
The project metadata model is specific per tenant. A Project metadata model has metadata linked to a specific project. Values are known upfront, general information is required for each sample of a specific project, and it may include general mandatory company information.
The pipeline metadata model is linked to a pipeline, not to a project and can be shared across tenants. Values are populated during pipeline execution and it requires an output file with the name 'metadata.response.json'.
Each sample can have multiple metadata models. When you link a project metadata model to your project, you will see its groups and fields present on each sample. The root model from that tenant will be present as every metadata model inherits the groups and fields specified in the parent metadata model(s). When a pipeline is executed with single sample and the pipeline containing a metadata model, the groups and fields will be present as well for each analysis resulting from a pipeline execution.
In the main navigation, go to System Settings > Metadata Models. Here you will see the root metadata model and any underlying sub-metadata models. To create a new submodel, select +Create at the bottom of the screen.
The new metadata model screen will show an overview of all the higher-level metadata models. use the down arrow next to the model name to expand these for more information.
For your new metadata model, add a unique name and optional description. Once this is done, start adding the metadata fields with the +Add button. The field type will determine the parameters which you can configure.
To edit your metadata model later on, select it and choose Manage > Edit. Keep in mind that fields can be added, but not removed once the model is published.
Text
Free text
Keyword
Automatically complete value based on already used values
Numeric
Only numbers
Boolean
True or false, cannot be multiple value
Date
e.g. 23/02/2022
Date time
e.g. 23/02/2022 11:43:53, saved in UTC
Enumeration
select value from list. Enter the values in the options field which appears when you have selected enumeration type.
Field Group
Groups fields. Once you have chosen this, the +Add group field becomes available to add fields to this group.
The following properties can be selected for groups & fields:
Required
Pipeline can not be started with this sample until the required group/field is filled in.
Sensitive
Values of this group/field are only visible to project users of the own tenant. When a sample is shared across tenants, these fields will not be visible.
Multi value
This group/field can consist of multiple (grouped) values
Filled by pipeline
Fields that need to be filled by pipeline should be part of the same group. This group will automatically be multiple value and values will be available after pipeline execution. This property is only available for the Field Group type.
If you have fields that are filled by the pipeline you can create an example JSON structure indicating what the json in an analysis output file with name metadata.response.json should look like to fill in the metadata fields of this model. Use System Settings > Metadata Models > your_metadata_model > Manage > Generate example JSON. Only fields in groups marked as Filled by pipeline are included.
Fields cannot be both required and filled by pipeline at the same time.
Newly created and updated metadata models are not available for use within the tenant until the metadata model is published. Once a metadata model is published, fields and field groups cannot be deleted, but the names and descriptions for fields and field groups can be edited. A model can be published after verifying all parent models are published first. To publish your model, select System Settings > Metadata Models > your_metadata_model > Manage > Publish.
If a published metadata model is no longer needed, you can retire the model (except the root model). Once a model is retired, it can be published again in case you would need to reactivate it.
First, check if the model contains any submodels. A model cannot be retired if it contains any published submodels.
When you are certain you want to retire a model and all submodels are retired, select System Settings > Metadata Models > your_metadata_model > Manage > Retire Metadata Model.
To add metadata to your samples, you first need to assign a metadata model to your project.
Go to Projects > your_project > Project Settings > Details.
Select Edit.
From the Metadata Model drop-down list, select the metadata model you want to use for the project.
Select Save. All fields configured for the metadata model, and all fields in any parent models are applied to the samples in the project.
If you have a metadata model assigned to your project, you can manually fill out the defined metadata of the samples in your project:
Go to Projects > your_project > Samples > your_sample.
Click your sample to open the sample details and choose Edit Sample.
Enter all metadata information as it applies to the selected sample. All required metadata fields must be populated or the pipeline will not be able to start.
Select Save
To fill metadata by pipeline executions, a pipeline model must be created.
In the main navigation, go to Projects > your_project > Flow > Pipelines > your_pipeline.
Click on your pipeline to open the pipeline details and choose Edit.
Create/Edit your model under Metadata Model tab. Field groups should be used when configuring metadata fields that are filled by a pipeline. These fields should be part of the same field group and be configured with the Multiple Value setting enabled.
In order for your pipeline to fill the metadata model, an output file with the name metadata.response.json must be generated. After adding your group fields to the pipeline model, click on Generate example JSON to view the required format for your pipeline.
Use System Settings > Metadata Models > your_metadata_model > Manage > Generate example JSON to see an example JSON for these fields.
The field names cannot have . in them, e.g. for the metric name Q30 bases (excl. dup & clipped bases) the . after excl must be removed.
Populating metadata models of samples allows having a sample-centric view of all the metadata. It is also possible to synchronize that data into your project's Base warehouse.
In ICA, select Projects > your_project >Base > Schedule.
Select +Create > From metadata.
Type a name for your schedule, optionally add a description, and set it to active. You can select if sensitive metadata fields should be included as values of sensitive metadata fields will not be visible to other users outside of the project.
Select Save.
Navigate to Base > Tables in your project.
Two new table schemas should be added with your current metadata models.
This tutorial shows you how to
monitor the execution
Start Bench workspace
For this tutorial, the instance size depends on the flow you import, and whether you use a Bench cluster:
If using a cluster, choose standard-small or standard-medium for the workspace master node
Otherwise, choose at least standard-large as nf-core pipelines often need more than 4 cores to run.
Select the single user workspace permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines
Specify at least 100GB of disk space
Optional: After choosing the image, enable a cluster with at least this one standard-largeinstance type
Start the workspace, then (if applicable) start the cluster
mkdir demo
cd demo
pipeline-dev import-from-nextflow nf-core/demoIf conda and/or nextflow are not installed, pipeline-dev will offer to install them.
The Nextflow files are pulled into the nextflow-src subfolder.
/data/demo $ pipeline-dev import-from-nextflow nf-core/demo
Creating output folder nf-core/demo
Fetching project nf-core/demo
Fetching project info
project name: nf-core/demo
repository : https://github.com/nf-core/demo
local path : /data/.nextflow/assets/nf-core/demo
main script : main.nf
description : An nf-core demo pipeline
author : Christopher Hakkaart
Pipeline “nf-core/demo” successfully imported into nf-core/demo.
Suggested actions:
cd nf-core/demo
pipeline-dev run-in-bench
[ Iterative dev: Make code changes + re-validate with previous command ]
pipeline-dev deploy-as-flow-pipeline
pipeline-dev launch-validation-in-flowAll nf-core pipelines conveniently define a "test" profile that specifies a set of validation inputs for the pipeline.
The following command runs this test profile. If a Bench cluster is active, it runs on your Bench cluster, otherwise it runs on the main workspace instance.
cd nf-core/demo
pipeline-dev run-in-benchWhen a pipeline is running locally (i.e. not on a Bench cluster), you can monitor the task execution from another terminal with docker ps
When a pipeline is running on your Bench cluster, a few commands help to monitor the tasks and cluster. In another terminal, you can use:
qstat to see the tasks being pending or running
tail /data/logs/sge-scaler.log.<latest available workspace reboot time> to check if the cluster is scaling up or down (it currently takes 3 to 5 minutes to get a new node)
The output of the pipeline is in the outdir folder
Nextflow work files are under the work folder
Log files are .nextflow.log* and output.log
pipeline-dev deploy-as-flow-pipelineAfter generating a few ICA-specific files (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name, so that's what is checked here). It then asks if you want to update the latest version or create a new one.
Choose "3" and enter a name of your choice to avoid conflicts with other users following this same tutorial.
Choice: 3
Creating ICA Flow pipeline dev-nf-core-demo_v4
Sending inputForm.json
Sending onRender.js
Sending main.nf
Sending nextflow.configAt the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.
pipeline-dev launch-validation-in-flowThis launches an analysis in ICA Flow, using the same inputs as the nf-core pipeline's "test" profile.
Some of the input files will have been copied to your ICA project to allow the launch to take place. They are stored in the folder bench-pipeline-dev/temp-data.
When looking at the main ICA navigation, you will see the following structure:
Projects are your primary work locations which contain your data and tools to execute your analyses. Projects can be considered as a binder for your work and information. You can have data contained within a project, or you can choose to make it shareable between projects.
Reference Data are reference genome sets which you use to help look for deviations and to compare your data against.
Bundles are packages of assets such as sample data, pipelines, tools and templates which you can use as a curated data set. Bundles can be provided both by Illumina and other providers, and you can even create your own bundles. You will find the Illumina-provided pipelines in bundles.
Audit/Event Logs are used for audit purposes and issue resolving.
System Settings contain general information susch as the location of storage space, docker images and tool repositories.
Projects are the main dividers in ICA. They provide an access-controlled boundary for organizing and sharing resources created in the platform. The Projects view is used to manage projects within the current tenant.
To create a new project, click the Projects > + Create button.
On the project creation screen, add information to create a project. See Project Details page for information about each field.
Required fields include:
Name
1-255 characters
Must begin with a letter
Characters are limited to alphanumerics, hyphens, underscores, and spaces
Project Owner Owner (and usually contact person) of the project. The project owner has the same rights as a project administrator, but can not be removed from a project without first assigning another project owner. This can be done by the current project owner, the tenant administrator or a project administrator of the current project. Reassignment is done at Projects > your_project > Project Settings > Team > Edit.
Region Select your project location. Options available are based on Entitlement(s) associated with purchased subscription.
Analysis Priority (Low/Medium(default)/High) This is balanced per tenant with high priority analyses started first and the system progressing to the next lower priority once all higher priority analyses are running. Balance your priorities so that lower priority projects do not remain waiting for resources indefinitely.
Billing Mode Select if the costs of this project are to be charged to the tenant of the Project owner or the tenant of the user who is using the project.
Data Sharing Enable this if you want to allow the data from this project to be linked, moved or copied and used in other projects of your tenant. Disabling this is a convenient way to prevent your data from showing up in the list of available data to be linked, moved or copied in other projects. Even though this prevents copying and linking files and folders, it does not protect against someone downloading the files or copying the contents of your files from the viewer.
Storage Bundle This is auto-selected and appears when you select the Project Region.
Click the Save button to finish creating the project. The project will be visible from the Projects view.
Refer to the Storage Configuration documentation for details on creating a storage configuration.
During project creation, select the I want to manage my own storage checkbox to use a Storage Configuration as the data provider for the project.
With a storage configuration set, a project will have a 2-way sync with the external cloud storage provider: any data added directly to the external storage will be sync'ed into the ICA project data, and any data added to the project will be sync'ed into the external cloud storage.
Several tools are available to assist you with keeping an overview of your projects. These filters work in both list and tile view and persist across sessions.
Searching is a case-insensitive wildcard filter. Any project which contains the characters will be shown. Use * as wildcard in searches. Be aware that operators without search words are blocked and will result in Unexpected error occurred when searching for projects. You can use the brackets, AND, OR and NOT operators, provided that you do not start the search with them (Monkey AND Banana is allowed, AND Aardvark by itself is invalid syntax)
Filter by Workgroup : Projects in ICA can be accessible for different workgroups. This drop-down list allows you to filter projects for specific workgroups. To reset the filter so it displays projects from all your workgroups, use the x on the right which appears when a workgroup is selected.
Hidden projects : You can hide projects (Projects > your_project > Details > Hide) which you no longer use. Hiding will delete data in base and bench and will thus be irreversible.
You can still see hidden projects if you select this option and delete the data they contain at Projects > your_project > Data to save on storage costs.
If you are using your own S3 bucket, your S3 storage will be unlinked from the project, but the data will remain in your S3 storage. Your S3 storage can then be used for other projects.
Hiding projects is not possible for externally-managed projects.
Favorites : By clicking on the star next to the project name in the tile view, you set a project as favourite. You can have multiple favourites and use the Favourites checkbox to only show those favourites. This prevents having too many projects visible.
Tile view shows a grid of projects. This view is best suited if you only have a few projects or have filtered them out by creating favourites. A single click will open the project.
List view shows a list of projects. This view allows you to add additional filters on name, description, location, user role, tenant, size and analyses. A double-click is required to open the project.
Illumina software applications which do their own data management on ICA (such as BSSH) store their resources and data in a project in the same was as manually created projects work in ICA. For ICA, these projects are considered externally-managed projects and there are a number of restrictions on which actions are allowed on externally-managed projects from within ICA. For example, you can not delete or move externally-managed data. This is to prevent inconsistencies when these applications want to access their own project data.
You can add bundles to externally managed projects, provided those bundles do not come with additional restrictions for the project.
You can start bench workspaces in externally-managed projects. The resulting data will be stored in the externally-managed project.
Project administrators and tenant administrators can disable data sharing on externally managed projects at Projects > externally_managed_project > Project Settings > Details to prevent data from being copied or extracted.
Projects are indicated as externally-managed in the projects overview screen by a project card with a light grey accent and a lock symbol followed by "managed by app". \
You can keep track of which files are externally controlled and which are ICA-managed by means of the “managed by” column, visible in the data list view of externally-managed projects at Projects > your_project > Data.
If you have an externally-manage project and want to move the data to another project, you need to:
Copy the data from the externally-managed project to the other (new) project.
From within your external application, delete the data which is stored in the externally-managed project in ICA.
Externally-managed projects protect their notification subscriptions to ensure no user can delete them. It is possible to add your own subscriptions to externally-managed projects, see notifications for more information.
For a better understanding of how all components of ICA work, try the end-to-end tutorial.
You can share links to your project and content within projects to people who have access to it. Sharing is done by copying the URL from your browser. This URL contains both the filters and the sort options which you have applied.
ICA Cohorts can pull any molecular data available in an ICA Project, as well as additional sample- and subject-level metadata information such as demographics, biometrics, sequencing technology, phenotypes, and diseases.
To import a new data set, select Import Jobs from the left navigation tab underneath Cohorts, and click the Import Files button. The Import Files button is also available under the Data Sets left navigation item.
The
Data Setmenu item is used to view imported data sets and information. TheImport Jobsmenu item is used to check the status of data set imports.
Confirm that the project shown is the ICA Project that contains the molecular data you would like to add to ICA Cohorts.
Choose a data type among
Germline variants
Somatic mutations
RNAseq
GWAS
Choose a new study name by selecting the radio button: Create new study and entering a Study Name.
To add new data to an existing Study, select the radio button: Select from list of studies and select an existing Study Name from the dropdown.
To add data to existing records or add new records, select Job Type, Append.
Append does not wipe out any data ingested previously and can be used to ingest the molecular data in an incremental manner.
To replace data, select Job Type, Replace. If you are ingesting data again, use the Replace job type.
Enter an optional Study description.
Select the metadata model (default: Cohorts; alternatively, select OMOP version 5.4 if your data is formatted that way.)
Select the genome build your molecular data is aligned to (default: GRCh38/hg38)
For RNAseq, specify whether you want to run differential expression (see below) or only upload raw TPM.
Click Next.
Navigate to VCFs located in the Project Data.
Select each single-sample VCF or multi-sample VCF to ingest. For GWAS, select CSV files produced by Regenie.
As an alernative to selecting individual files, you can also opt to select a folder instead. Toggle the radio button on Step 2 from "Select files" to "Select folder".
This option is currently only available for germline variant ingestion: any combination of small variants, structural variation, and/or copy number variants.
ICA Cohorts will scan the selected folder and all sub-folders for any VCF files or JSON files and try to match them against the Sample ID column in the metadata TSV file (Step 3).
Files not matching sample IDs will be ignored; allowed file extensions for VCF files after the sample ID are: *.vcf.gz, *.hard-filtered.vcf.gz, *.cnv.vcf.gz, and *.sv.vcf.gz .
Files not matching sample IDs will be ignored; allowed file extensions for JSON files after the sample ID are: .json,.json.gz, *.json.bgz, *.json.gzip.
Click Next.
Navigate to the metadata (phenotype) data tsv in the project Data.
Select the TSV file or files for ingestion.
Click Finish.
All VCF types, specifically from DRAGEN, can be ingested using the Germline variants selection. Cohorts will distinguish the variant types that it is ingesting. If Cohorts cannot determine the variant file type, it will default to ingest small variants.
Alternatively to VCFs, you can select Nirvana JSON files for DNA variants: small variants, structural variants, and copy number variation.
The maximum amount of files that can be part of a single manual ingestion batch is capped at 1000
Alternatively, users can choose a single folder and ICA Cohorts will identify all ingestible files within that folder and its sub-folders. In this scenario, cohorts will select molecular data files matching the samples listed in the metadata sheet which is the next step in the import process.
Users have the option to ingest either VCF files or Nirvana JSON files for any given batch, regardless of the chosen ingestion method.
The sample identifiers used in the VCF columns need to match the sample identifiers used in subject/sample metadata files; accordingly, if you are starting from JSON files containing variant- and gene-level annotations provided by ILMN Nirvana, the
sampleslisted in the header need to match the metadata files.
ICA Cohorts supports VCF files formatted according to VCF v4.2 and v4.3 specifications. VCF files require at least one of the following header rows to identify the genome build:
##reference=file://... --- needs to contain a reference to hg38/GRCh38 in the file path or name (numerical value is sufficient)
##contig=<ID=chr1,length=248956422> --- for hg38/GRCh38
##DRAGENCommandLine= ... --ht-reference
ICA Cohorts accepts VCFs aligned to hg38/GRCh38 and hg19/GRCh37. If your data uses hg19/GRCh37 coordinates, Cohorts will convert these to hg38/GRCh38 during the ingestion process [see Reference 1]. Harmonizing data to one genome build facilitates searches across different private, shared, and public projects when building and analyzing a cohort. If your data contains a mixture of samples mapped to hg38 and hg19, please ingest these in separate batches, as each import job into Cohorts is limited to one genome build.
Alternative to VCFs, ICA Cohorts accepts the JSON output of for hg38/GRCh38-aligned data for small germline variants and somatic mutations, copy number variations other structural variants.
ICA Cohorts can process gene- and transcript-level quantification files produced by the Illumina DRAGEN RNA pipeline. The file naming convention needs to match .quant.genes.sf for genes; and .quant.sf for transcript-level TPM (transcripts per million.)
Please also see the online documentation for the for more information on output file formats.
ICA Cohorts currently support upload of SNV-level GWAS results produced by and saved as CSV files.
Note: If annotating large sets of samples with molecular data, expect the annotation process to take over 20 minutes per whole genome batch of samples. You will receive two e-mail notifications: once your ingestion starts and once completed successfully or failed.
As an alternative to ICA Cohorts' metadata file format, you can provide files formatted according to the . Cohorts currently ingests data for these OMOP 5.4 tables, formatted as tab-delimited files:
PERSON (mandatory),
CONCEPT (mandatory if any of the following is provided),
CONDITION_OCCURRENCE (optional),
DRUG_EXPOSURE (optional), and
PROCEDURE_OCCURRENCE (optional.)
Additional files such as measurement and observation will be supported in a subsequent release of Cohorts.
Note that Cohorts requires that all such files do not deviate from the OMOP CDM 5.4 standard. Depending on your implementation, you may have to adjust file formatting to be OMOP CDM 5.4-compatible.
[1] VcfMapper: https://stratus-documentation-us-east-1-public.s3.amazonaws.com/downloads/cohorts/main_vcfmapper.py
[2] crossMap: https://crossmap.sourceforge.net/
[3] liftOver: https://genome.ucsc.edu/cgi-bin/hgLiftOver
[4] Chain files:
In this tutorial, we will demonstrate how to create and launch a simple DRAGEN pipeline using the Nextflow language in ICA GUI. More information about Nextflow on ICA can be found . For this example, we will implement the alignment and variant calling example from this for Paired-End FASTQ Inputs.
The first step in creating a pipeline is to select a project for the pipeline to reside in. If the project doesn't exist, create a project. For instructions on creating a project, see the page. In this tutorial, we'll use a project called Getting Started.
After a project has been created, a DRAGEN bundle must be linked to a project to obtain access to a DRAGEN docker image. Enter the project by clicking on it, and click Edit in the Project Details page. From here, you can link a DRAGEN Demo Tool bundle into the project. The bundle that is selected here will determine the DRAGEN version that you have access to. For this tutorial, you can link DRAGEN Demo Bundle 3.9.5. Once the bundle has been linked to your project, you can now access the docker image and version by navigating back to the All Projects page, clicking on Docker Repository, and double clicking on the docker image dragen-ica-4.0.3. The URL of this docker image will be used later in the container directive for your DRAGEN process defined in Nextflow.
Select Projects > your_project > Flow > Pipelines. From the Pipelines view, click +Create Pipeline > Nextflow > XML based to start creating a Nextflow pipeline.
In the Nextflow pipeline creation view, the Details tab is used to add information about the pipeline. Add values for the required Code (pipeline name) and Description fields. Nextflow Version and Storage size defaults to preassigned values.
Next, add the Nextflow pipeline definition by navigating to the Nextflow files > main files > main.nf. You will see a text editor. Copy and paste the following definition into the text editor. Modify the container directive by replacing the current URL with the URL found in the docker image dragen-ica-4.0.3.
To specify a compute type for a Nextflow process, use the directive within each process.
Outputs for Nextflow pipelines are uploaded from the out folder in the attached shared filesystem. The directive specifies the output folder for a given process. Only data moved to the out folder using the publishDir directive will be uploaded to the ICA project after the pipeline finishes executing.
Refer to the for details on ICA specific attributes within the Nextflow definition.
Next, create the input form used for the pipeline. This is done through the XML CONFIGURATION tab. More information on the specifications for the input form can be found in page.
This pipeline takes two FASTQ files, one reference file and one sample_id parameter as input.
Paste the following XML input form into the XML CONFIGURATION text editor.
Click the Simulate button (at the bottom of the text editor) to preview the launch form fields.
Click the Save button to save the changes.
The dataInputs section specifies file inputs, which will be mounted when the pipeline executes. Parameters defined under the steps section refer to string and other input types.
Each of the dataInputs and parameters can be accessed in the Nextflow within the params object named according to the code defined in the XML (e.g. params.sample_id).
If you have no test data available, you need to link the Dragen Demo Bundle to your project at Projects > your_project > Project Settings > Details > Linked Bundles.
Go to the projects > your_project > flow > pipelines page from the left navigation pane. Select the pipeline you just created and click Start Analysis.
Fill in the required fields indicated by the asterisk "*" sign and click on Start Analysis button.
You can monitor the run from the Projects > your_project > Flow > analysis page. Once the Status changes to Succeeded, you can click on the run to access the results.
Field
Description
Project name
The ICA project for your cohort analysis (cannot be changed.)
Study name
Create or select a study. Each study represents a subset of data within the project.
Description
Short description of the data set (optional).
Job type
Append: Appends values to any existing values. If a field supports only a single value, the value is replaced.
Replace: Overwrites existing values with the values in the uploaded file.
Subject metadata files
Subject metadata file(s) in tab-delimited format. For Append and Replace job types, the following fields are required and cannot be changed: - Sample identifier - Sample display name - Subject identifier - Subject display name - Sex
nextflow.enable.dsl = 2
process DRAGEN {
// The container must be a DRAGEN image that is included in an accepted bundle and will determine the DRAGEN version
container '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/7ecddc68-f08b-4b43-99b6-aee3cbb34524:latest'
pod annotation: 'scheduler.illumina.com/presetSize', value: 'fpga2-medium'
pod annotation: 'volumes.illumina.com/scratchSize', value: '1TiB'
// ICA will upload everything in the "out" folder to cloud storage
publishDir 'out', mode: 'symlink'
input:
tuple path(read1), path(read2)
val sample_id
path ref_tar
output:
stdout emit: result
path '*', emit: output
script:
"""
set -ex
mkdir -p /scratch/reference
tar -C /scratch/reference -xf ${ref_tar}
/opt/edico/bin/dragen --partial-reconfig HMM --ignore-version-check true
/opt/edico/bin/dragen --lic-instance-id-location /opt/instance-identity \\
--output-directory ./ \\
-1 ${read1} \\
-2 ${read2} \\
--intermediate-results-dir /scratch \\
--output-file-prefix ${sample_id} \\
--RGID ${sample_id} \\
--RGSM ${sample_id} \\
--ref-dir /scratch/reference \\
--enable-variant-caller true
"""
}
workflow {
DRAGEN(
Channel.of([file(params.read1), file(params.read2)]),
Channel.of(params.sample_id),
Channel.fromPath(params.ref_tar)
)
}<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
<pd:dataInputs>
<pd:dataInput code="read1" format="FASTQ" type="FILE" required="true" multiValue="false">
<pd:label>FASTQ Read 1</pd:label>
<pd:description>FASTQ Read 1</pd:description>
</pd:dataInput>
<pd:dataInput code="read2" format="FASTQ" type="FILE" required="true" multiValue="false">
<pd:label>FASTQ Read 2</pd:label>
<pd:description>FASTQ Read 2</pd:description>
</pd:dataInput>
<pd:dataInput code="ref_tar" format="TAR" type="FILE" required="true" multiValue="false">
<pd:label>Reference</pd:label>
<pd:description>Reference TAR</pd:description>
</pd:dataInput>
</pd:dataInputs>
<pd:steps>
<pd:step execution="MANDATORY" code="General">
<pd:label>General</pd:label>
<pd:description></pd:description>
<pd:tool code="generalparameters">
<pd:label>General Parameters</pd:label>
<pd:description></pd:description>
<pd:parameter code="sample_id" minValues="1" maxValues="1" classification="USER">
<pd:label>Sample ID</pd:label>
<pd:description></pd:description>
<pd:stringType/>
<pd:value></pd:value>
</pd:parameter>
</pd:tool>
</pd:step>
</pd:steps>
</pd:pipeline>

















Bundles are curated data sets which combine assets such as pipelines, tools, and Base query templates. This is where you will find packaged assets such as Illumina-provided pipelines and sample data. You can create, share and use bundles in projects of your own tenant as well as projects in other tenants.
The following ICA assets can be included in bundles:
Data (link / unlink)
Samples (link / unlink)
Reference Data (add / delete)
Pipelines (link/unlink)
Tools and Tool images (link/unlink)
Base tables (read-only) (link/unlink)
The main Bundles screen has two tabs: My Bundles and Entitled Bundles. The My Bundles tab shows all the bundles that you are a member of. This tab is where most of your interactions with bundles occur. The Entitled Bundles tab shows the bundles that have been specially created by Illumina or other organizations and shared with you to use in your projects. See Access and Use an Entitled Bundle.
Some bundles come with additional restrictions such as disabling bench access or internet access when running pipelines to protect the data contained in them. When you link these bundles, the restrictions will be enforced on your project. Unlinking the bundle will not remove the restrictions.
You can not link bundles which come with additional restrictions to externally managed projects.
As of ICA v.2.29, the content in bundles is linked in such a way that any updates to a bundle are automatically propagated to the projects which have that bundle linked.
If you have created bundle links in ICA versions prior to ICA v2.29 and want to switch them over to links with dynamic updates, you need to unlink and relink them.
From the main navigation page, select Projects > your_project > Project Settings > Details.
Click the Edit button at the top of the Details page.
Click the + button, under Linked bundles.
Click on the desired bundle, then click the +Link Bundles button.
Click Save.
The assets included in the bundle will now be available in the respective pages within the Project (e.g. Data and Pipelines pages). Any updates to the assets will be automatically available in the destination project.
To unlink a bundle from a project,
Select Projects > your_project > Project Settings > Details.
Click the Edit button at the top of the Details page.
Click the (-) button, next to the linked bundle you wish to remove.
Bundles and projects have to be in the same region in order to be linked. Otherwise, the error The bundle is in a different region than the project so it's not eligible for linking will be displayed.
To create a new bundle and configure its settings, do as follows.
From the main navigation, select Bundles.
Select + Create .
Enter a unique name for the bundle.
From the Region drop-down list, select where the assets for this bundle should be stored.
Set the status of the bundle. When the status of a bundle changes, it cannot be reverted to a draft or released state.
Draft—The bundle can be edited.
Released—The bundle is released. Technically, you can still edit bundle information and add assets to the bundle, but should refrain from doing so.
Deprecated—The bundle is no longer intended for use. By default, deprecated bundles are hidden on the main Bundles screen (unless non-deprecated versions of the bundle exist). Select "Show deprecated bundles" to show all deprecated bundles. Bundles can not be recovered from deprecated status.
[optional] Configure the following settings.
Categories—Select an existing category or enter a new one.
Short Description—Enter a description for the bundle.
Metadata Model—Select a metadata model to apply to the bundle.
Enter a release version for the bundle and optionally enter a description for the version.
[Optional] Links can be added with a display name (max 100 chars) and URL (max 2048 chars).
Homepage
License
Links
Publications
[Optional] Enter any information you would like to distribute with the bundle in the Documentation section.
Select Save.
There is no option to delete bundles, they must be deprecated instead.
To make changes to a bundle:
From the main navigation, select Bundles.
Select a bundle.
Select Edit.
Modify the bundle information and documentation as needed.
Select Save.
To make changes to a bundle:
Select a bundle.
On the left-hand side, select the type of asset (such as Flow > pipelines, Base > Tables or Bench > Docker Images) you want to add to the bundle.
Depending on the asset type, select add or link to bundle.
Select the assets and confirm.
Assets must meet the following requirements before they can be added to a bundle:
For Samples and Data, the project the asset belongs to must have data sharing enabled.
The region of the project containing the asset must match the region of the bundle.
You must have permission to access the project containing the asset.
Pipelines and tools need to be in released status.
Samples must be available in a complete state.
When you link folders to a bundle, a warning is displayed indicating that, depending on the size of the folder, linking may take considerable time. The linking process will run in the background and the progress can be monitored on the Bundles > your_bundle > activity > Batch Jobs screen. To see more details and the progress, double-click the batch job and then double-click the individual item. This will show how many individual files are already linked.
Which batch jobs are visible as activity depends on the user role.
When creating a new bundle version, you can only add assets to the bundle. You cannot remove existing assets from a bundle when creating a new version. If you need to remove assets from a bundle, it is recommended that you create a new bundle. All users wich currently have access to a bundle will automatically have access to the new version as well.
From the main navigation, select Bundles.
Select a bundle.
Select + Create new Version.
Make updates as needed and update the version number.
Select Save.
When you create a new version of a bundle, it will replace the old version in your list. To see the old version, open your new bundle and look at Bundles > your_bundle > Details > Versioning. There you can open the previous version which is contained in your new version.
Assets such as data which were added in a previous version of your bundle will be marked in green, while new content will be black.
From the main navigation, Select Bundles > your_bundle > Bundle Settings > Legal.
To add Terms of Use to a Bundle, do as follows:
Select + Create New Version.
Use the WYSIWYG editor to define Terms of Use for the selected bundle.
Click Save.
[Optional] Require acceptance by clicking the checkbox next to Acceptance required.
Acceptance required will prompt a user to accept the Terms of Use before being able to use a bundle or add the bundle to a project.
To edit the Terms of Use, repeat Steps 1-3 and use a unique version name. If you select acceptance required, you can choose to keep the acceptance status as is or require users to reaccept the terms of use. When reacceptance is required, users need to reaccept the terms in order continue using this bundle in their pipelines. This is indicated when they want to enter projects which use this bundle.
If you want to collaborate with other people on creating a bundle and managing the assets in the bundle, you can add users to your bundle and set their permissions. You use this to create a bundle together, not to use the bundle in your projects.
From the main navigation, select Bundles > your_bundle > Bundle Settings > Team.
To invite a user to collaborate on the bundle, do as follows.
To add a user from your tenant, select Someone of your tenant and select a user from the drop-down list.
To add a user by their email address, select By email and enter their email address.
To add all the users of an entire workgroup, select Add workgroup and select a workgroup from the drop-down list.
Select the Bundle Role drop-down list and choose a role for the user or workgroup. This role defines the ability of the user or workgroup to view or edit bundle settings.
Viewer: view content without editing rights.
Contributor: view bundle content and link/unlink assets.
Administrator: full edit rights of content and configuration.
Repeat as needed to add more users.
Users are not officially added to the bundle until they accept the invitation.
To change the permissions role for a user, select the Bundle Role drop-down list for the user and select a new role.
To revoke bundle permissions from a user, select the trash icon for the user.
Select Save Changes.
Once you have finalized your bundle and added all assets and legal requirements, you can share your bundle with other tenants to use it in their projects.
Your bundle must be in released status to prevent it from being updated while it is shared.
Go to Bundles > your_bundle > Edit > Details > Bundle status and set it to Released.
Save the change.
Once the bundle is released, you can share it. Invitations are sent to an individual email address, however access is granted and extended to all users and all workgroups inside that tenant.
Go to Bundles > your_bundle > Bundle Settings > Share.
Click Invite and enter the email address of the person you want to share the bundle with. They will receive an email from which they can accept or reject the invitation to use the bundle. The invitation will show the bundle name, description and owner. The link in the invite can only be used once.
You can follow up on the status of the invitation on the Bundles > your_bundle > Bundle Settings > Share page.
If they reject the bundle, the rejection date will be shown. To re-invite that person again later on, select their email address in the list and choose Remove. You can then create a new invitation. If you do not remove the old entry before sending a new invitation, they will be unable to accept and get an error message stating that the user and bundle combination must be unique. They can also not re-use an invitation once it has been accepted or declined.
If they accept the bundle, the acceptance date will be shown. They will in turn see the bundle under Bundles > Entitled bundles. To remove access, select their email address in the list and choose Remove.
Entitled bundles are bundles created by Illumina or third parties for you to use in your projects. Entitled bundles can already be part of your tenant when it is part of your subscription. You can see your entitled bundles at Bundles > Entitled Bundles.
To use your shared entitled bundle, add the bundle to your project via Project Linking. Content shared via entitled bundles is read-only, so you cannot add or modify the contents of an entitled bundle. If you lose access to an entitled bundle previously shared with you, the bundle is unlinked and you will no longer be able to access its contents.
Data Catalogues provide views on data from Illumina hardware and processes (Instruments, Cloud software, Informatics software and Assays) so that this data can be distributed to different applications. This data consists of read-only tables to prevent updates by the applications accessing it. Access to data catalogues is included with professional and enterprise subscriptions.
Project-level views
ICA_PIPELINE_ANALYSES_VIEW (Lists project-specific ICA pipeline analysis data)
ICA_DRAGEN_QC_METRIC_ANALYSES_VIEW (project-specific quality control metrics)
Tenant-level views
ICA_PIPELINE_ANALYSES_VIEW (Lists ICA pipeline analysis data)
CLARITY_SEQUENCINGRUN_VIEW_tenant (sequencing run data coming from the lab workflow software)
CLARITY_SAMPLE_VIEW_tenant (sample data coming from the lab workflow software)
CLARITY_LIBRARY_VIEW_tenant (library data coming from the lab workflow software)
CLARITY_EVENT_VIEW_tenant (event data coming from the lab workflow software)
ICA_DRAGEN_QC_METRIC_ANALYSES_VIEW (quality control metrics)
DRAGEN metrics will only have content when DRAGEN pipelines have been executed.
Analysis views will only have content when analyses have been executed.
Views containing Clarity data will only have content if you have a Clarity LIMS instance with minimum version 6.0 and the Product Analytics service installed and configured. Please see the Clarity LIMS documentation for more information.
When you use your own AWS S3 storage in a project, metrics can not be collected and thus the DRAGEN METRICS - related views can not be used.
Members of a project, who have both base contributor and project contributor or administrator rights and who belong to the same tenant as the project can add views from a Catalogue. Members of a project with the same rights who do not belong to the same tenant can remove the catalogue views from a project. Therefore, if you are invited to collaborate on a project, but belong to a different tenant, you can remove catalogue views, but cannot add them again.
To add Catalogue data,
Go to Projects > your_project > Base > Tables.
Select Add table > Import from Catalogue.
A list of available views will be displayed. (Note that views which are already part of your project are not listed)
Select the table you want to add and choose +Select
Catalogue data will have View as type, the same as tables which are linked from other projects.
To delete Catalogue data,
go to Projects > your_project > Base > Tables.
Select the table you want to delete and choose Delete.
A warning will be presented to confirm your choice. Once deleted, you can add the Catalogue data again if needed.
View: The name of the Catalogue table.
Description: An explanation of which data is contained in the view.
Category: The identification of the source system which provided the data.
Tenant/project. Appended to the view name as _tenant or _project. Determines if the data is visible for all projects within the same tenant or only within the project. Only the tenant administrator can see the non-project views.
In the Projects > your_project > Base > Tables view, double-click the Catalogue table to see the details. For an overview of the available actions and details, see Tables.
In this section, we provide examples of querying selected views from the Base UI, starting with ICA_PIPELINE_ANALYSES_VIEW (project view). This table includes the following columns: TENANT_UUID, TENANT_ID, TENANT_NAME, PROJECT_UUID, PROJECT_ID, PROJECT_NAME, USER_UUID, USER_NAME, and PIPELINE_ANALYSIS_DATA. While the first eight columns contain straightforward data types (each holding a single value), the PIPELINE_ANALYSIS_DATA column is of type VARIANT, which can store multiple values in a nested structure. In SQL queries, this column returns data as a JSON object. To filter specific entries within this complex data structure, a combination of JSON functions and conditional logic in SQL queries is essential.
Since Snowflake offers robust JSON processing capabilities, the FLATTEN function can be utilized to expand JSON arrays within the PIPELINE_ANALYSIS_DATA column, allowing for the filtering of entries based on specific criteria. It's important to note that each entry in the JSON array becomes a separate row once flattened. Snowflake aligns fields outside of this FLATTEN operation accordingly, i.e. the record USER_ID in the SQL query below is "recycled".
The following query extracts
USER_NAME directly from the ICA_PIPELINE_ANALYSES_VIEW_project table.
PIPELINE_ANALYSIS_DATA:reference and PIPELINE_ANALYSIS_DATA:price. These are direct accesses into the JSON object stored in the PIPELINE_ANALYSIS_DATA column. They extract specific values from the JSON object.
Entries from the array 'steps' in the JSON object. The query uses LATERAL FLATTEN(input => PIPELINE_ANALYSIS_DATA:steps) to expand the steps array within the PIPELINE_ANALYSIS_DATA JSON object into individual rows. For each of these rows, it selects various elements (like bpeResourceLifeCycle, bpeResourcePresetSize, etc.) from the JSON.
Furthermore, the query filters the rows based on the status being 'FAILED' and the stepId not containing the word 'Workflow': it allows the user to find steps which failed.
SELECT
USER_NAME as user_name,
PIPELINE_ANALYSIS_DATA:reference as reference,
PIPELINE_ANALYSIS_DATA:price as price,
PIPELINE_ANALYSIS_DATA:totalDurationInSeconds as duration,
f.value:bpeResourceLifeCycle::STRING as bpeResourceLifeCycle,
f.value:bpeResourcePresetSize::STRING as bpeResourcePresetSize,
f.value:bpeResourceType::STRING as bpeResourceType,
f.value:completionTime::TIMESTAMP as completionTime,
f.value:durationInSeconds::INT as durationInSeconds,
f.value:price::FLOAT as price,
f.value:pricePerSecond::FLOAT as pricePerSecond,
f.value:startTime::TIMESTAMP as startTime,
f.value:status::STRING as status,
f.value:stepId::STRING as stepId
FROM
ICA_PIPELINE_ANALYSES_VIEW_project,
LATERAL FLATTEN(input => PIPELINE_ANALYSIS_DATA:steps) f
WHERE
f.value:status::STRING = 'FAILED'
AND f.value:stepId::STRING NOT LIKE '%Workflow%';Now let's have a look at DRAGEN_METRICS_VIEW_project view. Each DRAGEN pipeline on ICA creates multiple metrics files, e.g. SAMPLE.mapping_metrics.csv, SAMPLE.wgs_coverage_metrics.csv, etc for DRAGEN WGS Germline pipeline. Each of these files is represented by a row in DRAGEN_METRICS_VIEW_project table with columns ANALYSIS_ID, ANALYSIS_UUID, PIPELINE_ID, PIPELINE_UUID, PIPELINE_NAME, TENANT_ID, TENANT_UUID, TENANT_NAME, PROJECT_ID, PROJECT_UUID, PROJECT_NAME, FOLDER, FILE_NAME, METADATA, and ANALYSIS_DATA. ANALYSIS_DATA column contains the content of the file FILE_NAME as an array of JSON objects. Similarly to the previous query we will use FLATTEN command. The following query extracts
Sample name from the file names.
Two metrics 'Aligned bases in genome' and 'Aligned bases' for each sample and the corresponding values.
The query looks for files SAMPLE.wgs_coverage_metrics.csv only and sorts based on the sample name:
SELECT DISTINCT
SPLIT_PART(FILE_NAME, '.wgs_coverage_metrics.csv', 1) as sample_name,
f.value:column_2::STRING as metric,
f.value:column_3::FLOAT as value
FROM
DRAGEN_METRICS_VIEW_project,
LATERAL FLATTEN(input => ANALYSIS_DATA) f
WHERE
FILE_NAME LIKE '%wgs_coverage_metrics.csv'
AND (
f.value:column_2::STRING = 'Aligned bases in genome'
OR f.value:column_2::STRING = 'Aligned bases'
)
ORDER BY
sample_name;Lastly, you can combine these views (or rather intermediate results derived from these views) using the WITH and JOIN commands. The SQL snippet below demonstrates how to join two intermediate results referred to as 'flattened_dragen_scrna' and 'pipeline_table'. The query:
Selects two metrics ('Invalid barcode read' and 'Passing cells') associated with single-cell RNA analysis from records where the FILE_NAME ends with 'scRNA.metrics.csv', and then stores these metrics in a temporary table named 'flattened_dragen_scrna'.
Retrieves metadata related to all scRNA analyses by filtering on the pipeline ID from the 'ICA_PIPELINE_ANALYSES_VIEW_project' view and stores this information in another temporary table named 'pipeline_table'.
Joins the two temporary tables using the JOIN operator, specifying the join condition with the ON operator.
WITH flattened_dragen_scrna AS (
SELECT DISTINCT
SPLIT_PART(FILE_NAME, '.scRNA.metrics.csv', 1) as sample_name,
ANALYSIS_UUID,
f.value:column_2::STRING as metric,
f.value:column_3::FLOAT as value
FROM
DRAGEN_METRICS_VIEW_project,
LATERAL FLATTEN(input => ANALYSIS_DATA) f
WHERE
FILE_NAME LIKE '%scRNA.metrics.csv'
AND (
f.value:column_2::STRING = 'Invalid barcode read'
OR f.value:column_2::STRING = 'Passing cells'
)
),
pipeline_table AS (
SELECT
PIPELINE_ANALYSIS_DATA:reference::STRING as reference,
PIPELINE_ANALYSIS_DATA:id::STRING as analysis_id,
PIPELINE_ANALYSIS_DATA:status::STRING as status,
PIPELINE_ANALYSIS_DATA:pipelineId::STRING as pipeline_id,
PIPELINE_ANALYSIS_DATA:requestTime::TIMESTAMP as start_time
FROM
ICA_PIPELINE_ANALYSES_VIEW_project
WHERE
PIPELINE_ANALYSIS_DATA:pipelineId = 'c9c9a2cc-3a14-4d32-b39a-1570c39ebc30'
)
SELECT * FROM flattened_dragen_scrna JOIN pipeline_table
ON
flattened_dragen_scrna.ANALYSIS_UUID = pipeline_table.analysis_id;You can use ICA_PIPELINE_ANALYSES_VIEW to obtained the costs of individual steps of an analysis. Using the following SQL snippet you can retrieve the costs of individual steps for every analyses run in the past week.
SELECT
USER_NAME as user_name,
PROJECT_NAME as project,
SUBSTRING(PIPELINE_ANALYSIS_DATA:reference, 1, 30) as reference,
PIPELINE_ANALYSIS_DATA:status as status,
ROUND(PIPELINE_ANALYSIS_DATA:computePrice,2) as price,
PIPELINE_ANALYSIS_DATA:totalDurationInSeconds as duration,
PIPELINE_ANALYSIS_DATA:startTime::TIMESTAMP as startAnalysis,
f.value:bpeResourceLifeCycle::STRING as bpeResourceLifeCycle,
f.value:bpeResourcePresetSize::STRING as bpeResourcePresetSize,
f.value:bpeResourceType::STRING as bpeResourceType,
f.value:durationInSeconds::INT as durationInSeconds,
f.value:price::FLOAT as priceStep,
f.value:status::STRING as status,
f.value:stepId::STRING as stepId
FROM
ICA_PIPELINE_ANALYSES_VIEW_project,
LATERAL FLATTEN(input => PIPELINE_ANALYSIS_DATA:steps) f
WHERE
PIPELINE_ANALYSIS_DATA:startTime > CURRENT_TIMESTAMP() - INTERVAL '1 WEEK'
ORDER BY
priceStep DESC;Data Catalogue views cannot be shared as part of a Bundle.
Data size is not shown for views because views are a subset of data.
By removing Base from a project, the Data Catalogue will also be removed from that project.
As tenant-level Catalogue views can contain sensitive data, it is best to save this (filtered) data to a new table and share that table instead of sharing the entire view as part of a project. To do so, add your view to a separate project and run a query on the data at Projects > your_project > Base > Query > New Query. When the query completes, you can export the result as a new table. This ensures no new data will be added on consequent runs.
This tutorial demonstrates how to use the ICA Python library packaged with the JupyterLab image for Bench Workspaces.
See the JupyterLab documentation for details about the JupyterLab docker image provided by Illumina.
The tutorial will show how authentication to the ICA API works and how to search, upload, download and delete data from a project into a Bench Workspace. The python code snippets are written for compatibility with a Jupyter Notebook.
Navigate to Bench > Workspaces and click Enable to enable workspaces. Select +New Workspace to create a new workspace. Fill in the required details and select JupyterLab for the Docker image. Click Save and Start to open the workspace. The following snippets of code can be pasted into the workspace you've created.
This snippet defines the required python modules for this tutorial:
# Wrapper modules
import icav2
from icav2.api import project_data_api
from icav2.model.problem import Problem
from icav2.model.project_data import ProjectData
# Helper modules
import random
import os
import requests
import string
import hashlib
import getpassThis snippet shows how to authenticate using the following methods:
ICA Username & Password
ICA API Token
# Authenticate using User credentials
username = input("ICA Username")
password = getpass.getpass("ICA Password")
tenant = input("ICA Tenant name")
url = os.environ['ICA_URL'] + '/rest/api/tokens'
r = requests.post(url, data={}, auth=(username,password),params={'tenant':tenant})
token = None
apiClient = None
if r.status_code == 200:
token = r.content
configuration = icav2.Configuration(
host = os.environ['ICA_URL'] + '/rest',
access_token = str(r.json()["token"])
)
apiClient = icav2.ApiClient(configuration, header_name="Content-Type",header_value="application/vnd.illumina.v3+json")
print("Authenticated to %s" % str(os.environ['ICA_URL']))
else:
print("Error authenticating to %s" % str(os.environ['ICA_URL']))
print("Response: %s" % str(r.status_code))
## Authenticate using ICA API TOKEN
configuration = icav2.Configuration(
host = os.environ['ICA_URL'] + '/rest'
)
configuration.api_key['ApiKeyAuth'] = getpass.getpass()
apiClient = icav2.ApiClient(configuration, header_name="Content-Type",header_value="application/vnd.illumina.v3+json")These snippets show how to manage data in a project. Operations shown are:
Create a Project Data API client instance
List all data in a project
Create a data element in a project
Upload a file to a data element in a project
Download a data element from a project
Search for matching data elements in a project
Delete matching data elements in a project
# Retrieve project ID from the Bench workspace environment
projectId = os.environ['ICA_PROJECT']# Create a Project Data API client instance
projectDataApiInstance = project_data_api.ProjectDataApi(apiClient)# List all data in a project
pageOffset = 0
pageSize = 30
try:
projectDataPagedList = projectDataApiInstance.get_project_data_list(project_id = projectId, page_size = str(pageSize), page_offset = str(pageOffset))
totalRecords = projectDataPagedList.total_item_count
while pageOffset*pageSize < totalRecords:
for projectData in projectDataPagedList.items:
print("Path: "+projectData.data.details.path + " - Type: "+projectData.data.details.data_type)
pageOffset = pageOffset + 1
except icav2.ApiException as e:
print("Exception when calling ProjectDataAPIApi->get_project_data_list: %s\n" % e)# Create data element in a project
data = icav2.model.create_data.CreateData(name="test.txt",data_type = "FILE")
try:
projectData = projectDataApiInstance.create_data_in_project(projectId, create_data=data)
fileId = projectData.data.id
except icav2.ApiException as e:
print("Exception when calling ProjectDataAPIApi->create_data_in_project: %s\n" % e)## Upload a local file to a data element in a project
# Create a local file in a Bench workspace
filename = '/tmp/'+''.join(random.choice(string.ascii_lowercase) for i in range(10))+".txt"
content = ''.join(random.choice(string.ascii_lowercase) for i in range(100))
f = open(filename, "a")
f.write(content)
f.close()
# Calculate MD5 hash (optional)
localFileHash = md5Hash = hashlib.md5((open(filename, 'rb').read())).hexdigest()
try:
# Get Upload URL
upload = projectDataApiInstance.create_upload_url_for_data(project_id = projectId, data_id = fileId)
# Upload dummy file
files = {'file': open(filename, 'r')}
data = open(filename, 'r').read()
r = requests.put(upload.url , data=data)
except icav2.ApiException as e:
print("Exception when calling ProjectDataAPIApi->create_upload_url_for_data: %s\n" % e)
# Delete local dummy file
os.remove(filename)## Download a data element from a project
try:
# Get Download URL
download = projectDataApiInstance.create_download_url_for_data(project_id=projectId, data_id=fileId)
# Download file
filename = '/tmp/'+''.join(random.choice(string.ascii_lowercase) for i in range(10))+".txt"
r = requests.get(download.url)
open(filename, 'wb').write(r.content)
# Verify md5 hash
remoteFileHash = hashlib.md5((open(filename, 'rb').read())).hexdigest()
if localFileHash != remoteFileHash:
print("Error: MD5 mismatch")
# Delete local dummy file
os.remove(filename)
except icav2.ApiException as e:
print("Exception when calling ProjectDataAPIApi->create_download_url_for_data: %s\n" % e)# Search for matching data elements in a project
try:
projectDataPagedList = projectDataApiInstance.get_project_data_list(project_id = projectId, full_text="test.txt")
for projectData in projectDataPagedList.items:
print("Path: " + projectData.data.details.path + " - Name: "+projectData.data.id + " - Type: "+projectData.data.details.data_type)
except icav2.ApiException as e:
print("Exception when calling ProjectDataAPIApi->get_project_data_list: %s\n" % e)# Delete matching data elements in a project
try:
projectDataPagedList = projectDataApiInstance.get_project_data_list(project_id = projectId, full_text="test.txt")
for projectData in projectDataPagedList.items:
print("Deleting file "+projectData.data.details.path)
projectDataApiInstance.delete_data(project_id = projectId, data_id = projectData.data.id)
except icav2.ApiException as e:
print("Exception %s\n" % e)These snippets show how to get a connection to a base database and run an example query. Operations shown are:
Create a python jdbc connection
Create a table
Insert data into a table
Query the table
Delete the table
Snowflake Python API documentation can be found here
This snipppet defines the required python modules for this tutorial:
# API modules
import icav2
from icav2.api import project_base_api
from icav2.model.problem import Problem
from icav2.model.base_connection import BaseConnection
# Helper modules
import os
import requests
import getpass
import snowflake.connector# Retrieve project ID from the Bench workspace environment
projectId = os.environ['ICA_PROJECT']# Create a Project Base API client instance
projectBaseApiInstance = project_base_api.ProjectBaseApi(apiClient)# Get a Base Access Token
try:
baseConnection = projectBaseApiInstance.create_base_connection_details(project_id = projectId)
except icav2.ApiException as e:
print("Exception when calling ProjectBaseAPIApi->create_base_connection_details: %s\n" % e)
## Create a python jdbc connection
ctx = snowflake.connector.connect(
account=os.environ["ICA_SNOWFLAKE_ACCOUNT"],
authenticator=baseConnection.authenticator,
token=baseConnection.access_token,
database=os.environ["ICA_SNOWFLAKE_DATABASE"],
role=baseConnection.role_name,
warehouse=baseConnection.warehouse_name
)
ctx.cursor().execute("USE "+os.environ["ICA_SNOWFLAKE_DATABASE"])## Create a Table
tableName = "test_table"
ctx.cursor().execute("CREATE OR REPLACE TABLE " + tableName + "(col1 integer, col2 string)")## Insert data into a table
ctx.cursor().execute(
"INSERT INTO " + tableName + "(col1, col2) VALUES " +
" (123, 'test string1'), " +
" (456, 'test string2')")## Query the table
cur = ctx.cursor()
try:
cur.execute("SELECT * FROM "+tableName)
for (col1, col2) in cur:
print('{0}, {1}'.format(col1, col2))
finally:
cur.close()# Delete the table
ctx.cursor().execute("DROP TABLE " + tableName);

Queries can be used for data mining. On the Projects > your_project > Base > Query page:
New queries can be created and executed
Already executed queries can be found in the query history
Saved queries and query templates are listed under the saved queries tab.
All available tables and their details are listed on the New Query tab.
Queries are executed using SQL (for example Select * From table_name). When there is a syntax issue with the query, the error will be displayed on the query screen when trying to run it. The query can be immediately executed or saved for future use.
Do not use queries such as ALTER TABLE to modify your table structure as it will go out of sync with the table definition and will result in processing errors.
When you have duplicate column names in your query, put the columns explicitly in the select clause and use column aliases for columns with the same name.
Case sensitive column names (such as the VARIANTS table) must be surrounded by double quotes. For example, select * from MY_TABLE where "PROJECT_NAME" = 'MyProject'.
The syntax for ICA case-sensitive subfields is without quotes, for example select * from MY_TABLE where ica:Tenant = 'MyTenant' As these are case sensitive, the upper and lowercasing must be respected.
If you want to query data from a table shared from another tenant (indicated in green), select the table to see the unique name. In the example below, the query will be select * from demo_alpha_8298.public.TestFiles
\
For more information on queries, please also see the snowflake documentation: https://docs.snowflake.com/en/user-guide/
Some tables contain columns with an array of values instead of a single value.
Suppose you have a table called YOUR_TABLE_NAME consisting of three fields. The first is a name, the second is a code and the third field is an array of data called ArrayField:
You can use the name field and code field to do queries by running
Select * from YOUR_TABLE_NAME where NameField = "Name A".
If you want to show specific data like the email and bundle name from the array, this becomes
Select ArrayField:userEmail as User_Email, ArrayField:bundleName as Bundle_Name from YOUR_TABLE_NAME where NameField = "Name A".
If you want to use data in the array as your selection criteria, the expression becomes
Select ArrayField:userEmail as User_Email, ArrayField:bundleName as Bundle_Name from YOUR_TABLE_NAME where ArrayField:boolean = true.
If your criteria is text in the array, use the ' to delimit the text. For example:
Select ArrayField:userEmail as User_Email, ArrayField:bundleName as Bundle_Name from YOUR_TABLE_NAME where ArrayField:userEmail = '[email protected]'.
You can also use the LIKE operator with the % wildcard if you do not know the exact content.
Select ArrayField:userEmail as User_Email, ArrayField:bundleName as Bundle_Name from YOUR_TABLE_NAME where ArrayField:userEmail LIKE '%A@server%'
If the query is valid for execution, the result will be shown as a table underneath the input box. Only the first 200 chars of a string, record or variant field are included in the query results grid. The complete value is available through clicking the "show details" link.
From within the result page of the query, it is possible to save the result in two ways:
Download: As Excel or JSON file to the computer.
Export: As a new table, as a view or as file to the project in CSV (Tab, Pipe or a custom delimeter is also allowed.) or JSON format. When exporting in JSON format, the result will be saved in a text file that contains a JSON object for each entry, similar to when exporting a table. The exported file can be located in the Data page under the folder named base_export_<user_supplied_name>_<auto generated unique id>.
Navigate to Projects > your_project > Base > Query.
Enter the query to execute using SQL.
Select »Run Query.
Optionally, select Save Query to add the query to your saved queries list.
If the query takes more than 30 seconds without returning a result, a message will be displayed to inform you the query is still in progress and the status can be consulted on Projects > your_project > Activity > Base Jobs. Once this Query is successfully completed, the results can be found in Projects > your_project > Base > Query > Query History tab.
The query history lists all queries that were executed. Historical queries are shown with their date, executing user, returned rows and duration of the run.
Navigate to Projects > your_project > Base > Query.
Select the Query History tab.
Select a query.
Perform one of the following actions:
Open Query—Open the query in the New Query tab. You can then select Run Query to execute the query again.
Save Query—Save the query to the saved queries list.
View Results—Download the results from a query or export results to a new table, view, or file in the project. Results are available for 24 hours after the query is executed. To view results after 24 hours, you need to execute the query again.
All queries saved within the project are listed under the Saved Queries tab together with the query templates.
The saved queries can be:
Opened: This will open the query in the “New query” tab.
Saved as template: The saved query becomes a query template.
Deleted: The query is removed from the list and cannot be opened again.
The query templates can be:
Opened: This will open the query again in the “New query” tab.
Deleted: The query is removed from the list and cannot be opened again.
It is possible to edit the saved queries and templates by double-clicking on each query or template. Specifically for Query Templates, the data classification can be edited to be:
Account: The query template will be available for everyone within the account
User: The query template will be available for the user who created it
If you have saved a query, you can run the query again by selecting it from the list of saved queries.
Navigate to Projects > your_project > Base > Query.
Select the Saved Queries tab.
Select a query.
Select Open Query to open the query in the New Query tab from where it can be edited if needed and run by selecting Run Query.
Shared databases are displayed under the list of Tables as Shared Database for project <project name>.
For ICA Cohorts Customers, shared databases are available in a project Base instance. For more information on specific Cohorts shared database tables that are viewable, See Cohorts Base.
Base is a genomics data aggregation and knowledge management solution suite. It is a secure and scalable integrated genomics data analysis solution which provides information management and knowledge mining. Refer to the Base documentation for more details.
This tutorial provides an example for exercising the basic operations used with Base, including how to create a table, load the table with data, and query the table.
An ICA project with access to Base
If you don't already have a project, please follow the instructions in the Project documentation to create a project.
File to import
A tab delimited gene expression file (sampleX.final.count.tsv). Example format:
HES4-NM_021170-T00001 1392
ISG15-NM_005101-T00002 46
SLC2A5-NM_003039-T00003 14
H6PD-NM_004285-T00004 30
PIK3CD-NM_005026-T00005 200
MTOR-NM_004958-T00006 156
FBXO6-NM_018438-T00007 10
MTHFR-NM_005957-T00008 154
FHAD1-NM_052929-T00009 10
PADI2-NM_007365-T00010 12Tables are components of databases that store data in a 2-dimensional format of columns and rows. Each row represents a new data record in the table; each column represents a field in the record. On ICA, you can use Base to create custom tables to fit your data. A schema definition defines the fields in a table. On ICA you can create a schema definition from scratch, or from a template. In this activity, you will create a table for RNAseq count data, by creating a schema definition from scratch.
Go to the Projects > your_project > Base > Tables and enable Base by clicking on the Enable button.
Select Add Table > New Table.
Create your table
To create your table from scratch, select Empty Table from the Create table from dropdown.
Name your table FeatureCounts
Uncheck the box next to Include reference, to exclude reference data from your table.
Check the box next to Edit as text. This will reveal a text box that can be used to create your schema.
Copy the schema text below and paste it in into the text box to create your schema.
{
"Fields": [
{
"NAME_PATTERN": "[a-zA-Z][a-zA-Z0-9_]*",
"Name": "TranscriptID",
"Type": "STRING",
"Mode": "REQUIRED",
"Description": null,
"DataResolver": null,
"SubBluebaseFields": []
},
{
"NAME_PATTERN": "[a-zA-Z][a-zA-Z0-9_]*",
"Name": "ExpressionCount",
"Type": "INTEGER",
"Mode": "REQUIRED",
"Description": null,
"DataResolver": null,
"SubBluebaseFields": []
}
]
}Click the Save button
Upload sampleX.final.count.tsv file with the final count.
Select Data tab (1) from the left menu.
Click on the grey box (2) to choose the file to upload or drag and drop the sampleX.final.count.tsv into the grey box
Refresh the screen (3)
The uploaded file (4) will appear on the data page after successful upload.
Data can be loaded into tables manually or automatically. To load data automatically, you can set up a schedule. The schedule specifies which files’ data should be automatically loaded into a table, when those files are uploaded to ICA or created by an analyses on ICA. Active schedules will check for new files every 24 hours.
In this exercise, you will create a schedule to automatically load RNA transcript counts from .final.count.tsv files into the table you created above.
Go to Projects > your_project > Base > Schedule and click the + Add New button.
Select the option to load the contents from files into a table.
Create your schedule.
Name your schedule LoadFeatureCounts
Choose Project as the source of data for your table.
To specify that data from .final.count.tsv files should be loaded into your table, enter .final.count.tsv in the Search for a part of a specific ‘Orignal Name’ or Tag text box.
Specify your table as the one to load data into, by selecting your table (FeatureCounts) from the dropdown under Target Base Table.
Under Write preference, select Append to table. New data will be appended to your table, rather than overwriting existing data in your table.
The format of the .final.count.tsv files that will be loaded into your table are TSV/tab-delimited, and do not contain a header row. For the Data format, Delimiter, and Header rows to skip fields, use these values:
Data format: TSV
Delimiter: Tab
Header rows to skip: 0
Click the Save button
Highlight your schedule. Click the Run button to run your schedule now.
It will take a short time to prepare and load data into your table.
Check the status of your job on your Projects > your_project > Activity page.
Click the BASE JOBS tab to view the status of scheduled Base jobs.
Click BASE ACTIVITY to view Base activity.
Check the data in the table.
Go back to your Projects > your_project > Base > Tables page.
Double-click your table to view its details.
You will land on the SCHEMA DEFINITION page.
Click the PREVIEW tab to view the records that were loaded into your table.
Click the DATA tab, to view a list of the files whose data has been loaded into your table.
To request data or information from a Base table, you can run an SQL query. You can create and run new queries or saved queries.
In this activity, we will create and run a new SQL query to find out how many records (RNA transcripts) in your table have counts greater than 100.
Go to your Projects > your_project > Base > Query page.
SELECT TranscriptID,ExpressionCount FROM FeatureCounts WHERE ExpressionCount > 100;Paste the above query into the NEW QUERY text box
Click the Run Query button to run your query
View your query results.
Save your query for future use by clicking the Save Query button. You will be asked to "Name" the query before clicking on the "Create" button.
Find the table you want to export on the "Tables" page under BASE. Go to the table details page by clicking twice on the table you want to export.
Click on the Export As File icon and complete the required fields
Name: Name of the exported file
Data Format: A table can be exported in CSV and JSON format. The exported files can be compressed using GZIP, BZ2, DEFLATE or RAW_DEFLATE.
CSV Format: In addition to Comma, the file can be Tab, Pipe or Custom character delimited.
JSON Format: Selecting JSON format exports the table in a text file containing a JSON object for each entry in the table. This is the standard snowflake behavior.
Export to single/multiple files: This option allows the export of a table as a single (large) file or multiple (smaller) files. If "Export to multiple files" is selected, a user can provide "Maximum file size (in bytes)" for exported files. The default value is 16,000,000 bytes but can be increased to accommodate larger files. The maximum file size supported is 5 GB.
In this tutorial, we will show how to create and launch a pipeline using the Nextflow language in ICA.
This tutorial references the Basic pipeline example in the Nextflow documentation.
The first step in creating a pipeline is to create a project. For instructions on creating a project, see the Projects page. In this tutorial, the project is named Getting Started.
After creating your project,
Open the project at Projects > your_project.
Navigate to the Flow > Pipelines view in the left navigation pane.
From the Pipelines view, click +Create > Nextflow > XML based to start creating the Nextflow pipeline.
In the Nextflow pipeline creation view, the Description field is used to add information about the pipeline. Add values for the required Code (unique pipeline name) , description and size fields.
Next we'll add the Nextflow pipeline definition. The pipeline we're creating is a modified version of the Basic pipeline example from the Nextflow documentation. Modifications to the pipeline definition from the nextflow documentation include:
Add the container directive to each process with the latest ubuntu image. If no Docker image is specified, public.ecr.aws/lts/ubuntu:22.04_stable is used as default.
Add the publishDir directive with value 'out' to the reverse process.
Modify the reverse process to write the output to a file test.txt instead of stdout.
The description of the pipeline from the linked Nextflow docs:
This example shows a pipeline that is made of two processes. The first process receives a FASTA formatted file and splits it into file chunks whose names start with the prefix seq_.
The process that follows, receives these files and it simply reverses their content by using the rev command line tool.
Resources: For each process, you can use the memory directive and cpus directive to set the Compute Types. ICA will then determine the best matching compute type based on those settings. Suppose you set memory '10240 GB' and cpus 6, then ICA will determine you need standard-large ICA Compute Type.
Syntax example:
process iwantstandardsmallresources {
cpus 2
memory '8 GB'
...Navigate to the Nextflow files > main.nf tab to add the definition to the pipeline. Since this is a single file pipeline, we don't need to add any additional definition files. Paste the following definition into the text editor:
#!/usr/bin/env nextflow
params.in = "$HOME/sample.fa"
sequences = file(params.in)
SPLIT = (System.properties['os.name'] == 'macOS' ? 'gcsplit' : 'csplit')
process splitSequences {
container 'public.ecr.aws/lts/ubuntu:22.04'
input:
file 'input.fa' from sequences
output:
file 'seq_*' into records
"""
$SPLIT input.fa '%^>%' '/^>/' '{*}' -f seq_
"""
}
process reverse {
container 'public.ecr.aws/lts/ubuntu:22.04'
publishDir 'out'
input:
file x from records
output:
file 'test.txt'
"""
cat $x | rev > test.txt
"""
}Next we'll create the input form used when launching the pipeline. This is done through the XML Configuration tab. Since the pipeline takes in a single FASTA file as input, the XML-based input form will include a single file input.
Paste the below XML input form into the XML CONFIGURATION text editor. Click the Generate button to preview the launch form fields.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition">
<pd:dataInputs>
<pd:dataInput code="in" format="FASTA" type="FILE" required="true" multiValue="false">
<pd:label>in</pd:label>
<pd:description>fasta file input</pd:description>
</pd:dataInput>
</pd:dataInputs>
<pd:steps/>
</pd:pipeline>With the definition added and the input form defined, the pipeline is complete.
On the Documentation tab, you can add additional information about your pipeline. This information will be presented under the Documentation tab whenever a user starts a new analysis on the pipeline.
Click the Save button at the top right. The pipeline will now be visible from the Pipelines view within the project.
Before we launch the pipeline, we'll need to upload a FASTA file to use as input. In this tutorial, we'll use a public FASTA file from the UCSC Genome Browser. Download the chr1_GL383518v1_alt.fa.gz file and unzip to decompress the FASTA file.
To upload the FASTA file to the project, first navigate to the Data section in the left navigation pane. In the Data view, drag and drop the FASTA file from your local machine into the indicated section in the browser. Once the file upload completes, the file record will show in the Data explorer. Ensure that the format of the file is set to "FASTA".
Now that the input data is uploaded, we can proceed to launch the pipeline. Navigate to Projects > your_project > Flow > Analyses click on Start. Next, select your pipeline from the list.
In the Launch Pipeline view, the input form fields are presented along with some required information to create the analysis.
Enter a User Reference (identifier) for the analysis. This will be used to identify the analysis record after launching.
Set the Entitlement Bundle (there will typically only be a single option).
In the Input Files section, select the FASTA file for the single input file. (chr1_GL383518v1_alt.fa)
Set the Storage size to small. This will attach a 1.2TB shared file system to the environment used to run the pipeline.
With the required information set, click the button to Start Analysis.
After launching the pipeline, navigate to the Analyses view in the left navigation pane.
The analysis record will be visible from the Analyses view. The Status will transition through the analysis states as the pipeline progresses. It may take some time (depending on resource availability) for the environment to initialize and the analysis to move to the In Progress status.
Once the pipeline succeeds, the analysis record will show the "Succeeded" status. Do note that this may take considerable time if it is your first analysis because of the required resource management.
Click the analysis record to enter the analysis details view.
From the analysis details view, the logs produced by each process within the pipeline are accessible via the Steps tab.
Analysis outputs are written to an output folder in the project with the naming convention {Analysis User Reference}-{Pipeline Code}-{GUID}. (1)
Inside of the analysis output folder are the files output by the analysis processes written to the out folder. In this tutorial, the file test.txt (2) is written to by the reverse process. Navigating into the analysis output folder, clicking into the test.txt file details, and opening the VIEW tab (3) shows the output file contents.
The "Download" button (4) can be used to download the data to the local machine.
All tables created within Base are gathered on the Projects > your_project > Base > Tables page. New tables can be created and existing tables can be updated or deleted here.
To create a new table, click Projects > your_project > Base > Tables > +Create. Tables can be created from scratch or from a template that was previously saved. Views on data from Illumina hardware and processes are selected with the option .
Once a table is saved it is no longer possible to edit the schema, only new fields can be added. The workaround is switching to text mode, copying the schema of the table to which you want to make modifications and paste it into a new empty table where the necessary changes can be made before saving.
Once created, do not try to modify your table column layout via the Query module as even though you can execute ALTER TABLE commands, the definitions and syntax of the table will go out of sync resulting in processing issues.
Be careful when naming tables when you want to use them in . Table names have to be unique per bundle, so no two tables with the same name can be part of the same bundle.
To create a table from scratch, complete the fields listed below and click the Save button. Once saved, a job will be created to create the table. To view table creation progress, navigate to the Activity page.
The table name is a required field and must be unique. The first character of the table must be a letter followed by letters, numbers or underscores. The description is optional.
Including or excluding references can be done by checking or un-checking the Include reference checkbox. These reference fields are not shown on the table creation page, but are added to the schema definition, which is visible after creating the table (Projects > your_project > Base > Tables > your_table > Schema definition). By including references, additional columns will be added to the which can contain references to the data on the platform:
data_reference: reference to the data element in the Illumina platform from which the record originates
data_name: original name of the data element in the Illumina platform from which the record originates
sample_reference: reference to the sample in the Illumina platform from which the record originates
sample_name: name of the sample in the Illumina platform from which the record originates
pipeline_reference: reference to the pipeline in the Illumina platform from which the record originates
pipeline_name: name of the pipeline in the Illumina platform from which the record originates
execution_reference: reference to the pipeline execution in the Illumina platform from which the record originates
account_reference: reference to the account in the Illumina platform from which the record originates
account_name: name of the account in the Illumina platform from which the record originates
In an empty table, you can create a schema by adding a field with the +Add button for each column of the table and defining it. At any time during the creation process, it is possible to switch to the edit definition mode and back. The definition mode shows the JSON code, whereas the original view shows the fields in a table.
Each field requires:
a unique name (*1) with optional description.
a type
String – collection of characters
Bytes – raw binary data
Integer – whole numbers
Float – fractional numbers (*2)
Numeric – any number (*3)
Boolean – only options are “true” or “false”
Timestamp - Stores number of (milli)seconds passed since the Unix epoch
Date - Stores date in the format YYYY-MM-DD
Time - Stores time in the format HH:MI:SS
Datetime - Stores date and time information in the format YYYY-MM-DD HH:MI:SS
Record – has a child field
Variant - can store a value of any other type, including OBJECT and ARRAY
a mode
Required - Mandatory field
Nullable - Field is allowed to have no value
Repeated - Multiple values are allowed in this field (will be recognized as array in Snowflake)
(*1) Do not use reserved Snowflake keywords such as left, right, sample, select, table,... (https://docs.snowflake.com/en/sql-reference/reserved-keywords) for your schema name as this will lead to SQL compilation errors.
Users can create their own template by making a table which is turned into a template at Projects > your_project > Base > Tables > your_table > Manage (top right) > Save as template.
If a template is created and available/active, it is possible to create a new table based on this template. The table information and references follow the rules of the empty table but in this case the schema will be pre-filled. It is possible to still edit the schema that is based on the template.
The status of a table can be found at Projects > your_project > Base > Tables. The possible statuses are:
Available: Ready to be used, both with or without data
Pending: The system is still processing the table, there is probably a process running to fill the table with data
Deleted: The table is deleted functionally; it still exists and can be shown in the list again by clicking the Show deleted tables/views button
Additional Considerations
Tables created from empty data or from a template are available faster.
When copying a table with data, it can remain in a Pending for longer periods of time.
Clicking on the page's refresh button will update the list.
For any available table, the following details are shown:
Table information: Name, description, status, number of records and data size.
Definition: An overview of the table schema, also available in text. Fields can be added to the schema but not deleted. Tip for deleting fields: copy the schema as text and paste in a new empty table where the schema is still editable.
Preview: A preview of the table for the 50 first rows (when data is uploaded into the table). Select show details to see record details.
Source Data: the files that are currently uploaded into the table. You can see the Load Status of the files which can be Prepare Started, Prepare Succeeded or Prepare Failed and finally Load Succeeded or Load Failed.
From within the details of a table it is possible to perform the following actions from the Manage menu (top right) of the table:
Edit: Add fields to the table and change the table description.
Copy: Create a copy from this table in the same or a different project. In order to copy to another project, data sharing of the original project should be enabled in the details of this project. The user also has to have access to both original and target project.
Export as file: Export this table as a CSV or JSON file. The exported file can be found in a project where the user has the access to download it.
Save as template: Save the schema or an edited form of it as a template.
Add data: Load additional data into the table manually. This can be done by selecting data files previously uploaded to the project, or by dragging and dropping files directly into the popup window for adding data to the table. It’s also possible to load data into a table manually or automatically via a pre-configured job. This can be done on the Schedule page.
Delete: Delete the table.
To manually add data to your table, go to Projects > your_project > Base > Tables > your_table > Manage (top right) > Add Data
The data selection screen will show options to select the structure as CSV (comma-separated), TSV (tab-separated) or JSON (JavaScript Object Notation) and the location of your source data. In the first step, you select the data format and the files containing the data.
Data format (required): Select the format of the data which you want to import.
Write preference: Define if data can be written to the table only when the table is empty, if the data should be appended to the table or if the table should be overwrtitten.
Delimiter: Which delimiter is used in the delimiter separated file. If the required delimiter is not comma, tab or pipe, select custom and define the custom delimiter.
Custom delimiter: If a custom delimiter is used in the source data, it must be defined here.
Header rows to skip: The number of consecutive header rows (at the top of the table) to skip.
References: Choose which references must be added to the table.
Most of the advanced options are legacy functions and should not be used. The only exceptions are
Encoding: Select if the encoding is UTF-8 (any Unicode character) or ISO-8859-1 (first 256 Unicode characters).
Ignore unknown values: This applies to CSV-formatted files. You can use this function to handle optional fields without separators, provided that the missing fields are located at the end of the row. Otherwise, the parser can not detect the missing separator and will shift fields to the left, resulting in errors.
If headers are used: The columns that have matching fields are loaded, those that have no matching fields are loaded with NULL and remaining fields are discarded.
If no headers are used: The fields are loaded in order of occurrence and trailing missing fields are loaded with NULL, trailing additional fields are discarded.
To see the status of your data import, go to Projects > your_project > Activity > Base Jobs where you will see a job of type Prepare Data which will have succeeded or failed. If it has failed, you can see the error message and details by double-clicking the base job. You can then take corrective actions if the input mismatched with the table design and try to run the import again (with a new copy of the file as each input file can only be used once)
If you need to cancel the import, you can do so while it is scheduled by navigating to the Base Jobs inventory and selecting the job followed by Abort.
To see which data has been used to populate your table go to Projects > your_project > Base > Tables > your_table > Source Data. This will list all the source data files, even those that failed to be imported. You can not use these files anymore to import again to prevent double entries.
Base Table schema definitions do not include an array type, but arrays can be ingested using either the Repeated mode for arrays containing a single type (ie, String), or the Variant type.
If you have a nested JSON structure, you can import it into individual fields of your table.
For example, if your JSON nested structure looks like the above and you want to get it imported into a table with a, b and c having integers as values, you need to create a matching table. This can be done either or via the sql command CREATE OR REPLACE TABLE json_data ( a INTEGER, b INTEGER, c INTEGER);
Format your JSON data to have single lines per structure.
Finally, create a to import your data or perform a .
The resulting table will look like this:
This tutorial shows you how to start a new pipeline from scratch
Start Bench workspace
For this tutorial, any instance size will work, even the smallest standard-small.
Select the single user workspace permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines.
A small amount of disk space (10GB) will be enough.
We are going to wrap the "gzip" linux compression tool with inputs:
1 file
compression level: integer between 1 and 9
Here is an example of NextFlow code that wraps the bzip2 command and publishes the final output in the “out” folder:
Save this file as nextflow-src/main.nf, and check that it works:
We now need to:
Use Docker
Follow some nf-core best practices to make our source+test compatible with the pipeline-dev tools
In NextFlow, Docker images can be specified at the process level
Each process may use a different docker image
It is highly recommended to always specify an image. If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.
Specifying the Docker image is done with the container '<image_name:version>' directive, which can be specified
at the start of each process definition
or in nextflow config files (preferred when following nf-core guidelines)
For example, create nextflow-src/nextflow.config:
We can now run with nextflow's -with-docker option:
Following some nf-core to make our source+test compatible with the pipeline-dev tools:
Here is an example of “test” profile that can be added to nextflow-src/nextflow.config to define some input values appropriate for a validation run:
With this profile defined, we can now run the same test as before with this command:
A “docker” profile is also present in all nf-core pipelines. Our pipeline-dev tools will make use of it, so let’s define it:
We can now run the same test as before with this command:
We also have enough structure in place to start using the pipeline-dev command:
In order to deploy our pipeline to ICA, we need to generate the user interface input form.
This is done by using nf-core's recommended nextflow_schema.json.
For our simple example, we generate a minimal one by hand (done by using one of the nf-core pipelines as example):
In the next step, this gets converted to the ica-flow-config/inputForm.json file.
We just need to create a final file, which we had skipped until now: Our project description file, which can be created via the command pipeline-dev project-info --init:
We can now run:
After generating the ICA-Flow-specific files in the ica-flow-config folder (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name).
It then asks if we want to update the latest version or create a new one.
Choose "3" and enter a name of your choice to avoid conflicts with all the others users following this same tutorial.
At the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.
This launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.
Some of the input files will have been copied to your ICA project in order for the analysis launch to work. They are stored in the folder /data/project/bench-pipeline-dev/temp-data.
The Pipeline Development Kit in Bench makes it easy to create Nextflow pipelines for ICA Flow. This kit consists of a number of development tools which are installed in /data/.software (regardless of which Bench image is selected) and provides the following features:
Import to Bench
From public nf-core pipelines
From existing ICA Flow Nextflow pipelines
Run in Bench
Modify and re-run in Bench, providing fast development iterations
Deploy to Flow
Launch validation in Flow
Recommended workspace size: Nf-core Nextflow pipelines typically require 4 or more cores to run.
The pipeline development tools require
Conda which is automatically installed by “pipeline-dev” if conda-miniconda.installer.ica-userspace.sh is present in the image.
Nextflow (version 24.10.2 is automatically installed using conda, or you can use other versions)
git (automatically installed using conda)
jq, curl (which should be made available in the image)
Pipeline development tools work best when the following items are defined:
Nextflow profiles:
test profile, specifying inputs appropriate for a validation run
docker profile, instructing NextFlow to use Docker
nextflow_schema.json, as described . This is useful for the launch UI generation. The nf-core CLI tool (installable via pip install nf-core) offers extensive help to create and maintain this schema.
ICA Flow adds one additional constraint. The output directory out is the only one automatically copied to the Project data when an ICA Flow Analysis completes. The -outdir parameter recommended by nf-core should therefore be set to--outdir=out when running as a Flow pipeline.
These are installed in /data/.software (which should be in your $PATH), the pipeline-dev script is the front-end to the other pipeline-dev-* tools.
Pipeline-dev fulfils a number of roles:
Checks that the environment contains the required tools (conda, nextflow, etc) and offers to install them if needed.
Checks that the fast data mounts are present (/data/mounts/project etc.) – it is useful to check regularly, as they get unmounted when a workspace is stopped and restarted.
Redirects stdout and stderr to .pipeline-dev.log, with the history of log files kept as .pipeline-dev.log.<log date>.
Launches the appropriate sub-tool.
Prints out errors with backtrace, to help report issues.
A pipeline-dev project relies on the following Folder structure, which is auto-generated when using the pipeline-dev import* tools.
If you start a project manually, you must follow the same folder structure.
Project base folder
nextflow-src: Platform-agnostic Nextflow code, for example the github contents of an nf-core pipeline, or your usual nextflow source code.
main.nf
nextflow.config
nextflow_schema.json
pipeline-dev.project-info: contains project name, description, etc.
nextflow-bench.config (automatically generated when needed): contains definitions for bench.
ica-flow-config: Directory of files used when deploying pipeline to Flow.
inputForm.json (if not present, gets generated from nextflow-src/nextflow_schema.json): input form as defined in ICA Flow.
onSubmit.js, onRender.js (optional, generated at the same time as inputForm.json): javascript code to go with the input form.
launchPayload_inputFormValues.json (if not present, gets generated from the test profile): used by “pipeline-dev launch-validation-in-flow”.
The above-mentioned project structure must be generated manually. The nf-core CLI tools can assist to generate the nextflow_schema.json. Tutorial goes into more details about this use case.
A directory with the same name as the nextflow/nf-core pipeline is created, and the Nextflow files are pulled into the nextflow-src subdirectory.
Tutorial goes into more details about this use case.
A directory called imported-flow-analysis is created and the analysis+pipeline assets are downloaded.
Tutorial goes into more details about this use case.
Optional parameters --local / --sge can be added to force the execution on the local workspace node, or on the workspace cluster (when available). Otherwise, the presence of a cluster is automatically detected and used.
The script then launches nextflow. The full nextflow command line is printed and launched.
In case of errors, full logs are saved as .pipeline-dev.log
Nextflow can run processes with and without Docker images. In the context of pipeline development, the pipeline-dev tools assume Docker images are used, in particular during execution with the nextflow --profile docker.
In NextFlow, Docker images can be specified at the process level
This is done with the container "<image_name:version>" directive, which can be specified
in nextflow config files (preferred method when following the nf-core best practices)
or at the start of each process definition.
Each process can use a different docker image
It is highly recommended to always specify an image. If no Docker image is specified, Nextflow will report this. In ICA, a basic image will be used but with no guarantee that the necessary tools are available.
Resources such as #cpu and memory can be specified as described See or our for details about Nextflow-Docker syntax.
Bench can push/pull/create/modify Docker images, as described in .
This command does the following:
Generate the JSON file describing the ICA Flow user interface.
If ica-flow-config/inputForm.json doesn’t exist: generate it from nextflow-src/nextflow_swagger.json .
Generate the JSON file containing the validation launch inputs.
If ica-flow-config/launchPayload_inputFormValues.json doesn’t exist: generate it from nextflow --profile test inputs.
If local files are used as validation inputs or as default input values:
copy them to /data/project/pipeline-dev-files/temp .
get their ICA file ids.
use these file ids in the launch specifications.
If remote files are used as validation inputs or as default input values of an input of type “file” (and not “string”): do the same as above.
Identify the pipeline name to use for this new pipeline deployment:
If a deployment has already occurred in this project, or if the project was imported from an existing Flow pipeline, start from this pipeline name. Otherwise start from the project name.
Identify which already-deployed pipelines have the same base name, with or without suffixes that could be some versioning (_v<number>, _<number>, _<date>) .
Ask the user if they prefer to update the current version of the pipeline, create a new version, or enter a new name of their choice – or use the --create/--update parameters when specified, for scripting without user interactions.
New ICA Flow pipeline gets created (except in case of pipeline update) .
The current Nextflow version in Bench is used to select the best Nextflow version to be used in Flow
nextflow-src folder is uploaded file by file as pipeline assets.
Output Example:
The pipeline name, id and URL are printed out, and if your environment allows, Ctrl+Click/Option+Click/Right click can open the URL in a browser.
Opening the URL of the pipeline and clicking on Start Analysis shows the generated user interface:
The ica-flow-config/launchPayload_inputFormValues.json file generated in the previous step is submitted to ICA Flow to start an analysis with the same validation inputs as “nextflow --profile test”.
Output Example:
The analysis name, id and URL are printed out, and if your environment allows, Ctrl+Click/Option+Click/Right click can open the URL in a browser.
In this tutorial, we will demonstrate how to create and launch a Nextflow pipeline using the ICA command line interface (CLI).
Please refer to for installing ICA CLI. To authenticate, please follow the steps in the page.
In this tutorial, we will create pipeline in ICA. This includes four processes: index creation, quantification, FastQC, and MultiQC. We will also upload a Docker container to the ICA Docker repository for use within the pipeline.
The 'main.nf' file defines the pipeline that orchestrates various RNASeq analysis processes.
The script uses the following tools:
Salmon: Software tool for quantification of transcript abundance from RNA-seq data.
FastQC: QC tool for sequencing data
MultiQC: Tool to aggregate and summarize QC reports
We need a Docker container consisting of these tools. You can refer to the section in the help page to build your own docker image with the required tools. For the sake of this tutorial, we will use the container from the
With in your computer, download the image required for this project using the following command.
docker pull nextflow/rnaseq-nf
Create a tarball of the image to upload to ICA.
Following are lists of commands that you can use to upload the tarball to your project.
Add the image to the ICA Docker repository
The uploaded image can be added to the ICA docker repository from the ICA Graphical User Interface (GUI).
Change the format for the image tarball to DOCKER:
Navigate to Projects > your_project > Data.
Check the checkbox for the uploaded tarball.
Click on Manage > Change format.
In the new popup window, select "DOCKER" format and save.
To add this image to the ICA Docker repository, first click on Projects to go back to the home page.
From the ICA home page, click on the System Settings > Docker Repository > Create > Image.
This will open a new window that lets you select the region (US, EU, CA) in which your your project is and the docker image from the bottom pane.
Edit the Name field to rename it. For this tutorial, we will change the name to "rnaseq". Select the region, and give it a version number, and description. Click on "Save".
After creating a new docker image, you can click on the image to get the container URL (under Regions) for the nextflow configuration file.
Create a configuration file called "nextflow.config" in the same folder as the main.nf file above. Use the URL copied above to add the process.container line in the config file.
You can add a pod directive within a process or in the config file to specify a compute type. The following is an example of a configuration file with the 'standard-small' compute type for all processes. Please refer to the page for a list of available compute types.
The parameters file defines the pipeline input parameters. Refer to the for detailed information for creating correctly formatted parameters files.
An empty form looks as follows:
The input files are specified within a single dataInputs node with individual input file specified in a separate dataInput node. Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the pipeline, including but not limited to, strings, booleans, integers, etc..
For this tutorial, we do not have any settings parameters but it requires multiple file inputs. The parameters.xml file looks as follows:
Use the following commands to create the pipeline with the above contents in your project.
If not already in the project context, enter it by using the following command:
icav2 enter <PROJECT NAME or ID>
Create pipeline using icav2 project pipelines create nextflow Example:
If you prefer to organize the processes in different folders/files, you can use --other parameter to upload the different processes as additional files. Example:
You can refer to page to explore options to automate this process.
Refere to for details on running the pipeline from CLI.
Example command to run the pipeline from CLI:
You can get the pipeline id under "ID" column by running the following command:
You can get the file ids under "ID" column by running the following commands:
Please refer to command help (icav2 [command] --help) to determine available flags to filter output of above commands if necessary. You can also refer to page for available flags for the icav2 commands.
For more help on uploading data to ICA, please refer to the page.
Additional Resources:
In bioinformatics and computational biology, the vast and growing amount of data necessitates methods and tools that can process and analyze data in parallel. This demand gave birth to the scatter-gather approach, an essential pattern in creating pipelines that offers efficient data handling and parallel processing capabilities. In this tutorial, we will demonstrate how to create a CWL pipeline utilizing the scatter-gather approach. To this purpose, we will use two widely known tools: and . Given the functionalities of both fastp and multiqc, their combination in a scatter-gather pipeline is incredibly useful. Individual datasets can be scattered across resources for parallel preprocessing with fastp. Subsequently, the outputs from each of these parallel tasks can be gathered and fed into multiqc, generating a consolidated quality report. This method not only accelerates the preprocessing of large datasets but also offers an aggregated perspective on data quality, ensuring that subsequent analyses are built upon a robust foundation.
First, we create the two tools: fastp and multiqc. For this, we need the corresponding Docker images and CWL tool definitions. Please, look up this of our help sites to learn more how to import a tool into ICA. In a nutshell, once the CWL tool definition is pasted into the editor, the other tabs for editing the tool will be populated. To complete the tool, the user needs to select the corresponding Docker image and to provide a tool version (could be any string).
For this demo, we will use the publicly available Docker images: quay.io/biocontainers/fastp:0.20.0--hdbcaa40_0 for fastp and docker.io/ewels/multiqc:v1.15 for multiqc. In this one can find how to import publicly available Docker images into ICA.
Furthermore, we will use the following CWL tool definitions:
and
Once the tools are created, we will create the pipeline itself using these two tools at Projects > your_project > Flow > Pipelines > CWL > Graphical:
On the Definition tab, go to the tool repository and drag and drop the two tools which you just created on the pipeline editor.
Connect the JSON output of fastp to multiqc input by hovering over the middle of the round, blue connector of the output until the icon changes to a hand and then drag the connection to the first input of multiqc. You can use the magnification symbols to make it easier to connect these tools.
Above the diagram, drag and drop two input FASTQ files and an output HTML file on to the pipeline editor and connect the blue markers to match the diagram below.
Relevant aspects of the pipeline:
Both inputs are multivalue (as can be seen on the screenshot)
Ensure that the step fastp has scattering configured: it scatters on both inputs using the scatter method 'dotproduct'. This means that as many instances of this step will be executed as there are pairs of FASTQ files. To indicate that this step is executed multiple times, the icons of both inputs have doubled borders.
Both input arrays (Read1 and Read2) must be matched. Currently an automatic sorting of input arrays is not supported yet. One has to take care of matching the input arrays. There are two ways to achieve this (besides the manual specification in the GUI):
invoke this pipeline in CLI using Bash functionality to sort the arrays
add a tool to the pipeline which will intake array of all FASTQ files, spread them on R1 and R2 suffixes, and sort them.
We will describe the second way in more detail. The tool will be based on public python Docker docker.io/python:3.10 and have the following definition. In this tool we are providing the Python script spread_script.py via Dirent .
Now this tool can added to the pipeline before fastp step.
From the Cohorts menu in the left hand navigation, select a cohort created in Create Cohort to begin a cohort analysis.
The query details can be accessed by clicking the triangle next to Show Query Details. The query details displays the selections used to create a cohort. The selections can be edited by clicking the pencil icon in the top right.
Charts will be open by default. If not, click Show Charts.
Use the gear icon in the top-right to change viewable chart settings.
There are four charts available to view summary counts of attributes within a cohort as histogram plots.
Click Hide Charts to hide the histograms.
Display time-stamped events and observations for a single subject on a timeline.The timeline view is visible to only those subjects which have time-series data.
Below attributes are displayed in timeline view: • Diagnosed and Self-Reported Diseases: • Start and end dates • Progression vs. remission • Medication and Other Treatments: • Prescribed and self-medicated • Start date, end date, and dosage at every time point
The timeline utilizes age (at diagnosis, at event, at measurement) as the x-axis and attribute name as the y-axis. If the birthdate is not recorded for a subject, the user can now switch to Date to visualize data.
In the default view, the timeline shows the first five disease data and the first five drug/medication data in the plot. Users can choose different attributes or change the order of existing attributes by clicking on the “select attribute” button.
The x-axis shows the person’s age in years, with data points initially displayed between ages 0 to 100. Users can zoom in by selecting the desired range to visualize data points within the selected age range.
Each event is represented by a dot in the corresponding track. Events in the same track can be connected by lines to indicate the start and end period of an event.
By Default, the Subjects tab is displayed.
The Subjects tab with a list of all subjects matching your criteria is displayed below Charts with a link to each Subject by ID and other high-level information. By clicking a subject ID, you will be brought to the data collected at the Subject level.
Search for a specific subject by typing the Subject ID into the Search Subjects text box.
Get all details available on a subject by clicking the hyperlinked Subject ID in the Subject list.
To Exclude specific subjects from subsequent analysis, such as marker frequencies or gene-level aggregated views, you can uncheck the box at the beginning of each row in the subject list. You will then be prompted to save any exclusion(s).
You can Export the list of subjects either to your ICA Project's data folder or to your local disk as a TSV file for subsequent use. Any export will omit subjects that you excluded after you saved those changes. For more information, see at the bottom of this page.
Specific subjects can be removed from a Cohort.
Select the Subjects tab.
Subjects in the Cohort, by default are checked.
To remove a specific subject from a Cohort, uncheck the checkbox next to subjects to remove from a Cohort.
Check box selections are maintained while browsing through the pages of the subject list.
Click Save Cohort to save the subjects you would like to exclude.
The specific subjects will no longer be counted in all analysis visualizations.
The specific excluded subjects will be saved for the Cohort.
To add the subjects back to the Cohort, select the checkboxes to checked and click Save Cohort.
For each individual cohort, display a table of all observed SVs that overlap with a given gene.
Click the Marker Frequency tab, then click the Gene Expression tab.
Down-regulated genes are displayed in blue and Up-regulated genes are displayed in red.
A frequency in the Cohort is displayed and the Matching number/Total is also displayed in the chart.
Genes can be searched by using the Search Genes text box.
You are brought to the Gene tab under the Gene Summary sub-tab.
Select a Gene by typing the gene name into the Search Genes text box.
A Gene Summary will be displayed that lists information and links to public resources about the selected gene.
A cytogenic map will be displayed based on the selected gene and a vertical orange bar represents gene location in the chromosome.
Click the Variants tab and Show legend and filters if it does not open by default.
Below the interactive legend, you see a set of analysis tracks: Needle Plot, Primate AI, Pathogenic variants, and Exons.
The Needle Plot allows toggling the plot by gnomAD frequency and Sample Count. Select Sample Count in the Plot by legend above the plot. You can also filter the plot to only show variants above/below a certain cut-off for gnomAD frequency (in percent) or absolute sample count.
The Needle Plot allows filtering by PrimateAI Score.
Set a lower (>=) or upper (<=) threshold for the PrimateAI Score to filter variants.
Enter the threshold value in the text box located below the gnomadFreq/SampleCount input box.
If no threshold value is entered, no filter will be applied.
The filter affects both the plot and the table when the “Display only variants shown in the plot above” toggle is enabled.
Filter preferences persist across gene views for a seamless experience.
The following filters are always shown and can be independently set: %gnomAD Frequency Sample Count PrimateAI Score . Changes made to these filters are immediately reflected in both the needle plot and the variant list below.
Click on a variant's needle pin to view details about the variant from public resources and counts of variants in the selected cohort by disease category. If you want to view all subjects that carry the given variant, click on the sample count link, which will take you to the list of subjects (see above).
Use the Exon zoom bar from each end of the Amino Acid sequence to zoom in on the gene domain to better separate observations.
The Pathogenic Variant Track shows pop up details with pathogenicity calls, phenotypes, submitter and a link to the ClinVar entry is seen by hovering over the purple triangles.
Below the needle plot is a full listing of variants displayed in the needle plot visualization
Display only variants shown in the plot above. toggle (enabled by default) syncs the table with the Needle Plot. When the toggle is on, the table will display only the variants shown in the Needle Plot, applying all active filters (e.g., variant type, somatic/germline, sample count). When the toggle is off, all reported variants are displayed in the table and table-based filters can be used.
Export to CSV: When the views are synchronized (toggle on), the filtered list of variants can be exported to a CSV file for further analysis.The Phenotypes tab shows a stacked horizontal bar chart which displays molecular breakdown (disease type vs Gene) and subject count for the selected gene.
Note on "Stop Lost" Consequence Variants:
The
stop_lostconsequence is mapped asFrameshift, Stop lostin the tooltip.The l
Stop gained|lostvalue includes both stop gain and stop loss variants.When the Stop gained filter is applied, Stop lost variants will not appear in the plot or table if the "Display only variants shown in the plot above" toggle is enabled
The Gene Expression tab shows known gene expression data from tissue types in GTEx.
The Genetic Burden Test will only be available for de novo variants only.
For every correlation, subjects contained in each count can be viewed by selecting the count on the bubble or the count on the X-axis and Y-axis.
Click the Correlation Tab.
In X-axis category, select Clinical.
In X-axis Attribute, select a clinical attribute.
In Y-axis category, select Clinical.
In Y-Axis Attribute, select another clinical attribute.
You will be shown a bubble plot comparing the first clinical attribute on the x-axis to second attributes on the y-axis.
The size of the bubbles correspond to the number of subjects falling into those categories.
To see a breakdown of Somatic Mutations vs. RNA Expression levels perform the following steps:
Note this comparison is for a Cancer case.
Click the Correlation Tab.
In X-axis category, select Somatic.
In X-axis Attribute, select a gene.
In Y-axis category, select RNA expression.
In Y-Axis Attribute, type a gene and leave Reference Type, NORMAL.
Click Continuous to see violin plots of compared variables.
Note this comparison is for a Cancer case.
Click the Correlation Tab.
In X-axis category, select Somatic.
In X-axis Attribute, type a gene name.
In Y-axis category, select Clinical.
In Y-Axis Attribute, select a clinical attribute.
Click the Molecular Breakdown Tab.
In Enter a clinical Attribute, and select a clinical attribute.
In Enter a gene, select a gene by typing a gene name.
You are shown a stacked bar-chart by the clinical attribute selected values on the Y-axis.
For each attribute value the bar represents the % of Subjects with RNA Expression, Somatic Mutation, and Multiple Alterations.
Note: for each of the aforementioned bubble plots, you can view the list of subjects by following the link under each subject count associated with an individual bubble or axis label. This will take you to the list of subjects view, see above.
If there is Copy Number Variant data in the cohort:
Click the CNV tab.
A graph will show CNV a Sample Percentage on the Y-axis and Chromosomes on the X-axis.
Any value above Zero is a copy number gain, and any value below Zero is a copy number loss.
Click Chromosome: to select a specific chromosome position.
ICA allows for integrated analysis in a computation workspace. You can export your cohorts definitions and, in combination with molecular data in your ICA Project Data, perform, for example, a GWAS analysis.
Confirm the VCF data for your analysis is in ICA Project Data.
From within your ICA Project, Start a Bench Workspace -- See for more details.
Navigate back to ICA Cohorts.
Create a Cohort of subjects of interest using .
From the Subjects Tab click the Export subjects... from the top-right of the subject list. The file can be downloaded to the Browser or ICA Project Data.
We suggest using export ...to Data Folder for immediate access to this data in Bench or other areas of ICA.
Create another cohort if needed for your Research and complete the last 3 steps.
Navigate to the Bench workspace created in the second step.
After the workspace has started up, click Access.
Find the /Project/ folder in the Workspace file navigation.
This folder will contain your cohort files created along with any pipeline output data needed for your workspace analysis.
The platform provides Connectors to facilitate automation for operations on data (ie, upload, download, linking). The connectors are helpful when you want to sync data between ICA and your local computer or link data between projects in ICA.
The ICA CLI upload/download proves beneficial when handling large files/folders, especially in situations where you're operating on a remote server by connecting from your local computer. You can use icav2 projects enter <project-name/id> to set the project context for the CLI to use for the commands when relevant. If the project context is not set, you can supply the additional parameter --project-id <project-id> to specify the project for the command.
Note: Because of how S3 manages storage, it doesn't have a concept of folders in the traditional sense. So, if you provide the "folder" ID of an empty "folder", you will not see anything downloaded.
Another option to upload data to ICA is via . This option is helpful where data needs to be transferred via automated scripts. You can use the following two endpoints to upload a file to ICA.
Post - with the following body which will create a partial file at the desired location and return a dataId for the file to be uploaded. {projectId} is the the project id for the destination project. You can find the projectId in yout projects details page (Project > Details > URN > urn:ilmn:ica:project:projectId#MyProject).
Post - where dataId is the dataId from the response of the previous call. This call will generate the URL that you can use to upload the file.
Create data in the project by making the API call below. If you don't already have the API-Key, refer to the instructions on the for guidance on generating one.
In the example above, we're generating a partial file named 'tempFile.txt' within a project identified by the project ID '41d3643a-5fd2-4ae3-b7cf-b89b892228be', situated inside a folder with the folder ID 'fol.579eda846f1b4f6e2d1e08db91408069'. You can access project, file, or folder IDs either by logging into the ICA web interface or through the use of the ICA CLI.
The response will look like this:
Retrieve the data/file ID from the response (for instance: fil.b13c782a67e24d364e0f08db9f537987) and employ the following format for the Post request - /api/projects/{projectId}/data/{dataId}:createUploadUrl:
The response will look like this:
Use the URL from the response to upload a file (tempFile.txt) as follows:
ICA allows you to directly upload/download data from ICA using . It is especially helpful when dealing with an unstable internet connection to upload or download a large amount of data. If the transfer gets interrupted midway, you can employ the sync command to resume the transfer from the point it was stopped.
To connect to ICA storage, you must first download and install AWS CLI on your local system. You will need temporary credentials to AWS CLI to access ICA storage. You can generate temporary credentials through the ICA CLI, which can be used to authenticate AWS CLI against ICA. The temporary credentials can be obtained using
If you are trying to upload data to /cli-upload/ folder, you can get the temporary credentials to access the folder using icav2 projectdata temporarycredentials /cli-upload/. It will produce following output with accessKey, secretKey and sessionToken that you will need to configure AWS CLI to access this folder.
Copy the awsTempCredentials.accessKey, awsTempCredentials.secretKey and awsTempCredentials.sessionToken to build the credentials file: ~/.aws/credentials. It should look something like
The temporary credentials expire in 36 hours. If the temporary credentials expire before the copy is complete, you can use AWS sync command to resume from where it left off.
Following are a few AWS commands to demonstrate the use. The remote path in the commands below are constructed off of the output of temporarycredentials command in this format: s3://<awsTempCredentials.bucket>/<awsTempCredentials.objectPrefix>
You can also write scripts to monitor the progress of your copy operation and regenerate and refresh the temporary credentials before they expire.
You may also use Rclone for data transfer if you prefer. The steps to generate temporary credentials is the same as above. You can run rclone config to set keys and tokens to configure rclone with the temporary credentials. You will need to select the advanced edit option when asked to enter the session key. After completing the config, your configure file (~/.config/rclone/rclone.conf) should look like this:
{
"one": {
"a": "1",
"b": "1"
},
"three": {
"a": "3",
"b": "3",
"c": "3"
}
}{"A":1,"B":1}
{"A":3,"B":3,"C":3}1
1
1
2
3
3
3
nextflow.enable.dsl = 2
process INDEX {
input:
path transcriptome_file
output:
path 'salmon_index'
script:
"""
salmon index -t $transcriptome_file -i salmon_index
"""
}
process QUANTIFICATION {
publishDir 'out', mode: 'symlink'
input:
path salmon_index
tuple path(read1), path(read2)
val(quant)
output:
path "$quant"
script:
"""
salmon quant --libType=U -i $salmon_index -1 $read1 -2 $read2 -o $quant
"""
}
process FASTQC {
input:
tuple path(read1), path(read2)
output:
path "fastqc_logs"
script:
"""
mkdir fastqc_logs
fastqc -o fastqc_logs -f fastq -q ${read1} ${read2}
"""
}
process MULTIQC {
publishDir 'out', mode:'symlink'
input:
path '*'
output:
path 'multiqc_report.html'
script:
"""
multiqc .
"""
}
workflow {
index_ch = INDEX(Channel.fromPath(params.transcriptome_file))
quant_ch = QUANTIFICATION(index_ch, Channel.of([file(params.read1), file(params.read2)]),Channel.of("quant"))
fastqc_ch = FASTQC(Channel.of([file(params.read1), file(params.read2)]))
MULTIQC(quant_ch.mix(fastqc_ch).collect())
}docker save nextflow/rnaseq-nf > cont_rnaseq.tar# Enter the project context
icav2 enter docs
# Upload the container image to the root directory (/) of the project
icav2 projectdata upload cont_rnaseq.tar /process.container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/3cddfc3d-2431-4a85-82bb-dae061f7b65d:latest'process {
container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/3cddfc3d-2431-4a85-82bb-dae061f7b65d:latest'
pod = [
annotation: 'scheduler.illumina.com/presetSize',
value: 'standard-small'
]
}<pipeline code="" version="1.0" xmlns="xsd://www.illumina.com/ica/cp/pipelinedefinition">
<dataInputs>
</dataInputs>
<steps>
</steps>
</pipeline><?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
<pd:dataInputs>
<pd:dataInput code="read1" format="FASTQ" type="FILE" required="true" multiValue="false">
<pd:label>FASTQ Read 1</pd:label>
<pd:description>FASTQ Read 1</pd:description>
</pd:dataInput>
<pd:dataInput code="read2" format="FASTQ" type="FILE" required="true" multiValue="false">
<pd:label>FASTQ Read 2</pd:label>
<pd:description>FASTQ Read 2</pd:description>
</pd:dataInput>
<pd:dataInput code="transcriptome_file" format="FASTA" type="FILE" required="true" multiValue="false">
<pd:label>Transcript</pd:label>
<pd:description>Transcript faster</pd:description>
</pd:dataInput>
</pd:dataInputs>
<pd:steps/>
</pd:pipeline>icav2 projectpipelines create nextflow rnaseq-docs --main main.nf --parameter parameters.xml --config nextflow.config --storage-size small --description 'cli nextflow pipeline'icav2 projectpipelines create nextflow rnaseq-docs --main main.nf --parameter parameters.xml --config nextflow.config --other index.nf:filename=processes/index.nf --other quantification.nf:filename=processes/quantification.nf --other fastqc.nf:filename=processes/fastqc.nf --other multiqc.nf:filename=processes/multiqc.nf --storage-size small --description 'cli nextflow pipeline'icav2 projectpipelines start nextflow <pipeline_id> --input read1:<read1_file_id> --input read2:<read2_file_id> --input transcriptome_file:<transcriptome_file_id> --storage-size small --user-reference demo_runicav2 projectpipelines listicav2 projectdata list> icav2 projects list #note the project-name/id.
> icav2 projects enter <project-name/id> # set the project context
> icav2 projectdata upload <localFileFolder> <remote-path> # upload localFileFolder to remote-path
#Example:
> icav2 projects enter demo
> icav2 projectdata upload localFolder /uploads/> icav2 projectdata list # note the data-id
> icav2 projectdata download <data-id> # download the data. {
"name": "string",
"folderId": "string",
"folderPath": "string",
"formatCode": "string",
"dataType": "FILE"
} {
"url": "string"
} curl -X 'POST' \
'https://ica.illumina.com/ica/rest/api/projects/41d3643a-5fd2-4ae3-b7cf-b89b892228be/data' \
-H 'accept: application/vnd.illumina.v3+json' \
-H 'X-API-Key: XXXXXXXXXXXXXXXX' \
-H 'Content-Type: application/vnd.illumina.v3+json' \
-d '{
"name": "tempFile.txt",
"folderId": "fol.579eda846f1b4f6e2d1e08db91408069",
"dataType": "FILE"
}'{
"data": {
"id": "fil.b13c782a67e24d364e0f08db9f537987",
"urn": "string",
"details": {
"timeCreated": "2023-08-22T19:27:31.286Z",
"timeModified": "2023-08-22T19:27:31.286Z",
"creatorId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"tenantId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"tenantName": "string",
"owningProjectId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"owningProjectName": "string",
"name": "string",
"path": "string",
"fileSizeInBytes": 0,
"status": "PARTIAL",
"tags": {
"technicalTags": [
"string"
],
"userTags": [
"string"
],
"connectorTags": [
"string"
],
"runInTags": [
"string"
],
"runOutTags": [
"string"
],
"referenceTags": [
"string"
]
},
"format": {
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"timeCreated": "2023-08-22T19:27:31.286Z",
"timeModified": "2023-08-22T19:27:31.286Z",
"ownerId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"tenantId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"tenantName": "string",
"code": "string",
"description": "string",
"mimeType": "string"
},
"dataType": "FILE",
"objectETag": "string",
"storedForTheFirstTimeAt": "2023-08-22T19:27:31.286Z",
"region": {
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"timeCreated": "2023-08-22T19:27:31.286Z",
"timeModified": "2023-08-22T19:27:31.286Z",
"ownerId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"tenantId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"tenantName": "string",
"code": "string",
"country": {
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"timeCreated": "2023-08-22T19:27:31.286Z",
"timeModified": "2023-08-22T19:27:31.286Z",
"ownerId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"tenantId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"tenantName": "string",
"code": "string",
"name": "string",
"region": "string"
},
"cityName": "string"
},
"willBeArchivedAt": "2023-08-22T19:27:31.286Z",
"willBeDeletedAt": "2023-08-22T19:27:31.286Z",
"sequencingRun": {
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"instrumentRunId": "string",
"name": "string"
}
}
},
"projectId": "3fa85f64-5717-4562-b3fc-2c963f66afa6"
}curl -X 'POST' \
'https://ica.illumina.com/ica/rest/api/projects/41d3643a-5fd2-4ae3-b7cf-b89b892228be/data/fil.b13c782a67e24d364e0f08db9f537987:createUploadUrl' \
-H 'accept: application/vnd.illumina.v3+json' \
-H 'X-API-Key: XXXXXXXXXX' \
-d ''{
"url": "string"
}curl --upload-file tempFile.txt "url"> icav2 projectdata temporarycredentials --help
This command fetches temporal AWS and Rclone credentials for a given project-data. If path is given, project id from the flag --project-id is used. If flag not present project is taken from the context
Usage:
icav2 projectdata temporarycredentials [path or data Id] [flags]
Flags:
-h, --help help for temporarycredentials
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service> icav2 projectdata temporarycredentials /cli-upload/
awsTempCredentials.accessKey XXXXXXXXXX
awsTempCredentials.bucket stratus-gds-use1
awsTempCredentials.objectPrefix XXXXXX/cli-upload/
awsTempCredentials.region us-east-1
awsTempCredentials.secretKey XXXXXXXX
awsTempCredentials.serverSideEncryptionAlgorithm AES256
awsTempCredentials.sessionToken XXXXXXXXXXXXXXXX[profile]
aws_access_key_id=AKIAIOSFODNN7EXAMPLE
aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
aws_session_token = IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZVERYLONGSTRINGEXAMPLE#Copy single file to ICA
> aws s3 cp cp1 s3://stratus-gds-use1/53395234-6b20-4fb1-3587-08db9144d245/cli-upload/
#Sync local folder to ICA
> aws s3 sync cli-upload s3://stratus-gds-use1/53395234-6b20-4fb1-3587-08db9144d245/cli-upload[s3-config]
type = s3
provider = AWS
env_auth = false
access_key_id = XXXXXXXXXX
secret_access_key = XXXXXXX
region = us-east-1
acl = private
session_token = XXXXXXXX#Copy single file to ICA
> rclone copy file.txt s3-config:stratus-gds-use1/53395234-6b20-4fb1-3587-08db9144d245/cli-upload/
#Sync local folder to ICA
> rclone sync cli-upload s3-config:stratus-gds-use1/53395234-6b20-4fb1-3587-08db9144d245/cli-upload/ mkdir demo_gzip
cd demo_gzip
echo test > test_input.txtmkdir nextflow-src
# Create nextflow-src/main.nf using contents below
vi nextflow-src/main.nfnextflow.enable.dsl=2
process COMPRESS {
input:
path input_file
val compression_level
output:
path "${input_file.simpleName}.gz" // .simpleName keeps just the filename
publishDir 'out', mode: 'symlink'
script:
"""
gzip -c -${compression_level} ${input_file} > ${input_file.simpleName}.gz
"""
}
workflow {
input_path = file(params.input_file)
gzip_out = COMPRESS(input_path, params.compression_level)
}nextflow run nextflow-src/ --input_file test_input.txt --compression_level 5process.container = 'ubuntu:latest'nextflow run nextflow-src/ --input_file test_input.txt --compression_level 5 -with-dockerprocess.container = 'ubuntu:latest'
profiles {
test {
params {
input_file = 'test_input.txt'
compression_level = 5
}
}
}nextflow run nextflow-src/ -profile test -with-dockerprocess.container = 'ubuntu:latest'
profiles {
test {
params {
input_file = 'test_input.txt'
compression_level = 5
}
}
docker {
docker.enabled = true
}
}nextflow run nextflow-src/ -profile test,dockerpipeline-dev run-in-bench{
"$defs": {
"input_output_options": {
"title": "Input/output options",
"properties": {
"input_file": {
"description": "Input file to compress",
"help_text": "The file that will get compressed",
"type": "string",
"format": "file-path"
},
"compression_level": {
"type": "integer",
"description": "Compression level to use (1-9)",
"default": 5,
"minimum": 1,
"maximum": 9
}
}
}
}
}$ pipeline-dev project-info --init
pipeline-dev.project-info not found. Let's create it with 2 questions:
Please enter your project name: demo_gzip
Please enter a project description: Bench gzip demopipeline-dev deploy-as-flow-pipelinepipeline-dev launch-validation-in-flow/data/demo $ pipeline-dev launch-validation-in-flow
pipelineld: 331f209d-2a72-48cd-aa69-070142f57f73
Getting Analysis Storage Id
Launching as ICA Flow Analysis...
ICA Analysis created:
- Name: Test demo_gzip
- Id: 17106efc-7884-4121-a66d-b551a782b620
- Url: https://stage.v2.stratus.illumina.com/ica/projects/1873043/analyses/17106efc-7884-4121-a66d-b551a782620



$ pipeline-dev import-from-nextflow <repo name e.g. nf-core/demo>$ pipeline-dev import-from-flow [--analysis-id=…] $ pipeline-dev run-in-bench [--local|--sge] $ pipeline-dev deploy-as-flow-pipeline [--create|--update] $ pipeline-dev launch-validation-in-flow 



#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
requirements:
- class: InlineJavascriptRequirement
label: fastp
doc: Modified from https://github.com/nigyta/bact_genome/blob/master/cwl/tool/fastp/fastp.cwl
inputs:
fastq1:
type: File
inputBinding:
prefix: -i
fastq2:
type:
- File
- 'null'
inputBinding:
prefix: -I
threads:
type:
- int
- 'null'
default: 1
inputBinding:
prefix: --thread
qualified_phred_quality:
type:
- int
- 'null'
default: 20
inputBinding:
prefix: --qualified_quality_phred
unqualified_phred_quality:
type:
- int
- 'null'
default: 20
inputBinding:
prefix: --unqualified_percent_limit
min_length_required:
type:
- int
- 'null'
default: 50
inputBinding:
prefix: --length_required
force_polyg_tail_trimming:
type:
- boolean
- 'null'
inputBinding:
prefix: --trim_poly_g
disable_trim_poly_g:
type:
- boolean
- 'null'
default: true
inputBinding:
prefix: --disable_trim_poly_g
base_correction:
type:
- boolean
- 'null'
default: true
inputBinding:
prefix: --correction
outputs:
out_fastq1:
type: File
outputBinding:
glob:
- $(inputs.fastq1.nameroot).fastp.fastq
out_fastq2:
type:
- File
- 'null'
outputBinding:
glob:
- $(inputs.fastq2.nameroot).fastp.fastq
html_report:
type: File
outputBinding:
glob:
- fastp.html
json_report:
type: File
outputBinding:
glob:
- fastp.json
arguments:
- prefix: -o
valueFrom: $(inputs.fastq1.nameroot).fastp.fastq
- |
${
if (inputs.fastq2){
return '-O';
} else {
return '';
}
}
- |
${
if (inputs.fastq2){
return inputs.fastq2.nameroot + ".fastp.fastq";
} else {
return '';
}
}
baseCommand:
- fastp#!/usr/bin/env cwl-runner
cwlVersion: cwl:v1.0
class: CommandLineTool
label: MultiQC
doc: MultiQC is a tool to create a single report with interactive plots for multiple
bioinformatics analyses across many samples.
inputs:
files:
type:
- type: array
items: File
- 'null'
doc: Files containing the result of quality analysis.
inputBinding:
position: 2
directories:
type:
- type: array
items: Directory
- 'null'
doc: Directories containing the result of quality analysis.
inputBinding:
position: 3
report_name:
type: string
doc: Name of output report, without path but with full file name (e.g. report.html).
default: multiqc_report.html
inputBinding:
position: 1
prefix: -n
outputs:
report:
type: File
outputBinding:
glob:
- '*.html'
baseCommand:
- multiqc#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
requirements:
- class: InlineJavascriptRequirement
- class: InitialWorkDirRequirement
listing:
- entry: "import argparse\nimport os\nimport json\n\n# Create argument parser\n\
parser = argparse.ArgumentParser()\nparser.add_argument(\"-i\", \"--inputFiles\"\
, type=str, required=True, help=\"Input files\")\n\n# Parse the arguments\n\
args = parser.parse_args()\n\n# Split the inputFiles string into a list of file\
\ paths\ninput_files = args.inputFiles.split(',')\n\n# Sort the input files\
\ by the base filename\ninput_files = sorted(input_files, key=lambda x: os.path.basename(x))\n\
\n\n# Separate the files into left and right arrays, preserving the order\n\
left_files = [file for file in input_files if '_R1_' in os.path.basename(file)]\n\
right_files = [file for file in input_files if '_R2_' in os.path.basename(file)]\n\
\n# Print the left files for debugging\nprint(\"Left files:\", left_files)\n\
\n# Print the left files for debugging\nprint(\"Right files:\", right_files)\n\
\n# Ensure left and right files are matched\nassert len(left_files) == len(right_files),\
\ \"Mismatch in number of left and right files\"\n\n \n# Write the left files\
\ to a JSON file\nwith open('left_files.json', 'w') as outfile:\n left_files_objects\
\ = [{\"class\": \"File\", \"path\": file} for file in left_files]\n json.dump(left_files_objects,\
\ outfile)\n\n# Write the right files to a JSON file\nwith open('right_files.json',\
\ 'w') as outfile:\n right_files_objects = [{\"class\": \"File\", \"path\"\
: file} for file in right_files]\n json.dump(right_files_objects, outfile)\n\
\n"
entryname: spread_script.py
writable: false
label: spread_items
inputs:
inputFiles:
type:
type: array
items: File
inputBinding:
separate: false
prefix: -i
itemSeparator: ','
outputs:
leftFiles:
type:
type: array
items: File
outputBinding:
glob:
- left_files.json
loadContents: true
outputEval: $(JSON.parse(self[0].contents))
rightFiles:
type:
type: array
items: File
outputBinding:
glob:
- right_files.json
loadContents: true
outputEval: $(JSON.parse(self[0].contents))
baseCommand:
- python3
- spread_script.py


























This is an unofficial developer tool to help develop Nextflow pipelines that will run successfully on ICA. There are some syntax bugs that may get introduced in your Nextflow code. One suggestion is to run the steps as described below and then open these files in VisualStudio Code with the Nextflow plugin installed. You may also need to run smoke tests on your code to identify syntax errors you might not catch upon first glance.
This is not an official Illumina product, but is intended to make your Nextflow experience in ICA more fruitful.
Some examples of Nextflow pipelines that have been lifted over with this repo can be found here.
Some additional examples of ICA-ported Nextflow pipelines are here.
Some additional repos that can help with your ICA experience can be found below:
Monitor your analysis run in ICA and troubleshoot here
Wrap a WDL-based process in a CWL wrapper
Wrap a Nextflow-based process in a CWL wrapper
This naive wrapper will allow you to test your main.nf script. If you have a Nextflow pipeline that is more nf-core like (i.e. where you may have several subprocesses and module files), this script may be more appropriate. Any and all comments are welcome.
Parse configuration files and the Nextflow scripts (main.nf, processes, subprocesses, modules) of a pipeline and update the configuration of the pipeline with pod directives to tell ICA what compute instance to run
Strips out parameters that ICA utilizes for pipeline orchestration
Migrates manifest closure to conf/base.ica.config file
Ensures that docker is enabled
Adds workflow.onError to aid troubleshooting
Modifies the processes that reference scripts and tools in the bin/ folder of a pipeline's projectDir, so that when ICA orchestrates your Nextflow pipeline, it can find and properly execute your pipeline process
Generates parameter XML file based on nextflow_schema.json, nextflow.config, conf/ `- Take a look at this to understand a bit more of what's done with the XML, as you may want to make further edits to this file for better usability
Additional edits to ensure your pipeline runs more smoothly on ICA
Nextflow pipelines on ICA are orchestrated by kubernetes and require a parameters XML file containing data inputs (i.e. files + folders) and other string-based options for all configurable parameters to properly be passed from ICA to your Nextflow pipelines
Nextflow processes will need to contain a reference to a container --- a Docker image that will run that specific process
Nextflow processes will need a pod annotation specified for ICA to know what instance type to run the process.
A table of instance types and the associated CPU + Memory specs can be found here under a table named Compute Types
These scripts have been made to be compatible with nf-core pipelines, so you may find the concepts from the documentation here a better starting point.
The scripts mentioned below can be run in a docker image keng404/nextflow-to-icav2-config:0.0.3
This has:
nf-core installed
All Rscripts in this repo with relevant R libraries installed
The ICA CLI installed, to allow for pipeline creation and CLI templates to request pipeline runs after the pipeline is created in ICA
You'll likely need to run the image with a docker command like this for you to be able to run git commands within the container:
{% code overflow="wrap" %} `
``bash docker run -itv pwd:pwd -e HOME=pwd -u $(id -u):$(id -g) keng404/nextflow-to-icav2-config:0.0.3 /bin/bash
</div>
where `pwd` is your `$HOME` folder.
## Prerequitsites
### STEP 0 Github credentials
### STEP 1 \[OPTIONAL] : create JSON of nf-core pipeline metadata or specify pipeline of interest
If you have a specific pipeline from Github, you can skip this statement below.
You'll first need to download the python module from nf-core via a `pip install nf-core` command. Then you can use nf-core list --json to return a JSON metadata file containing current pipelines in the nf-core repository.
You can choose which pipelines to `git clone`, but as a convenience, the wrapper `nf-core.conversion_wrapper.R` will perform a git pull, parse nextflow\_schema.json files and generate parameter XML files, and then read configuration and Nextflow scripts and make some initial modifications for ICA development. Lastly, these pipelines are created in an ICA project of your choosing, so you will need to generate and download an API key from the ICA domain of your choosing.
### STEP 2: Obtain API key file
Next, you'll need an API key file for ICA that can be generated using the instructions [here](https://help.ica.illumina.com/account-management/am-iam#api-keys).
### STEP 3: Create a project in ICA
Finally, you'll need to create a project in ICA. You can do this via the CLI and API, but you should be able to follow these [instructions](https://help.ica.illumina.com/home/h-projects#create-new-project) to create a project via the ICA GUI.
### STEP 4: Download and configure the ICA CLI (see STEP 2):
Install ICA CLI by following these [installation instructions](https://help.ica.illumina.com/command-line-interface/cli-installation).
A table of all CLI releases for mac, linux, and windows can be found [here](https://help.ica.illumina.com/command-line-interface/cli-releasehistory).
The Project view should be the default view after logging into your private domain (https://my\_domain.login.illumina.com) and clicking on your ICA 'card' ( This will redirect you to https://illumina.ica.com/ica).
### Let's do some liftovers
<div data-gb-custom-block data-tag="code" data-overflow='wrap'>
```bash
Rscript nf-core.conversion_wrapper.R --input {PIPELINE_JSON_FILE} --staging_directory {DIRECTORY_WHERE_NF_CORE_PIPELINES_ARE_LOCATED} --run-scripts {DIRECTORY_WHERE_THESE_R_SCRIPTS_ARE_LOCATED} --intermediate-copy-template {DIRECTORY_WHERE_THESE_R_SCRIPTS_ARE_LOCATED}/dummy_template.txt --create-pipeline-in-ica --api-key-file {API_KEY_FILE} --ica-project-name {ICA_PROJECT_NAME} --nf-core-mode
[OPTIONAL PARAMETER]
--git-repos {GIT_HUB_URL}
--pipeline-dirs {LOCAL_DIRECTORY_WITH_NEXTFLOW_PIPELINE}GIT_HUB_URL can be specified to grab pipeline code from github. If you intend to liftover anything in the master branch, your GIT_HUB_URL might look like https://github.com/keng404/my_pipeline. If there is a specific release tag you intend to use, you can use the convention https://github.com/keng404/my_pipeline:my_tag.
Alternatively, if you have a local copy/version of a Nextflow pipeline you'd like to convert and use in ICA, you can use the --pipeline-dirs argument to specify this.
In summary, you will need the following prerequisites, either to run the wrapper referenced above or to carry out individual steps below.
git clone nf-core pipelines of interest
Install the python module nf-core and create a JSON file using the command line nf-core list --json > {PIPELINE_JSON_FILE}
nf-core.conversion_wrapper.R does for each Nextflow pipelineRscript create_xml/nf-core.json_to_params_xml.R --json {PATH_TO_SCHEMA_JSON}A Nextflow schema JSON is generated by nf-core's python library nf-core
nf-core can be installed via a pip install nf-core command
nf-core schema build -d {PATH_NF-CORE_DIR}nextflow.config and a base config file so that it is compatible with ICA.Rscript ica_nextflow_config.test.R --config-file {DEFAULT_NF_CONFIG} [OPTIONAL: --base-config-files {BASE_CONFIG}] [--is-simple-config]This script will update your configuration files so that it integrates better with ICA. The flag --is-simple-config will create a base config file from a template. This flag will also be active if no arguments are supplied to --base-config-files.
Rscript develop_mode.downstream.R --config-file {DEFAULT_NF_CONFIG} --nf-script {MAIN_NF_SCRIPT} --other-workflow-scripts {OTHER_NF_SCRIPT1 } --other-workflow-scripts {OTHER_NF_SCRIPT2} ... --other-workflow-scripts {OTHER_NF_SCRIPT_N}This step adds some updates to your module scripts to allow for easier troubleshooting (i.e. copy work folder back to ICA if an analysis fails). It also allows for ICA's orchestration of your Nextflow pipeline to properly handle any script/binary in your bin/ folder of your pipeline $projectDir.
Rscript update_xml_based_on_additional_configs.R --config-file {DEFAULT_NF_CONFIG} --parameters-xml {PARAMETERS_XML}You may have to edit your {PARAMETERS_XML} file if these edits are unnecessary.
Rscript testing_pipelines/test_nextflow_script.R --nextflow-script {MAIN_NF_SCRIPT} --docker-image nextflow/nextflow:22.04.3 --nextflow-config {DEFAULT_NF_CONFIG}Currently ICA supports Nextflow versions nextflow/nextflow:22.04.3 and nextflow/nextflow:20.10.0 (with 20.10.0 to be deprecated soon)
nf-core.create_ica_pipeline.RRscript nf-core.create_ica_pipeline.R --nextflow-script {NF_SCRIPT} --workflow-language nextflow --parameters-xml {PARAMETERS_XML} --nf-core-mode --ica-project-name {NAME} --pipeline-name {NAME} --api-key-file {PATH_TO_API_KEY_FILE}Add the flag --developer-mode to the command line above if you have custom groovy libraries or modules files referenced in your pipeline. When this flag is specified, the script will upload these files and directories to ICA and update the parameters XML file to allow you to specify directories under the parameters project_dir and files under input_files. This will ensure that these files and directories will be placed in the $workflow.launchDir when the pipeline is invoked.
As a convenience, you can also get a templated CLI command to help run a pipeline (i.e. submit a pipeline request) in ICA via the following:
Rscript create_cli_templates_from_xml.R --workflow-language {xml or nextflow} --parameters-xml {PATH_TO_PARAMETERS_XML}There will be a corrsponding JSON file (i.e. a file with a file extension *ICAv2_CLI_template.json) that saves these values that one could modify and configure to build out templates or launch the specific pipeline run you desire. You can specify the name of this JSON file with the parameter --output-json.
Once you modify this file, you can use --template-json and specify this file to create the CLI you can use to launch your pipeline.
If you have a previously successful analysis with your pipeline, you may find this approach more useful.
Where possible, these scripts search for config files that refer to a test (i.e. test.config,test_full.config,test*config) and creates a boolean parameter params.ica_smoke_test that can be toggled on/off as a sanity check that the pipeline works as intended. By default, this parameter is set to false.
When set to true, these test config files are loaded in your main nextflow.config.
Prerequisite - Launch a CWL or Nextflow pipeline to completion using the ICA CLI with the intended set of parameters.
Configure and Authenticate ICA command line interface (CLI).
Obtain a list of your projects with their associated IDs:
icav2 projects listID NAME OWNER
a5690b16-a739-4bd7-a62a-dc4dc5c5de6c Project1 670fd8ea-2ddb-377d-bd8b-587e7781f2b5
ccb0667b-5949-489a-8902-692ef2f31827 Project2 f1aa8430-7058-4f6c-a726-b75ddf6252eb
No of items : 2Use the ID of the project from the previous step to enter the project context:
icav2 projects enter a5690b16-a739-4bd7-a62a-dc4dc5c5de6cFind the pipeline you want to start from the CLI by obtaining a list of pipelines associated with your project:
icav2 projectpipelines listID CODE DESCRIPTION
fbd6f3c3-cb70-4b35-8f57-372dce2aaf98 DRAGEN Somatic 3.9.5 The DRAGEN Somatic tool identifies somatic variants
b4dc6b91-5283-41f6-8095-62a5320ed092 DRAGEN Somatic Enrichment 3-10-4 The DRAGEN Somatic Enrichment pipeline identifies somatic variants which can exist at low allele frequencies in the tumor sample.
No of items : 2Find the ID associated with your pipeline of interest.
To find the input files parameter, you can use a previously launched projectanalyses with the input command.
Find the previous analyses launched along with their associated IDs:
icav2 projectanalyses listID REFERENCE CODE STATUS
3539d676-ae99-4e5f-b7e4-0835f207e425 kyle-test-somatic-2-DRAGEN Somatic 3_9_5 DRAGEN Somatic 3.9.5 SUCCEEDED
f11e248e-9944-4cde-9061-c41e70172f20 kyle-test-somatic-1-DRAGEN Somatic 3_9_5 DRAGEN Somatic 3.9.5 FAILED
No of items : 2List the analyses inputs by using the ID found in the previous step:
icav2 projectanalyses input 3539d676-ae99-4e5f-b7e4-0835f207e425CODE NAMES DATA ID
BED
CNV_B_Allele_VCF
CNV_Population_B_Allele_VCF
HLA_Allele_Frequency_File
HLA_BED
HLA_reference_file_(protein_FASTA)
Microsatellites_File
Normal_BAM_File
Normal_FASTQ_Files
Panel_of_Normals
Panel_of_Normals_TAR
Reference hg38_altaware_nohla-cnv-anchored.v8.tar fil.35e27101fdec404fb37d08d9adf63307
Systematic_Noise_BED
Tumor_BAM_File
Tumor_FASTQ_Files HCC1187C_S1_L001_R1_001.fastq.gz,HCC1187C_S1_L001_R2_001.fastq.gz fil.e1ec77f2647f45804fe508d9aecb19c4,fil.d89018f0c7784fc4b76708d9adf63307
This will return the Input File Codes, as well as the file names and data IDs of the associated data used to previously launch the pipeline
You need to use the ICA API to access the configuration settings of a project analyses that ran successfully.
Generate JWT Token from API Key or Basic login credentials
Instructions on how to get an API Key https://illumina.gitbook.io/ica/account-management/am-iam#api-keys
If your user has access to multiple domains, you will need to need to add a "?tenant=($domain)" to the request
curl -X 'POST' \
'https://ica.illumina.com/ica/rest/api/tokens' \
-H 'accept: application/vnd.illumina.v3+json' \
-H 'X-API-Key: <YOUR_APIKEY>' \
-d ''echo -ne '[email protected]:testpassword' | base64
<BASE64UN+PW>
curl -X 'POST' \
'https://ica.illumina.com/ica/rest/api/tokens' \
-H 'accept: application/vnd.illumina.v3+json' \
-H 'Authorization: Basic <BASE64UN+PW>' \
-d ''Response to this request will provide a JWT token {"token":($token)}, use the value of the token in further requests
Using the API endpoint /api/projects/{projectID}/analyses/{analysisId}/configurations to find the configuration file listing out all of required and optional parameters
curl -X 'GET' \
'https://ica.illumina.com/ica/rest/api/projects/e501a0d5-f5e7-458c-a590-586c79bb87e0/analyses/3539d676-ae99-4e5f-b7e4-0835f207e425/configurations' \
-H 'accept: application/vnd.illumina.v3+json' \
-H 'Authorization: Bearer <Token>' \
-H 'X-API-Key: <APIKEY>'The response JSON to this API will have configuration items listed as
{
"items": [{
"name": "DRAGEN_Somatic__enable_variant_caller",
"multiValue": false,
"values": [
"true"
]
}]
}Structure of the final command
icav2 projectpipelines start cwl $(pipelineID) --user-referenc Plus input options
Input Options - For CLI, the entire input can be broken down as individual command line arguments
To launch the same analysis as using the GUI, use the same file ID and parameters, if using new data you can use the CLI command icav2 projectdata list to find new file IDs to launch a new instance of the pipeline Required information in Input - Input Data and Parameters
This option requires the use of --type input STRUCTURED along with --input and --parameters
The input parameter names such as Reference and Tumor_FASTQ_Files in the example below are from the pipeline definition where you can give the parameters a name. You can see which of these were used when the pipeline originally ran, in the Identify Input File Parameters section above. You can also look at the pipeline definitions for the input parameters, for example the code value of these XML inputs.
icav2 projectpipelines start cwl fbd6f3c3-cb70-4b35-8f57-372dce2aaf98 \
--user-reference kyle-test-somatic-9 \
--storage-size small \
--type-input STRUCTURED \
--input Reference:fil.35e27101fdec404fb37d08d9adf63307 \
--input Tumor_FASTQ_Files:fil.e1ec77f2647f45804fe508d9aecb19c4,fil.d89018f0c7784fc4b76708d9adf63307 \
--parameters DRAGEN_Somatic__enable_variant_caller:true \
--parameters DRAGEN_Somatic__enable_hrd:false \
--parameters DRAGEN_Somatic__enable_sv:true \
--parameters DRAGEN_Somatic__output_file_prefix:tumor \
--parameters DRAGEN_Somatic__enable_map_align:true \
--parameters DRAGEN_Somatic__cnv_use_somatic_vc_baf:false \
--parameters DRAGEN_Somatic__enable_cnv:false \
--parameters DRAGEN_Somatic__output_format:BAM \
--parameters DRAGEN_Somatic__vc_emit_ref_confidence:BP_RESOLUTION \
--parameters DRAGEN_Somatic__enable_hla:false \
--parameters DRAGEN_Somatic__enable_map_align_output:trueanalysisStorage.description 1.2 TB
analysisStorage.id 6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
analysisStorage.name Small
analysisStorage.ownerId 8ec463f6-1acb-341b-b321-043c39d8716a
analysisStorage.tenantId f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
analysisStorage.tenantName ica-cp-admin
analysisStorage.timeCreated 2021-11-05T10:28:20Z
analysisStorage.timeModified 2021-11-05T10:28:20Z
id 51abe34a-2506-4ab5-adef-22df621d95d5
ownerId 47793c21-75a6-3aa8-8147-81b354d0af4d
pipeline.analysisStorage.description 1.2 TB
pipeline.analysisStorage.id 6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
pipeline.analysisStorage.name Small
pipeline.analysisStorage.ownerId 8ec463f6-1acb-341b-b321-043c39d8716a
pipeline.analysisStorage.tenantId f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
pipeline.analysisStorage.tenantName ica-cp-admin
pipeline.analysisStorage.timeCreated 2021-11-05T10:28:20Z
pipeline.analysisStorage.timeModified 2021-11-05T10:28:20Z
pipeline.code DRAGEN Somatic 3.9.5
pipeline.description The DRAGEN Somatic tool identifies somatic variants which can exist at low allele frequencies in the tumor sample. The pipeline can analyze tumor/normal pairs and tumor-only sequencing data. The normal sample, if present, is used to avoid calls at sites with germline variants or systematic sequencing artifacts. Unlike germline analysis, the somatic platform makes no ploidy assumptions about the tumor sample, allowing sensitive detection of low-frequency alleles.
pipeline.id fbd6f3c3-cb70-4b35-8f57-372dce2aaf98
pipeline.language CWL
pipeline.ownerId e9dd2ff5-c9ba-3293-857e-6546c5503d76
pipeline.tenantId 55cb0a54-efab-4584-85da-dc6a0197d4c4
pipeline.tenantName ilmn-dragen
pipeline.timeCreated 2021-11-23T22:55:49Z
pipeline.timeModified 2021-12-09T16:42:14Z
reference kyle-test-somatic-9-DRAGEN Somatic 3_9_5-bc56d4b1-f90e-4039-b3a4-b11d29263e4e
status REQUESTED
summary
tenantId b5b750a6-49d4-49de-9f18-75f4f6a81112
tenantName ilmn-cci
timeCreated 2022-03-16T22:48:31Z
timeModified 2022-03-16T22:48:31Z
userReference kyle-test-somatic-9400 Bad Request : ICA_API_004 : com.fasterxml.jackson.databind.exc.InvalidFormatException: Cannot deserialize value of type `java.util.UUID` from String "8f57-372dce2aaf98": UUID has to be represented by standard 36-char representation
at [Source: (io.undertow.servlet.spec.ServletInputStreamImpl); line: 1, column: 983] (through reference chain: com.bluebee.rest.v3.publicapi.dto.analysis.SearchMatchingActivationCodesForCwlAnalysisDto["pipelineId"]) (ref. c9cd9090-4ddb-482a-91b5-8471bff0be58)Check that the pipeline ID is correct based on icav2 projectpipelines list
404 Not Found : ICA_GNRC_001 : Could not find data with ID [fil.35dec404fb37d08d9adf63307] (ref. 91b70c3c-378c-4de2-acc9-794bf18258ec)Check that the file ID is correct based on icav2 projectdata list
400 Bad Request : ICA_EXEC_007 : The specified variableName [DRAGEN] does not exist. Make sure to use an existing variableName (ref. ab296d4e-9060-412c-a4c9-562c63450022)When using nextflow to start runs, the input-type parameter is not used, but the --project-id is required
Structure of the file command icav2 projectpipelines start nextflow $(pipelineID) --user-reference Plus input options
icav2 projectpipelines start nextflow b4dc6b91-5283-41f6-8095-62a5320ed092 \
--user-reference "somatic-3-10-test5" \
--project-id e501a0d5-f5e7-458c-a590-586c79bb87e0 \
--storage-size Small \
--input ref_tar:fil.35e27101fdec404fb37d08d9adf63307 \
--input tumor_fastqs:fil.e1ec77f2647f45804fe508d9aecb19c4,fil.d89018f0c7784fc4b76708d9adf63307 \
--parameters enable_map_align:true \
--parameters enable_map_align_output:true \
--parameters output_format:BAM \
--parameters enable_variant_caller:true \
--parameters vc_emit_ref_confidence:BP_RESOLUTION \
--parameters enable_cnv:false \
--parameters enable_sv:true \
--parameters repeat_genotype_enable:true \
--parameters enable_hla:false \
--parameters enable_variant_annotation:false \
--parameters output_file_prefix:TumorThe Response status can be used to determine if the pipeline was submitted successfully.
In this tutorial, we will demonstrate how to create and launch a pipeline using the CWL language using the ICA command line interface (CLI).
Please refer to these instructions for installing ICA CLI.
In this project, we will create two simple tools and build a pipeline that we can run on ICA using CLI. The first tool (tool-fqTOfa.cwl) will convert a FASTQ file to a FASTA file. The second tool(tool-countLines.cwl) will count the number of lines in an input FASTA file. The workflow.cwl will combine the two tools to convert an input FASTQ file to a FASTA file and count the number of lines in the resulting FASTA file.
Following are the two CWL tools and scripts we will use in the project. If you are new to CWL, please refer to the cwl user guide for a better understanding of CWL codes. You will also need the cwltool installed to create these tools and processes. You can find installation instructions on the CWL github page.
#!/usr/bin/env cwltool
cwlVersion: v1.0
class: CommandLineTool
inputs:
inputFastq:
type: File
inputBinding:
position: 1
stdout: test.fasta
outputs:
outputFasta:
type: File
streamable: true
outputBinding:
glob: test.fasta
arguments:
- 'NR%4 == 1 {print ">" substr($0, 2)}NR%4 == 2 {print}'
baseCommand:
- awk#!/usr/bin/env cwltool
cwlVersion: v1.0
class: CommandLineTool
baseCommand: [wc, -l]
inputs:
inputFasta:
type: File
inputBinding:
position: 1
stdout: lineCount.tsv
outputs:
outputCount:
type: File
streamable: true
outputBinding:
glob: lineCount.tsvcwlVersion: v1.0
class: Workflow
inputs:
ipFQ: File
outputs:
count_out:
type: File
outputSource: count/outputCount
fqTOfaOut:
type: File
outputSource: convert/outputFasta
steps:
convert:
run: tool-fqTOfa.cwl
in:
inputFastq: ipFQ
out: [outputFasta]
count:
run: tool-countLines.cwl
in:
inputFasta: convert/outputFasta
out: [outputCount]we don't specify the Docker image used in both tools. In such a case, the default behaviour is to use public.ecr.aws/docker/library/bash:5 image. This image contains basic functionality (sufficient to execute wc and awk commands).
If you want to use a different public image, you can specify it using requirements tag in cwl file. Assuming you want to use *ubuntu:latest' you need to add
requirements:
- class: DockerRequirement
dockerPull: ubuntu:latestIf you want to use a Docker image from the ICA Docker repository, you need the link to AWS ECR from ICA GUI. Double-click on the image name in the Docker repository and copy the URL to the clipboard. Add the URL to dockerPull key.
requirements:
- class: DockerRequirement
dockerPull: 079623148045.dkr.ecr.eu-central-1.amazonaws.com/cp-prod/XXXXXXXXXX:latestTo add a custom or public docker image to the ICA repository, refer to the Docker Repository.
Before you can use ICA CLI, you need to authenticate using the Illumina API key. Follow these instructions to authenticate.
Either create a project or use an existing project to create a new pipeline. You can create a new project using the icav2 projects create command.
% icav2 projects create basic-cli-tutorial --region c39b1feb-3e94-4440-805e-45e0c76462bfIf you do not provide the --region flag, the value defaults to the existing region when there is only one region available. When there is more than one region available, a selection must be made from the available regions at the command prompt. The region input can be determined by calling the icav2 regions list command first.
You can select the project to work on by entering the project using the icav2 projects enter command. Thus, you won't need to specify the project as an argument.
% icav2 projects enter basic-cli-tutorialYou can also use the icav2 projects list command to determine the names and ids of the project you have access to.
% icav2 projects listprojectpipelines is the root command to perform actions on pipelines in a project. The create command creates a pipeline in the current project.
The parameter file specifies the input with additional parameter settings for each step in the pipeline. In this tutorial, input is a FASTQ file shown inside <dataInput> tag in the parameter file. There aren't any specific settings for the pipeline steps resulting in a parameter file below with an empty <steps> tag. Create a parameter file (parameters.xml) with the following content using a text editor.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
<pd:dataInputs>
<pd:dataInput code="ipFQ" format="FASTQ" type="FILE" required="true" multiValue="false">
<pd:label>ipFQ</pd:label>
<pd:description></pd:description>
</pd:dataInput>
</pd:dataInputs>
<pd:steps/>
</pd:pipeline>The following command creates a pipeline called "cli-tutorial" using the workflow.cwl, tools "tool-fqTOfa.cwl" and "tool-countLines.cwl" and parameter file "parameter.xml" with small storage size.
% icav2 projectpipelines create cwl cli-tutorial --workflow workflow.cwl --tool tool-fqTOfa.cwl --tool tool-countLines.cwl --parameter parameters.xml --storage-size small --description "cli tutorial pipeline"Once the pipeline is created, you can view it using the list command.
% icav2 projectpipelines list
ID CODE DESCRIPTION
6779fa3b-e2bc-42cb-8396-32acee8b6338 cli-tutorial cli tutorial pipeline Upload data to the project using the icav2 projectdata upload command. Refer to the Data page for advanced data upload features. For this test, we will use a small FASTQ file test.fastq containing the following reads.
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
@SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
AGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
+SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
IIIIIIIIIIIIIIIIIIIIIIGII>IIIII-I)8I
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
@SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
AGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
+SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
IIIIIIIIIIIIIIIIIIIIIIGII>IIIII-I)8I
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
@SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
AGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
+SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=36
IIIIIIIIIIIIIIIIIIIIIIGII>IIIII-I)8IThe "icav2 projectdata upload" command lets you upload data to ica.
% icav2 projectdata upload test.fastq /
oldFilename= test.fastq en newFilename= test.fastq
bucket= stratus-gds-use1 prefix= 0a488bb2-578b-404a-e09d-08d9e3343b2b/test.fastq
Using: 1 workers to upload 1 files
15:23:32: [0] Uploading /Users/user1/Documents/icav2_validation/for_tutorial/working/test.fastq
15:23:33: [0] Uploaded /Users/user1/Documents/icav2_validation/for_tutorial/working/test.fastq to /test.fastq in 794.511591ms
Finished uploading 1 files in 795.244677ms
The list command lets you view the uploaded file. Note the ID of the file you want to use with the pipeline.
% icav2 projectdata list
PATH NAME TYPE STATUS ID OWNER
/test.fastq test.fastq FILE AVAILABLE fil.c23246bd7692499724fe08da020b1014 4b197387-e692-4a78-9304-c7f73ad75e44The icav2 projectpipelines start command initiates the pipeline run. The following command runs the pipeline. Write down the id for exploring the analysis later.
If for some reason your create command fails and needs to rerun, you might get an error (ConstraintViolationException). If so, try your command with a different name.
% icav2 projectpipelines start cwl cli-tutorial --type-input STRUCTURED --input ipFQ:fil.c23246bd7692499724fe08da020b1014 --user-reference tut-test
analysisStorage.description 1.2 TB
analysisStorage.id 6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
analysisStorage.name Small
analysisStorage.ownerId 8ec463f6-1acb-341b-b321-043c39d8716a
analysisStorage.tenantId f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
analysisStorage.tenantName ica-cp-admin
analysisStorage.timeCreated 2021-11-05T10:28:20Z
analysisStorage.timeModified 2021-11-05T10:28:20Z
id 461d3924-52a8-45ef-ab62-8b2a29621021
ownerId 7fa2b641-1db4-3f81-866a-8003aa9e0818
pipeline.analysisStorage.description 1.2 TB
pipeline.analysisStorage.id 6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
pipeline.analysisStorage.name Small
pipeline.analysisStorage.ownerId 8ec463f6-1acb-341b-b321-043c39d8716a
pipeline.analysisStorage.tenantId f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
pipeline.analysisStorage.tenantName ica-cp-admin
pipeline.analysisStorage.timeCreated 2021-11-05T10:28:20Z
pipeline.analysisStorage.timeModified 2021-11-05T10:28:20Z
pipeline.code cli-tutorial
pipeline.description Test, prepared parameters file from working GUI
pipeline.id 6779fa3b-e2bc-42cb-8396-32acee8b6338
pipeline.language CWL
pipeline.ownerId 7fa2b641-1db4-3f81-866a-8003aa9e0818
pipeline.tenantId d0696494-6a7b-4c81-804d-87bda2d47279
pipeline.tenantName icav2-entprod
pipeline.timeCreated 2022-03-10T13:13:05Z
pipeline.timeModified 2022-03-10T13:13:05Z
reference tut-test-cli-tutorial-eda7ee7a-8c65-4c0f-bed4-f6c2d21119e6
status REQUESTED
summary
tenantId d0696494-6a7b-4c81-804d-87bda2d47279
tenantName icav2-entprod
timeCreated 2022-03-10T20:42:42Z
timeModified 2022-03-10T20:42:43Z
userReference tut-testYou can check the status of the run using the icav2 projectanalyses get command.
% icav2 projectanalyses get 461d3924-52a8-45ef-ab62-8b2a29621021
analysisStorage.description 1.2 TB
analysisStorage.id 6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
analysisStorage.name Small
analysisStorage.ownerId 8ec463f6-1acb-341b-b321-043c39d8716a
analysisStorage.tenantId f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
analysisStorage.tenantName ica-cp-admin
analysisStorage.timeCreated 2021-11-05T10:28:20Z
analysisStorage.timeModified 2021-11-05T10:28:20Z
endDate 2022-03-10T21:00:33Z
id 461d3924-52a8-45ef-ab62-8b2a29621021
ownerId 7fa2b641-1db4-3f81-866a-8003aa9e0818
pipeline.analysisStorage.description 1.2 TB
pipeline.analysisStorage.id 6e1b6c8f-f913-48b2-9bd0-7fc13eda0fd0
pipeline.analysisStorage.name Small
pipeline.analysisStorage.ownerId 8ec463f6-1acb-341b-b321-043c39d8716a
pipeline.analysisStorage.tenantId f91bb1a0-c55f-4bce-8014-b2e60c0ec7d3
pipeline.analysisStorage.tenantName ica-cp-admin
pipeline.analysisStorage.timeCreated 2021-11-05T10:28:20Z
pipeline.analysisStorage.timeModified 2021-11-05T10:28:20Z
pipeline.code cli-tutorial
pipeline.description Test, prepared parameters file from working GUI
pipeline.id 6779fa3b-e2bc-42cb-8396-32acee8b6338
pipeline.language CWL
pipeline.ownerId 7fa2b641-1db4-3f81-866a-8003aa9e0818
pipeline.tenantId d0696494-6a7b-4c81-804d-87bda2d47279
pipeline.tenantName icav2-entprod
pipeline.timeCreated 2022-03-10T13:13:05Z
pipeline.timeModified 2022-03-10T13:13:05Z
reference tut-test-cli-tutorial-eda7ee7a-8c65-4c0f-bed4-f6c2d21119e6
startDate 2022-03-10T20:42:42Z
status SUCCEEDED
summary
tenantId d0696494-6a7b-4c81-804d-87bda2d47279
tenantName icav2-entprod
timeCreated 2022-03-10T20:42:42Z
timeModified 2022-03-10T21:00:33Z
userReference tut-testThe pipelines can be run using JSON input type as well. The following is an example of running pipelines using JSON input type. Note that JSON input works only with file-based CWL pipelines (built using code, not a graphical editor in ICA).
% icav2 projectpipelines start cwl cli-tutorial --data-id fil.c23246bd7692499724fe08da020b1014 --input-json '{
"ipFQ": {
"class": "File",
"path": "test.fastq"
}
}' --type-input JSON --user-reference tut-test-jsonruntime.ram and runtime.cpu values are by default evaluated using the compute environment running the host CWL runner. CommandLineTool Steps within a CWL pipeline run on different compute environments than the host CWL runner, so the valuations of the runtime.ram and runtime.cpu for within the CommandLineTool will not match the runtime environment the tool is running in. The valuation of runtime.ram and runtime.cpu can be overridden by specifying cpuMin and ramMin in the ResourceRequirements for the CommandLineTool.
Projects can be shared by updating the project's Team. You can add team members as
Existing user within the current tenant
By adding their E-mail address
As entire Workgroup within the current tenant
Select the corresponding option under Projects > your_project > Project Settings > Team > + Add.
Email invites are sent out as soon as you click the save button on the add team member dialog.
Users can accept or reject invites. The status column shows a green checkmark for accept, an orange question mark for users that have not responded and a red x for users that rejected the invite.
The project owner has administrator-level project rights. To change the project owner, select the Edit project owner button at the top right and select the new project owner from the list. This can be done by the current project owner, the tenant administrator or a project administrator of the current project.
Every user added to the project team will need to have a role assigned for specific categories of functionality in ICA. These categories are:
Project (contains data and tools to execute analysis)
Flow (secondary analysis pipelines)
Base (genomics data aggregation and analysis)
Bench (interactive data analysis)
While the categories will determine most of what a user can do or see, explicit upload and download rights need to be granted for users. Select the checkbox next to Download allowed and Upload allowed when adding a team member.
Upload and download rights are independent of the assigned role. A user with only viewer rights will still be able to perform uploads and downloads if their upload and download rights are not disabled. Likewise, an administrator can only perform uploads and downloads if their upload and download rights are enabled.
The sections below describe the roles and their allowed actions.
Create a Connector
x
x
x
x
View project resources
x
x
x
Link/Unlink data to a project
x
x
Subscribe to notifications
x
x
View Activity
x
x
Create samples
x
x
Delete/archive data
x
x
Manage notification channels
x
Manage project team
x
View analyses results
x
x
Create analyses
x
Create pipelines and tools
x
Edit pipelines and tools
x
Add docker image
x
View table records
x
x
Click on links in table
x
x
Create queries
x
x
Run queries
x
x
Export query
x
x
Save query
x
x
Export tables
x
x
Create tables
x
Load files into a table
x
Execute a notebook
x
x
Start/Stop Workspace
x
x
Create/Delete/Modify workspaces
x
Install additional tools, packages, libraries, …
x
Build a new Bench docker image
x
Create a tool for pipeline-execution
x
Pipelines defined using the "Code" mode require either an XML-based or JSON-based input form to define the fields shown on the launch view in the user interface (UI). The XML-based input form is defined in the "XML Configuration" tab of the pipeline editing view.
The input form XML must adhere to the input form schema.
During the creation of a Nextflow pipeline the user is given an empty form to fill out.
<pipeline code="" version="1.0" xmlns="xsd://www.illumina.com/ica/cp/pipelinedefinition">
<dataInputs>
</dataInputs>
<steps>
</steps>
</pipeline>The input files are specified within a single DataInputs node. An individual input is then specified in a separate DataInput node. A DataInput node contains following attributes:
code: an unique id. Required.
format: specifying the format of the input: FASTA, TXT, JSON, UNKNOWN, etc. Multiple entries are possible: example below. Required.
type: is it a FILE or a DIRECTORY? Multiple entries are not allowed. Required.
required: is this input required for the execution of a pipeline? Required.
multiValue: are multiple files as an input allowed? Required.
dataFilter: TBD. Optional.
Additionally, DataInput has two elements: label for labelling the input and description for a free text description of the input.
An example of a single file input which can be in a TXT, CSV, or FASTA format.
<pd:dataInput code="in" format="TXT, CSV, FASTA" type="FILE" required="true" multiValue="false">
<pd:label>Input file</pd:label>
<pd:description>Input file can be either in TXT, CSV or FASTA format.</pd:description>
</pd:dataInput>To use a folder as an input the following form is required:
<pd:dataInput code="fastq_folder" format="UNKNOWN" type="DIRECTORY" required="false" multiValue="false">
<pd:label>fastq folder path</pd:label>
<pd:description>Providing Fastq folder</pd:description>
</pd:dataInput>For multiple files, set the attribute multiValue to true. This will make it so the variable is considered to be of type list [], so adapt your pipeline when changing from single value to multiValue.
<pd:dataInput code="tumor_fastqs" format="FASTQ" type="FILE" required="false" multiValue="true">
<pd:label>Tumor FASTQs</pd:label>
<pd:description>Tumor FASTQ files to be provided as input. FASTQ files must have "_LXXX" in its filename to denote the lane and "_RX" to denote the read number. If either is omitted, lane 1 and read 1 will be used in the FASTQ list. The tool will automatically write a FASTQ list from all files provided and process each sample in batch in tumor-only mode. However, for tumor-normal mode, only one sample each can be provided.
</pd:description>
</pd:dataInput>Settings (as opposed to files) are specified within the steps node. Settings represent any non-file input to the workflow, including but not limited to, strings, booleans, integers, etc. The following hierarchy of nodes must be followed: steps > step > tool > parameter. The parameter node must contain following attributes:
code: unique id. This is the parameter name that is passed to the workflow
minValues: how many values (at least) should be specified for this setting. If this setting is required, minValues should be set to 1.
maxValues: how many values (at most) should be specified for this setting
classification: is this setting specified by the user?
In the code below a string setting with the identifier inp1 is specified.
<pd:steps>
<pd:step execution="MANDATORY" code="General">
<pd:label>General</pd:label>
<pd:description>General parameters</pd:description>
<pd:tool code="generalparameters">
<pd:label>generalparameters</pd:label>
<pd:description></pd:description>
<pd:parameter code="inp1" minValues="1" maxValues="3" classification="USER">
<pd:label>inp1</pd:label>
<pd:description>first</pd:description>
<pd:stringType/>
<pd:value></pd:value>
</pd:parameter>
</pd:tool>
</pd:step>
</pd:steps>Examples of the following types of settings are shown in the subsequent sections. Within each type, the value tag can be used to denote a default value in the UI, or can be left blank to have no default. Note that setting a default value has no impact on analyses launched via the API.
For an integer setting the following schema with an element integerType is to be used. To define an allowed range use the attributes minimumValue and maximumValue.
<pd:parameter code="ht_seed_len" minValues="0" maxValues="1" classification="USER">
<pd:label>Seed Length</pd:label>
<pd:description>Initial length in nucleotides of seeds from the reference genome to populate into the hash table. Consult the DRAGEN manual for recommended lengths. Corresponds to DRAGEN argument --ht-seed-len.
</pd:description>
<pd:integerType minimumValue="10" maximumValue="50"/>
<pd:value>21</pd:value>
</pd:parameter>Options types can be used to designate options from a drop-down list in the UI. The selected option will be passed to the workflow as a string. This currently has no impact when launching from the API, however.
<pd:parameter code="cnv_segmentation_mode" minValues="0" maxValues="1" classification="USER">
<pd:label>Segmentation Algorithm</pd:label>
<pd:description> DRAGEN implements multiple segmentation algorithms, including the following algorithms, Circular Binary Segmentation (CBS) and Shifting Level Models (SLM).
</pd:description>
<pd:optionsType>
<pd:option>CBS</pd:option>
<pd:option>SLM</pd:option>
<pd:option>HSLM</pd:option>
<pd:option>ASLM</pd:option>
</pd:optionsType>
<pd:value>false</pd:value>
</pd:parameter>Option types can also be used to specify a boolean, for example
<pd:parameter code="output_format" minValues="1" maxValues="1" classification="USER">
<pd:label>Map/Align Output</pd:label>
<pd:description></pd:description>
<pd:optionsType>
<pd:option>BAM</pd:option>
<pd:option>CRAM</pd:option>
</pd:optionsType>
<pd:value>BAM</pd:value>
</pd:parameter>For a string setting the following schema with an element stringType is to be used.
<pd:parameter code="output_file_prefix" minValues="1" maxValues="1" classification="USER">
<pd:label>Output File Prefix</pd:label>
<pd:description></pd:description>
<pd:stringType/>
<pd:value>tumor</pd:value>
</pd:parameter>For a boolean setting, booleanType can be used.
<pd:parameter code="quick_qc" minValues="0" maxValues="1" classification="USER">
<pd:label>quick_qc</pd:label>
<pd:description></pd:description>
<pd:booleanType/>
<pd:value></pd:value>
</pd:parameter>One known limitation of the schema presented above is the inability to specify a parameter that can be multiple type, e.g. File or String. One way to implement this requirement would be to define two optional parameters: one for File input and the second for String input. At the moment ICA UI doesn't validate whether at least one of these parameters is populated - this check can be done within the pipeline itself.
Below one can find both a main.nf and XML configuration of a generic pipeline with two optional inputs. One can use it as a template to address similar issues. If the file parameter is set, it will be used. If the str parameter is set but file is not, the str parameter will be used. If neither of both is used, the pipeline aborts with an informative error message.
nextflow.enable.dsl = 2
// Define parameters with default values
params.file = false
params.str = false
// Check that at least one of the parameters is specified
if (!params.file && !params.str) {
error "You must specify at least one input: --file or --str"
}
process printInputs {
container 'public.ecr.aws/lts/ubuntu:22.04'
pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
input:
file(input_file)
script:
"""
echo "File contents:"
cat $input_file
"""
}
process printInputs2 {
container 'public.ecr.aws/lts/ubuntu:22.04'
pod annotation: 'scheduler.illumina.com/presetSize', value: 'standard-small'
input:
val(input_str)
script:
"""
echo "String input: $input_str"
"""
}
workflow {
if (params.file) {
file_ch = Channel.fromPath(params.file)
file_ch.view()
str_ch = Channel.empty()
printInputs(file_ch)
}
else {
file_ch = Channel.empty()
str_ch = Channel.of(params.str)
str_ch.view()
file_ch.view()
printInputs2(str_ch)
}
}<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pd:pipeline xmlns:pd="xsd://www.illumina.com/ica/cp/pipelinedefinition" code="" version="1.0">
<pd:dataInputs>
<pd:dataInput code="file" format="TXT" type="FILE" required="false" multiValue="false">
<pd:label>in</pd:label>
<pd:description>Generic file input</pd:description>
</pd:dataInput>
</pd:dataInputs>
<pd:steps>
<pd:step execution="MANDATORY" code="general">
<pd:label>General Options</pd:label>
<pd:description locked="false"></pd:description>
<pd:tool code="general">
<pd:label locked="false"></pd:label>
<pd:description locked="false"></pd:description>
<pd:parameter code="str" minValues="0" maxValues="1" classification="USER">
<pd:label>String</pd:label>
<pd:description></pd:description>
<pd:stringType/>
<pd:value>string</pd:value>
</pd:parameter>
</pd:tool>
</pd:step>
</pd:steps>
</pd:pipeline>Bench images are Docker containers tailored to run in ICA with the necessary permissions, configuration and resources. For more information of Docker images, please refer to
The following steps are needed to get your bench image running in ICA.
You need to have Docker installed in order to build your images.
For your Docker bench image to work in ICA, they must run on Linux X86 architecture, have the correct user id and initialization script in the Docker file.
The following scripts must be part of your Docker bench image. Please refer to the examples from the for more details.
This script copies the ica_start.sh file which takes care of the Initialization and termination of your workspace to the location in your project from where it can be started by ICA when you request to start your workspace.
The user settings must be set up so that bench runs with UID 1000.
To do a clean shutdown, you can capture the sigterm which is transmitted 30 seconds before the workspace is terminated.
Once you have Docker installed and completed the configuration of your Docker files, you can build your bench image.
Open the command prompt on your machine.
Navigate to the root folder of your Docker files.
Execute docker build -f Dockerfile -t mybenchimage:0.0.1 . with mybenchimage being the name you want to give to your image and 0.0.1 replaced with the version number which you want your bench image to be. For more information on this command, see
Once the image has been built, save it as docker tar file with the command docker save mybenchimage:0.0.1 | bzip2 > ../mybenchimage-0.0.1.tar.bz2 The resulting tar file will appear next to the root folder of your docker files.
Open ICA and log in.
Go to Projects > your_project > Data.
For small Docker images, upload the docker image file which you generated in the previous step. For large Docker images use the to better performance and reliability to import the Docker image.
Select the uploaded image file and perform Manage > Change Format.
From the format list, select DOCKER and save the change.
Go to System Settings > Docker Repository > Create > Image.
Select the uploaded docker image and fill out the other details.
Name: The name by which your docker image will be seen in the list
Version: A version number to keep track of which version you have uploaded. In our example this was 0.0.1
Description: Provide a description explaining what your docker images does or is suited for.
Type: The type of this image is Bench. The Tool type is reserved for tool images.
Cluster compatible: Indicates if this docker images is suited for .
Access: This setting must match the available access options of your Docker image. You can choose web access (HTTP), console access (SSH) or both. What is selected here becomes available on the + New Workspace screen. Enabling an option here which your Docker image does not support, will result in access denied errors when trying to run the workspace.
Regions: If your tenant has access to multiple regions, you can select to which regions to replicate the docker image.
Once the settings are entered, select Save. The creation of the Docker image typically takes between 5 and 30 minutes. The status of your docker image will be partial during creation and available once completed.
Navigate to Projects > your_project > Bench > Workspaces.
Create a new workspace with + Create Workspace or edit an existing workspace.
Fill in the bench workspace details according to .
Save your changes.
Select Start Workspace
Wait for the workspace to be started and you can access it either via console or the GUI.
Once your bench image has been started, you can access it via console, web or both, depending on your configuration.
Web access (HTTP) is done from either Projects > your_project > Bench > Workspaces > your_Workspace > Access tab or from the link provided at provided in your running workspace at Projects > your_project > Bench > Workspaces > your_Workspace > Details tab > Access section.
Console access (SSH) is performed from your command prompt by going to the path provided in your running workspace at Projects > your_project > Bench > Workspaces > your_Workspace > Details tab > Access section.
To execute , your workspace needs a way to run them such as the inclusion of an SSH daemon, be it integrated into your web access image or into your console access. There is no need to download the workspace command-line interface, you can run it from within the workspace.
The bench image will be instantiated as a container which will be forcedly started as user with UID 1000 and GID 100.
You cannot elevate your permissions in a running workspace.
Do not run containers as root as this is bad security practice.
Only the following folders are writeable:
/data
/tmp
All other folders are mounted as read-only.
For inbound access, the following ports on the container are publicly exposed, depending on the selection made at startup.
Web: TCP/8888
Console: TCP/2222
For outbound access, a workspace can be started in two modes:
Public: Access to public IP’s is allowed using TCP protocol.
Restricted: Access to list of URLs are allowed.
At runtime, the following Bench-specific environment variables are made available to the workspace instantiated from the Bench image.
Following files and folders will be provided to the workspace and made accessible for reading at runtime.
At runtime, ICA-related software will automatically be made available at /data/.software in read-only mode.
New versions of ICA software will be made available after a restart of your workspace.
When a bench workspace is instantiated from your selected bench image, the following script is invoked: /usr/local/bin/ica_start.sh
This script is the main process in your running workspace and cannot run to completion as it will stop the workspace and instantiate a restart (see ).
This script can be used to invoke other scripts.
When you stop a workspace, a TERM signal is sent to the main process in your bench workspace. You can trap this signal to handle the stop gracefully (see and shut down child processes of the main process. The workspace will be forcedly shut down after 30 seconds if your main process hasn’t stopped within the given period.
If you get the error "docker buildx build" requires exactly 1 argument when trying to build your docker image, then a possible cause is missing the last . of the command.
When you stop the workspace when users are still actively using it, they will receive a message showing a Server Connection Error.
# Init script invoked at start of a bench workspace
COPY --chmod=0755 --chown=root:root ${FILES_BASE}/ica_start.sh /usr/local/bin/ica_start.sh# Bench workspaces need to run as user with uid 1000 and be part of group with gid 100
RUN adduser -H -D -s /bin/bash -h ${HOME} -u 1000 -G users ica# Terminate function
function terminate() {
# Send SIGTERM to child processes
kill -SIGTERM $(jobs -p)
# Send SIGTERM to waitpid
echo "Stopping ..."
kill -SIGTERM ${WAITPID}
}
# Catch SIGTERM signal and execute terminate function.
# A workspace will be informed 30s before forcefully being shutdown.
trap terminate SIGTERM
# Hold init process until TERM signal is received
tail -f /dev/null &
WAITPID=$!
wait $WAITPIDICA_WORKSPACE
The unique identifier related to the started workspace. This value is bound to a workspace and will never change.
32781195
ICA_CONSOLE_ENABLED
Whether Console access is enabled for this running workspace.
true, false
ICA_WEB_ENABLED
Whether Web access is enabled for this running workspace.
true, false
ICA_SERVICE_ACCOUNT_USER_API_KEY
An API key that allows interaction with ICA using the ICA CLI and is bound to the permissions defined at startup of the workspace.
ICA_BENCH_URL
The host part of the public URL which provides access to the running workspace.
use1-bench.platform.illumina.com
ICA_PROJECT_UUID
The unique identifier related to the ICA project in which the workspace was started.
ICA_URL
The ICA Endpoint URL.
HTTP_PROXY
HTTPS_PROXY
The proxy endpoint in case the workspace was started in restricted mode.
HOME
The home folder.
/data
/etc/workspace-auth
Contains the SSH rsa public/private keypair which is required to be used to run the workspace SSHD.
/data
This folder contains all data specific to your workspace.
Data in this folder is not persisted in your project and will be removed at deletion of the workspace.
/data/project
This folder contains all your project data.
/data/.software
This folder contains ICA-related software.



This tutorial shows you how to
import an existing ICA Flow pipeline with a supporting validation analysis
monitor the execution
Iterative development: modify pipeline code and validate in Bench
Modify nextflow code
Modify Docker image contents (Dockerfile or Interactive method)
Make sure you have access in ICA Flow to:
the pipeline you want to work with
an analysis exercising this pipeline, preferably with a short execution time, to use as validation test
For this tutorial, the instance size depends on the flow you import, and whether you use a Bench cluster:
When using a cluster, choose standard-small or standard-medium for the workspace master node
Otherwise, choose at least standard-large if you re-import a pipeline that originally came from nf-core, as they typically need 4 or more CPUs to run.
Select the "single user workspace" permissions (aka "Access limited to workspace owner "), which allows us to deploy pipelines
Specify at least 100GB of disk space
Optional: After choosing the image, enable a cluster with at least one standard-large instance type.
Start the workspace, then (if applicable) also start the cluster
mkdir demo-flow-dev
cd demo-flow-dev
pipeline-dev import-from-flow
or
pipeline-dev import-from-flow --analysis-id=9415d7ff-1757-4e74-97d1-86b47b29fb8fThe starting point is the analysis id that is used as pipeline validation test (the pipeline id is obtained from the analysis metadata).
If no --analysis-id is provided, the tool lists all the successfull analyses in the current project and lets the developer pick one.
If conda and/or nextflow are not installed, pipeline-dev will offer to install them.
A folder called imported-flow-analysis is created.
Pipeline Nextflow assets are downloaded into the nextflow-src sub-folder.
Pipeline input form and associated javascript are downloaded into the ica-flow-config sub-folder.
Analysis input specs are downloaded to the ica-flow-config/launchPayload_inputFormValues.json file.
The analysis inputs are converted into a "test" profile for Nextflow, stored - among other items - in nextflow_bench.conf
Enter the number of the entry you want to use: 21
Fetching analysis 9415d7ff-1757-4e74-97d1-86b47b29fb8f ...
Fetching pipeline bb47d612-5906-4d5a-922e-541262c966df ...
Fetching pipeline files... main.nf
Fetching test inputs
New Json inputs detected
Resolving test input ids to /data/mounts/project paths
Fetching input form..
Pipeline "GWAS pipeline_1.
_2_1_20241215_130117" successfully imported.
pipeline name: GWAS pipeline_1_2_1_20241215_130117
analysis name: Test GWAS pipeline_1_2_1_20241215_130117
pipeline id : bb47d612-5906-4d5a-922e-541262c966df
analysis id : 9415d7ff-1757-4e74-97d1-86b47b29fb8f
Suggested actions:
pipeline-dev run-in-bench
I Iterative dev: Make code changes + re-validate with previous command ]
pipeline-dev deploy-as-flow-pipeline
pipeline-dev run-in-flowThe following command runs this test profile. If a Bench cluster is active, it runs on your Bench cluster, otherwise it runs on the main workspace instance:
cd imported-flow-analysis
pipeline-dev run-in-benchWhen a pipeline is running on your Bench cluster, a few commands help to monitor the tasks and cluster. In another terminal, you can use:
qstat to see the tasks being pending or running
tail /data/logs/sge-scaler.log.<latest available workspace reboot time> to check if the cluster is scaling up or down (it currently takes 3 to 5 minutes to get a new node)
/data/demo $ tail /data/logs/sge-scaler.log.*
2025-02-10 18:27:19,657 - SGEScaler - INFO: SGE Marked Overview - {'UNKNOWN': O, 'DEAD': O, 'IDLE': O, 'DISABLED': O, 'DELETED': O, 'UNRESPONSIVE': 0}
2025-02-10 18:27:19,657 - SGEScaler - INFO: Job Status - Active jobs : 0, Pending jobs : 6
2025-02-10 18:27:26,291 - SGEScaler - INFO: Cluster Status - State: Transitioning,
Online Members: 0, Offline Members: 2, Requested Members: 2, Min Members: 0, Max Members: 2The output of the pipeline is in the outdir folder
Nextflow work files are under the work folder
Log files are .nextflow.log* and output.log
Nextflow files (located in the nextflow-src folder) are easy to modify.
Depending on your environment (ssh access / docker image with JupyterLab or VNC, with and without Visual Studio code), various source code editors can be used.
code nextflow-src # Open in Visual Studio Code
code . # Open current dir in Visual Studio Code
vi nextflow-src/main.nfAfter modifying the source code, you can run a validation iteration with the same command as before:
pipeline-dev run-in-benchModifying the Docker image is the next step.
Nextflow (and ICA) allow the Docker images to be specified at different places:
in config files such as nextflow-src/nextflow.config
in nextflow code files:
/data/demo-flow-dev $ head nextflow-src/main.nf
nextflow.enable.dsl = 2
process top_level_process t
container 'docker.io/ljanin/gwas-pipeline:1.2.1'grep container may help locate the correct files:
Use case: Update some of the software (mimalloc) by compiling a new version
IMAGE_BEFORE=docker.io/ljanin/gwas-pipeline:1.2.1
IMAGE_AFTER=docker.io/ljanin/gwas-pipeline:tmpdemo
# Create directory for Dockerfile
mkdir dirForDockerfile
cd dirForDockerfile
# Create Dockerfile
cat <<EOF > Dockerfile
FROM ${IMAGE_BEFORE}
RUN mkdir /mimalloc-compile \
&& cd /mimalloc-compile \
&& git clone -b v2.0.6 https://github.com/microsoft/mimalloc \
&& mkdir -p mimalloc/out/release \
&& cd mimalloc/out/release \
&& cmake ../.. \
&& make \
&& make install \
&& cd / \
&& rm -rf mimalloc-compile
EOF
# Build image
docker build -t ${IMAGE_AFTER} .With the appropriate permissions, you can then "docker login" and "docker push" the new image.
IMAGE_BEFORE=docker.io/ljanin/gwas-pipeline:1.2.1
IMAGE_AFTER=docker.io/ljanin/gwas-pipeline:1.2.2
docker run -it --rm ${IMAGE_BEFORE} bash
# Make some modifications
vi /scripts/plot_manhattan.py
<Fix "manhatten.png" into "manhattAn.png">
<Enter :wq to save and quit vi><Start another terminal (try Ctrl+Shift+T if using wezterm)>
# Identify container id# Save container changes into new image layer
CONTAINER_ID=c18670335247
docker commit ${CONTAINER_ID} ${IMAGE_AFTER}With the appropriate permissions, you can then "docker login" and "docker push" the new image.
Beware that this extension creates a lot of temp files in /tmp and in $HOME/.vscode-server. Don't include them in your image...
Update the nextflow code and/or configs to use the new image
sed --in-place "s/${IMAGE_BEFORE}/${IMAGE_AFTER}/" nextflow-src/main.nfValidate your changes in Bench:
pipeline-dev run-in-benchpipeline-dev deploy-as-flow-pipelineAfter generating a few ICA-specific files (JSON input specs for Flow launch UI + list of inputs for next step's validation launch), the tool identifies which previous versions of the same pipeline have already been deployed (in ICA Flow, pipeline versioning is done by including the version number in the pipeline name, so that's what is checked here).
It then asks if we want to update the latest version or create a new one.
Choice: 2
Creating ICA Flow pipeline dev-nf-core-demo_v4
Sending inputForm.json
Sending onRender.js
Sending main.nf
Sending nextflow.configAt the end, the URL of the pipeline is displayed. If you are using a terminal that supports it, Ctrl+click or middle-click can open this URL in your browser.
/data/demo $ pipeline-dev deploy-as-flow-pipeline
Generating ICA input specs...
Extracting nf-core test inputs...
Deploying project nf-core/demo
- Currently being developed as: dev-nf-core-demo
- Last version updated in ICA: dev-nf-core-demo_v3
- Next suggested version: dev-nf-core-demo_v4
How would you like to deploy?
1. Update dev-nf-core-demo (current version)
2. Create dev-nf-core-demo_v4
3. Enter new name
4. Update dev-nf-core-demo_v3 (latest version updated in ICA)Sending docs/images/nf-core-demo-subway.svg
Sending docs/images/nf-core-demo_logo_dark.png
Sending docs/images/nf-core-demo_logo_light.png
Sending docs/images/nf-core-demo-subway.png
Sending docs/README. md
Sending docs/output.md
Pipeline successfully deployed
- Id : 26bc5aa5-0218-4e79-8a63-ee92954c6cd9
- URL: https://stage.v2.stratus.illumina.com/ica/projects/1873043/pipelines/26bc5aa5-0218-4e79-8a63-ee92954C6cd9
Suggested actions:
pipeline-dev run-in-flowpipeline-dev launch-validation-in-flowThis launches an analysis in ICA Flow, using the same inputs as the pipeline's "test" profile.
Some of the input files will have been copied to your ICA project to allow the launch to take place. They are stored in the folder /data/project/bench-pipeline-dev/temp-data.
/data/demo $ pipeline-dev launch-validation-in-flow
pipelineld: 26bc5aa5-0218-4e79-8a63-ee92954c6cd9
Getting Analysis Storage Id
Launching as ICA Flow Analysis...
ICA Analysis created:
- Name: Test dev-nf-core-demo_v4
- Id: cadcee73-d975-435d-b321-5d60e9aec1ec
- Url: https://stage.v2.stratus.illumina.com/ica/projects/1873043/analyses/cadcee73-d975-435d-b321-5d60e9aec1ecThe main concept in Bench is the Workspace. A workspace is an instance of a Docker image that runs the framework which is defined in the image (for example JupyterLab, R Studio). In this workspace, you can write and run code and graphically represent data. You can use API calls to access data, analyses, Base tables and queries in the platform. Via the command line, R-packages, tools, libraries, IGV browsers, widgets, etc. can be installed.
You can create multiple workspaces within a project and each workspace runs on an individual node and is available in different resource sizes. Each node has local storage capacity, where files and results can be temporarily stored and exported from to be permanently stored in a Project. The size of the storage capacity can range from 1GB – 16TB.
For each workspace, you can see the status by the color.
Once a workspace is started, it will be restarted every 30 days for security reasons. Even when you have automatic shutdown configured to be more than 30, the workspace will be restarted after 30 days and the remaining days will be counted in the next cycle.
You can see the remaining time until the next event (Shutdown or restart) in the workspaces overview and on the workspace details.
Click Projects > Your_Project > Bench > Workspaces > + Create Workspace
Complete the following fields and save the changes.
The workspace can be edited afterwards when it is stopped, on the Details tab within the workspace. The changes will be applied when the workspace is restarted.
When Access limited to workspace owner is selected, only the workspace owner can access the workspace. Everything created in that workspace will belong to the workspace owner.
Bench administrators are able to create, edit and delete workspaces and start and stop workspaces. If their permissions match or exceed those of the workspace, they can also access the workspace contents.
Contributors are able to start and stop workspaces and if their permissions match or exceed those of the workspace, they can also access the workspace contents.
The determines if someone is an administrator or contributor, while the dedicated indicate what the workspace itself can and cannot do within your project. For this reason, the users need to meet or exceed the required permissions to enter this workspace and use it.
The permissions that a Bench workspace can receive are the following:
Upload rights
Download rights (required)
Project (No Access - Dataprovider - Viewer - Contributor)
Flow (No Access - Viewer - Contributor)
Base (No Access - Viewer - Contributor)
Based on these permissions, you will be able to upload or download data to your ICA project (upload and download rights) and will be allowed to take actions in the Project, Flow and Base modules related to the granted permission.
If you encounter issues when uploading/downloading data in a workspace, the security settings for that workspace may be set to not allow uploads and downloads. This can result in RequestError: send request failed and read: connection reset by peer. This is by design in restricted workspaces and thus limits data access to your project via /data/project to prevent the extraction of large amounts of (proprietary) data.
To delete a workspace, go to Projects > your_project > Bench > Workspaces > your_workspace and click “Delete”. Note that the delete option is only available when the workspace is stopped.
The workspace will not be accessible anymore, nor will it be shown in the list of workspaces. The content of it will be deleted so if there is any information that should be kept, you can either put it in a docker image which you can use to start from next time, or export it using the API.
The workspace is not always accessible. It needs to be started before it can be used. From the moment a workspace is Running, a node with a specific capacity is assigned to this workspace. From that moment on, you can start working in your workspace.
To start the workspace, follow the next steps:
Go to Projects > your_project > Bench > Workspaces > your_workspace > Details
Click on Start Workspace button
On the top of the details tab, the status changes to “Starting”. When you click on the >_Access tab, the message “The workspace is starting” appears.
Wait until the status is “Running” and the “Access” tab can be opened. This can take some time because the necessary resources have to be provisioned.
You can refresh the workspace status by selecting the round refresh symbol at the top right.
Once a workspace is running, it can be manually stopped or it will be automatically shut down after the amount of time configured in the field. Even with automatic shutdown, it is still best practice to stop your workspace run when you no longer need it to save costs.
When you exit a workspace, you can choose to stop the workspace or keep it running. Keeping the workspace running means that it will continue to use resources and incur associated costs. To stop the workspace, select stop in the displayed dialog. You can also stop a workspace by opening it and selecting stop at the top right.
Stopping the workspace will stop the notebook, but will not delete local data. Content will no longer be accessible and no actions can be performed until it is restarted. Any work that has been saved will stay stored.
Storage will continue to be charged until the workspace is deleted. Administrators have a delete option for the workspace in the exit screen.
The project/tenant administrator can enter and stop workspaces for their project/tenant even if they did not start those workspaces at Projects > your_project > Bench > Workspaces > your_workspace > Details. Be careful not to stop workspaces that are processing data. For security reasons, a log entry is added when a project/tenant administrator enters and exits a workspace.
You can see who is using a workspace in the workspace list view.
Once the Workspace is running, the default applications are loaded. These are defined by the start script of the docker image.
The docker images provided by Illumina will load JupyterLab by default. It also contains Tutorial notebooks that can help you get started. Opening a new terminal can be done via the Launcher, + button above the folder structure.
To ensure that packages (and other objects, including data) are permanently installed on a Bench image, a new Bench image needs to be created, using the BUILD option in Bench. A new image can only be derived from an existing one. The build process uses the DOCKERFILE method, where an existing image is the starting point for the new Docker Image (The FROM directive), and any new or updated packages are additive (they are added as new layers to the existing Docker file).
In order to create a derived image, open up the image that you would like to use as the basis and select the Build tab.
Name: By default, this is the same name as the original image and it is recommended to change the name.
Version: Required field which can by any value.
Description: The description for your docker image (for example, indicating which apps it contains).
Code: The Docker file commands must be provided in this section.
The first 4 lines of the Docker file must NOT be edited. It is not possible to start a docker file with a different FROM directive. The main docker file commands are RUN and COPY. More information on them is available in the official Docker documentation.
Once all information is present, click the Build button. Note that the build process can take a while. Once building has completed, the docker image will be available on the Data page within the Project. If the build has failed, the log will be displayed here and the log file will be in the Data list.
From within the workspace it is possible to create a tool from the Docker image.
Click the Manage > Create CWL Tool button in the top right corner of the workspace.
Give the tool a name.
Replace the description of the tool to describe what it does.
Add a version number for the tool.
Click the Docker Build tab.
Here the image that accompanies the tool will be created.
Change the name for the image.
Change the version.
Replace the description to describe what the image does.
Below the line where it says “#Add your commands below.” write the code necessary for running this docker image.
Click the General tab. This tab and all next tabs will look familiar from Flow. Enter the information required for the tool in each of the tabs. For more detailed instruction check out the section in the Flow documentation.
Click the Save button in the upper, right-hand corner to start the build process.
The building can take a while. When it has completed, the tool will be available in the Tool Repository.
To export data from your workspace to your local machine, it is best practice to move the data in your workspace to the /data/project/ folder so that it becomes available in your project under projects > your_project > Data. Although this storage is slow, it offers read and write access and access to the content from within ICA.
For fast read-only access, link folders with the workspace-ctl data create-mount --mode read-only.
For fast read/write access, link which are visible, but whose contents are not accessible from ICA. Use the workspace-ctl data create-mount --mode read-write to do so. You can not have fast read-write access to indexed folders as the indexing mechanism on those would deteriorate the performance.
Every workspace you start has a read-only /data/.software/ folder which contains the icav2 command-line interface (and readme file).
The last tab of the workspace is the activity tab. On this tab all actions performed in the workspace are shown. For example, the creation of the workspace, starting or stopping of the workspace,etc. The activities are shown with their date, the user that performed the action and the description of the action. This page can be used to check how long the workspace has run.
In the general Activity page of the project, there is also a Bench activity tab. This shows all activities performed in all workspaces within the project, even when the workspace has been deleted. The Activity tab in the workspace only shows the action performed in that workspace. The information shown is the same as per workspace, except that here the workspace in which the action is performed is listed as well.
An Analysis is the execution of a pipeline.
You can start an analysis from both the dedicated analysis screen or from the actual pipeline.
Navigate to Projects > Your_Project > Flow > Analyses.
Select Start.
Select a single Pipeline.
Configure the.
Select Start Analysis.
Refresh to see the analysis status. See for more information on statuses.
If for some reason, you want to end the analysis before it can complete, select Projects > Your_Project > Flow > Analyses > Manage > Abort. Refresh to see the status update.
Navigate to Projects > <Your_Project> > Flow > Pipelines
Select the pipeline you want to run or open the pipeline details of the pipeline which you want to run.
Select Start Analysis.
Configure .
Select Start Analysis.
View the analysis status on the Analyses page. See for more information on statuses.
If for some reason, you want to end the analysis before it can complete, select Manage > Abort on the Analyses page.
You can abort a running analysis from either the analysis overview (Projects > your_project > Flow > Analyses > your_analysis > Manage > Abort) or from the analysis details (Projects > your_project > Flow > Analyses > your_analysis > Details tab > Abort).
Once an analysis has been executed, you can rerun it with the same settings or choose to modify the parameters when rerunning. Modifying the parameters is possible on a per-analysis basis. When selecting multiple analyses at once, they will be executed with the original parameters. Draft pipelines are subject to updates and thus can result in a different outcome when rerunning. ICA will display a warning message to inform you of this when you try to rerun an analysis based on a draft pipeline.
When there is an XML configuration change on a a pipline for which you want to rerun an analysis, ICA will display a warning and not fill out the parameters as it cannot guarantee their validity for the new XML.
Some restrictions apply when trying to rerun an analysis.
To rerun one or more analyses with te same settings:
Navigate to Projects > Your_Project > Flow > Analyses.
In the overview screen, select one or more analyses.
Select Manage > Rerun. The analyses will now be executed with the same parameters as their original run.
To rerun a single analysis with modified parameters:
Navigate to Projects > Your_Project > Flow > Analyses.
In the overview screen, open the details of the analysis you want to rerun by clicking on the analysis user reference.
Select Rerun. (at the top right)
Update the parameters you want to change.
Select Start Analysis The analysis will now be executed with the updated parameters.
When an analysis is started, the availability of resources may impact the start time of the pipeline or specific steps after execution has started. Analyses are subject to delay when the system is under high load and the availability of resources is limited.
During analysis start, ICA runs a verification on the input files to see if they are available. When it encounters files that have not completed their upload or transfer, it will report "Data found for parameter [parameter_name], but status is Partial instead of Available". Wait for the file to be available and restart the analysis.
During the execution of an analysis, logs are produced for each process involved in the analysis lifecyle. In the analysis details view, the Steps tab is used to view the steps in near real time as they're produced in the running processes. A grid layout is used for analyses with more than 50 steps, a tiled view for analyses with 50 steps or less, though you can choose to also use the grid layout for those by means of the tile/grid button on the top right of the analysis log tab. The steps tab also shows which resources were used as compute type in the different main analysis steps. (For child steps, these are displayed on the parent step)
There are system processes involved in the lifeycle for all analyses (ie. downloading inputs, uploading outputs, etc.) and there are processes which are pipeline-specific, such as processes which execute the pipeline steps. The below table describes the system processes. You can choose to display or hide these system processes with the Show technical steps
Additional log entries will show for the processes which execute the steps defined in the pipeline.
Each process shows as a distinct entry in the steps view with a Queue Date, Start Date, and End Date.
The time between the Start Date and the End Date is used to calculate the duration. The time of the duration is used to calculate the usage-based cost for the analysis. Because this is an active calculation, sorting on this field is not supported.
Each log entry in the Steps view contains a checkbox to view the stdout and stderr log files for the process. Clicking a checkbox adds the log as a tab to the log viewer where the log text is displayed and made available for download.
To see the price of an analysis in iCredits, look at Projects > your_project > Flow > Analyses > your_analysis > Details tab. The pricing section will show you the entitlement bundle, storage detail and price in iCredits once the analysis has succeeded, failed or been aborted.
By default, the stdout and stderr files are located in the ica_logs subfolder within the analysis. This location can be changed by selecting a different in the current project at the start of the analysis. Do not use a folder which already contains log files as these will be overwritten. To set the log file location, you can also use the CreateAnalysisLogs section of the Create Analysis .
If you delete these files, no log information will be available on the analysis details > Steps tab.
You can access the log files from the analysis details (projects > your_project > flow > analysis > your_analysis > details tab)
Logs can also be streamed using websocket client tooling. The API to retrieve analysis step details returns websocket URLs for each step to stream the logs from stdout/stderr during the step's execution. Upon completion, the websocket URL is no longer available.
Currently, only FOLDER type output mappings are supported
By default, analysis outputs are directed to a new folder within the project where the analysis is launched. Analysis output mappings may be specified to redirect outputs to user-specified locations consisting of project and path. An output mapping consists of:
the source path on the local disk of the analysis execution environment, relative to the working folder.
the data type, either FILE or FOLDER
the target project ID to direct outputs to; analysis launcher must have contributor access to the project.
the target path relative to the root of the project data to write the outputs.
If the output folder already exists, any existing contents with the same filenames as those output from the pipeline will be overwritten by the new analysis
You can jump from the Analysis Details to the individual files and folders by opening the output files tab on the detail view (Projects > your_project > Flow > Analyses > your_analysis > Output files tab > your_output_file) and selecting Open in data.
{% hint style="info" %} The Output files section of the analyses will always show the generated outputs, even when they have since been deleted from storage. This is done so you can always see which files were generated during the analysis. In this case it will no longer be possible to navigate to the actual output files. {% endhint %}
You can add and remove tags from your analyses.
Navigate to Projects > Your_Project > Flow > Analyses.
Select the analyses whose tags you want to change.
Select Manage > Manage tags.
Edit the user tags, reference data tags (if applicable) and technical tags.
Select Save to confirm the changes.
Both system tags and customs tags exist. User tags are custom tags which you set to help identify and process information while technical tags are set by the system for processing. Both run-in and run-out tags are set on data to identify which analyses use the data. Connector tags determine data entry methods and reference data tags identify where data is used as reference data.
If you want to share a link to an analysis, you can copy and paste the URL from your browser when you have the analysis open. The syntax of the analysis link will be `
/ica/link/project//analysis/. Likewise, workflow sessions will use the syntax /ica/link/project//workflowSession/`. To prevent third parties from accessing data via the link when it is shared or forwarded, ICA will verify the access rights of every user when they open the link.
Input for analysis is limited to a total of 50,000 files (including multiple copies of the same file). Concurrency limits on analyses prevent resource hogging which could result in resource starvation for other tenants. Additional analyses will be queued and scheduled when currently running analyses complete and free up positions. The theoretical limit is 20, but this can be less in practice, depending on a number of external factors.
When your analysis fails, open the analysis details view (Projects > your_project> Flow > Analyses > your_analysis) and select display failed steps. This will give you the steps view filtered on those steps that had non-0 exit codes. If there is only one failed step which has logfiles, the stderr of that step will be displayed.
Exit code 55 indicates analysis failure on economy instances due to an external event such as spot termination. You can retry the analysis.
Exit code 56 indicates analysis failure due to pod disruption and deletion by Kubernetes' Pod Garbage Collector (PodGC) because the node it was running on no longer exists. You can retry the anlaysis.
green
running
orange
restarting
grey
stopped
red
error
Name
must be a unique name
Automatic Restart Reminder
The time (in days/hours) prior to an automatic restart (every 30 days) at which an email reminder for this event is sent out to the workspace owner. For example: 1d 2h
Automatic Shutdown
The time (in days/hours) between start of the workspace and automatic shutdown. For example: 5d 12h. When this value is more than 30 days, which is the restart period, the workspace will be restarted after 30 days and the remaining time will be counted in the next cycle. So for 50 days, you will have a restart after 30 days and then 20 days remaining before the workspace shuts down.
Automatic Shutdown Reminder
The time (in days/hours) prior to the automatic shutdown at which an email reminder for this event is sent out to the workspace owner. For example: 1d 2h
Docker image
The list of docker images includes base images from ICA and images uploaded to the docker repository for that domain.
Storage size
Represents the size of the storage available on the workspace. A storage from 1GB to 16TB can be provided.
Resource model
Size of the machine on which the workspace will run and whether or not the machine should contain a Graphics Processing Unit (GPU). See Bench pricing for available sizes.
Description
A place to provide additional information about the workspace.
Access (available after selecting Docker image)
The options here are determined by the Docker image settings. The options you select will become available on the details tab of the Workspace when it is running. Web allows to interact with the workspace via a browser. Console provides a terminal to interact with the workspace.
Cluster
When your selected Docker image is cluster compatible, the cluster settings become available. If you do not enable the cluster settings, the cluster-compatible image will be run on a single node. When enabled, you can choose if you want to use a dedicated cluster manager and if it requires web access.
For the cluster members, you can choose to use a static amount of nodes or dynamic scaling, which resources (cpu, memory and storage) are required and if you want to run in economy mode (AWS spot instances). AWS can interrupt Spot Instances with a two-minute notification which is passed on to the workspace which in turn grants a 30 second graceful shutdown period, so it is best not to use spot Instances for workloads that cannot handle individual instance interruption. For more information on clusters, see Sun Grid Engine and Spark.
Internet Access
Type of access to the internet which should be provided for this workspace. Open: Internet access is allowed. Restricted: Creates a workspace with no internet access. Access to the ICA Project Data is still available in this mode. Whitelisted URLs: Specify URLs* and paths that are allowed in a restricted workspace. Separate URLS with a new line. Only domains and subdomains in the specified URL will be allowed.
Permissions
Your workspace will operate with these permissions. For security reasons, users will need to have the permissions matching what you set at the following permissions to run the workspace, regardless of their role.
Access limited to workspace owner. When this field is selected, only the workspace owner can access the workspace. Everything created in that workspace will belong to the workspace owner.
Download/Upload allowed
Project/Flow/Base access
Contributor
-
-
X
when permissions match those of the workspace
Administrator
X
X
X
when permissions match those of the workspace

Analyses using external data
Allowed
-
Analyses using mount paths on input data
Allowed
-
Analyses using user-provided input json
Allowed
-
Analyses using advanced output mappings
-
-
Analyses with draft pipeline
Warn
Warn
Analyses with XML configuration change
Warn
Warn
Requested
The request to start the Analysis is being processed
No
Queued
Analysis has been queued
No
Initializing
Initializing environment and performing validations for Analysis
No
Preparing Inputs
Downloading inputs for Analysis
No
In Progress
Analysis execution is in progress
No
Generating outputs
Transferring the Analysis results
No
Aborting
Analysis has been requested to be aborted
No
Aborted
Analysis has been aborted
Yes
Failed
Analysis has finished with error
Yes
Succeeded
Analysis has finished with success
Yes
Setup Environment
Validate analysis execution environment is prepared
Run Monitor
Monitor resource usage for billing and reporting
Prepare Input Data
Download and mount input data to the shared file system
Pipeline Runner
Parent process to execute the pipeline definition
Finalize Output Data
Upload Output Data
Queue Date
The time when the process is submitted to the processes scheduler for execution
Start Date
The time when the process has started exection
End Date
The time when the process has stopped execution
```json
{
...
"analysisOutput":
[
{
"sourcePath": "out/test1",
"type": "FOLDER",
"targetProjectId": "4d350d0f-88d8-4640-886d-5b8a23de7d81",
"targetPath": "/output-testing-01/"
},
{
"sourcePath": "out/test2",
"type": "FOLDER",
"targetProjectId": "4d350d0f-88d8-4640-886d-5b8a23de7d81",
"targetPath": "/output-testing-02/"
}
]
}
```Default
Default
Logs are a subfolder of the analysis output.
Mapped
Default
Logs are a subfolder of the analysis output.
Default
Mapped
Outputs and logs may be separated.
Mapped
Mapped
Outputs and logs may be separated.










Any operation from the ICA graphical user interface can also be performed with the API.
The following are some basic examples on how to use the API. These examples are based on using Python as programming language. For other languages, please see their native documentation on API usage.
An installed copy of Python. (https://www.python.org/)
The package installer for python (pip) (https://pip.pypa.io/)
Having the python requests library installed (pip install requests)
One of the easiest authentication methods is by means of API keys. To generate an API key, refer to the section. This key is then used in your Python code to authenticate the API calls. It is best practice to regularly update your API keys.
API keys are valid for a single user, so any information you request is for that user to which the key belongs. For this reason, it is best practice to create a dedicated API user so you can manage the access rights for the API by managing those user rights.
There is a dedicated where you can enter your API key and try out the different API commands and get an overview of the available parameters.
The examples on the page use curl (Client URL) while Python uses Python requests. There are a number of online tools to automatically convert from curl to python.
To get the curl command,
Look up the endpoint you want to use on the API reference page.
Select Try it out.
Enter the necessary parameters.
Select Execute.
Copy the resulting curl command.
Never paste your API authentication key into online tools when performing curl conversion as this poses a significant security risk.
In the most basic form, the curl command
curl my.curlcommand.com
becomes
You will see the following options in the curl commands on the page.
-H means header.
-X means the string is passed "as is" without interpretation.
becomes
This is a straightforward request without parameters which can be used to to verify your connection.
The API call is
response = requests.get('https://ica.illumina.com/ica/rest/api/eventcodes', headers={'X-API-Key': '<your_generated_API_key>'})
In this example, the API key is part of your API call, which means you must update all API calls when the key changes. A better practice is to put this API key in the headers so it is easier to maintain. The full code then becomes
The list of application codes was returned as a single line, which makes it difficult to read, so let's pretty-print the result.
Now that we are able to retrieve information with the API, we can use it for a more practical request like retrieving a list of projects. This API request can also take parameters.
First, we pass the request without parameters to retrieve all projects.
The easiest way to pass a parameter is by appending it to the API request. The following API request will list the projects with a filter on CAT as user tag.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects?userTags=CAT', headers=headers)
If you only want entries that have both the tags CAT and WOLF, you would append them like this:
response = requests.get('https://ica.illumina.com/ica/rest/api/projects?userTags=CAT&userTags=WOLF', headers=headers)
To copy data, you need to know:
Your generated API key.
The dataId of the files and folders which you want to copy (their syntax is fil.hexadecimal_identifier and fol.hexadecimal_identifier). You can select a file or folder in the GUI and select it to see the Id (Projects > your_project > Data > your_file > Data details > Id) or you can use the /api/projects/{projectId}/data endpoint.
The destination project to which you want to copy the data.
The destination folder within the destination project to which you want to copy the data (fol.hexadecimal_identifier).
What to do when the destination files or folders already exist (OVERWRITE, SKIP or RENAME).
The full code will then be as follows:
Now that we have done individual API requests, we can combine them and use the output of one request as input for the next request. When you want to run a pipeline, you need a number of input parameters. In order to obtain these parameters, you need to make a number of API calls first and use the returned results as part of your request to run the pipeline. In the examples below, we will build up the requests one by one so you can run them individually first to see how they work. These examples only follow the happy path to keep them as simple as possible. If you program them for a full project, remember to add error handling. You can also use the GUI to get all the parameters or write them down after performing the individual API calls in this section. Then, you can build your final API call with those values fixed.
This block must be added at the beginning of your code
Previously, we already requested a list of all projects, now we add a search parameter to look for a project called MyProject. (Replace MyProject with the name of the project you want to look for).
Now that we have found our project by name, we need to get the unique project id, which we will use in the combined requests. To get the id, we add the following line to the end of the code above.
Syntax ['items'][0]['id'] means we look for the items list, 0 means we take the first entry (as we presume our filter was accurate enough to only return the correct result and we don't have duplicate project names) and id means we take the data from the id field. Similarly, you can build other expressions to give you the data you want to see, such as ['items'][0]['urn'] to get the urn or ['items'][0]['tags']['userTags'] to get the list of user tags.
Once we have the identifier we need, we add it to a variable which we will call Project_Identifier in our examples.
Once we have the identifier of our project, we can fill it out in the request to list the pipelines which are part of our project.
This will give us all the available pipelines for that project. As we will only want to run a single pipeline, we can search for our pipeline, which in this example will be the basic_pipeline. Unfortunately, this API call has no direct search parameter, so when we get the list of pipelines, we will look for the id and store that in a variable which we will call Pipeline_Identifier in our examples as follows:
Once we know the project identifier and the pipeline identifier, we can create an API request to retrieve the list of input parameters which are needed for the pipeline. We will consider a simple pipeline which only needs a file as input. If your pipeline has more input parameters, you will need to set those as well.
Here we will look for the id of the extra small storage size. This is done with the 0 in the My_API_Data['items'][0]['id']
Now we will look for a file "testExample" which we want to use as input and store the file id.
Finally, we can run the analysis with parameters filled out.
{
"$id": "#ica-pipeline-input-form",
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "ICA Pipeline Input Forms",
"description": "Describes the syntax for defining input setting forms for ICA pipelines",
"type": "object",
"additionalProperties": false,
"properties": {
"fields": {
"description": "The list of setting fields",
"type": "array",
"items": {
"$ref": "#/definitions/ica_pipeline_input_form_field"
}
}
},
"required": [
"fields"
],
"definitions": {
"ica_pipeline_input_form_field": {
"$id": "#ica_pipeline_input_form_field",
"type": "object",
"additionalProperties": false,
"properties": {
"id": {
"description": "The unique identifier for this field. Will be available with this key to the pipeline script.",
"type": "string",
"pattern": "^[a-zA-Z-0-9\\-_\\.\\s\\+\\[\\]]+$"
},
"type": {
"type": "string",
"enum": [
"textbox",
"checkbox",
"radio",
"select",
"number",
"integer",
"data",
"section",
"text",
"fieldgroup"
]
},
"label": {
"type": "string"
},
"minValues": {
"description": "The minimal amount of values that needs to be present. Default is 0 when not provided. Set to >=1 to make the field required.",
"type": "integer",
"minimum": 0
},
"maxValues": {
"description": "The maximal amount of values that needs to be present. Default is 1 when not provided.",
"type": "integer",
"exclusiveMinimum": 0
},
"minMaxValuesMessage": {
"description": "The error message displayed when minValues or maxValues is not adhered to. When not provided a default message is generated.",
"type": "string"
},
"helpText": {
"type": "string"
},
"placeHolderText": {
"description": "An optional short hint (a word or short phrase) to aid the user when the field has no value."
"type": "string"
},
"value": {
"description": "The value for the field. Can be an array for multi-value fields.
For 'number' type values the exponent needs to be between -300 and +300 and max precision is 15.
For 'integer' type values the value needs to between -100000000000000000 and 100000000000000000"
},
"minLength": {
"type": "integer",
"minimum": 0
},
"maxLength": {
"type": "integer",
"exclusiveMinimum": 0
},
"min": {
"description": "Minimal allowed value for 'integer' and 'number' type. Exponent needs to be between -300 and +300 and max precision is 15.",
"type": "number"
},
"max": {
"description": "Maximal allowed value for 'integer' and 'number' type. Exponent needs to be between -300 and +300 and max precision is 15.",
"type": "number"
},
"choices": {
"type": "array",
"items": {
"$ref": "#/definitions/ica_pipeline_input_form_field_choice"
}
},
"fields": {
"description": "The list of setting sub fields for type fieldgroup",
"type": "array",
"items": {
"$ref": "#/definitions/ica_pipeline_input_form_field"
}
},
"dataFilter": {
"description": "For defining the filtering when type is 'data'.",
"type": "object",
"additionalProperties": false,
"properties": {
"nameFilter": {
"description": "Optional data filename filter pattern that input files need to adhere to when type is 'data'. Eg parts of the expected filename",
"type": "string"
},
"dataFormat": {
"description": "Optional dataformat name array that input files need to adhere to when type is 'data'",
"type": "array",
"contains": {
"type": "string"
}
},
"dataType": {
"description": "Optional data type (file or directory) that input files need to adhere to when type is 'data'",
"type": "string",
"enum": [
"file",
"directory"
]
}
}
},
"regex": {
"type": "string"
},
"regexErrorMessage": {
"type": "string"
},
"hidden": {
"type": "boolean"
},
"disabled": {
"type": "boolean"
},
"emptyValuesAllowed": {
"type": "boolean",
"description": "When maxValues is greater than 1 and emptyValuesAllowed is true, the values may contain null entries. Default is false."
},
"updateRenderOnChange": {
"type": "boolean",
"description": "When true, the onRender javascript function is triggered ech time the user changes the value of this field. Default is false."
}
"streamable": {
"type": "boolean",
"description": "EXPERIMENTAL PARAMETER! Only possible for fields of type 'data'. When true, the data input files will be offered in streaming mode to the pipeline instead of downloading them."
},
},
"required": [
"id",
"type"
],
"allOf": [
{
"if": {
"description": "When type is 'textbox' then 'dataFilter', 'fields', 'choices', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"textbox"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"choices",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'checkbox' then 'dataFilter', 'fields', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"checkbox"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"choices",
"placeHolderText",
"regex",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'radio' then 'dataFilter', 'fields', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"radio"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"placeHolderText",
"regex",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'select' then 'dataFilter', 'fields', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"select"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"regex",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'number' or 'integer' then 'dataFilter', 'fields', 'choices', 'regex', 'regexErrorMessage', 'maxLength' and 'minLength' are not allowed",
"properties": {
"type": {
"enum": [
"number",
"integer"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"fields",
"choices",
"regex",
"regexErrorMessage",
"maxLength",
"minLength"
]
}
}
}
},
{
"if": {
"description": "When type is 'data' then 'dataFilter' is required and 'fields', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"data"
]
}
},
"required": [
"type"
]
},
"then": {
"required": [
"dataFilter"
],
"propertyNames": {
"not": {
"enum": [
"fields",
"choices",
"placeHolderText",
"regex",
"regexErrorMessage",
"max",
"min",
"maxLength",
"minLength"
]
}
}
}
},
{
"if": {
"description": "When type is 'section' or 'text' then 'disabled', 'fields', 'updateRenderOnChange', 'classification', 'value', 'minValues', 'maxValues', 'minMaxValuesMessage', 'dataFilter', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' are not allowed",
"properties": {
"type": {
"enum": [
"section",
"text"
]
}
},
"required": [
"type"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"disabled",
"fields",
"updateRenderOnChange",
"classification",
"value",
"minValues",
"maxValues",
"minMaxValuesMessage",
"dataFilter",
"choices",
"regex",
"placeHolderText",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min"
]
}
}
}
},
{
"if": {
"description": "When type is 'fieldgroup' then 'fields' is required and then 'dataFilter', 'choices', 'placeHolderText', 'regex', 'regexErrorMessage', 'maxLength', 'minLength', 'max' and 'min' and 'emptyValuesAllowed' are not allowed",
"properties": {
"type": {
"enum": [
"fieldgroup"
]
}
},
"required": [
"type",
"fields"
]
},
"then": {
"propertyNames": {
"not": {
"enum": [
"dataFilter",
"choices",
"placeHolderText",
"regex",
"regexErrorMessage",
"maxLength",
"minLength",
"max",
"min",
"emptyValuesAllowed"
]
}
}
}
}
]
},
"ica_pipeline_input_form_field_choice": {
"$id": "#ica_pipeline_input_form_field_choice",
"type": "object",
"additionalProperties": false,
"properties": {
"value": {
"description": "The value which will be set when selecting this choice. Must be unique over the choices within a field",
},
"text": {
"description": "The display text for this choice, similar as the label of a field. ",
"type": "string"
},
"selected": {
"description": "Optional. When true, this choice value is picked as default selected value.
As in selected=true has precedence over an eventual set field 'value'.
For clarity it's better however not to use 'selected' but use field 'value' as is used to set default values for the other field types.
Only maximum 1 choice may have selected true.",
"type": "boolean"
},
"disabled": {
"type": "boolean"
},
"parent": {
"description": "Value of the parent choice item. Can be used to build hierarchical choice trees."
}
},
"required": [
"value",
"text"
]
}
}
}import requests
response = requests.get('http://my.curlcommand.com')curl -X 'GET' 'https://my.curlcommand.com' -H 'HeaderName: HeaderValue'import requests
headers = {
'HeaderName': 'HeaderValue',
}
response = requests.get('https://my.curlcommand.com', headers=headers)# The requests library will allow you to make HTTP requests.
import requests
# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
'X-API-Key': '<your_generated_API_key>',
}
# Store the API request in response.
response = requests.get('https://ica.illumina.com/ica/rest/api/eventCodes', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Response status code: ", response.status_code)
# Display the data from the request.
print(response.json())# The requests library will allow you to make HTTP requests.
import requests
# JSON will allow us to format and interpret the output.
import json
# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
'X-API-Key': '<your_generated_API_key>',
}
# Store the API request in response.
response = requests.get('https://ica.illumina.com/ica/rest/api/eventCodes', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Print JSON data in readable format with indentation and sorting.
print(json.dumps(My_API_Data, indent=3, sort_keys=True))# The requests library will allow you to make HTTP requests.
import requests
# JSON will allow us to format and interpret the output.
import json
# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
'X-API-Key': '<your_generated_API_key>',
}
# Store the API request in response.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Print JSON data in readable format with indentation and sorting.
print(json.dumps(My_API_Data, indent=3, sort_keys=True))# The requests library will allow you to make HTTP requests.
import requests
# Fill out your generated API key.
headers = {
'accept': 'application/vnd.illumina.v3+json',
'X-API-Key': '<your_generated_API_key>',
'Content-Type': 'application/vnd.illumina.v3+json',
}
# Enter the files and folders, the destination folder, and the action to perform when the destination data already exists.
data = '{"items": [{"dataId": "fil.0123456789abcdef"}, {"dataId": "fil.735040537abcdef"}], "destinationFolderId": "fol.1234567890abcdef", "copyUserTags": true,"copyTechnicalTags": true,"copyInstrumentInfo": true,"actionOnExist": "SKIP"}'
# Replace <Project_Identifier> with the actual identifier of the destination project.
response = requests.post(
'https://ica.illumina.com/ica/rest/api/projects/**<Project_Identifier>**/dataCopyBatch',
headers=headers,
data=data,
)
# Display the response status code.
print("Response status code: ", response.status_code) # The requests library will allow you to make HTTP requests.
import requests
# JSON will allow us to format and interpret the output.
import json
# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
'X-API-Key': '<your_generated_API_key>',
}# Store the API request in response. Here we look for a project called "MyProject".
response = requests.get('https://ica.illumina.com/ica/rest/api/projects?search=MyProject', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Print JSON data in readable format with indentation and sorting.
print(json.dumps(My_API_Data, indent=3, sort_keys=True))print(My_API_Data['items'][0]['id'])# Get the project identifier.
Project_Identifier = My_API_Data['items'][0]['id']response = requests.get('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/pipelines?', headers=headers)# Find Pipeline
# Store the API request in response. Here we look for the list of pipelines in MyProject.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/pipelines', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find Pipeline Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Store the list of pipelines for further processing.
pipelineslist = json.dumps(My_API_Data)
# Set "basic_pipeline" as the pipeline to search for. Replace this with your target pipeline.
target_pipeline = "basic_pipeline"
found_pipeline = None
# Look for the code to match basic_pipeline and store the ID.
for item in My_API_Data['items']:
if 'pipeline' in item and item['pipeline'].get('code') == target_pipeline:
found_pipeline = item['pipeline']
Pipeline_Identifier = found_pipeline['id']
break
print("Pipeline Identifier: " + Pipeline_Identifier)# Find Parameters
# Store the API request in response. Here we look for the Parameters in basic_pipeline
response = requests.get('https://ica.illumina.com/ica/rest/api/pipelines/'+(Pipeline_Identifier)+'/inputParameters', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find Parameters Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Get the parameters and store in the Parameters variable.
Parameters = My_API_Data['items'][0]['code']
print("Parameters: ",Parameters)# Store the API request in response. Here we look for the analysis storage size.
response = requests.get('https://ica.illumina.com/ica/rest/api/analysisStorages', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find analysisStorages Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Get the storage size. We will select extra small.
Storage_Size = My_API_Data['items'][0]['id']
print("Storage_Size: ",Storage_Size)# Get Input File
# Store the API request in response. Here we look for the Files testExample.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/data?fullText=testExample', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find input file Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data
My_API_Data = response.json()
# Get the first file ID.
InputFile = My_API_Data['items'][0]['data']['id']
print("InputFile id: ",InputFile)Postheaders = {
'accept': 'application/vnd.illumina.v4+json',
'X-API-Key': '<your_generated_API_key>',
'Content-Type': 'application/vnd.illumina.v4+json',
}
data = '{"userReference":"api_example","pipelineId":"'+(Pipeline_Identifier)+'","analysisStorageId":"'+(Storage_Size)+'","analysisInput":{"inputs":[{"parameterCode":"'+(Parameters)+'","dataIds":["'+(InputFile)+'"]}]}}'
response = requests.post(
'https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/analysis:nextflow',headers=Postheaders,data=data,
)# The requests library will allow you to make HTTP requests.
import requests
# JSON will allow us to format and interpret the output.
import json
# Replace <your_generated_API_key> with your actual generated API key here.
headers = {
'X-API-Key': '<your_generated_API_key>',
}
# Replace <your_generated_API_key> with your actual generated API key here.
Postheaders = {
'accept': 'application/vnd.illumina.v4+json',
'X-API-Key': '<your_generated_API_key>',
'Content-Type': 'application/vnd.illumina.v4+json',
}
# Find project
# Store the API request in response. Here we look for a project called "MyProject".
response = requests.get('https://ica.illumina.com/ica/rest/api/projects?search=MyProject', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find Project response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Get the project identifier.
Project_Identifier = My_API_Data['items'][0]['id']
print("Project_Identifier: ",Project_Identifier)
# Find Pipeline
# Store the API request in response. Here we look for the list of pipelines in MyProject.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/pipelines', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find Pipeline Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Store the list of pipelines for further processing.
pipelineslist = json.dumps(My_API_Data)
# Set "basic_pipeline" as the pipeline to search for. Replace this with your target pipeline.
target_pipeline = "basic_pipeline"
found_pipeline = None
# Look for the code to match basic_pipeline and store the ID.
for item in My_API_Data['items']:
if 'pipeline' in item and item['pipeline'].get('code') == target_pipeline:
found_pipeline = item['pipeline']
Pipeline_Identifier = found_pipeline['id']
break
print("Pipeline Identifier: " + Pipeline_Identifier)
# Find Parameters
# Store the API request in response. Here we look for the Parameters in basic_pipeline.
response = requests.get('https://ica.illumina.com/ica/rest/api/pipelines/'+(Pipeline_Identifier)+'/inputParameters', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find Parameters Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Get the parameters and store in the Parameters variable.
Parameters = My_API_Data['items'][0]['code']
print("Parameters: ",Parameters)
# Get Storage Size
# Store the API request in response. Here we look for the analysis storage size.
response = requests.get('https://ica.illumina.com/ica/rest/api/analysisStorages', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find analysisStorages Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Get the storage size. We will select extra small.
Storage_Size = My_API_Data['items'][0]['id']
print("Storage_Size: ",Storage_Size)
# Get Input File
# Store the API request in response. Here we look for the Files testExample.
response = requests.get('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/data?fullText=testExample', headers=headers)
# Display the response status code. Code 200 means the request succeeded.
print("Find input file Response status code: ", response.status_code)
# Put the JSON data from the response in My_API_Data.
My_API_Data = response.json()
# Get the first file ID.
InputFile = My_API_Data['items'][0]['data']['id']
print("InputFile id: ",InputFile)
# Finally, we can run the analysis with parameters filled out.
data = '{"userReference":"api_example","pipelineId":"'+(Pipeline_Identifier)+'","tags":{"technicalTags":[],"userTags":[],"referenceTags":[]},"analysisStorageId":"'+(Storage_Size)+'","analysisInput":{"inputs":[{"parameterCode":"'+(Parameters)+'","dataIds":["'+(InputFile)+'"]}]}}'
print (data)
response = requests.post('https://ica.illumina.com/ica/rest/api/projects/'+(Project_Identifier)+'/analysis:nextflow',headers=Postheaders,data=data,)
print("Post Response status code: ", response.status_code)Pipelines defined using the "Code" mode require an XML or JSON-based input form to define the fields shown on the launch view in the user interface (UI).
To create a JSON-based Nextflow (or CWL) pipeline, go to Projects > your_project > Flow > Pipelines > +Create > Nextflow (or CWL) > JSON-based.
Three files, located on the inputform files tab, work together for evaluating and presenting JSON-based input.
inputForm.json contains the actual input form which is rendered when starting the pipeline run.
onRender.js is triggered when a value is changed.
onSubmit.js is triggered when starting a pipeline via the GUI or API.
Use + Create to add additional files and Simulate to test your inputForms.
Scripting execution supports crossfield validation of the values, hiding fields, making them required, .... based on value changes.
The JSON schema allowing you to define the input parameters. See the inputForm.json page for syntax details.
textbox
Corresponds to stringType in xml.
checkbox
A checkbox that supports the option of being required, so can serve as an active consent feature. (corresponds to the booleanType in xml).
radio
A radio button group to select one from a list of choices. The values to choose from must be unique.
select
A dropdown selection to select one from a list of choices. This can be used for both single-level lists and tree-based lists.
number
The value is of Number type in javascript and Double type in java. (corresponds to doubleType in xml).
integer
Corresponds to java Integer.
data
Data such as files.
section
For splitting up fields, to give structure. Rendered as subtitles. No values are to be assigned to these fields.
text
To display informational messages. No values are to be assigned to these fields.
fieldgroup
Can contain parameters or other groups. Allows to have repeating sets of parameters, for instance when a father|mother|child choice needs to be linked to each file input. So if you want to have the same elements multiple times in your form, combine them into a fieldgroup. Does not support the emptyValuesAllowed attribute.
These attributes can be used to configure all parameter types.
label
The display label for this parameter. Optional but recommended, id will be used if missing.
minValues
The minimal amount of values that needs to be present. Default when not set is 0. Set to >=1 to make the field required.
maxValues
The maximal amount of values that need to be present. Default when not set is 1.
minMaxValuesMessage
The error message displayed when minValues or maxValues is not adhered to. When not set, a default message is generated.
helpText
A helper text about the parameter. Will be displayed in smaller font with the parameter.
placeHolderText
An optional short hint ( a word or short phrase) to aid the user when the field has no value.
value
The value of the parameter. Can be considered default value.
minLength
Only applied on type="textbox". Value is a positive integer.
maxLength
Only applied on type="textbox". Value is a positive integer.
min
Minimal allowed value for 'integer' and 'number' type.
for 'integer' type fields the minimal and maximal values are -100000000000000000 and 100000000000000000.
for 'number' type fields the max precision is 15 significant digits and the exponent needs to be between -300 and +300.
max
Maximal allowed value for 'integer' and 'number' type.
for 'integer' type fields the minimal and maximal values are -100000000000000000 and 100000000000000000.
for 'number' type fields the max precision is 15 significant digits and the exponent needs to be between -300 and +300.
choices
A list of choices with for each a "value", "text" (is label), "selected" (only 1 true supported), "disabled". "parent" can be used to build hierarchical choicetrees. "availableWhen" can be used for conditional presence of the choice based on values of other fields. Parent and value must be unique, you can not use the same value for both.
fields
The list of sub fields for type fieldgroup.
dataFilter
For defining the filtering when type is 'data'. Use nameFilter for matching the name of the file, dataFormat for file format and dataType for selecting between files and directories. Tip: To see the data formats, open the file details in ICA and look at the Format on the data details. You can expand the dropdown list to see the syntax.
regex
The regex pattern the value must adhere to. Only applied on type="textbox".
regexErrorMessage
The optional error message when the value does not adhere to the "regex". A default message will be used if this parameter is not present. It is highly recommended to set this as the default message will show the regex which is typically very technical.
hidden
Makes this parameter hidden. Can be made visible later in onRender.js or can be used to set hardcoded values of which the user should be aware.
disabled
Shows the parameter but makes editing it impossible. The value can still be altered by onRender.js.
emptyValuesAllowed
When maxValues is 1 or not set and emptyValuesAllowed is true, the values may contain null entries. Default is false.
updateRenderOnChange
When true, the onRender javascript function is triggered each time the user changes the value of this field. Default is false.
dropValueWhenDisabled
When this is present and true and the field has disabled being true, then the value will be omitted during the submit handling (on the onSubmit result).
Streamable inputs
Adding "streamable":true to an input field of type "data" makes it a streamable input.
The onSubmit.js javascript function receives an input object which holds information about the chosen values of the input form and the pipeline and pipeline execution request parameters. This javascript function is not only triggered when submitting a new pipeline execution request in the user interface, but also when submitting one through the rest API..
settings
The value of the setting fields. Corresponds to settingValues in the onRender.js. This is a map with field id as key and an array of field values as value. For convenience, values of single-value fields are present as the individual value and not as an array of length 1. In case of fieldGroups, the value can be multiple levels of arrays. For fields of type data the values in the json are data ids (fil.xxxx). To help with validation, these are expanded and made available as an object here containing the id, name, path, format, size and a boolean indicating whether the data is external. This info can be used to validate or pick the chosen storageSize.
settingValues
To maximize the opportunity for reusing code between onRender and onSubmit, the 'settings' are also exposed as settingValues like in the onRender input.
pipeline
Info about the pipeline: code, tenant, version, and description are all available in the pipeline object as string.
analysis
Info about this run: userReference, userName, and userTenant are all available in the analysis object as string.
storageSize
The storage size as chosen by the user. This will initially be null. StorageSize is an object containing an 'id' and 'name' property.
storageSizeOptions
The list of storage sizes available to the user when creating an analysis. Is a list of StorageSize objects containing an 'id' and 'name' property.
analysisSettings
The input form json as saved in the pipeline. So the original json, without eventual changes.
currentAnalysisSettings
The current input form JSON as rendered to the user. This can contain already applied changes form earlier onRender passes. Null in the first call, when context is 'Initial' or when analysis is created through CLI/API.
settings
The value of the setting fields. This allows modifying the values or applying defaults and such. Or taking info of the pipeline or analysis input object. When settings are not present in the onSubmit return value object, they are assumed to be not modified.
validationErrors
A list of AnalysisError essages representing validation errors. Submitting a pipeline execution request is not possible while there are still validation errors.
analysisSettings
The input form json with potential applied changes. The discovered changes will be applied in the UI when viewing the analysis.
This is the object used for representing validation errors.
fieldId / FieldId
The field which has an erroneous value. When not present, a general error/warning is displayed. To display an error on the storage size, use the storageSizeFieldid.
index / Index
The 0-starting index of the value which is incorrect. Use this when a particular value of a multivalue field is not correct. When not present, the entire field is marked as erroneous. The value can also be an array of indexes for use with fieldgroups. For instance, when the 3rd field of the 2nd instance of a fieldgroup is erroneous, a value of [ 1 , 2 ] is used.
message / Message
The error/warning message to display.
Receives an input object which contains information about the current state of the input form, the chosen values and the field value change that triggered the onrender call. It also contains pipeline information. Changed objects are present in the onRender return value object. Any object not present is considered to be unmodified. Changing the storage size in the start analysis screen triggers an onRender execution with storageSize as changed field.
context
"Initial"/"FieldChanged"/"Edited".
Initial is the value when first displaying the form when a user opens the start run screen.
The value is FieldChanged when a field with 'updateRenderOnChange'=true is changed by the user.
Edited (Not yet supported in ICA) is used when a form is displayed later again, this is intended for draft runs or when editing the form during reruns.
changedFieldId
The id of the field that changed and which triggered this onRender call. context will be FieldChanged. When the storage size is changed, the fieldId will be storageSize.
analysisSettings
The input form json as saved in the pipeline. This is the original json, without changes.
currentAnalysisSettings
The current input form json as rendered to the user. This can contain already applied changes form earlier onRender passes. Null in the first call, when context is Initial.
settingValues
The current value of all settings fields. This is a map with field id as key and an array of field values as value for multivalue fields. For convenience, values of single-value fields are present as the individual value and not as an array of length 1. In case of fieldGroups, the value can be multiple levels of arrays. For fields of type data the values in the json are data ids (fil.xxxx). To help with validation, these are expanded and made available as an object here containing the id, name, path, format, size and a boolean indicating whether the data is external. This info can be used to validate or pick the chosen storageSize.
pipeline
Information about the pipeline: code, tenant, version, and description are all available in the pipeline object as string.
analysis
Information about this run: userReference, userName, and userTenant are all available in the analysis object as string.
storageSize
The storage size as chosen by the user. This will initially be null. StorageSize is an object containing an 'id' and 'name' property.
storageSizeOptions
The list of storage sizes available to the user when creating an analysis. Is a list of StorageSize objects containing an 'id' and 'name' property.
analysisSettings
The input form json with potential applied changes. The discovered changes will be applied in the UI.
settingValues
The current, potentially altered map of all setting values. These will be updated in the UI.
validationErrors
A list of RenderMessages representing validation errors. Submitting a pipeline execution request is not possible while there are still validation errors.
validationWarnings
A list of RenderMessages representing validation warnings. A user may choose to ignore these validation warnings and start the pipeline execution request.
storageSize
The suitable value for storageSize. Must be one of the options of input.storageSizeOptions. When absent or null, it is ignored.
validation errors and validation warnings can use 'storageSize' as fieldId to let an error appear on the storage size field. 'storageSize' is the value of the changedFieldId when the user alters the chosen storage size.
This is the object used for representing validation errors and warnings. The attributes can be used with first letter lowercase (consistent with the input object attributes) or uppercase.
fieldId / FieldId
The field which has an erroneous value. When not present, a general error/warning is displayed. To display an error on the storage size, use the storageSizeFieldid.
index / Index
The 0-starting index of the value which is incorrect. Use this when a particular value of a multivalue field is not correct. When not present, the entire field is marked as erroneous. The value can also be an array of indexes for use with fieldgroups. For instance, when the 3rd field of the 2nd instance of a fieldgroup is erroneous, a value of [ 1 , 2 ] is used.
message / Message
The error/warning message to display.
A Tool is the definition of a containerized application with defined inputs, outputs, and execution environment details including compute resources required, environment variables, command line arguments, and more.
Tools define the inputs, parameters, and outputs for the analysis. Tools are available for use in graphical CWL pipelines by any project in the account.
Select System Settings > Tool Repository > + Create.
Configure tool settings in the tool properties tabs. See Tool Properties.
Select Save.
The following sections describe the tool properties that can be configured in each tab.
Refer to the CWL CommandLineTool Specification for further explanation about many of the properties described below. Not all features described in the specification are supported.
Name
The name of the tool.
Description
Free text description for information purposes.
Icon
The icon for the tool.
Status
The release of the tool.
Docker image
The registered Docker image for the tool.
Categories
One or more tags to categorize the tool. Select from existing tags or type a new tag name in the field.
Tool version
The version of the tool specified by the end user. Could be any string.
Release version
The version number of the tool.
Version comment
A description of changes in the updated version.
Links
External reference links.
Documentation
The Documentation field provides options for configuring the HTML description for the tool. The description appears in the Tool Repository but is excluded from exported CWL definitions.
The release status of the tool. can be one of "Draft", "Release Candidate", "Released" or "Deprecated". The Building and Build Failed options are set by the application and not during configuration.
Draft
Fully editable draft.
Release Candidate
The tool is ready for release. Editing is locked but the tool can be cloned to create a new version.
Released
The tool is released. Tools in this state cannot be edited. Editing is locked but the tool can be cloned to create a new version.
Deprecated
The tool is no longer intended for use in pipelines. but there are no restrictions placed on the tool. That is, it can still be added to new pipelines and will continue to work in existing pipelines. It is merely an indication to the user that the tool should no longer be used.
The General provides options to configure the basic command line.
ID
CWL identifier field
CWL version
The CWL version in use. This field cannot be changed.
Base command
Components of the command. Each argument must be added in a separate line.
Standard out
The name of the file that captures Standard Out (STDOUT) stream information.
Standard error
The name of the file that captures Standard Error (STDERR) stream information.
Requirements
The requirements for triggering an error message. (see below)
Hints
The requirements for triggering a warning message. (see below)
The Hints/Requirements include CWL features to indicate capabilities expected in the Tool's execution environment.
Inline Javascript
The Tool contains a property with a JavaScript expression to resolve it's value.
Initial workdir
The workdir can be any of the following types:
String or Expression — A string or JavaScript expression, eg, $(inputs.InputFASTA)
File or Dir — A map of one or more files or directories, in the following format: {type: array, items: [File, Directory]}
Dirent — A script in the working directory. The Entry name field specifies the file name.
Scatter feature — Indicates that the workflow platform must support the scatter and scatterMethod fields.
The Arguments tab provides options to configure base command parameters that do not require user input.
Tool arguments may be one of two types:
String or Expression — A literal string or JavaScript expression, eg --format=bam.
Binding — An argument constructed from the binding of an input parameter.
The following table describes the argument input fields.
Value
The literal string to be added to the base command.
String or expression
Position
The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added.
Binding
Prefix
The string prefix.
Binding
Item separator
The separator that is used between array values.
Binding
Value from
The source string or JavaScript expression.
Binding
Separate
The setting to require the Prefix and Value from fields to be added as separate or combined arguments. Tru indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument.
Binding
Shell quote
The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually.
Binding
Example
Prefix
--output-filename
Value from
$(inputs.inputSAM.nameroot).bam
Input file
/tmp/storage/SRR45678_sorted.sam
Output file
SRR45678_sorted.bam
The Inputs tab provides options to define the input files and folders for the tool. The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.
ID
The file ID.
Label
A short description of the input.
Description
A long description of the input.
Type
The input type, which can be either a file or a directory.
Input options
Optional indicates the input is optional. Multi value indicates there is more than one input file or directory. Streamable indicates the file is read or written sequentially without seeking.
Secondary files
The required secondary files or directories.
Format
The input file format.
Position
The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added.
Prefix
The string prefix.
Item separator
The separator that is used between array values.
Value from
The source string or JavaScript expression.
Load contents
The setting to require the Prefix and Value from fields to be added as separate or combined arguments. True indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument.
Separate
The setting to require the Prefix and Value from fields to be added as separate or combined arguments. True indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument.
Shell quote
The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually.
The Settings tab provides options to define parameters that can be set at the time of execution. The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.
ID
The file ID.
Label
A short description of the input.
Description
A long description of the input.
Type
The input type, which can be Boolean, Int, Long, Float, Double or String.
Default Value
The default value to use if the tool setting is not available.
Input options
Optional indicates the input is optional. Multi value indicates there can be more than one value for the input.
Position
The position of the argument in the final command line. If the position is not specified, the default value is set to 0 and the arguments appear in the order they were added.
Prefix
The string prefix.
Item separator
The separator that is used between array values.
Value from
The source string or JavaScript expression.
Separate
The setting to require the Prefix and Value from fields to be added as separate or combined arguments. True indicates the fields must be added as separate arguments. False indicates the fields must be added as a single concatenated argument.
Shell quote
The setting to quote the Value from field on the command line. True indicates the value field appears in the command line. False indicates the value field is entered manually.
The Outputs tab provides options to define the parameters of output files.
The following table describes the input and binding fields. Selecting multi value enables type binding options for adding prefixes to the input.
ID
The file ID.
Label
A short description of the input.
Description
A long description of the input.
Type
The input type, which can be either a file or a directory.
Output options
Optional indicates the input is optional. Multi value indicates here is more than one input file or directory. Streamable indicates the file is read or written sequentially without seeking.
Secondary files
The required secondary files or folders.
Format
The input file format.
Globs
The pattern for searching file names.
Load contents
Automatically loads some contents. The system extracts up to the first 64 KiB of text from the file. Populates the contents field with the first 64 KiB of text from the file.
Output eval
Evaluate an expression to generate the output value.
From the System Settings > Tool Repository page, select a tool.
Select Edit.
From the System Settings > Tool Repository page, select a tool.
Select the Information tab.
From the Status drop-down menu, select a status.
Select Save.
In addition to the interactive Tool builder, the platform GUI also supports working directly with the raw definition on the right hand side of the screen when developing a new Tool. This provides the ability to write the Tool definition manually or bring an existing Tool's definition to the platform.
Be careful when editing the raw tool definition as this can introduce errors.
A simple example CWL Tool definition is provided below.
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
label: echo
inputs:
message:
type: string
default: testMessage
inputBinding:
position: 1
outputs:
echoout:
type: stdout
baseCommand:
- echoAfter pasting into the editor, the definition is parsed and the other tabs for visually editing the Tool will populate according to the definition contents.
General Tool - includes your base command and various optional configurations.
The base command is required for your tool to run, e.g. python /path/to/script.py such that python and /path/to/script.py are added in separate lines.
Inline Javascript requirement - must be enabled if you are using Javascript anywhere in your tool definition.
Initial workdir requirement - Dirent Type
Your tool must point to a script that executes your analysis. That script can either be provided in your Docker image or using a Dirent. Defining a script via Dirent allows you to dynamically modify your script without updating your Docker image. In order to define your Dirent script define your script name under Entry name (e.g. runner.sh) and the script content under Entry. Then, point your base command to that custom script, e.g. bash runner.sh.
How to reference your tool inputs and settings throughout the tool definition?
You can either reference your inputs using their position or ID.
Settings can be referenced using their defined IDs, e.g. $(inputs.InputSetting)
File/Folder inputs can be referenced using their defined IDs, followed by the desired field, e.g. $(inputs.InputFile.path). For additional information please refer to the File CWL documentation.
All inputs can also be referenced using their position, e.g. bash script.sh $1 $2
Notifications (Projects > your_project > Project Settings > Notifications ) are events to which you can subscribe. When they are triggered, they deliver a message to an external target system such as emails, Amazon SQS or SNS systems or HTTP post requests. The following table describes available system events to which you can subscribe:
Analysis failure
ICA_EXEC_001
Emitted when an analysis fails
Analysis
Analysis success
ICA_EXEC_002
Emitted when an analysis succeeds
Analysis
Analysis aborted
ICA_EXEC_027
Emitted when an analysis is aborted either by the system or the user
Analysis
Analysis status change
ICA_EXEC_028
Emitted when an state transition on an analysis occurs
Analysis
Base Job failure
ICA_BASE_001
Emitted when a Base job fails
BaseJob
Base Job success
ICA_BASE_002
Emitted when a Base job succeeds
BaseJob
Data transfer success
ICA_DATA_002
Emitted when a data transfer is marked as Succeeded
DataTransfer
Data transfer stalled
ICA_DATA_025
Emitted when data transfer hasn't progressed in the past 2 minutes
DataTransfer
Data <action>
ICA_DATA_100
Subscribing to this serves as a wildcard for all project data status changes and covers those changes that have no separate code. This does not include DataTransfer events or changes that trigger no data status changes such as adding tags to data.
ProjectData
Data linked to project
ICA_DATA_104
Emitted when a file is linked to a project
ProjectData
Data can not be created in non-indexed folder
ICA_DATA_105
Emitted when attempting to create data in a non-indexed folder
ProjectData
Data deleted
ICA_DATA_106
Emitted when data is deleted
ProjectData
Data created
ICA_DATA_107
Emitted when data is created
ProjectData
Data uploaded
ICA_DATA_108
Emitted when data is uploaded
ProjectData
Data updated
ICA_DATA_109
Emitted when data is updated
ProjectData
Data archived
ICA_DATA_110
Emitted when data is archived
ProjectData
Data unarchived
ICA_DATA_114
Emitted when data is unarchived
ProjectData
Job status changed
ICA_JOB_001
Emitted when a job changes status (INITIALIZED, WAITING_FOR_RESOURCES, RUNNING, STOPPED, SUCCEEDED, PARTIALLY_SUCCEEDED, FAILED)
JobId
Sample completed
ICA_SMP_002
Emitted when a sample is marked as completed
ProjectSample
Sample linked to a project
ICA_SMP_003
Emitted when a sample is linked to a project
ProjectSample
Workflow session start
ICA_WFS_001
Emitted when workflow is started
WorkflowSession
Workflow session failure
ICA_WFS_002
Emitted when workflow fails
WorkflowSession
Workflow session success
ICA_WFS_003
Emitted when workflow succeeds
WorkflowSession
Workflow session aborted
ICA_WFS_004
Emitted when workflow is aborted
WorkflowSession
When you subscribe to overlapping event codes such as ICA_EXEC_002 (analysis success) and ICA_EXEC_028 (analysis status change) you will get both notifications when analysis success occurs.
Event notifications can be delivered to the following delivery targets:
E-mail delivery
E-mail Address
Sqs
AWS SQS Queue
AWS SQS Queue URL
Sns
AWS SNS Topic
AWS SNS Topic ARN
Http
Webhook (POST request)
URL
To create a subscription via the GUI, select Projects > your_project > Project Settings > Notifications > +Create > ICA event. Select an event from the dropdown menu and fill out the requested fields, depending on the selected delivery targets, the fields will change.
Once created, you can disable, enable or delete the notification subscriptions at Projects > your_project > Project Settings > Notifications.
In order to allow the platform to deliver events to Amazon SQS or SNS delivery targets, a cross-account policy needs to be added to the target Amazon service.
{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Principal":{
"AWS":"arn:aws:iam::<platform_aws_account>:root"
},
"Action":"<action>",
"Resource": "<arn>"
}
]
}Substitute the variables in the example above according to the table below.
platform_aws_account
The platform AWS account ID: 079623148045
action
For SNS use SNS:Publish. For SQS, use SQS:SendMessage
arn
The Amazon Resource Name (ARN) of the target SNS topic or SQS queue
See examples for setting policies in Amazon SQS and Amazon SNS
To create a subscription to deliver events to an Amazon SNS topic, one can use either GUI or API endpoints.
To create a subscription via GUI, select Projects > your_project > Project Settings > Notifications > +Create > ICA event. Select an event from the dropdown menu, insert optional filter, select the channel type (SNS), and then insert the ARN from the target SNS topic and the AWS region.
To create a subscription via API, use the endpoint /api/notificationChannel to create a channel and then /api/projects/{projectId}/notificationSubscriptions to create a notification subscription.
To create a subscription to deliver events to an Amazon SQS queue, you can use either GUI or API endpoints.
To create a subscription via the GUI, select Projects > your_project > Project Settings > Notifications > +Create > ICA event. Next, select an event from the dropdown menu, choose SQS as the way to receive the notifications, enter your SQS URL, and if applicable for that event, choose a payload version. Not all payload versions are applicable for all events and targets, so the system will filter the options out for you. Finally, you can enter a filter expression to filter which events are relevant for you. Only those events matching the expression will be received.
To create a subscription via API, use the endpoint /api/notificationChannel to create a channel and then /api/projects/{projectId}/notificationSubscriptions to create a notification subscription.
Messages delivered to AWS SQS contain the following event body attributes:
correlationId
GUID used to identify the event
timestamp
Date when the event was sent
eventCode
Event code of the event
description
Description of the event
payload
Event payload
The following example is a Data Updated event payload sent to an AWS SQS delivery target (condensed for readability):
{
"correlationId": "2471d3e2-f3b9-434c-ae83-c7c7d3dcb4e0",
"timestamp": "2022-10-06T07:51:09.128Z",
"eventCode": "ICA_DATA_100",
"description": "Data updates",
"payload": {
"id": "fil.8f6f9511d70e4036c60908daa70ea21c",
...
}
}Notification subscriptions will trigger for all events matching the configured event type. A filter may be configured on a subscription to limit the matching strategy to only those event payloads which match the filter.
The filter expressions leverage the JsonPath library for describing the matching pattern to be applied to event payloads. The filter must be in the format [?(<expression>)].
The Analysis Success event delivers a JSON event payload matching the Analysis data model (as output from the API to retrieve a project analysis).
{
"id": "0c2ed19d-9452-4258-809b-0d676EXAMPLE",
"timeCreated": "2021-09-20T12:23:18Z",
"timeModified": "2021-09-20T12:43:02Z",
"ownerId": "15d51d71-b8a1-4b38-9e3d-74cdfEXAMPLE",
"tenantId": "022c9367-8fde-48fe-b129-741a4EXAMPLE",
"reference": "210920-1-CopyToolDev-9d78096d-35f4-47c9-b9b6-e0cbcEXAMPLE",
"userReference": "210920-1",
"pipeline": {
"id": "20261676-59ac-4ea0-97bd-8a684EXAMPLE",
"timeCreated": "2021-08-25T01:49:41Z",
"timeModified": "2021-08-25T01:49:41Z",
"ownerId": "15d51d71-b8a1-4b38-9e3d-74cdfEXAMPLE",
"tenantId": "022c9367-8fde-48fe-b129-741a4EXAMPLE",
"code": "CopyToolDev",
"description": "CopyToolDev",
"language": "CWL",
"pipelineTags": {
"technicalTags": ["Demo"]
}
},
"status": "SUCCEEDED",
"startDate": "2021-09-20T12:23:21Z",
"endDate": "2021-09-20T12:43:00Z",
"summary": "",
"finishedSteps": 0,
"totalSteps": 1,
"tags": {
"technicalTags": [],
"userTags": [],
"referenceTags": []
}
}The below examples demonstrate various filters operating on the above event payload:
Filter on a pipeline, with a code that starts with ‘Copy’. You’ll need a regex expression for this:
[?($.pipeline.code =~ /Copy.*/)]
Filter on status (note that the Analysis success event is only emitted when the analysis is successful):
[?($.status == 'SUCCEEDED')]
Both payload Version V3 and V4 guarantee the presence of the final state (SUCCEEDED, FAILED, FAILED_FINAL, ABORTED) but depending on the flow (so not every intermediate state is guaranteed):
V3 can have REQUESTED - IN_PROGRESS - SUCCEEDED
V4 can have the status REQUESTED - QUEUED - INITIALIZING - PREPARING_INPUTS - IN_PROGRESS - GENERATING_OUTPUTS - SUCCEEDED
Filter on pipeline, having a technical tag “Demo":
[?('Demo' in $.pipeline.pipelineTags.technicalTags)]
Combination of multiple expressions using &&. It's best practice to surround each individual expression with parentheses:
[?(($.pipeline.code =~ /Copy.*/) && $.status == 'SUCCEEDED')]
Examples for other events
Filtering ICA_DATA_104 on owning project name. The top level keys on which you can filter are under the payload key, so payload is not included in this filter expression.
[?($.details.owningProjectName == 'my_project_name')]
Custom events enable triggering notification subscriptions using event types beyond the system-defined event types. When creating a custom subscription, a custom event code may be specified to use within the project. Events may then be sent to the specified event code using a POST API with the request body specifying the event payload.
Custom events can be defined using the API. In order to create a custom event for your project please follow the steps below:
Create a new custom event POST {ICA_URL}/ica/rest/api/projects/{projectId}/customEvents
a. Your custom event code must be 1-20 characters long, e.g. 'ICA_CUSTOM_123'.
b. That event code will be used to reference that custom event type.\
Create a new notification channel POST {ICA_URL}/ica/rest/api/notificationChannels
a. If there is already a notification channel created with the desired configuration within the same project, it is also possible to get the existing channel ID using the call GET {ICA_URL}/ica/rest/api/notificationChannels.\
Create a notification subscription POST {ICA_URL}/ica/rest/api/projects/{projectId}/customNotificationSubscriptions.
a. Use the event code created in step 1.
b. Use the channel ID from step 2.\
To create a subscription via the GUI, select Projects > your_project > Project Settings > Notifications > +Create > Custom event.
Once the steps above have been completed successfully, the call from the first step POST {ICA_URL}/ica/rest/api/projects/{projectId}/customEvents could be reused with the same event code to continue sending events through the same channel and subscription.
Following is a sample Python function used inside an ICA pipeline to post custom events for each failed metric:
def post_custom_event(metric_name: str, metric_value: str, threshold: str, sample_name: str):
api_url = f"{ICA_HOST}/api/projects/{PROJECT_ID}/customEvents"
headers = {
"Content-Type": "application/vnd.illumina.v3+json",
"accept": "application/vnd.illumina.v3+json",
"X-API-Key": f"{ICA_API_KEY}"
}
content = {\"code\": \"ICA_CUSTOM_123\", \"content\": { \"metric_name\": metric_name, \"metric_value\": metric_value,\"threshold\": threshold, \"sample_name\": sample_name}}
json_data = json.dumps(content)
response = requests.post(api_url, data=json_data, headers=headers)
if response.status_code != 204:
print(f"[EVENT-ERROR] Could not post metric failure event for the metric {metric_name} (sample {sample_name}).")


The following is a list of available bench CLI commands and thier options.
Please refer to the examples from the Illumina website for more details.
Usage:
workspace-ctl [flags]
workspace-ctl [command]
Available Commands:
completion Generate completion script
compute
data
help Help about any command
software
workspace
Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
-h, --help help for workspace-ctl
--help-tree
--help-verbose
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Use "workspace-ctl [command] --help" for more information about a command.cmd execute error: accepts 1 arg(s), received 0Usage:
workspace-ctl compute [flags]
workspace-ctl compute [command]
Available Commands:
get-cluster-details
get-logs
get-pools
scale-pool
Flags:
-h, --help help for compute
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Use "workspace-ctl compute [command] --help" for more information about a command.workspace-ctl compute get-cluster-details
Usage:
workspace-ctl compute get-cluster-details [flags]
Flags:
-h, --help help for get-cluster-details
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")workspace-ctl compute get-logs
Usage:
workspace-ctl compute get-logs [flags]
Flags:
-h, --help help for get-logs
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")workspace-ctl compute get-pools
Usage:
workspace-ctl compute get-pools [flags]
Flags:
--cluster-id string Required. Cluster ID
-h, --help help for get-pools
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")workspace-ctl compute scale-pool
Usage:
workspace-ctl compute scale-pool [flags]
Flags:
--cluster-id string Required. Cluster ID
-h, --help help for scale-pool
--help-tree
--help-verbose
--pool-id string Required. Pool ID
--pool-member-count int Required. New pool size
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")Usage:
workspace-ctl data [flags]
workspace-ctl data [command]
Available Commands:
create-mount Create a data mount under /data/mounts. Return newly created mount.
delete-mount Delete a data mount
get-mounts Returns the list of data mounts
Flags:
-h, --help help for data
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Use "workspace-ctl data [command] --help" for more information about a command.workspace-ctl data create-mount
Create a data mount under /data/mounts. Return newly created mount.
Usage:
workspace-ctl data create-mount [flags]
Aliases:
create-mount, mount
Flags:
-h, --help help for create-mount
--help-tree Display commands as a tree
--help-verbose Extended help topics and options
--mode string Enum:["read-only","read-write"]. Mount mode i.e. read-only, read-write
--mount-path string Where to mount the data, e.g. /data/mounts/hg38data (or simply hg38data)
--source string Required. Source data location, e.g. /data/project/myData/hg38 or fol.bc53010dec124817f6fd08da4cf3c48a (ICA folder id)
--wait Wait for new mount to be available on all nodes before sending response
--wait-timeout int Max number of seconds for wait option. Absolute max: 300 (default 300)
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")workspace-ctl data delete-mount
Delete a data mount
Usage:
workspace-ctl data delete-mount [flags]
Aliases:
delete-mount, unmount
Flags:
-h, --help help for delete-mount
--help-tree
--help-verbose
--id string Id of mount to remove
--mount-path string Path of mount to remove
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")workspace-ctl data get-mounts
Returns the list of data mounts
Usage:
workspace-ctl data get-mounts [flags]
Aliases:
get-mounts, list-mounts
Flags:
-h, --help help for get-mounts
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")Usage:
workspace-ctl [flags]
workspace-ctl [command]
Available Commands:
completion Generate completion script
compute
data
help Help about any command
software
workspace
Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
-h, --help help for workspace-ctl
--help-tree
--help-verbose
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Use "workspace-ctl [command] --help" for more information about a command.workspace-ctl help completion
To load completions:
Bash:
$ source <(yourprogram completion bash)
# To load completions for each session, execute once:
# Linux:
$ yourprogram completion bash > /etc/bash_completion.d/yourprogram
# macOS:
$ yourprogram completion bash > /usr/local/etc/bash_completion.d/yourprogram
Zsh:
# If shell completion is not already enabled in your environment,
# you will need to enable it. You can execute the following once:
$ echo "autoload -U compinit; compinit" >> ~/.zshrc
# To load completions for each session, execute once:
$ yourprogram completion zsh > "${fpath[1]}/_yourprogram"
# You will need to start a new shell for this setup to take effect.
fish:
$ yourprogram completion fish | source
# To load completions for each session, execute once:
$ yourprogram completion fish > ~/.config/fish/completions/yourprogram.fish
PowerShell:
PS> yourprogram completion powershell | Out-String | Invoke-Expression
# To load completions for every new session, run:
PS> yourprogram completion powershell > yourprogram.ps1
# and source this file from your PowerShell profile.
Usage:
workspace-ctl completion [bash|zsh|fish|powershell]
Flags:
-h, --help help for completionworkspace-ctl help compute
Usage:
workspace-ctl compute [flags]
workspace-ctl compute [command]
Available Commands:
get-cluster-details
get-logs
get-pools
scale-pool
Flags:
-h, --help help for compute
--help-tree
--help-verbose
Use "workspace-ctl compute [command] --help" for more information about a command.workspace-ctl help compute get-cluster-details
Usage:
workspace-ctl compute get-cluster-details [flags]
Flags:
-h, --help help for get-cluster-details
--help-tree
--help-verboseworkspace-ctl help compute get-logs
Usage:
workspace-ctl compute get-logs [flags]
Flags:
-h, --help help for get-logs
--help-tree
--help-verboseworkspace-ctl help compute get-pools
Usage:
workspace-ctl compute get-pools [flags]
Flags:
--cluster-id string Required. Cluster ID
-h, --help help for get-pools
--help-tree
--help-verboseworkspace-ctl help compute scale-pool
Usage:
workspace-ctl compute scale-pool [flags]
Flags:
--cluster-id string Required. Cluster ID
-h, --help help for scale-pool
--help-tree
--help-verbose
--pool-id string Required. Pool ID
--pool-member-count int Required. New pool sizeworkspace-ctl help data
Usage:
workspace-ctl data [flags]
workspace-ctl data [command]
Available Commands:
create-mount Create a data mount under /data/mounts. Return newly created mount.
delete-mount Delete a data mount
get-mounts Returns the list of data mounts
Flags:
-h, --help help for data
--help-tree
--help-verbose
Use "workspace-ctl data [command] --help" for more information about a command.workspace-ctl help data create-mount
Create a data mount under /data/mounts. Return newly created mount.
Usage:
workspace-ctl data create-mount [flags]
Aliases:
create-mount, mount
Flags:
-h, --help help for create-mount
--help-tree
--help-verbose
--mount-path string Where to mount the data, e.g. /data/mounts/hg38data (or simply hg38data)
--source string Required. Source data location, e.g. /data/project/myData/hg38 or fol.bc53010dec124817f6fd08da4cf3c48a (ICA folder id)
--wait Wait for new mount to be available on all nodes before sending response
--wait-timeout int Max number of seconds for wait option. Absolute max: 300 (default 300)workspace-ctl help data delete-mount
Delete a data mount
Usage:
workspace-ctl data delete-mount [flags]
Aliases:
delete-mount, unmount
Flags:
-h, --help help for delete-mount
--help-tree
--help-verbose
--id string Id of mount to remove
--mount-path string Path of mount to removeworkspace-ctl help data get-mounts
Returns the list of data mounts
Usage:
workspace-ctl data get-mounts [flags]
Aliases:
get-mounts, list-mounts
Flags:
-h, --help help for get-mounts
--help-tree
--help-verboseworkspace-ctl help help
Help provides help for any command in the application.
Simply type workspace-ctl help [path to command] for full details.
Usage:
workspace-ctl help [command] [flags]
Flags:
-h, --help help for helpworkspace-ctl help software
Usage:
workspace-ctl software [flags]
workspace-ctl software [command]
Available Commands:
get-server-metadata
get-software-settings
Flags:
-h, --help help for software
--help-tree
--help-verbose
Use "workspace-ctl software [command] --help" for more information about a command.workspace-ctl help software get-server-metadata
Usage:
workspace-ctl software get-server-metadata [flags]
Flags:
-h, --help help for get-server-metadata
--help-tree
--help-verboseworkspace-ctl help software get-software-settings
Usage:
workspace-ctl software get-software-settings [flags]
Flags:
-h, --help help for get-software-settings
--help-tree
--help-verboseworkspace-ctl help workspace
Usage:
workspace-ctl workspace [flags]
workspace-ctl workspace [command]
Available Commands:
get-cluster-settings
get-connection-details
get-workspace-settings
Flags:
-h, --help help for workspace
--help-tree
--help-verbose
Use "workspace-ctl workspace [command] --help" for more information about a command.workspace-ctl help workspace get-cluster-settings
Usage:
workspace-ctl workspace get-cluster-settings [flags]
Flags:
-h, --help help for get-cluster-settings
--help-tree
--help-verboseworkspace-ctl help workspace get-connection-details
Usage:
workspace-ctl workspace get-connection-details [flags]
Flags:
-h, --help help for get-connection-details
--help-tree
--help-verboseworkspace-ctl help workspace get-workspace-settings
Usage:
workspace-ctl workspace get-workspace-settings [flags]
Flags:
-h, --help help for get-workspace-settings
--help-tree
--help-verboseUsage:
workspace-ctl software [flags]
workspace-ctl software [command]
Available Commands:
get-server-metadata
get-software-settings
Flags:
-h, --help help for software
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Use "workspace-ctl software [command] --help" for more information about a command.workspace-ctl software get-server-metadata
Usage:
workspace-ctl software get-server-metadata [flags]
Flags:
-h, --help help for get-server-metadata
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")workspace-ctl software get-software-settings
Usage:
workspace-ctl software get-software-settings [flags]
Flags:
-h, --help help for get-software-settings
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")Usage:
workspace-ctl workspace [flags]
workspace-ctl workspace [command]
Available Commands:
get-cluster-settings
get-connection-details
get-workspace-settings
Flags:
-h, --help help for workspace
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")
Use "workspace-ctl workspace [command] --help" for more information about a command.workspace-ctl workspace get-cluster-settings
Usage:
workspace-ctl workspace get-cluster-settings [flags]
Flags:
-h, --help help for get-cluster-settings
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")workspace-ctl workspace get-connection-details
Usage:
workspace-ctl workspace get-connection-details [flags]
Flags:
-h, --help help for get-connection-details
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")workspace-ctl workspace get-workspace-settings
Usage:
workspace-ctl workspace get-workspace-settings [flags]
Flags:
-h, --help help for get-workspace-settings
--help-tree
--help-verbose
Global Flags:
--X-API-Key string
--base-path string For example: / (default "/")
--config string config file path
--debug output debug logs
--dry-run do not send the request to server
--hostname string hostname of the service (default "api:8080")
--print-curl print curl equivalent do not send the request to server
--scheme string Choose from: [http] (default "http")You can use your own S3 bucket with Illumina Connected Analytics (ICA) for data storage. This section describes how to configure your AWS account to allow ICA to connect to an S3 bucket.
When configuring a new project in ICA to use a preconfigured S3 bucket, create a folder on your S3 bucket in the AWS console. This folder will be connected to ICA as a prefix.
Failure to create a folder will result in the root folder of your S3 bucket being assigned which will block your S3 bucket from being used for other ICA projects with the error "Conflict while updating file/folder. Please try again later."
Because of how and does not send events for S3 folders, the following restrictions must be taken into account for ICA project data stored in S3.
When creating an empty folder in S3, it will not be visible in ICA.
When moving folders in S3, the original, but empty, folder will remain visible in ICA and must be manually deleted there.
When deleting a folder and its contents in S3, the empty folder will remain visible in ICA and must be manually deleted there.
Projects cannot be created with ./ as prefix since S3 does not allow uploading files with this key prefix.
The AWS S3 bucket must exist in the same AWS region as the ICA project. Refer to the table below for a mapping of ICA project regions to AWS regions:
(*) BSSH is not currently deployed on the South Korea instance, resulting in limited functionality in this region with regard to sequencer integration.
You can enable SSE using an Amazon S3-managed key (SSE-S3). Instructions for using KMS-managed (SSE-KMS) keys are found .
ICA requires cross-origin resource sharing (CORS) permissions to write to the S3 bucket for uploads via the browser. Refer to (expand the "Using the S3 console" section) documentation for instructions on enabling CORS via the AWS Management Console.
In the cross-origin resource sharing (CORS) section, enter the following content.
ICA requires specific permissions to access data in an AWS S3 bucket. These permissions are contained in an AWS IAM Policy.
Refer to the documentation for instructions on creating an AWS IAM Policy via the AWS Management Console. Use the following configuration during the process:
paste the JSON policy document below. Note the example below provides access to all objects prefixes in the bucket.
Replace YOUR_BUCKET_NAME with the name of the S3 bucket you created for ICA. Replace YOUR_FOLDER_NAME with the name of the folder in your S3 bucket.
On Versioned OR Suspended buckets, paste the JSON policy document below. Note the example below provides access to all objects prefixes in the bucket.
Replace YOUR_BUCKET_NAME with the name of the S3 bucket you created for ICA. Replace YOUR_FOLDER_NAME with the name of the folder in your S3 bucket.
To create the IAM Policy via the AWS CLI, create a local file named illumina-ica-admin-policy.json containing the policy content above and run the following command. Be sure the path to the policy document (--policy-document) leads to the path where you saved the file:
An AWS IAM User is needed to create an Access Key for ICA to connect to the AWS S3 Bucket. The policy will be attached to the IAM user to grant the user the necessary permissions.
Refer to the documentation for instructions on creating an AWS IAM User via the AWS Management Console. Use the following configuration during the process:
(optional) Set user name to "illumina_ica_admin"
Select the Programmatic access option for the type of access
Select Attach existing policies directly when setting the permissions, and choose the policy created in
(Optional) Retrieve the Access Key ID and Secret Access Key by choosing to Download .csv
To create the IAM user and attach the policy via the AWS CLI, enter the following command (AWS IAM users are global resources and do not require a region to be specified). This command creates an IAM user illumina_ica_admin, retrieves your AWS account number, and then attaches the policy to the user.
If the Access Key information was retrieved during the , skip this step.
Refer to the AWS documentation for instructions on creating an AWS Access Key via the AWS Console. See the "To create, modify, or delete another IAM user's access keys (console)" sub-section.
Use the command below to create the Access Key for the illumina_ica_admin IAM user. Note the SecretAccessKey is sensitive and should be stored securely. The access key is only displayed when this command is executed and cannot be recovered. A new access key must be created if it is lost.
The AccessKeyId and SecretAccessKey values will be provided to ICA in the next step.
Connecting your S3 bucket to ICA does not require any additional bucket policies.
By default, public access to the S3 bucket is allowed. For increased security, it is advised to block public access with the following command:
To block public access to S3 buckets on account level, you can use the AWS Console on the website.
To connect your S3 account to ICA, you need to add a storage credential in ICA containing the Access Key ID and Access Key created in the previous step. From the ICA home screen, navigate to System Settings > Credentials > Create > Storage Credential to create a new storage credential.
Provide a name for the storage credentials, ensure the type is set to "AWS user" and provide the Access Key ID and Secret Access Key.
With the secret credentials created, a storage configuration can be created using the secret credential. Refer to the instructions to for details.
ICA uses AssumeRole to copy and move objects from a bucket in an AWS account to another bucket in another AWS account. To allow cross account access to a bucket, the following policy statements must be added in the S3 bucket policy:
Be sure to replace the following fields:
ASSUME_ROLE_ARN: Replace this field with the ARN of the cross account role you want to give permission to. Refer to the table below to determine which region-specific Role ARN should be used.
YOUR_BUCKET_NAME: Replace this field with the name of the S3 bucket you created for ICA.
The ARN of the cross account role you want to give permission to is specified in the Principal. Refer to the table below to determine which region-specific Role ARN should be used.
The following are common issues encountered when connecting an AWS S3 bucket through a storage configuration
This error occurs when an existing bucket notification's event information overlaps with the notifications ICA is trying to add. only allows overlapping events with non-overlapping prefix. Depending on the conflicts on the notifications, the error can be presented in any of the following:
Volume Configuration cannot be provisioned: storage container is already set up for customer's own notification
Invalid parameters for volume configuration: found conflicting storage container notifications with overlapping prefixes
Failed to update bucket policy: Configurations overlap. Configurations on the same bucket cannot share a common event type
Solution:
In the Amazon S3 Console, review your current S3 bucket's notification configuration and look for prefixes that overlap with your Storage Configuration's key prefix
Delete the existing notification that overlaps with your Storage Configuration's key prefix
ICA will perform a series of steps in the background to re-verify the connection to your bucket.
This error can occur when recreating a recently deleted storage configuration. To fix the issue, you have to delete the bucket notifications:
In the select the bucket for which you need to delete the notifications from the list.
Choose properties
Navigate to the Event Notifications section and choose the check box for the event notifications with name gds:objectcreated, gds:objectremoved and gds:objectrestore and click Delete.
Wait 15 minutes for the storage to become available in ICA
Australia
ap-southeast-2
Canada
ca-central-1
Germany
eu-central-1
India
ap-south-1
Indonesia
ap-southeast-3
Israel
il-central-1
Japan
ap-northeast-1
Singapore
ap-southeast-1
South Korea*
ap-northeast-2
UK
eu-west-2
United Arab Emirates
me-central-1
United States
us-east-1
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"HEAD",
"GET",
"PUT",
"POST",
"DELETE"
],
"AllowedOrigins": [
"https://ica.illumina.com"
],
"ExposeHeaders": [
"ETag",
"x-amz-meta-custom-header"
]
}
]{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutBucketNotification",
"s3:ListBucket",
"s3:GetBucketNotification",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::YOUR_BUCKET_NAME"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:RestoreObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/YOUR_FOLDER_NAME/*"
},
{
"Effect": "Allow",
"Action": [
"sts:GetFederationToken"
],
"Resource": [
"*"
]
}
]
}{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutBucketNotification",
"s3:ListBucket",
"s3:GetBucketNotification",
"s3:GetBucketLocation",
"s3:ListBucketVersions",
"s3:GetBucketVersioning"
],
"Resource": [
"arn:aws:s3:::YOUR_BUCKET_NAME"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:RestoreObject",
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:GetObjectVersion"
],
"Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/YOUR_FOLDER_NAME/*"
},
{
"Effect": "Allow",
"Action": [
"sts:GetFederationToken"
],
"Resource": [
"*"
]
}
]
}aws iam create-policy --policy-name illumina-ica-admin-policy --policy-document file://illumina-ica-admin-policy.jsonaws iam create-user --user-name illumina_ica_admin
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
aws iam attach-user-policy --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/illumina-ica-admin-policy --user-name illumina_ica_adminaws iam create-access-key --user-name illumina_ica_admin
"AccessKey": {
"UserName": "illumina_ica_admin",
"AccessKeyId": "<access key id>",
"Status": "Active",
"SecretAccessKey": "<secret access key>",
"CreateDate": "2020-10-22 09:42:24+00:00"
}{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Principal": {
"AWS": "*"
},
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:RestoreObject",
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:GetObjectVersion"
],
"Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*",
"Condition": {
"ArnNotLike": {
"aws:PrincipalArn": [
"arn:aws:iam::YOUR_ACCOUNT_ID:user/YOUR_IAM_USER",
"arn:aws:sts::YOUR_ACCOUNT_ID:federated-user/*"
]
}
}
}
]
}aws s3api put-public-access-block --bucket ${BUCKET_NAME} --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true" {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowCrossAccountAccess",
"Effect": "Allow",
"Principal": {
"AWS": "ASSUME_ROLE_ARN"
},
"Action": [
"s3:PutObject",
"s3:DeleteObject",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::YOUR_BUCKET_NAME",
"arn:aws:s3:::YOUR_BUCKET_NAME/*"
]
}
]
}Australia (AU)
arn:aws:iam::079623148045:role/ica_aps2_crossacct
Canada (CA)
arn:aws:iam::079623148045:role/ica_cac1_crossacct
Germany (EU)
arn:aws:iam::079623148045:role/ica_euc1_crossacct
India (IN)
arn:aws:iam::079623148045:role/ica_aps3_crossacct
Indonesia (ID)
arn:aws:iam::079623148045:role/ica_aps4_crossacct
Israel (IL)
arn:aws:iam::079623148045:role/ica_ilc1_crossacct
Japan (JP)
arn:aws:iam::079623148045:role/ica_apn1_crossacct
Singapore (SG)
arn:aws:iam::079623148045:role/ica_aps1_crossacct
South Korea (KR)
arn:aws:iam::079623148045:role/ica_apn2_crossacct
UK (GB)
arn:aws:iam::079623148045:role/ica_euw2_crossacct
United Arab Emirates (AE)
arn:aws:iam::079623148045:role/ica_mec1_crossacct
United States (US)
arn:aws:iam::079623148045:role/ica_use1_crossacct
Access Forbidden
Access forbidden: {message}
Mostly occurs because of lack of permission. Fix: Review IAM policy, Bucket policy, ACLs for required permissions
Unsupported principal
Unsupported principal: The policy type ${policy_type} does not support the Principal element. Remove the Principal element.
This can indicate that the S3 bucket policy settings have been added to the IAM policy by mistake.
Conflict
System topic is not in a valid state
Conflict
Found conflicting storage container notifications with overlapping prefixes
Conflict
Found conflicting storage container notifications for {prefix}{eventTypeMsg}
Conflict
Found conflicting storage container notifications with overlapping prefixes{prefixMsg}{eventTypeMsg}
Customer Container Notification Exists
Volume Configuration cannot be provisioned: storage container is already set up for customer's own notification
Invalid Access Key ID
Failed to update bucket policy: The AWS Access Key Id you provided does not exist in our records.
Check the status of the AWS Access Key ID in the console. If not active, activate it. If missing, create it.
Invalid Paramater
Missing credentials for storage container
Check the storage credential. AccessKeyId and/or SecretAccessKey is not set.
Invalid Parameter
Missing bucket name for storage container
Bucket name has not been set for the storage configuration.
Invalid Parameter
The storage container name has invalid characters
Storage container name can only contain lowercase letters, numbers, hyphens, and periods.
Invalid Parameter
Storage Container '{storageContainer}' does not exist
Update storage configuration container to a valid s3 bucket.
Invalid Parameter
Invalid parameters for volume configuration: {message}
Invalid Storage Container Location
Storage container must be located in the {region} region
Update storage configuration region to match storage container region.
Invalid Storage Container Location
Storage container must be located in one of the following regions: {regions}
Update storage configuration region to match storage container region.
Missing Configuration
Missing queue name for storage container notification
Missing Configuration
Missing system topic name for storage container notification
Missing Configuration
Missing lambda ARN for storage container notification
Missing Configuration
Missing subscription name for storage container notification
Missing Storage Account Settings
The storage account '{storageAccountName}' needs HNS (Hierarchical Namespace) enabled.
Missing Storage Container Settings
Missing settings for storage container
ICA provides a Service Connector, which is a small program that runs on your local machine to sync data between the platform's cloud-hosted data store and your local computer or server. The Service Connector securely uploads data or downloads results using TLS 1.2. In order to do this, the Connector makes 2 connections:
A control connection, which the Connector uses to get configuration information from the platform, and to update the platform about its activities
A connection towards the storage node, used to transfer the actual data between your local or network storage and your cloud-based ICA storage.
This Connector runs in the background, and configuration is done in the Illumina Connected Analytics (ICA) platform, where you can add upload and download rules to meet the requirements of the current project and any new projects you may create.
The Service Connector looks at any new files and checks their size. As long as the file size is changing, it knows data is still being added to the file and it is not ready for transfer. Only when the file size is stable and does not change anymore will it consider the file to be complete and initiate transfer. Despite this, it is still best practice to not connect the Service Connector to active folders which are used as streaming output for other processes as this can result in incomplete files being transferred when the active processes have extended compute periods in which the file size remains unchanged.
The service connector will handle integrity checking during file transfer, which requires the calculation of hashes on the data. In addition, Transmission speed depends on the available data transfer bandwidth and connection stability. For these reasons, uploading large amounts of data can take considerable time. This can in turn result in temporarily seeing empty folders at the destination location since these are created at the beginning of the transfer process.
Select Projects > your_project > Project Settings > Connectivity > Service Connectors.
Select + Create.
Fill out the fields in the New Connector configuration page.
Name - Enter the name of the connector.
Status - This is automatically updated with the actual status, you do not need to enter anything here.
Debug Information Accessible by Illumina (optional) - Illumina support can request connector debugging information to help diagnose issues. For security reasons, support can only collect this data if the option Debug Information Accessible by Illumina is active. You can choose to either proactively enable this when encountering issues to speed up diagnosis or to only activate it when support requests access. You can at any time revoke access again by deselecting the option.
Description (optional) - Enter any additional information you want to show for this connector.
Mode (required) - Specify if the connector can upload data, download data, both or neither.
Operating system (required) - Select your server or computer operating system.
Add any upload or download rules. See Connector Rules below.
Select Save and download the connector (top right). An initialization key will be displayed in the platform now. Copy this value as it will be needed during installation.
Launch the installer after the download completes and follow the on-screen prompts to complete the installation, including entering the initialization key copied in the previous step. Do not install the connector in the upload folder as this will result in the connector attempting to upload itself and the associated log files.
Run the downloaded .exe file. During the installation, the installer will ask for the initialization key. Fill out the initialization key you see in the platform.
The installer will create an Illumina Service Connector, register it as a Windows service, and start the service. That means, if you wait for about 60 seconds, and then refresh the screen in the Platform by using the refresh button in the right top corner of the page, the connector should display as connected.
You can only install 1 connector on Windows. If for some reason, you need to install a new one, first uninstall the old one. You only need to do this when there is a problem with your existing connector. Upgrading a connector is also possible. To do this, you don’t need to uninstall the old one first.
Double click the downloaded .dmg file. Double click Illumina Service Connector in the window that opens to start the installer. Run through the installer, and fill out the initialization key when asked for it.
To start the connector once installed or after a reboot, open the app. You can find the app on the location where you installed it. The connector icon will appear in your dock when the app is running.
In the platform on the Connectivity page, you can check whether your local connector has been connected with the platform. This can take 60 seconds after you started your connector locally, and you may need to refresh the Connectivity page using the refresh button in the top right corner of the page to see the latest status of your connector.
The connector app needs to be closed to shut down your computer. You can do this from within your dock.
Installations require Java 11 or later. You can check this with ‘java –version’ from a command line terminal. With Java installed, you can run the installer from the command line using the command bash illumina_unix_develop.sh.
Depending on whether you have an X server running or not, it will display a UI, or follow a command line installation procedure. You can force a command line installation by adding a –c flag: bash illumina_unix_develop.sh -c.
The connector can be started by running ./illuminaserviceconnector start from the folder in which the connector was installed.
In the upload and download rules, you define different properties when setting up a connector. A connector can be used by multiple projects and a connector can have multiple upload and download rules. Configuration can be changed anytime. Changes to the configuration will be applied approximately 60 seconds after changes are made in ICA if the connector is already connected. If the connector is not already started when configuration changes are made in ICA, it will take about 60 seconds after the connector is started for the configuration changes to be propagated to the connector. The following are the different properties you can configure when setting up a connector. After adding a rule and installing the connector, you can use the Active checkbox to disable rules.
Below is an example of a new connector setup with an Upload Rule to find all files ending with .tar or .tar.gz located within the local folder C:\Users\username\data\docker-images.
An upload rule tells the connector which folder on your local disk it needs to watch for new files to upload. The connector contacts the platform every minute to pick up changes to upload rules. To configure upload rules for different projects, first switch into the desired project and select Connectivity. Choose the connector from the list and select Click to add a new upload rule and define the rule. The project field will be automatically filled with the project you are currently within.
Name
Name of the upload rule.
Active
Set to true to have this rule be active. This allows you to deactivate rules without deleting them.
Local folder
The folder path on the local machine where files to be uploaded are stored.
File pattern
Files with filenames that match the string/pattern will be uploaded.
Location
The location the data will be uploaded to.
Project
The project the data will be uploaded to.
Description
Additional information about the upload rule.
Assign Format
Select which data format tag the uploaded files will receive. This is used for various things like filtering.
Data owner
The owner of the data after upload.
When you schedule downloads in the platform, you can choose which connector needs to download the files. That connector needs some way to know how and where it needs to download your files. That’s what a download rule is for. The connector contacts the platform every minute to pick up changes to download rules. The following are the different download rule settings.
Name
Name of the download rule.
Active
Set to true to have this rule be active. This allows you to deactivate rules without deleting them.
Order of execution
If using multiple download rules, set the order the rules are performed.
Target Local folder
The folder path on the local machine where the files will be downloaded to.
Description
Additional information about the download rule.
Format
The format the files must comply to in order to be scheduled as downloaded.
Project
The projects the rule applies to.
You can see the service connector status by the color indicator.
green
Connected/Active
orange
Pending installation
grey
Installed/Inactive
red
-
When you set up your connector for the first time, and your sample files are located on a shared drive, it’s best to create a folder on your local disk, put one of the sample files in there, and do the connector setup with that folder. When this works, try to configure the shared drive.
Transfer to and from a shared drive may be quite slow. That means it can take up to 30 minutes after you configured a shared drive before uploads start. This is due to the integrity check the connector does for each file before it starts uploading. The connector can upload from or download to a shared drive, but there are a few conditions:
The drive needs to be mounted locally. X:\illuminaupload or /Volumes/shareddrive/illuminaupload will work, \\shareddrive\illuminaupload or smb://shareddrive/illuminaupload will not.
The user running the connector must have access to the shared drive without a password being requested.
The user who runs the Illumina Service Connector process on the Linux machine needs to have read, write and execute permissions on the installation folder.
Illumina might release new versions of the Service Connector, with improvements and/or bug fixes. You can easily download a new version of the Connector with the Download button on the Connectivity screen in the platform. After you downloaded the new installer, run it and choose the option ‘Yes, update the existing installation’.
To uninstall the connector, perform one of the following:
Windows and Linux: Run the uninstaller located in the folder where the connector was installed.
Mac: Move the Illumina Service Connector to your Trash folder.
The Connector has a log file containing technical information about what’s happening. When something doesn’t work, it often contains clues to why it doesn’t work. Interpreting this log file is not always easy, but it can help the support team to give a fast answer on what is wrong, so it is suggested to attach it to your email when you have upload or download problems. You can find this log file at the following location:
\<Installation Folder>\logs\BSC.out
Default: C:\Program Files (x86)\illumina\logs\BSC.out
/<Installation Directory>/Illumina Service Connector.app/Contents/java/app/logs/BSC.out
Default: /Applications/Illumina Service Connector.app/Contents/java/app/logs/BSC.out
/<Installation Directory>/logs/BSC.out
Default: /usr/local/illumina
Windows
Service connector doesn't connect
First, try restarting your computer. If that doesn’t help, open the Services application (By clicking the Windows icon, and then typing services). In there, there should be a service called Illumina Service Connector. • If it doesn’t have status Running, try starting it (right mouse click -> start) • If it has status Running, and still does not connect, you might have a corporate proxy. Proxy configuration is currently not supported for the connector. • If you do not have a corporate proxy, and your connector still doesn’t connect, contact Illumina Technical Support, and include your connector BSC.out log files.
OS X
Service connector doesn't connect
Check whether the Connector is running. If it is, there should be an Illumina icon in your Dock. • If it doesn’t, log out and log back in. An Illumina service connector icon should appear in your dock. • If it still doesn’t, try starting the Connector manually from the Launchpad menu. • If it has status Running, and still does not connect, you might have a corporate proxy. Proxy configuration is currently not supported for the connector. • If you do not have a corporate proxy, and your connector still doesn’t connect, contact Illumina Technical Support, and include your connector BSC.out log files.
Linux
Service connector doesn't connect
Check whether the connector process is running with:
ps aux
Linux
Can’t define java version for connector
The connector makes use of java version 8 or 11. If you run the installer and get the following error “Please define INSTALL4J_JAVA_HOME to point to a suitable JVM.”: • When downloading the correct java version from Oracle, there are 2 variables in the script that can be defined (INSTALL4J_JAVA_HOME_OVERRIDE & INSTALL4J_JAVA_PREFIX), but not INSTALL4J_JAVA_HOME, which is printed in the above error message. Instead, export the variable to your env before running the installation script. You can export the variable to your env before running the script, like this: • Note that Java home should not point to the java executable, but to the jre folder. For example: export INSTALL4J_JAVA_HOME_OVERRIDE=/usr/lib/jvm/java-1.8.0-openjdk-amd64 sh illumina _unix_1_13_2_0_35.sh
Linux
Corrupted installation script
If you get the following error message “gzip: sfx_archive.tar.gz: not in gzip format. I am sorry, but the installer file seems to be corrupted. If you downloaded that file please try it again. If you transfer that file with ftp please make sure that you are using binary mode.” : • This indicates the installation script file is corrupted. Editing the shell script will cause it to be corrupt. Please re-download the installation script from ICA.
Linux
Unsupported version error in log file
If the log file gives the following error "Unsupported major.minor version 52.0", an unsupported version of java is present. The connector makes use of java version 8 or 11.
Linux
Manage the connector via the CLI
• Connector installation issues:
It may be necessary to first make the connector installation script executable with:
chmod +x illumina_unix_develop.sh
Once it has been made executable, run the installation script with:
bash illumina_unix_develop.sh
It may be necessary to run with sudo depending on user permissions on the system:
sudo bash illumina_unix_develop.sh
If installing on a headless system, use the -c flag to do everythign from the command line:
bash illumina_unix_develop.sh -c
• Start connector with logging directly to the terminal stdout) (in case log file is not present, likely due to the absence of java version 8 or 11). From within the installation directory run:
./illuminaserviceconnector run
• Check status of connector. From within the install location run:
./illuminaserviceconnector status
• Stop the connector with:
./illuminaserviceconnector stop
• Restart the connector with:
./illuminaserviceconnector restart
Connector gets connected, but uploads won’t start
Create a new empty folder on your local disk, put a small file in there, and configure this folder as upload folder. • If it works, and your sample files are on a shared drive, have a look at the section. • If it works, and your sample files are on your local disk, there are a few possibilities: a) There is an error in how the upload folder name is configured in the platform. b) For big files, or on slow disks, the connector needs quite some time to start the transfer because it needs to calculate a hash to make sure there are no transfer errors. Wait up to 30 minutes, without changing anything to your Connector configuration. • If this doesn’t work, you might have a corporate proxy. Proxy configuration is currently not supported for the connector.
Upload from shared drive does not work
Follow the guidelines in section. Inspect the connector BSC.log file for any error messages regarding the folder not being found. • If there is such a message, there are two options: a) An issue with the the folder name, such as special characters and spaces. As a best practice, use only alphanumeric characters, underscores, dashes and periods. b) A permissions issue. In this case, ensure the user running the connector has read & write access, without a password being requested, to the network share. • If there are no messages indicating the folder cannot be found, it may be necessary to wait for some time until the integrity checks have been done. This check can take quite long on slow disks and slow networks.
Data Transfers are slow
Many factors can affect the speed: • Distance from upload location to storage location • Quality of the internet connection • Hardlines are preferred over WiFi • Restrictions for up- and download by the company or the provider. These factors can change every time the customer switches from location (e.g. working from home).
The upload or download progress % goes down instead of up.
This is normal behavior. Instead of one continuous transmission, data is split into blocks so that whenever transmission issues occur, not all data has to be retried. This does result in dropping back to a lower % of transmission completed when retrying.
The Data section gives you access to the files and folders stored in the project as well as those linked to the project. Here, you can perform searches and data management operations such as moving, copying, deleting and (un)archiving.
ICA supports UTF-8 characters in file and folder names for data. Please follow the guidelines detailed below. (For more information about recommended approaches to file naming that can be applicable across platforms, please refer to the .)
See the list of supported
Data privacy should be carefully considered when adding data in ICA, either through storage configurations (ie, AWS S3) or ICA data upload. Be aware that when adding data from cloud storage providers by creating a storage configuration, ICA will provide access to the data. Ensure the storage configuration source settings are correct and ensure uploads do not include unintended data in order to avoid unintentional privacy breaches. More guidance can be found in the .
See
On the Projects > your_project > Data page, you can view file information and preview files.
To view file details click on the filename to see the file details.
Run input tags identify the last 100 pipelines which used this file as input.
Connector tags indicate if the file was added via browser upload or connector.
To view file contents, select the checkbox at the beginning of the line and then select View from the top menu. Alternatively, you can first click on the filename to see the details and then click view to preview the file.
When you share the data view by sharing the link from your browser, filters and sorting is retained in links, so the recipient will see the same data and order.
To see the ongoing actions (copying from, copying to, moving from, moving to) on data in the data overview (Projects > your_project > Data), add the ongoing actions column from the column list. This contains a list of ongoing actions sorted by when they were created. You can also consult the data detail view for ongoing actions by clicking on the data in the overview. When clicking on an ongoing action itself, the data job details of the most recent created data job are shown.
For folders, the list of ongoing actions is displayed on top left of the folder details. When clicking the list, the data job details are shown of the most recent created data job of all actions.
When Secondary Data is added to a data record, those secondary data records are mounted in the same parent folder path as the primary data file when the primary data file is provided as an input to a pipeline. Secondary data is intended to work with the CWL secondaryFiles feature. This is commonly used with genomic data such as BAM files with companion BAM index files (refer to https://www.ncbi.nlm.nih.gov/tools/gbench/tutorial6/ for an example).
To hyperlink to data, use the following syntax:
Uploading data to the platform makes it available for consumption by analysis workflows and tools.
To upload data manually via the drag-and-drop interface in the platform UI, go to Projects > your_project > Data and either
Drag a file from your system into the Choose a file or drag it here box.
Select the Choose a file or drag it here box, and then choose a file. Select Open to upload the file.
Your files are added to the Data page with status partial during upload and become available when upload completes.
For instructions on uploading/downloading data via CLI, see .
You can copy data from the same project to a different folder or from another project to which you have access.
In order to copy data, the following rights must be assigned to the person copying the data:
The following restrictions apply when copying data:
Data in the "Partial" or "Archived" state will be skipped during a copy job.
To use data copy:
Go to the destination project for your data copy and proceed to Projects > your_project > Data > Manage > Copy From.
Optionally, use the filters (Type, Name, Status, Format or additional filters) to filter out the data or search with the search box.
Select the data (individual files or folders with data) you want to copy.
Select any meta data which you want to keep with the copied data (user tags, technical system tags or instrument information).
Select which action to take if the data already exists (overwrite exsiting data, don't copy or keep both the original and the new copy by appending a number to the copied data).
Select Copy Data to copy the data to your project. You can see the progress in Projects > your_project > Activity > Batch Jobs and if your browser permits it, a pop-up message will be displayed whan the copy process completes.
The outcome can be
INITIALIZED
WAITING_FOR_RESOURCES
RUNNING
STOPPED - When choosing to stop the batch job.
SUCCEEDED - All files and folders are copied.
PARTIALLY_SUCCEEDED - Some files and folders could be copied, but not all. Partially succeeded will typically occur when files were being modified or unavailable while the copy process was running.
FAILED - None of the files and folders could be copied.
To see the ongoing actions on data in the data overview (Projects > your_project > Data), you can add the ongoing actions column from the column list with the three column symbol at the top right, next to the filter funnel. You can also consult the data detail view for ongoing actions by clicking on the data in the overview.
There is a difference in copy type behavior between copying files and folders. The behavior is designed for files and it is best practice to not copy folders if there already is a folder with the same name in the destination location.
You can move data both within a project and between different projects to which you have access. If you allow notifications from your browser, a pop-up will appear when the move is completed.
Move From is used when you are in the destination location.
Move To is used when you are in the source location. Before moving the data, pre-checks are performed to verify that the data can be moved and no currently running operations are being performed on the folder. Conflicting jobs and missing permissions will be reported. Once the move has started, no other operation should be performed on the data being moved to avoid potential data loss or duplication. Adding or (un)archiving files during the move may result in duplicate folders and files with different identifiers. If this happens, you will need to manually delete the duplicate files and move the files which were skipped during the initial move.
When you move data from one location to another, you should not change the source data while the Move job is in progress. This will result in jobs getting aborted. Please expand the "Troubleshooting" section below for information on how to fix this if it occurs.
There are a number of rights and restrictions related to data move as this will delete the data in the source location.
Move jobs will fail if any data being moved is in the Partial or Archived state.
Move Data From is used when you are in the destination location.
Navigate to Projects > your_project > Data > your_destination_location > Manage > Move From.
Select the files and folders which you want to move.
Select the Move button. Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.
Move Data To is used when you are in the source location. You will need to select the data you want to move from to current location and the destination to move it to.
Navigate to Projects > your_project > Data > your_source_location.
Select the files and folders which you want to move.
Select to Projects > your_project > Data > your_source_location > Manage > Move To.
Select your target project and location.
Note: You can create a new folder to move data to by filling in the "New folder name (optional)" field. This does NOT rename an existing folder. To rename an existing folder, please see .
Select the Move button. Moving large amounts of data can take considerable time. You can monitor the progress at Projects > your_project > Activity > Batch Jobs.
INITIALIZED
WAITING_FOR_RESOURCES
RUNNING
STOPPED - When choosing to stop the batch job.
SUCCEEDED - All files and folders are moved.
PARTIALLY_SUCCEEDED - Some files and folders could be moved, but not all. Partially succeeded will typically occur when files were being modified or unavailable while the move process was running.
FAILED - None of the files and folders could be moved.
To see the ongoing actions on data in the data overview (Projects > your_project > Data), you can add the ongoing actions column from the column list with the three column symbol at the top right, next to the filter funnel. You can also consult the data detail view for ongoing actions by clicking on the data in the overview.
Single files can be downloaded directly from within the UI.
Select the checkbox next to the file which you want to download, followed by Download > Select Browser Download > Download.
You can also download files from their details screen. Click on the file name and select Download at the bottom of the screen. Depending on the size of your file, it may take some time to load the file contents.
You can trigger an asynchronous download via service connector using the Schedule for Download button with one or more files selected.
Select a file or files to download.
Select Download > Schedule download (for files or folders). This will display a list of all available connectors.
Select a connector and optionally, enter your email address if you want to be notified of download completion, and then select Download.
You can view the progress of the download or stop the download on the Activity page for the project.
The data records contained in a project can be exported in CSV, JSON, and excel format.
Select one or more files to export.
Select Export.
Select the following export options:
To export only the selected file, select the Selected rows as the Rows to export option. To export all files on the page, select Current page.
To export only the columns present for the file, select the Visible columns as the Columns to export option.
Select the export format.
To manually archive or delete files, do as follows:
Select the checkbox next to the file or files to delete or archive.
Select Manage, and then select one of the following options:
Archive — Move the file or files to long-term storage (event code ICA_DATA_110).
Unarchive — Return the file or files from long-term storage. Unarchiving can take up to 48 hours, regardless of file size. Unarchived files can be used in analysis (event code ICA_DATA_114).
Delete — Remove the file completely (event code ICA_DATA_106).
When attempting concurrent archiving or unarchiving of the same file, a message will inform you to wait for the currently running (un)archiving to finish first.
To archive or delete files programmatically, you can use ICA's API endpoints:
the file's information.
Modify the dates of the file to be deleted/archived.
the updated information back in ICA.
Data linking creates a dynamic read-only view to the source data. You can use data linking to get access to data without running the risk of modifying the source material and to share data between projects. In addition, linking ensures changes to the source data are immediately visible and no additional storage is required. You can recognise linked data by the green color and see the owning project as part of the details.
Since this is read-only access, you cannot perform actions on linked data that need write access. Actions like (un)archiving, linking, creating, deleting, adding or moving data and folders, and copying data into the linked data are not possible.
Select Projects > your_project > Data > Manage, and then select Link.
To view data by project, select the funnel symbol, and then select Owning Project. If you only know which project the data is linked to, you can choose to filter on linked projects.
Select the checkbox next to the file or files to add.
Select Select Data.
Your files are added to the Data page. To view the linked data file, select Add filter, and then select Links.
If you link a folder instead of individual files, a warning is displayed indicating that, depending on the size of the folder, linking may take considerable time. The linking process will run in the background and the progress can be monitored on the Projects > your_project > activity > Batch Jobs screen.
To see more details, double-click the batch job.
To see how many individual files are already linked, double-click the item.
To unlink the data, go to the root level of your project and select the linked folder or if you have linked individual files separately, then you can select those linked files (limited to 100 at a time) and select Manage > Unlink. As during linking a folder, when unlinking, the progress can be monitored at Projects > your_project > activity > Batch Jobs.
https://<ServerURL>/ica/link/project/<ProjectID>/data/<FolderID>https://<ServerURL>/ica/link/project/<ProjectID>/analysis/<AnalysisID>ServerURL
see browser addres bar
projectID
At YourProject > Details > URN > urn:ilmn:ica:project:ProjectID#MyProject
FolderID
At YourProject > Data > folder > folder details > ID
AnalysisID
At YourProject > Flow > Analyses > YourAnalysis > ID
Within a project
Contributor rights
Upload and Download rights
Contributor rights
Upload and Download rights
Between different projects
Download rights
Viewer rights
Upload rights
Contributor rights
Within a project
No linked data
No partial data
No archived data
No Linked data
Between different projects
Data sharing enabled
No partial data
No archived data
Within the same region
No linked data
Within the same region
Within a project
Contributor rights
Contributor rights
Between different projects
Download rights
Contributor rights
Upload rights
Viewer rights
Within a project
No linked data
No partial data
No archived data
No Linked data
Between different projects
Data sharing enabled
Data owned by user's tenant
No linked data
No partial data
No archived data
No externally managed projects
Within the same region
No linked data
Within same region
import requests
import json
from config import PROJECT_ID, DATA_ID, API_KEY
url_get="https://ica.illumina.com/ica/rest/api/projects/" + PROJECT_ID + "/data/" + DATA_ID
# set the API get headers
headers = {
'X-API-Key': API_KEY,
'accept': 'application/vnd.illumina.v3+json'
}
# set the API put headers
headers_put = {
'X-API-Key': API_KEY,
'accept': 'application/vnd.illumina.v3+json',
'Content-Type': 'application/vnd.illumina.v3+json'
}
# Helper function to insert willBeArchivedAt after field named 'region'
def insert_after_region(details_dict, timestamp):
new_dict = {}
for k, v in details_dict.items():
new_dict[k] = v
if k == 'region':
new_dict['willBeArchivedAt'] = timestamp
if 'willBeArchivedAt' in details_dict:
new_dict['willBeArchivedAt'] = timestamp
return new_dict
# 1. Make the GET request
response = requests.get(url_get, headers=headers)
response_data = response.json()
# 2. Modify the JSON data
timestamp = "2024-01-26T12:00:04Z" # Replace with the provided timestamp
response_data['data']['details'] = insert_after_region(response_data['data']['details'], timestamp)
# 3. Make the PUT request
put_response = requests.put(url_get, data=json.dumps(response_data), headers=headers_put)
print(put_response.status_code)Filtering
To add filters, select the funnel/filter symbol at the top right, next to the search field.
Filters are reset when you exit the current screen.
Sorting
To sort data, select the three vertical dots in the column header on which you want to sort and chose ascending or descending.
Sorting is retained when you exit the current screen.
Displaying Columns
To change which columns are displayed, select the three columns symbol and select which columns should be shown.
You can keep track of which files are externally controlled and which are ICA-managed by means of the “managed by” column.
The displayed columns are retained when you exit the current screen.


Replace
Overwrites the existing data. Folders will copy their data in an existing folder with existing files. Existing files will be replaced when a file with the same name is copied and new files will be added. The remaining files in the target folder will remain unchanged.
Don't copy
The original files are kept. If you selected a folder, files that do not yet exist in the destination folder are added to it. Files that already exist at the destination are not copied over and the originals are kept.
Keep both
Files have a number appended to them if they already exist. If you copy folders, the folders are merged, with new files added to the destination folder and original files kept. New files with the same name get copied over into the folder with a number appended.






A Pipeline is a series of Tools with connected inputs and outputs configured to execute in a specific order.
Linking a pipeline (Projects > your_project > Flow > Pipelines > Link) adds that pipeline to your project. This is not as a copy, but as the actual pipeline, so any changes to the pipeline are atomatically propagated to and from any project which has this pipeline linked.
You can link a pipeline if it is not already linked to your project and it is from your tenant or available in your bundle or activation code.
If you unlink a pipeline it removes the pipline from your project, but it remains part of the list of pipelines of your tenant, so it can be linked to other projects later on.
Pipelines are created and stored within projects.
Navigate to Projects > your_project > Flow > Pipelines > +Create.
Configure pipeline settings in the pipeline property tabs.
When creating a graphical CWL pipeline, drag connectors to link tools to input and output files in the canvas. Required tool inputs are indicated by a yellow connector.
Select Save.
Pipelines use the latest tool definition when the pipeline was last saved. Tool changes do not automatically propagate to the pipeline. In order to update the pipeline with the latest tool changes, edit the pipeline definition by removing the tool and re-adding it back to the pipeline.
For pipeline authors sharing and distributing their pipelines, the draft, released, deprecated, and archived statuses provide a structured framework for managing pipeline availability, user communication, and transition planning. To change the pipeline status, select it at Projects > your_project > Pipelines > your_pipeline > change status.
The following sections describe the properties that can be configured in each tab of the pipeline editor.
Depending on how you design the pipeline, the displayed tabs differ between the graphical and code definitions. For CWL you have a choice on how to define the pipeline, Nextflow is always defined in code mode.
Any additional source files related to your pipeline will be displayed here in alphabetical order.
See the following pages for language-specific details for defining pipelines:
The details tab provides options for configuring basic information about the pipeline.
Code
The name of the pipeline. The name must be unique within the tenant, including linked and unlinked pipelines.
Nextflow Version
User selectable Nextflow version available only for Nextflow pipelines
Categories
One or more tags to categorize the pipeline. Select from existing tags or type a new tag name in the field.
Description
A short description of the pipeline.
Proprietary
Hide the pipeline scripts and details from users who do not belong to the tenant who owns the pipeline. This also prevents cloning the pipeline.
Status
The of the pipeline.
Storage size
User selectable for running the pipeline. This must be large enough to run the pipeline, but setting it too large incurs unnecessary costs.
Family
A group of pipeline versions. To specify a family, select Change, and then select a pipeline or pipeline family. To change the order of the pipeline, select Up or Down. The first pipeline listed is the default and the remainder of the pipelines are listed as Other versions. The current pipeline appears in the list as this pipeline.
Version comment
A description of changes in the updated version.
Links
External reference links. (max 100 chars as name and 2048 chars as link)
The following information becomes visible when viewing the pipeline details.
ID
Unique Identifier of the pipeline.
URN
Identification of the pipeline in Uniform Resource Name
The clone action will be shown in the pipeline details at the top-right. Cloning a pipeline allows you to create modifications without impacting the original pipeline. When cloning a pipeline, you become the owner of the cloned pipeline. When you clone a pipeline, you must give it a unique name because no duplicate names are allowed within all projects of the tenant. So the name must be unique per tenant. It is possible that you see the same pipeline name twice when a pipeline linked from another tenant is cloned with that same name in your tenant. The name is then still unique per tenant, but you will see them both in your tenant.
When you clone a Nextflow pipeline, a verification of the configured Nextflow version is done to prevent the use of deprecated versions.
The Documentation tab provides is the place where you explain how your pipeline works to users. The description appears in the tool repository but is excluded from exported CWL definitions. If no documentation has been provided, this tab will be empty.
When using graphical mode for the pipeline definition, the Definition tab provides options for configuring the pipeline using a visualization panel and a list of component menus.
Machine profiles
available to use with Tools in the pipeline.
Shared settings
Settings for pipelines used in more than one tool.
Reference files
Descriptions of reference files used in the pipeline.
Input files
Descriptions of input files used in the pipeline.
Output files
Descriptions of output files used in the pipeline.
Tool
Details about the tool selected in the visualization panel.
Tool repository
A list of tools available to be used in the pipeline.
In graphical mode, you can drag and drop inputs into the visualization panel to connect them to the tools. Make sure to connect the input icons to the tool before editing the input details in the component menu. Required tool inputs are indicated by a yellow connector.
Safari is not suported as browser for graphical editing.
This page is used to specify all relevant information about the pipeline parameters.
For each process defined by the workflow, ICA will launch a compute node to execute the process.
For each compute type, the standard (default - AWS on-demand) or economy (AWS spot instance) tiers can be selected.
When selecting an fpga instance type for running analyses on ICA, it is recommended to use the medium size. While the large size offers slight performance benefits, these do not proportionately justify the associated cost increase for most use cases.
When no type is specified, the default type of compute node is standard-small.
By default, compute nodes have no scratch space. This is an advanced setting and should only be used when absolutely necessary as it will incur additional costs and may offer only limited performance benefits because it is not local to the compute node.
For simplicity and better integration, consider using shared storage available at /ces. It is what is provided in the Small/Medium/Large+ compute types. This shared storage is used when writing files with relative paths.
Daemon sets and system processes consume approximately 1 CPU and 2 GB Memory from the base values shown in the table. Consumption will vary based on the activity of the pod.
Compute Type
CPUs
Mem (GiB)
Nextflow (pod.value)
CWL (type, size)
standard-small
2
8
standard-small
standard, small
standard-medium
4
16
standard-medium
standard, medium
standard-large
8
32
standard-large
standard, large
standard-xlarge
16
64
standard-xlarge
standard, xlarge
standard-2xlarge
32
128
standard-2xlarge
standard, 2xlarge
standard-3xlarge
64
256
standard-3xlarge
standard, 3xlarge
hicpu-small
16
32
hicpu-small
hicpu, small
hicpu-medium
36
72
hicpu-medium
hicpu, medium
hicpu-large
72
144
hicpu-large
hicpu, large
himem-small
8
64
himem-small
himem, small
himem-medium
16
128
himem-medium
himem, medium
himem-large
48
384
himem-large
himem, large
himem-xlarge2
92
700
himem-xlarge
himem, xlarge
hiio-small
2
16
hiio-small
hiio, small
hiio-medium
4
32
hiio-medium
hiio, medium
fpga2-medium1
24
256
fpga2-medium
fpga2,medium
fpga2-large1
48
512
fpga2-large
fpga2,large
fpga-medium3
16
244
fpga-medium
fpga,medium
fpga-large3
64
976
fpga-large
fpga,large
transfer-small4
4
10
transfer-small
transfer, small
transfer-medium 4
8
15
transfer-medium
transfer, medium
transfer-large4
16
30
transfer-large
transfer, large
(3) FPGA1 instances will be decommissioned by Nov 1st 2025. Please migrate to F2 for improved capacity and performance with up to 40% reduced turnaround time for analysis.
Syntax highlighting is determined by the file type, but you can select alternative syntax highlighting with the drop-down selection list. The following formats are supported:
DIFF (.diff)
GROOVY (.groovy .nf)
JAVASCRIPT (.js .javascript)
JSON (.json)
SH (.sh)
SQL (.sql)
TXT (.txt)
XML (.xml)
YAML (.yaml .cwl)
The Nextflow project main script.
The Nextflow configuration settings.
The Common Workflow Language main script.
Multiple files can be added by selecting the +Create option at the bottom of the screen to make pipelines more modular and manageable.
See Metadata Models
Here patterns for detecting report files in the analysis output can be defined. On opening an analysis result window of this pipeline, an additional tab will display these report files. The goal is to provide a pipeline-specific user-friendly representation of the analysis result.
To add a report select the + symbol on the left side. Provide your report with a unique name, a regular expression matching the report and optionally, select the format of the report. This must be the source format of the report data generated during the analysis.
Use the following instructions to start a new analysis for a single pipeline.
Select Projects > your_project > Flow > Pipelines.
Select the pipeline or pipeline details of the pipeline you want to run.
Select Start Analysis.
Configure analysis settings. (see below)
Select Start Analysis.
View the analysis status on the Analyses page.
Requested—The analysis is scheduled to begin.
In Progress—The analysis is in progress.
Succeeded—The analysis is complete.
Failed —The analysis has failed.
Aborted — The analysis was aborted before completing.
To end an analysis, select Abort.
To perform a completed analysis again, select Re-run.
The Start Analysis screen provides the configuration options for the analysis.
User Reference
The unique analysis name.
Pipeline
This is not editable, but provides a link to the pipeline so you want to look up details of the pipeline.
User tags (optional)
One or more tags used to filter the analysis list. Select from existing tags or type a new tag name in the field.
Notification (optional)
Enter your email address if you want to be notified when the analysis completes.
Output Folder1
Select a folder in which the output folder of the analysis should be located. When no folder is selected, the output folder will be located in the root of the project.
When you open the folder selection dialog, you have the option to create a new folder (bottom of the screen). You can create nested folders by using the folder/subfolder syntax.
Do not use a / before the first folder or after the last subfolder in the folder creation dialog.
Logs Folder
Select a folder in which the logs of the analysis should be located. When no logs folder is selected, the logs will be stored as subfolder in the output folder. When a logs folder is selected which is different from the output folder, the outputs and logs folders are separated.
Files that already exist in the logs folder will be overwritten with new versions.
When you open the folder selection dialog, you have the option to create a new folder (bottom of the screen). You can create nested folders by using the folder/subfolder syntax.
Note: Choose a folder that is empty and not in use for other analyses, as files will be overwritten.
Note: Do not use a / before the first folder or after the last subfolder in the folder creation dialog.
Pricing
Select a subscription to which the analysis will be charged.
Input
Select the input files to use in the analysis. (max. 50,000)
Settings (optional)
Provide input settings.
Resources
Select the storage size for your analysis. The available storage sizes depend on your selected Pricing subscription. See for more information.
1 When using the API, you can redirect analysis outputs to be outside of the current project.
You can abort a running analysis from either the analysis overview (Projects > your_project > Flow > Analyses > your_analysis > Manage > Abort) or from the analysis details (Projects > your_project > Flow > Analyses > your_analysis > Details tab > Abort).
You can view analysis results on the Analyses page or in the output folder on the Data page.
Select a project, and then select the Flow > Analyses page.
Select an analysis.
From the output files tab, expand the list if needed and select an output file.
If you want to add or remove any user or technical tags, you can do so from the data details view.
If you want to download the file, select Download.
To preview the file, select the View tab.
Return to Flow > Analyses > your_analysis.
View additional analysis result information on the following tabs:
Details - View information on the pipeline configuration.
Report - Shows the reports defined on the pipeline report tab.
Output files - View the output of the Analysis.
Steps - stderr and stdout information.
Nextflow timeline - Nextflow process execution timeline.
Nextflow execution - Nextflow analysis report. Showing the run times, commands, resource usage and tasks for Nextflow analyses.
Draft
Use the draft status while developing or testing a pipeline version internally.
Only share draft pipelines with collaborators who are actively involved in development.
Released
The released status signals that a pipeline is stable and ready for general use.
Share your pipeline when it is ready for broad use. Ensure users have access to current documentation and know where to find support or updates. Releasing a pipeline is only possible if all tools of that pipeline must be in released status.
Deprecated
Deprecation is used when a pipeline version is scheduled for retirement or replacement. Deprecated pipelines can not be linked to bundles, but will not be unlinked from existing bundles. Users who already have access will still be able to start analyses. You can add a message (max 256 chars) when deprecating pipelines.
Deprecate in advance of archiving a pipeline, making sure the new pipeline is available in the same bundle as the deprecated pipeline. This will allow the pipeline author to link the new or alternative pipeline in the deprecation message field.
Archived
Archiving a pipeline version removes it from active use; users can no longer launch analyses. Archived pipelines can not be linked to bundles, but are not automatically unlinked from bundles or projects. You can add a message (max 256 chars) when archiving pipelines.
Warn users in advance: Deprecate the pipeline before archiving to allow existing users time to transition. Use the archive message to point users to the new or alternative pipeline
CWL Graphical
Details
Documentation
Definition
Analysis Report
Metadata Model
Report
CWL Code
Details
Documentation
Inputform files (JSON) or XML Configuration (XML)
CWL Files
Metadata Model
Report
Nextflow Code
Details
Documentation
Inputform Files (JSON) or XML Configuration (XML)
Nextflow files
Metadata Model
Report
1 DRAGEN pipelines running on fpga2 compute type will incur a DRAGEN license cost of 0.10 iCredits per gigabase of data processed, with additional discounts as shown below.
80 or less gigabase per sample - no discount - 0.10 iCredits per gigabase
> 80 to 160 gigabase per sample - 20% discount - 0.08 iCredits per gigabase
> 160 to 240 gigabase per sample - 30% discount - 0.07 iCredits per gigabase
> 240 to 320 gigabase per sample - 40% discount - 0.06 iCredits per gigabase
> 320 and more gigabase per sample - 50% discount - 0.05 iCredits per gigabase
If your DRAGEN job fails, only the compute cost is charged, no DRAGEN license cost will be charged.
DRAGEN Iterative gVCF Genotyper (iGG) will incur a license cost of 0.6216 iCredits per gigabase. For example, a sample of 3.3 gigabase human reference will result in 2 iCredits per sample. The associated Compute costs will be based on the compute instance chosen.
Find the links to CLI builds in the Releases section below.
In the Releases section below, select the matching operating system in the link column for the version you want to install. This will download the installer for that operating system.
To determine which CLI version you are currently using, navigate to your currently installed CLI and use the CLI command icav2 version For help on this command use icav2 version -h.
Checksums are provided alongside each downloadable CLI binary to verify file integrity. The checksums are generated using the SHA256 algorithm. To use the checksums:
Download the CLI binary for your OS
Download the corresponding checksum using the links in the table
Calculate the SHA256 checksum of the downloaded CLI binary
Diff the calculated SHA256 checksum with the downloaded checksum. If the checksums match, the integrity is confirmed.
There are a variety of open source tools for calculating the SHA256 checksum. See the below tables for examples.
For CLI v2.3.0 and later:
Windows
CertUtil -hashfile ica-windows-amd64.zip SHA256
Linux
sha256sum ica-linux-amd64.zip
Mac
shasum -a 256 ica-darwin-amd64.zip
2.40
2.39.0
2.38.0
No changes, use 2.37.0
-
2.37.0
2.36.0
2.35.0
2.34.0
2.33.0
2.32.2
2.31.0
2.30.0
2.29.0
2.28.0
2.27.0
2.26.0
2.25.0
2.24.0
2.23.0
2.22.0
2.21.0
2.19.0
2.18.0
2.17.0
2.16.0
2.15.0
2.12.0
2.10.0
2.9.0
2.8.0
2.4.0
2.3.0
2.2.0
2.1.0
2.0.0
Note: To access release history of CLI versions prior to v2.0.0, please see the ICA v1 documentation here.
ICA Cohorts data can be viewed in an ICA Project Base instance as a shared database. A shared database in ICA Base operates as a database view. To use this feature, enable Base for your project prior to starting any ICA Cohorts ingestions. See Base for more information on enabling this feature in your ICA Project.
After ingesting data into your project, select Phenotypic and Molecular data are available to view in Base. See Cohorts Import for instruction on importing data sets into Cohorts.
Post ingestion, data will be represented in Base.
Select BASE from the ICA left-navigation and click Query.
Under the New Query window, a list of tables is displayed. Expand the Shared Database for Project \<your project name\> .
Cohorts tables will be displayed.
To preview the table and fields click each view listed.
Clicking any of these views then selecting PREVIEW on the right-hand side will show you a preview of the data in the tables.
SAMPLE_BARCODE
STRING
Sample Identifier
SUBJECTID
STRING
Identifer for Subject entity
STUDY
STRING
Study designation
AGE
NUMERIC
Age in years
SEX
STRING
Sex field to drive annotation
POPULATION
STRING
Population Designation for 1000 Genomes Project
SUPERPOPULATION
STRING
Superpopulation Designation from 1000 Genomes Project
RACE
STRING
Race according to NIH standard
CONDITION_ONTOLOGIES
VARIANT
Diagnosis Ontology Source
CONDITION_IDS
VARIANT
Diagnosis Concept Ids
CONDITIONS
VARIANT
Diagnosis Names
HARMONIZED_CONDITIONS
VARIANT
Diagnosis High-level concept to drive UI
LIBRARYTYPE
STRING
Seqencing technology
ANALYTE
STRING
Substance sequenced
TISSUE
STRING
Tissue source
TUMOR_OR_NORMAL
STRING
Tumor designation for somatic
GENOMEBUILD
STRING
Genome Build to drive annotations - hg38 only
SAMPLE_BARCODE_VCF
STRING
Sample ID from VCF
AFFECTED_STATUS
NUMERIC
Affected, Unaffected, or Unknown for Family Based Analysis
FAMILY_RELATIONSHIP
STRING
Relationship designation for Family Based Analysis
SAMPLE_BARCODE
STRING
Original sample barcode used in VCF column
SUBJECTID
STRING
Original identifier for the subject record
DATATYPE
ARRAY
The categorization of molecular data
TECHNOLOGY
ARRAY
The sequencing method
CREATEDATE
DATE
Date and time of record creation
LASTUPDATEDATE
DATE
Date and time of last update of record
This table is an entity-attribute value table of supplied sample data matching Cohorts accepted attributes.
SAMPLE_ BARCODE
STRING
Original sample barcode used in VCF column
SUBJECTID
STRING
Original identifier for the subject record
ATTRIBUTE_NAME
STRING
Cohorts meta-data driven field name
ATTRIBUTE_VALUE
VARIANT
List of values entered for the field
NAME
STRING
Study name
CREATEDATE
DATE
Date and time of study creation
LASTUPDATEDATE
DATE
Data and time of record update
SUBJECTID
STRING
Original identifier for the subject record
AGE
FLOAT
Age entered on subject record if applicable
SEX
STRING
-
ETHNICITY
STRING
-
STUDY
STRING
Study subject belongs to
CREATEDATE
DATE
Date and time of record creation
LASTUPDATEDATE
DATE
Date and time of record update
This table is an entity-attribute value table of supplied subject data matching Cohorts accepted attributes.
SUBJECTID
STRING
Original identifier for the subject record
ATTRIBUTE_NAME
STRING
Cohorts meta-data driven field name
ATTRIBUTE_VALUE
VARIANT
List of values entered for the field
SUBJECTID
STRING
Original identifier for the subject record
TERM
STRING
Code for disease term
OCCURRENCES
STRING
List of occurrence related data
SUBJECTID
STRING
Original identifier for the subject record
TERM
STRING
Code for drug term
OCCURRENCES
STRING
List of occurrence related data of drug exposure
SUBJECTID
STRING
Original identifier for the subject record
TERM
STRING
Code for measurement term
OCCURRENCES
STRING
List of occurrences and values related to lab or measurement data
SUBJECTID
STRING
Original identifier for the subject record
TERM
STRING
Code for procedure term
OCCURRENCES
STRING
List of occurrences and values related procedure data
This table will be available for all projects with ingested molecular data
Field Name
Type
Description
SAMPLE_BARCODE
STRING
Original sample barcode used in VCF column
STUDY
STRING
Study designation
GENOMEBUILD
STRING
Only hg38 is supported
CHROMOSOME
STRING
Chromosome without 'chr' prefix
CHROMOSOMEID
NUMERIC
Chromosome ID: 1..22, 23=X, 24=Y, 25=Mt
DBSNP
STRING
dbSNP Identifiers
VARIANT_KEY
STRING
Variant ID in the form "1:12345678:12345678:C"
NIRVANA_VID
STRING
Broad Institute VID: "1-12345678-A-C"
VARIANT_TYPE
STRING
Description of Variant Type (e.g. SNV, Deletion, Insertion)
VARIANT_CALL
NUMERIC
1=germline, 2=somatic
DENOVO
BOOLEAN
true / false
GENOTYPE
STRING
"G|T"
READ_DEPTH
NUMERIC
Sequencing read depth
ALLELE_COUNT
NUMERIC
Counts of each alternate allele for each site across all samples
ALLELE_DEPTH
STRING
Unfiltered count of reads that support a given allele for an individual sample
FILTERS
STRING
Filter field from VCF. If all filters pass, field is PASS
ZYGOSITY
NUMERIC
0 = hom ref, 1 = het ref/alt, 2 = hom alt, 4 = hemi alt
GENEMODEL
NUMERIC
1=Ensembl, 2=RefSeq
GENE_HGNC
STRING
HUGO/HGNC gene symbol
GENE_ID
STRING
Ensembl gene ID ("ENSG00001234")
GID
NUMERIC
NCBI Entrez Gene ID (RefSeq) or numerical part of Ensembl ENSG ID
TRANSCRIPT_ID
STRING
Ensembl ENST or RefSeq NM_
CANONICAL
STRING
Transcript designated 'canonical' by source
CONSEQUENCE
STRING
missense, stop gained, intronic, etc.
HGVSC
STRING
The HGVS coding sequence name
HGVSP
STRING
The HGVS protein sequence name
This table will only be available for data sets with ingested Somatic molecular data.
Field Name
Type
Description
SAMPLE_BARCODE
STRING
Original sample barcode, used in VCF column
SUBJECTID
STRING
Identifier for Subject entity
STUDY
STRING
Study designation
GENOMEBUILD
STRING
Only hg38 is supported
CHROMOSOME
STRING
Chromosome without 'chr' prefix
DBSNP
NUMERIC
dbSNP Identifiers
VARIANT_KEY
STRING
Variant ID in the form "1:12345678:12345678:C"
MUTATION_TYPE
NUMERIC
Rank of consequences by expected impact: 0 = Protein Truncating to 40 = Intergenic Variant
VARIANT_CALL
NUMERIC
1=germline, 2=somatic
GENOTYPE
STRING
"G|T"
REF_ALLELE
STRING
Reference allele
ALLELE1
STRING
First allele call in the tumor sample
ALLELE2
STRING
Second allele call in the tumor sample
GENEMODEL
NUMERIC
1=Ensembl, 2=RefSeq
GENE_HGNC
STRING
HUGO/HGNC gene symbol
GENE_ID
STRING
Ensembl gene ID ("ENSG00001234")
TRANSCRIPT_ID
STRING
Ensembl ENST or RefSeq NM_
CANONICAL
BOOLEAN
Transcript designated 'canonical' by source
CONSEQUENCE
STRING
missense, stop gained, intronic, etc.
HGVSP
STRING
HGVS nomenclature for AA change: p.Pro72Ala
This table will only be available for data sets with ingested CNV molecular data.
Field Name
Type
Description
SAMPLE_BARCODE
STRING
Sample barcode used in the original VCF
GENOMEBUILD
STRING
Genome build, always 'hg38'
NIRVANA_VID
STRING
Variant ID of the form 'chr-pos-ref-alt'
CHRID
STRING
Chromosome without 'chr' prefix
CID
NUMERIC
Numerical representation of the chromosome, X=23, Y=24, Mt=25
GENE_ID
STRING
NCBI or Ensembl gene identifier
GID
NUMERIC
Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix
START_POS
NUMERIC
First affected position on the chromosome
STOP_POS
NUMERIC
Last affected position on the chromosome
VARIANT_TYPE
NUMERIC
1 = copy number gain, -1 = copy number loss
COPY_NUMBER
NUMERIC
Observed copy number
COPY_NUMBER_CHANGE
NUMERIC
Fold-chang of copy number, assuming 2 for diploid and 1 for haploid as the baseline
SEGMENT_VALUE
NUMERIC
Average FC for the identified chromosomal segment
PROBE_COUNT
NUMERIC
Probes confirming the CNV (arrays only)
REFERENCE
NUMERIC
Baseline taken from normal samples (1) or averaged disease tissue (2)
GENE_HGNC
STRING
HUGO/HGNC gene symbol
This table will only be available for data sets with ingested SV molecular data. Note that ICA Cohorts stores copy number variants in a separate table.
Field Name
Type
Description
SAMPLE_BARCODE
STRING
Sample barcode used in the original VCF
GENOMEBUILD
STRING
Genome build, always 'hg38'
NIRVANA_VID
STRING
Variant ID of the form 'chr-pos-ref-alt'
CHRID
STRING
Chromosome without 'chr' prefix
CID
NUMERIC
Numerical representation of the chromosome, X=23, Y=24, Mt=25
BEGIN
NUMERIC
First affected position on the chromosome
END
NUMERIC
Last affected position on the chromosome
BAND
STRING
Chromosomal band
QUALIITY
NUMERIC
Quality from the original VCF
FILTERS
ARRAY
Filters from the original VCF
VARIANT_TYPE
STRING
Insertion, deletion, indel, tandem_duplication, translocation_breakend, inversion ("INV"), short tandem repeat ("STR2")
VARIANT_TYPE_ID
NUMERIC
21=insertion, 22=deletion, 23=indel, 24=tandem_duplication, 25=translocation_breakend, 26=inversion ("INV"), 27=short tandem repeat ("STR2")
CIPOS
ARRAY
Confidence interval around first position
CIEND
ARRAY
Confidence interval around last position
SVLENGTH
NUMERIC
Overall size of the structural variant
BONDCHR
STRING
For translocations, the other affected chromosome
BONDCID
NUMERIC
For translocations, the other affected chromosome as a numeric value, X=23, Y=24, Mt=25
BONDPOS
STRING
For translocations, positions on the other affected chromosome
BONDORDER
NUMERIC
3 or 5: Whether this fragment (the current variant/VID) "receives" the other chromosome's fragment on it's 3' end, or attaches to the 5' of the other chromosome fragment
GENOTYPE
STRING
Called genotype from the VCF
GENOTYPE_QUALITY
NUMERIC
Genotype call quality
READCOUNTSSPLIT
ARRAY
Read counts
READCOUNTSPAIRED
ARRAY
Read counts, paired end
REGULATORYREGIONID
STRING
Ensembl ID for the affected regulatory region
REGULATORYREGIONTYPE
STRING
Type of the regulatory region
CONSEQUENCE
ARRAY
Variant consequence according to SequenceOntology
TRANSCRIPTID
STRING
Ensembl of RefSeq transcript identifier
TRANSCRIPTBIOTYPE
STRING
Biotype of the transcript
INTRONS
STRING
Count of impacted introns out of the total number of introns, specified as "M/N"
GENEID
STRING
Ensembl or RefSeq gene identifier
GENEHGNC
STRING
HUGO/HGNC gene symbol
ISCANONICAL
BOOLEAN
Is the transcript ID the canonical one according to Ensembl?
PROTEINID
STRING
RefSeq or Ensembl protein ID
SOURCEID
NUMERICAL
Gene model: 1=Ensembl, 2=RefSeq
These tables will only be available for data sets with ingested RNAseq molecular data.
Table for gene quantification results:
Field Name
Type
Description
GENOMEBUILD
STRING
Genome build, always 'hg38'
STUDY_NAME
STRING
Study designation
SAMPLE_BARCODE
STRING
Sample barcode used in the original VCF
LABEL
STRING
Group label specified during import: Case or Control, Tumor or Normal, etc.
GENE_ID
STRING
Ensembl or RefSeq gene identifier
GID
NUMERIC
Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix
GENE_HGNC
STRING
HUGO/HGNC gene symbol
SOURCE
STRING
Gene model: 1=Ensembl, 2=RefSeq
TPM
NUMERICAL
Transcripts per million
LENGTH
NUMERICAL
The length of the gene in base pairs.
EFFECTIVE_LENGTH
NUMERICAL
The length as accessible to RNA-seq, accounting for insert-size and edge effects.
NUM_READS
NUMERICAL
The estimated number of reads from the gene. The values are not normalized.
The corresponding transcript table uses TRANSCRIPT_ID instead of GENE_ID and GENE_HGNC.
These tables will only be available for data sets with ingested RNAseq molecular data.
Table for differential gene expression results:
Field Name
Type
Description
GENOMEBUILD
STRING
Genome build, always 'hg38'
STUDY_NAME
STRING
Study designation
SAMPLE_BARCODE
STRING
Sample barcode used in the original VCF
CASE_LABEL
STRING
Study designation
GENE_ID
STRING
Ensembl or RefSeq gene identifier
GID
NUMERIC
Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix
GENE_HGNC
STRING
HUGO/HGNC gene symbol
SOURCE
STRING
Gene model: 1=Ensembl, 2=RefSeq
BASEMEAN
NUMERICAL
FC
NUMERICAL
Fold-change
LFC
NUMERICAL
Log of the fold-change
LFCSE
NUMERICAL
Standard error for log fold-change
PVALUE
NUMERICAL
P-value
CONTROL_SAMPLECOUNT
NUMERICAL
Number of samples used as control
CONTROL_LABEL
NUMERICAL
Label used for controls
The corresponding transcript table uses TRANSCRIPT_ID instead of GENE_ID and GENE_HGNC.
The build number, together with the used libraries and licenses are provided in the accompanying readme file.
Command line interface for the Illumina Connected Analytics, a genomics platform-as-a-service
Usage:
icav2 [command]
Available Commands:
analysisstorages Analysis storages commands
completion Generate the autocompletion script for the specified shell
config Config actions
dataformats Data format commands
help Help about any command
jobs Job commands
metadatamodels Metadata model commands
pipelines Pipeline commands
projectanalyses Project analyses commands
projectdata Project Data commands
projectpipelines Project pipeline commands
projects Project commands
projectsamples Project samples commands
regions Region commands
storagebundles Storage bundle commands
storageconfigurations Storage configurations commands
tokens Tokens commands
version The version of this application
Flags:
-t, --access-token string JWT used to call rest service
-h, --help help for icav2
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-v, --version version for icav2
-k, --x-api-key string api key used to call rest service
Use "icav2 [command] --help" for more information about a command.This is the root command for actions that act on analysis storages
Usage:
icav2 analysisstorages [command]
Available Commands:
list list of storage id's
Flags:
-h, --help help for analysisstorages
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 analysisstorages [command] --help" for more information about a command.This command lists all the analysis storage id's
Usage:
icav2 analysisstorages list [flags]
Flags:
-h, --help help for list
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command generates custom completion functions for icav2 tool. These functions facilitate the generation of context-aware suggestions based on the user's input and specific directives provided by the icav2 tool. For example, for ZSH shell the completion function _icav2() is generated. It could provide suggestions for available commands, flags, and arguments depending on the context, making it easier for the user to interact with the tool without having to constantly refer to documentation.
To enable this custom completion function, you would typically include it in your Zsh configuration (e.g., in .zshrc or a separate completion script) and then use the compdef command to associate the function with the icav2 command:
compdef _icav2 icav2This way, when the user types icav2 followed by a space and presses the TAB key, Zsh will call the _icav2 function to provide context-aware suggestions based on the user's input and the icav2 tool's directives.
Generate the autocompletion script for icav2 for the specified shell.
See each sub-command's help for details on how to use the generated script.
Usage:
icav2 completion [command]
Available Commands:
bash Generate the autocompletion script for bash
fish Generate the autocompletion script for fish
powershell Generate the autocompletion script for powershell
zsh Generate the autocompletion script for zsh
Flags:
-h, --help help for completion
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 completion [command] --help" for more information about a command.Generate the autocompletion script for the bash shell.
This script depends on the 'bash-completion' package.
If it is not installed already, you can install it via your OS's package manager.
To load completions in your current shell session:
source <(icav2 completion bash)
To load completions for every new session, execute once:
#### Linux:
icav2 completion bash > /etc/bash_completion.d/icav2
#### macOS:
icav2 completion bash > $(brew --prefix)/etc/bash_completion.d/icav2
You will need to start a new shell for this setup to take effect.
Usage:
icav2 completion bash
Flags:
-h, --help help for bash
--no-descriptions disable completion descriptions
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceGenerate the autocompletion script for the fish shell.
To load completions in your current shell session:
icav2 completion fish | source
To load completions for every new session, execute once:
icav2 completion fish > ~/.config/fish/completions/icav2.fish
You will need to start a new shell for this setup to take effect.
Usage:
icav2 completion fish [flags]
Flags:
-h, --help help for fish
--no-descriptions disable completion descriptions
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceGenerate the autocompletion script for powershell.
To load completions in your current shell session:
icav2 completion powershell | Out-String | Invoke-Expression
To load completions for every new session, add the output of the above command
to your powershell profile.
Usage:
icav2 completion powershell [flags]
Flags:
-h, --help help for powershell
--no-descriptions disable completion descriptions
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceGenerate the autocompletion script for the zsh shell.
If shell completion is not already enabled in your environment you will need
to enable it. You can execute the following once:
echo "autoload -U compinit; compinit" >> ~/.zshrc
To load completions in your current shell session:
source <(icav2 completion zsh)
To load completions for every new session, execute once:
#### Linux:
icav2 completion zsh > "${fpath[1]}/_icav2"
#### macOS:
icav2 completion zsh > $(brew --prefix)/share/zsh/site-functions/_icav2
You will need to start a new shell for this setup to take effect.
Usage:
icav2 completion zsh [flags]
Flags:
-h, --help help for zsh
--no-descriptions disable completion descriptions
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceConfig command provides functions for CLI configuration management.
Usage:
icav2 config [command]
Available Commands:
get Get configuration information
reset Remove the configuration information
set Set configuration information
Flags:
-h, --help help for config
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 config [command] --help" for more information about a command.Get configuration information.
Usage:
icav2 config get [flags]
Flags:
-h, --help help for getRemove configuration information.
Usage:
icav2 config reset [flags]
Flags:
-h, --help help for resetSet configuration information. Following information is asked when starting the command :
- server-url : used to form the url for the rest api's.
- x-api-key : api key used to fetch the JWT used to authenticate to the API server.
- colormode : set depending on your background color of your terminal. Input's and errors are colored. Default is 'none', meaning that no colors will be used in the output.
- table-format : Output layout, defaults to a table, other allowed values are json and yaml
Usage:
icav2 config set [flags]
Flags:
-h, --help help for setThis is the root command for actions that act on Data formats
Usage:
icav2 dataformats [command]
Available Commands:
list List data formats
Flags:
-h, --help help for dataformats
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 dataformats [command] --help" for more information about a command.This command lists the data formats you can use inside of a project
Usage:
icav2 dataformats list [flags]
Flags:
-h, --help help for list
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceHelp provides help for any command in the application.
Simply type icav2 help [path to command] for full details.
Usage:
icav2 help [command] [flags]
Flags:
-h, --help help for help
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis is the root command for actions that act on jobs
Usage:
icav2 jobs [command]
Available Commands:
get Get details of a job
Flags:
-h, --help help for jobs
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 jobs [command] --help" for more information about a command.This command fetches the details of a job using the argument as an id (uuid).
Usage:
icav2 jobs get [job id] [flags]
Flags:
-h, --help help for get
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis is the root command for actions that act on metadata models
Usage:
icav2 metadatamodels [command]
Available Commands:
list list of metadata models
Flags:
-h, --help help for metadatamodels
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 metadatamodels [command] --help" for more information about a command.This command lists all the metadata models
Usage:
icav2 metadatamodels list [flags]
Flags:
-h, --help help for list
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis is the root command for actions that act on pipelines
Usage:
icav2 pipelines [command]
Available Commands:
get Get details of a pipeline
list List pipelines
Flags:
-h, --help help for pipelines
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 pipelines [command] --help" for more information about a command.This command fetches the details of a pipeline without a project context
Usage:
icav2 pipelines get [pipeline id] [flags]
Flags:
-h, --help help for get
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command lists the pipelines without the context of a project
Usage:
icav2 pipelines list [flags]
Flags:
-h, --help help for list
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis is the root command for actions that act on projects analysis
Usage:
icav2 projectanalyses [command]
Available Commands:
get Get the details of an analysis
input Retrieve input of analyses commands
list List of analyses for a project
output Retrieve output of analyses commands
update Update tags of analyses
Flags:
-h, --help help for projectanalyses
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 projectanalyses [command] --help" for more information about a command.This command returns all the details of a analysis.
Usage:
icav2 projectanalyses get [analysis id] [flags]
Flags:
-h, --help help for get
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceRetrieve input of analyses commands
Usage:
icav2 projectanalyses input [analysisId] [flags]
Flags:
-h, --help help for input
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command lists the analyses for a given project. Sorting can be done on
- reference
- userReference
- pipeline
- status
- startDate
- endDate
- summary
Usage:
icav2 projectanalyses list [flags]
Flags:
-h, --help help for list
--max-items int maximum number of items to return, the limit and default is 1000
--page-offset int Page offset, only used in combination with sort-by. Offset-based pagination has a result limit of 200K rows and does not guarantee unique results across pages
--page-size int32 Page size, only used in combination with sort-by. The amount of rows to return. Use in combination with the offset or cursor parameter to get subsequent results. Default and max value of pagesize=1000 (default 1000)
--project-id string project ID to set current project context
--sort-by string specifies the order to list items
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceRetrieve output of analyses commands
Usage:
icav2 projectanalyses output [analysisId] [flags]
Flags:
-h, --help help for output
--project-id string project ID to set current project context
--raw-output Add this flag if output should be in raw format. Applies only for Cwl pipelines ! This flag needs no value, adding it sets the value to true.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceUpdates the user and technical tags of an analysis
Usage:
icav2 projectanalyses update [analysisId] [flags]
Flags:
--add-tech-tag stringArray Tech tag to add to analysis. Add flag multiple times for multiple values.
--add-user-tag stringArray User tag to add to analysis. Add flag multiple times for multiple values.
-h, --help help for update
--project-id string project ID to set current project context
--remove-tech-tag stringArray Tech tag to remove from analysis. Add flag multiple times for multiple values.
--remove-user-tag stringArray User tag to remove from analysis. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis is the root command for actions that act on projects data
Usage:
icav2 projectdata [command]
Available Commands:
archive archive data
copy Copy data to a project
create Create data id for a project
delete delete data
download Download a file/folder
downloadurl get download url
folderuploadsession Get details of a folder upload
get Get details of a data
link Link data to a project
list List data
mount Mount project data
move Move data to a project
temporarycredentials fetch temporal credentials for data
unarchive unarchive data
unlink Unlink data to a project
unmount Unmount project data
update Updates the details of a data
upload Upload a file/folder
Flags:
-h, --help help for projectdata
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 projectdata [command] --help" for more information about a command.This command archives data for a given project
Usage:
icav2 projectdata archive [path or data Id] [flags]
Flags:
-h, --help help for archive
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command copies data between projects. Use data id or a combination of path and --source-project-id to identify the source data. By default, the root folder of your current project will be used as destination. If you want to specify a destination, use --destination-folder to specify the destination path or folder id.
Usage:
icav2 projectdata copy [data id] or [path] [flags]
Flags:
--action-on-exist string what to do when a file or folder with the same name already exists: OVERWRITE|SKIP|RENAME (default "SKIP")
--background starts job in background on server. Does not provide upload progress updates. Use icav2 jobs get with the current job.id value
--copy-instrument-info copy instrument info form source data to destination data
--copy-technical-tags copy technical tags form source data to destination data
--copy-user-tags copy user tags form source data to destination data
--destination-folder string folder id or path to where you want to copy the data, default root of project
-h, --help help for copy
--polling-interval int polling interval in seconds for job status, values lower than 30 will be set to 30 (default 30)
--project-id string project ID to set current project context
--source-project-id string project ID from where the data needs to be copied, mandatory when using source path notation
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command creates a data on a project. It takes name of file/folder as an argument
Usage:
icav2 projectdata create [name] [flags]
Flags:
--data-type string (*) Data type : FILE or FOLDER
--folder-id string Id of the folder
--folder-path string Folder path under which the new project data will be created.
--format string Only allowed for file, sets the format of the file.
-h, --help help for create
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command deletes data for a given project
Usage:
icav2 projectdata delete [path or dataId] [flags]
Flags:
-h, --help help for delete
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceDownload a file/folder. Source path can be a data id or a path. Source path for download of a folder should end with '*'. For files : Target defines either local folder into which the download will occur, or a path with a new name for the file. If the file already exists locally, it is overwritten. For folders : If folder does not exist locally, it will be created automatically. Overwrite of an existing folder will need to be acknowledged.
Usage:
icav2 projectdata download [source data id or path] [target path] [flags]
Flags:
--exclude string Regex filter for file names to exclude from download.
--exclude-source-path Indicates that on folder download, the CLI will not create the parent folders of the downloaded folder in ICA on your local machine.
-h, --help help for download
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceExample 1
Using this command all the files starting with VariantCaller- will be downloaded (prerequisite: a tool jq is installed on the machine):
icav2 projectdata list --data-type FILE --file-name VariantCaller- --match-mode FUZZY -o json | jq -r '.items[].id' > filelist.txt; for item in $(cat filelist.txt); do echo "--- $item ---"; icav2 projectdata download $item . ; done;Example 2
Here an example of how to download all BAM files from a project (we are using some jq features to remove '.bam.bai' and '.bam.md5sum' files)
icav2 projectdata list --file-name .bam --match-mode FUZZY -o json | jq -r '.items[] | select(.details.format.code == "BAM") | [.id] | @tsv' > filelist.txt; for item in $(cat filelist.txt); do echo "--- $item ---"; icav2 projectdata download $item . ; doneTip: If you want to look up a file id from the GUI, go to that file and open te details view. The file id can be found on the top left side and will begin with fil.
This command returns the data download url for a given project
Usage:
icav2 projectdata downloadurl [path or data Id] [flags]
Flags:
-h, --help help for downloadurl
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command fetches the details a folder upload
Usage:
icav2 projectdata folderuploadsession [project id] [data id] [folder upload session id] [flags]
Flags:
-h, --help help for folderuploadsession
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command fetches the details a data
Usage:
icav2 projectdata get [data id] or [path] [flags]
Flags:
-h, --help help for get
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis links data to a project. Use data id or the path + the source project flag identify the data.
Usage:
icav2 projectdata link [data id] or [path] [flags]
Flags:
-h, --help help for link
--project-id string project ID to set current project context
--source-project-id string project ID from where the data needs to be linked
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceIt is best practice to always surround your path with quotes if you want to use the * wildcard. Otherwise, you may run into situations where the command results in "accepts at most 1 arg(s), received x" as it returns folders with the same name, but different amounts of subfolders.
For more information on how to use pagination, please refer to Cursor- versus Offset-based Pagination
If you want to look up a file id from the GUI, go to that file and open te details view. The file id can be found on the top left side and will begin with fil.
This command lists the data for a given project. Page-offset can only be used in combination with sort-by. Sorting can be done on
- timeCreated
- timeModified
- name
- path
- fileSizeInBytes
- status
- format
- dataType
- willBeArchivedAt
- willBeDeletedAt
Usage:
icav2 projectdata list [path] [flags]
Flags:
--data-type string Data type. Available values : FILE or FOLDER
--eligible-link Add this flag if output should contain only the data that is eligible for linking on the current project. This flag needs no value, adding it sets the value to true.
--file-name stringArray The filenames to filter on. The filenameMatchMode-parameter determines how the filtering is done. Add flag multiple times for multiple values.
-h, --help help for list
--match-mode string Match mode for the file name. Available values : EXACT (default), EXCLUDE, FUZZY.
--max-items int maximum number of items to return, the limit and default is 1000
--page-offset int Page offset, only used in combination with sort-by. Offset-based pagination has a result limit of 200K rows and does not guarantee unique results across pages
--page-size int32 Page size, only used in combination with sort-by. The amount of rows to return. Use in combination with the offset or cursor parameter to get subsequent results. Default and max value of pagesize=1000 (default 1000)
--parent-folder Indicates that the given argument is path of the parent folder. All children are selected for list, not the folder itself. This flag needs no value, adding it sets the value to true.
--project-id string project ID to set current project context
--sort-by string specifies the order to list items
--status stringArray Add the status of the data. Available values : PARTIAL, AVAILABLE, ARCHIVING, ARCHIVED, UNARCHIVING, DELETING. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceExample to list files in the folder SOURCE
icav2 projectdata list --project-id <project_id> --parent-folder /SOURCE/Example to list only subfolders in the folder SOURCE
icav2 projectdata list --project-id <project_id> --parent-folder /SOURCE/ --data-type FOLDERThis command mounts the project data as a file system directory for a given project
Usage:
icav2 projectdata mount [mount directory path] [flags]
Flags:
--allow-other Allow other users to access this project
-h, --help help for mount
--list List currently mounted projects
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command moves data between projects. Use data id or a combination of path and --source-project-id to identify the source data. By default, the root folder of your current project will be used as destination. If you want to specify a destination, use --destination-folder to specify the destination path or folder id.
Usage:
icav2 projectdata move [data id] or [path] [flags]
Flags:
--background starts job in background on server. Does not provide upload progress updates. Use icav2 jobs get with the current job.id value
--destination-folder string folder id or path to where you want to move the data, default root of project
-h, --help help for move
--polling-interval int polling interval in seconds for job status, values lower than 30 will be set to 30 (default 30)
--project-id string project ID to set current project context
--source-project-id string project ID from where the data needs to be moved, mandatory when using source path notation
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command fetches temporal AWS and Rclone credentials for a given project-data. If path is given, project id from the flag --project-id is used. If flag not present project is taken from the context
Usage:
icav2 projectdata temporarycredentials [path or data Id] [flags]
Flags:
-h, --help help for temporarycredentials
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command unarchives data for a given project
Usage:
icav2 projectdata unarchive [path or dataId] [flags]
Flags:
-h, --help help for unarchive
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis unlinks data from a project. Use path or id to identifiy the data.
Usage:
icav2 projectdata unlink [data id] or [path] [flags]
Flags:
-h, --help help for unlink
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command unmounts previously mounted project data
Usage:
icav2 projectdata unmount [flags]
Flags:
--directory-path string Set path to unmount
-h, --help help for unmount
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command updates some details of a data. Only user/tech tags, format and dates of will be archived/delete can be updated.
Usage:
icav2 projectdata update [data id] or [path] [flags]
Flags:
--add-tech-tag stringArray Tech tag to add. Add flag multiple times for multiple values.
--add-user-tag stringArray User tag to add. Add flag multiple times for multiple values.
--format-code string Format to assign to the data. Only available for files.
-h, --help help for update
--project-id string project ID to set current project context
--remove-tech-tag stringArray Tech tag to remove. Add flag multiple times for multiple values.
--remove-user-tag stringArray User tag to remove. Add flag multiple times for multiple values.
--will-be-archived-at string Time when data will be archived. Format is YYYY-MM-DD. Time is set to 00:00:00UTC time. Only available for files.
--will-be-deleted-at string Time when data will be deleted. Format is YYYY-MM-DD. Time is set to 00:00:00UTC time. Only available for files.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceUpload a file/folder. For files : if the target path does not already exist, it will be created automatically. For folders : overwrite will need to be acknowledged. Argument "icapath" is optional.
Usage:
icav2 projectdata upload [local path] [icapath] [flags]
Flags:
--existing-sample Link to existing sample
-h, --help help for upload
--new-sample Create and link to new sample
--num-workers int number of workers to parallelize. Default calculated based on CPUs available.
--project-id string project ID to set current project context
--sample-description string Set Sample Description for new sample
--sample-id string Set Sample id of existing sample
--sample-name string Set Sample name for new sample or from existing sample
--sample-technical-tag stringArray Set Sample Technical tag for new sample
--sample-user-tag stringArray Set Sample User tag for new sample
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceExample for uploading multiple files
In this example all the fastq.gz files from source will be uploaded to target using xargs utility.
find $source -name '*.fastq.gz' | xargs -n 1 -P 10 -I {} icav2 projectdata upload {} /$target/Example for uploading multiple files using a CSV file
In this example we upload multiple bam files specified with the corresponding path in the file bam_files.csv. The files will be renamed. We are using screen in detached mode (this creates a new session but not attaching to it):
while IFS=, read -r current_bam_file_name bam_path new_bam_file_name
do
screen -d -m icav2 projectdata upload ${bam_path}/${current_bam_file} /bam_files/${new_bam_file_name} --project-id $projectID
done <./bam_files.csv 2>./log.txtThis is the root command for actions that act on projects pipeline
Usage:
icav2 projectpipelines [command]
Available Commands:
create Create a pipeline
input Retrieve input parameters of pipeline
link Link pipeline to a project
list List of pipelines for a project
start Start a pipeline
unlink Unlink pipeline from a project
Flags:
-h, --help help for projectpipelines
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 projectpipelines [command] --help" for more information about a command.This command creates a pipeline in the current project
Usage:
icav2 projectpipelines create [command]
Available Commands:
cwl Create a cwl pipeline
cwljson Create a cwl Json pipeline
nextflow Create a nextflow pipeline
nextflowjson Create a nextflow Json pipeline
Flags:
-h, --help help for create
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 projectpipelines create [command] --help" for more information about a command.icav2 projectpipelines create cwl
This command creates a CWL pipeline in the current project using the argument as code for the pipeline
Usage:
icav2 projectpipelines create cwl [code] [flags]
Flags:
--category stringArray Category of the cwl pipeline. Add flag multiple times for multiple values.
--comment string Version comments
--description string (*) Description of pipeline
-h, --help help for cwl
--html-doc string Html documentation for the cwl pipeline
--links string links in json format
--parameter string (*) Path to the parameter XML file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--project-id string project ID to set current project context
--proprietary Add the flag if this pipeline is proprietary
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'.
--tool stringArray Path to the tool cwl file. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--workflow string (*) Path to the workflow cwl file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceicav2 projectpipelines create cwljson
This command creates a CWL Json pipeline in the current project using the argument as code for the pipeline
Usage:
icav2 projectpipelines create cwljson [code] [flags]
Flags:
--category stringArray Category of the cwl pipeline. Add flag multiple times for multiple values.
--comment string Version comments
--description string (*) Description of pipeline
-h, --help help for cwljson
--html-doc string Html documentation for the cwl pipeline
--inputForm string (*) Path to the input form file.
--links string links in json format
--onRender string Path to the on render file.
--onSubmit string Path to the on submit file.
--otherInputForm stringArray Path to the other input form files. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--project-id string project ID to set current project context
--proprietary Add the flag if this pipeline is proprietary
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'.
--tool stringArray Path to the tool cwl file. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--workflow string (*) Path to the workflow cwl file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceicav2 projectpipelines create nextflow
This command creates a Nextflow pipeline in the current project
Usage:
icav2 projectpipelines create nextflow [code] [flags]
Flags:
--category stringArray Category of the nextflow pipeline. Add flag multiple times for multiple values.
--comment string Version comments
--config string Path to the config nextflow file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--description string (*) Description of pipeline
-h, --help help for nextflow
--html-doc string Html documentation fo the nexflow pipeline
--links string links in json format
--main string (*) Path to the main nextflow file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--nextflow-version string Version of nextflow language to use.
--other stringArray Path to the other nextflow file. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--parameter string (*) Path to the parameter XML file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--project-id string project ID to set current project context
--proprietary Add the flag if this pipeline is proprietary
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceicav2 projectpipelines create nextflowjson
This command creates a Nextflow Json pipeline in the current project
Usage:
icav2 projectpipelines create nextflowjson [code] [flags]
Flags:
--category stringArray Category of the nextflow pipeline. Add flag multiple times for multiple values.
--comment string Version comments
--config string Path to the config nextflow file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--description string (*) Description of pipeline
-h, --help help for nextflowjson
--html-doc string Html documentation fo the nexflow pipeline
--inputForm string (*) Path to the input form file.
--links string links in json format
--main string (*) Path to the main nextflow file. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--nextflow-version string Version of nextflow language to use.
--onRender string Path to the on render file.
--onSubmit string Path to the on submit file.
--other stringArray Path to the other nextflow file. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--otherInputForm stringArray Path to the other input form files. Add flag multiple times for multiple values. You can set a custom file name and path by adding ':filename=' and the filename with optionally the path the file should be located in.
--project-id string project ID to set current project context
--proprietary Add the flag if this pipeline is proprietary
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceRetrieve input parameters of pipeline
Usage:
icav2 projectpipelines input [pipelineId] [flags]
Flags:
-h, --help help for input
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis links a pipeline to a project. Use code or id to identifiy the pipeline. If code is not found, argument is used as id.
Usage:
icav2 projectpipelines link [pipeline code] or [pipeline id] [flags]
Flags:
-h, --help help for link
--project-id string project ID to set current project context
--source-project-id string project ID from where the pipeline needs to be linked, mandatory when using pipeline code
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command lists the pipelines for a given project
Usage:
icav2 projectpipelines list [flags]
Flags:
-h, --help help for list
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command starts a pipeline in the current project
Usage:
icav2 projectpipelines start [command]
Available Commands:
cwl Start a CWL pipeline
cwljson Start a CWL Json pipeline
nextflow Start a Nextflow pipeline
nextflowjson Start a Nextflow Json pipeline
Flags:
-h, --help help for start
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 projectpipelines start [command] --help" for more information about a command.icav2 projectpipelines start cwl
This command starts a CWL pipeline for a given pipeline id, or for a pipeline code from the current project.
Usage:
icav2 projectpipelines start cwl [pipeline id] or [code] [flags]
Flags:
--data-id stringArray Enter data id's as follows : dataId{optional-mount-path} . Add flag multiple times for multiple values. Mount path is optional and can be absolute and relative and can not contain curly braces.
--data-parameters stringArray Enter data-parameters as follows : parameterCode:referenceDataId . Add flag multiple times for multiple values.
-h, --help help for cwl
--idempotency-key string Add a maximum 255 character idempotency key to prevent duplicate requests. The response is retained for 7 days so the key must be unique during that timeframe.
--input stringArray Enter inputs as follows : parametercode:dataId,dataId{optional-mount-path},dataId,... . Add flag multiple times for multiple values. Mount path is optional and can be absolute and relative and can not contain curly braces and commas.
--input-json string Analysis input JSON string. JSON input works only with file-based CWL pipelines (built using code, not a graphical editor in ICA).
--output-parent-folder string The id of the folder in which the output folder should be created.
--parameters stringArray Enter single-value parameters as code:value. Enter multi-value parameters as code:"'value1','value2','value3'". To add multiple values, add the flag multiple times.
--project-id string project ID to set current project context
--reference-tag stringArray Reference tag. Add flag multiple times for multiple values.
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 analysisstorages list'
--technical-tag stringArray Technical tag. Add flag multiple times for multiple values.
--type-input string (*) Input type STRUCTURED or JSON
--user-reference string (*) User reference
--user-tag stringArray User tag. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceicav2 projectpipelines start cwljson
This command starts a CWL Json pipeline for a given pipeline id, or for a pipeline code from the current project. See ICA CLI documentation for more information (https://help.ica.illumina.com/).
Usage:
icav2 projectpipelines start cwljson [pipeline id] or [code] [flags]
Flags:
--field stringArray Fields. Add flag multiple times for multiple fields. --field fieldA:value --field multivalueFieldB:value1,value2
--field-data stringArray Data fields. Add flag multiple times for multiple fields. --field-data fieldA:fil.id --field-data multivalueFieldB:fil.id1,fil.id2
--group stringArray Groups. Add flag multiple times for multiple fields in the group. --group groupA.index1.multivalueFieldA:value1,value2 --group groupA.index1.fieldB:value --group groupB.index1.fieldA:value --group groupB.index2.fieldA:value
--group-data stringArray Data groups. Add flag multiple times for multiple fields in the group. --group-data groupA.index1.multivalueFieldA:fil.id1,fil.id2 --group-data groupA.index1.fieldB:fil.id --group-data groupB.index1.fieldA:fil.id --group-data groupB.index2.fieldA:fil.id
-h, --help help for cwljson
--idempotency-key string Add a maximum 255 character idempotency key to prevent duplicate requests. The response is retained for 7 days so the key must be unique during that timeframe.
--output-parent-folder string The id of the folder in which the output folder should be created.
--project-id string project ID to set current project context
--reference-tag stringArray Reference tag. Add flag multiple times for multiple values.
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 list'.
--technical-tag stringArray Technical tag. Add flag multiple times for multiple values.
--user-reference string (*) User reference
--user-tag stringArray User tag. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceField definition
A field can only have values (--field) and a data field can only have datavalues (--field-data). To create multiple fields or data fields, you have to repeat the flag.
For example
--field fieldA:valueA --fieldB multivalueFieldB:valueB1,valueB2 --field-data DataFieldC:file.id"matches
"fields": [
{
"id": "fieldA",
"values": [
"valueA"
]
},
{
"id": "multivalueFieldB",
"values": [
"valueB1",
"valueB2"
]
},
{
"id": "DataFieldC",
"values": [
"file.id"
]
}
]The following example with --field and --field-data
--field asection:SECTION1
--field atext:"this is atext text"
--field ttt:tb1
--field notallowedrole:f
--field notallowedcondition:"this is a not allowed text box"
--field maxagesum:20
--field-data txts1:fil.ade9bd0b6113431a2de108d9fe48a3d8
--field-data txts2:fil.ade9bd0b6113431a2de108d9fe48a3d7{/dir1/dir2},fil.ade9bd0b6113431a2de108d9fe48a3d6{/dir3/dir4}matches
"fields": [
{
"id": "asection",
"values": [
"SECTION1"
]
},
{
"id": "atext",
"values": [
"this is atext text"
]
},
{
"id": "ttt",
"values": [
"tb1"
]
},
{
"id": "notallowedrole",
"values": [
"f"
]
},
{
"id": "notallowedcondition",
"values": [
"this is a not allowed text box"
]
},
{
"id": "maxagesum",
"values": [
"20"
]
},
{
"dataValues": [
{
"dataId": "fil.ade9bd0b6113431a2de108d9fe48a3d8"
}
],
"id": "txts1"
},
{
"dataValues": [
{
"dataId": "fil.ade9bd0b6113431a2de108d9fe48a3d7",
"mountPath": "/dir1/dir2"
},
{
"dataId": "fil.ade9bd0b6113431a2de108d9fe48a3d6",
"mountPath": "/dir3/dir4"
}
],
"id": "txts2"
}
],Group definition
A group will only have values (--group) and a data group can only have datavalues (--group-data). Add flags multiple times for multiple groups and fields in the group.
--group group1.0.age:80
--group group1.0.role:f
--group group1.0.conditions:cancer,covid
--group-data group1.0.info:fil.a4f17ecf13ca4f692fd008d9fe48a3d7
--group group1.1.age:20
--group group1.1.role:m
--group-data group1.1.info:fil.a4f17ecf13ca4f692fd008d9fe48a3d7
--group group2.0.roleForGroup2:f"groups": [
{
"id": "group1",
"values": [
{
"values": [
{
"id": "age",
"values": [
"80"
]
},
{
"id": "role",
"values": [
"f"
]
},
{
"id": "conditions",
"values": [
"cancer",
"covid"
]
},
{
"dataValues": [
{
"dataId": "fil.a4f17ecf13ca4f692fd008d9fe48a3d7"
}
],
"id": "info"
}
]
},
{
"values": [
{
"id": "age",
"values": [
"20"
]
},
{
"id": "role",
"values": [
"m"
]
},
{
"dataValues": [
{
"dataId": "fil.a4f17ecf13ca4f692fd008d9fe48a3d7"
}
],
"id": "info"
}
]
}
]
},
{
"id": "group2",
"values": [
{
"values": [
{
"id": "roleForGroup2",
"values": [
"f"
]
}
]
}
]
}
]icav2 projectpipelines start nextflow
This command starts a Nextflow pipeline for a given pipeline id, or for a pipeline code from the current project.
Usage:
icav2 projectpipelines start nextflow [pipeline id] or [code] [flags]
Flags:
--data-parameters stringArray Enter data-parameters as follows : parameterCode:referenceDataId . Add flag multiple times for multiple values.
-h, --help help for nextflow
--idempotency-key string Add a maximum 255 character idempotency key to prevent duplicate requests. The response is retained for 7 days so the key must be unique during that timeframe.
--input stringArray Enter inputs as follows : parametercode:dataId,dataId{optional-mount-path},dataId,... . Add flag multiple times for multiple values. Mount path is optional and can be absolute and relative and can not contain curly braces and commas.
--output-parent-folder string The id of the folder in which the output folder should be created.
--parameters stringArray Enter single-value parameters as code:value. Enter multi-value parameters as code:"'value1','value2','value3'". To add multiple values, add the flag multiple times.
--project-id string project ID to set current project context
--reference-tag stringArray Reference tag. Add flag multiple times for multiple values.
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 list'.
--technical-tag stringArray Technical tag. Add flag multiple times for multiple values.
--user-reference string (*) User reference
--user-tag stringArray User tag. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceicav2 projectpipelines start nextflowjson
This command starts a Nextflow Json pipeline for a given pipeline id, or for a pipeline code from the current project. See ICA CLI documentation for more information (https://help.ica.illumina.com/).
Usage:
icav2 projectpipelines start nextflowjson [pipeline id] or [code] [flags]
Flags:
--field stringArray Fields. Add flag multiple times for multiple fields. --field fieldA:value --field multivalueFieldB:value1,value2
--field-data stringArray Data fields. Add flag multiple times for multiple fields. --field-data fieldA:fil.id --field-data multivalueFieldB:fil.id1,fil.id2
--group stringArray Groups. Add flag multiple times for multiple fields in the group. --group groupA.index1.multivalueFieldA:value1,value2 --group groupA.index1.fieldB:value --group groupB.index1.fieldA:value --group groupB.index2.fieldA:value
--group-data stringArray Data groups. Add flag multiple times for multiple fields in the group. --group-data groupA.index1.multivalueFieldA:fil.id1,fil.id2 --group-data groupA.index1.fieldB:fil.id --group-data groupB.index1.fieldA:fil.id --group-data groupB.index2.fieldA:fil.id
-h, --help help for nextflowjson
--idempotency-key string Add a maximum 255 character idempotency key to prevent duplicate requests. The response is retained for 7 days so the key must be unique during that timeframe.
--output-parent-folder string The id of the folder in which the output folder should be created.
--project-id string project ID to set current project context
--reference-tag stringArray Reference tag. Add flag multiple times for multiple values.
--storage-size string (*) Name of the storage size. Can be fetched using the command 'icav2 list'.
--technical-tag stringArray Technical tag. Add flag multiple times for multiple values.
--user-reference string (*) User reference
--user-tag stringArray User tag. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceField definition
A field can only have values (--field) and a data field can only have datavalues (--field-data). To create multiple fields or data fields, you have to repeat the flag.
For example
--field fieldA:valueA --fieldB multivalueFieldB:valueB1,valueB2 --field-data DataFieldC:file.id"matches
"fields": [
{
"id": "fieldA",
"values": [
"valueA"
]
},
{
"id": "multivalueFieldB",
"values": [
"valueB1",
"valueB2"
]
},
{
"id": "DataFieldC",
"values": [
"file.id"
]
}
]The following example with --field and --field-data
--field asection:SECTION1
--field atext:"this is atext text"
--field ttt:tb1
--field notallowedrole:f
--field notallowedcondition:"this is a not allowed text box"
--field maxagesum:20
--field-data txts1:fil.ade9bd0b6113431a2de108d9fe48a3d8
--field-data txts2:fil.ade9bd0b6113431a2de108d9fe48a3d7{/dir1/dir2},fil.ade9bd0b6113431a2de108d9fe48a3d6{/dir3/dir4}matches
"fields": [
{
"id": "asection",
"values": [
"SECTION1"
]
},
{
"id": "atext",
"values": [
"this is atext text"
]
},
{
"id": "ttt",
"values": [
"tb1"
]
},
{
"id": "notallowedrole",
"values": [
"f"
]
},
{
"id": "notallowedcondition",
"values": [
"this is a not allowed text box"
]
},
{
"id": "maxagesum",
"values": [
"20"
]
},
{
"dataValues": [
{
"dataId": "fil.ade9bd0b6113431a2de108d9fe48a3d8"
}
],
"id": "txts1"
},
{
"dataValues": [
{
"dataId": "fil.ade9bd0b6113431a2de108d9fe48a3d7",
"mountPath": "/dir1/dir2"
},
{
"dataId": "fil.ade9bd0b6113431a2de108d9fe48a3d6",
"mountPath": "/dir3/dir4"
}
],
"id": "txts2"
}
],Group definition
A group will only have values (--group) and a data group can only have datavalues (--group-data). Add flags multiple times for multiple groups and fields in the group.
--group group1.0.age:80
--group group1.0.role:f
--group group1.0.conditions:cancer,covid
--group-data group1.0.info:fil.a4f17ecf13ca4f692fd008d9fe48a3d7
--group group1.1.age:20
--group group1.1.role:m
--group-data group1.1.info:fil.a4f17ecf13ca4f692fd008d9fe48a3d7
--group group2.0.roleForGroup2:f"groups": [
{
"id": "group1",
"values": [
{
"values": [
{
"id": "age",
"values": [
"80"
]
},
{
"id": "role",
"values": [
"f"
]
},
{
"id": "conditions",
"values": [
"cancer",
"covid"
]
},
{
"dataValues": [
{
"dataId": "fil.a4f17ecf13ca4f692fd008d9fe48a3d7"
}
],
"id": "info"
}
]
},
{
"values": [
{
"id": "age",
"values": [
"20"
]
},
{
"id": "role",
"values": [
"m"
]
},
{
"dataValues": [
{
"dataId": "fil.a4f17ecf13ca4f692fd008d9fe48a3d7"
}
],
"id": "info"
}
]
}
]
},
{
"id": "group2",
"values": [
{
"values": [
{
"id": "roleForGroup2",
"values": [
"f"
]
}
]
}
]
}
]This unlinks a pipeline from a project. Use code or id to identifiy the pipeline. If code is not found, argument is used as id.
Usage:
icav2 projectpipelines unlink [pipeline code] or [pipeline id] [flags]
Flags:
-h, --help help for unlink
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis is the root command for actions that act on projects
Usage:
icav2 projects [command]
Available Commands:
create Create a project
enter Enter project context
exit Exit project context
get Get details of a project
list List projects
Flags:
-h, --help help for projects
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 projects [command] --help" for more information about a command.This command creates a project.
Usage:
icav2 projects create [projectname] [flags]
Flags:
--billing-mode string Billing mode , defaults to PROJECT (default "PROJECT")
--data-sharing Indicates whether the data and samples created in this project can be linked to other Projects. This flag needs no value, adding it sets the value to true.
-h, --help help for create
--info string Info about the project
--metadata-model string Id of the metadata model.
--owner string Owner of the project. Default is the current user
--region string Region of the project. When not specified : takes a default when there is only 1 region, else a choice will be given.
--short-descr string Short pipelineDescription of the project
--storage-bundle string Id of the storage bundle. When not specified : takes a default when there is only 1 bundle, else a choice will be given.
--storage-config string An optional storage configuration id to have self managed storage.
--storage-config-sub-folder string Required when specifying a storageConfigurationId. The subfolder determines the object prefix of your self managed storage.
--technical-tag stringArray Technical tags for this project. Add flag multiple times for multiple values.
--user-tag stringArray User tags for this project. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command sets the project context for future commands
Usage:
icav2 projects enter [projectname] or [project id] [flags]
Flags:
-h, --help help for enter
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command switches the user back to their personal context
Usage:
icav2 projects exit [flags]
Flags:
-h, --help help for exit
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command fetches the details of the current project. If no project id is given, we take the one from the config file.
Usage:
icav2 projects get [project id] [flags]
Flags:
-h, --help help for get
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command lists the projects for the current user. Page-offset can only be used in combination with sort-by. Sorting can be done on
- name
- shortDescription
- information
Usage:
icav2 projects list [flags]
Flags:
-h, --help help for list
--max-items int maximum number of items to return, the limit and default is 1000
--page-offset int Page offset, only used in combination with sort-by. Offset-based pagination has a result limit of 200K rows and does not guarantee unique results across pages
--page-size int32 Page size, only used in combination with sort-by. The amount of rows to return. Use in combination with the offset or cursor parameter to get subsequent results. Default and max value of pagesize=1000 (default 1000)
--sort-by string specifies the order to list items
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis is the root command for actions that act on projects samples
Usage:
icav2 projectsamples [command]
Available Commands:
complete Set sample to complete
create Create a sample for a project
delete Delete a sample for a project
get Get details of a sample
link Link data to a sample for a project
list List of samples for a project
listdata List data from given sample
unlink Unlink data from a sample for a project
update Update a sample for a project
Flags:
-h, --help help for projectsamples
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 projectsamples [command] --help" for more information about a command.The sample status will be set to 'Available' and a sample completed event will be triggered as well.
Usage:
icav2 projectsamples complete [sampleId] [flags]
Flags:
-h, --help help for complete
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command creates a sample for a project. It takes the name of the sample as argument.
Usage:
icav2 projectsamples create [name] [flags]
Flags:
--description string Description
-h, --help help for create
--project-id string project ID to set current project context
--technical-tag stringArray Technical tag. Add flag multiple times for multiple values.
--user-tag stringArray User tag. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command deletes a sample from a project. The different flags indicate the way they are deleted. Only 1 flag can be used.
Usage:
icav2 projectsamples delete [sampleId] [flags]
Flags:
--deep Delete the entire sample: sample and linked files will be deleted from your project.
-h, --help help for delete
--mark Mark a sample as deleted.
--unlink Unlinking the sample: sample is deleted and files are unlinked and available again for linking to another sample.
--with-input Delete the sample as well as its input data: sample is deleted from your project, the input files and pipeline output folders are still present in the project but will not be available for linking to a new sample.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command fetches the details a sample using the argument as a name, if nothing found, the argument is used as an id (uuid).
Usage:
icav2 projectsamples get [sample id] or [name] [flags]
Flags:
-h, --help help for get
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command adds data to a project sample. Argument is the id of the project sample
Usage:
icav2 projectsamples link [sampleId] [flags]
Flags:
--data-id stringArray (*) Data id of the data that needs to be linked to the project sample. Add flag multiple times for multiple values.
-h, --help help for link
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command lists the samples for a given project
Usage:
icav2 projectsamples list [flags]
Flags:
-h, --help help for list
--include-deleted Include the deleted samples in the list. Default set to false.
--project-id string project ID to set current project context
--technical-tag stringArray Technical tags to filter on. Add flag multiple times for multiple values.
--user-tag stringArray User tags to filter on. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command lists the data for a given sample. It only supports offset based, and default sorting is done on timeCreated. Sorting can be done on timeCreated
- timeModified
- name
- path
- fileSizeInBytes
- status
- format
- dataType
- willBeArchivedAt
- willBeDeletedAt
Usage:
icav2 projectsamples listdata [sampleId] [path] [flags]
Flags:
--data-type string Data type. Available values : FILE or FOLDER
--file-name stringArray The filenames to filter on. The filenameMatchMode-parameter determines how the filtering is done. Add flag multiple times for multiple values.
-h, --help help for listdata
--match-mode string Match mode for the file name. Available values : EXACT (default), EXCLUDE, FUZZY.
--max-items int maximum number of items to return, the limit and default is 1000
--page-offset int Page offset, only used in combination with sort-by. Offset-based pagination has a result limit of 200K rows and does not guarantee unique results across pages
--page-size int32 Page size, only used in combination with sort-by. The amount of rows to return. Use in combination with the offset or cursor parameter to get subsequent results. Default and max value of pagesize=1000 (default 1000)
--parent-folder Indicates that the given argument is path of the parent folder. All children are selected for list, not the folder itself. This flag needs no value, adding it sets the value to true.
--project-id string project ID to set current project context
--sort-by string specifies the order to list items (default "timeCreated Desc")
--status stringArray Add the status of the data. Available values : PARTIAL, AVAILABLE, ARCHIVING, ARCHIVED, UNARCHIVING, DELETING. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command removes data from a project sample. Argument is the id of the project sample
Usage:
icav2 projectsamples unlink [sampleId] [flags]
Flags:
--data-id stringArray (*) Data id of the data that will be removed from the project sample. Add flag multiple times for multiple values.
-h, --help help for unlink
--project-id string project ID to set current project context
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command updates a sample for a project. Name,description, user and technical tags can be updated
Usage:
icav2 projectsamples update [sampleId] [flags]
Flags:
--add-tech-tag stringArray Tech tag to add. Add flag multiple times for multiple values.
--add-user-tag stringArray User tag to add. Add flag multiple times for multiple values.
-h, --help help for update
--name string Name
--project-id string project ID to set current project context
--remove-tech-tag stringArray Tech tag to remove. Add flag multiple times for multiple values.
--remove-user-tag stringArray User tag to remove. Add flag multiple times for multiple values.
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis is the root command for actions that act on regions
Usage:
icav2 regions [command]
Available Commands:
list list of regions
Flags:
-h, --help help for regions
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 regions [command] --help" for more information about a command.This command lists all the regions
Usage:
icav2 regions list [flags]
Flags:
-h, --help help for list
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis is the root command for actions that act on storage bundles
Usage:
icav2 storagebundles [command]
Available Commands:
list list of storage bundles
Flags:
-h, --help help for storagebundles
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 storagebundles [command] --help" for more information about a command.This command lists all the storage bundles id's
Usage:
icav2 storagebundles list [flags]
Flags:
-h, --help help for list
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis is the root command for actions that act on storage configurations
Usage:
icav2 storageconfigurations [command]
Available Commands:
list list of storage configurations
Flags:
-h, --help help for storageconfigurations
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 storageconfigurations [command] --help" for more information about a command.This command lists all the storage configurations
Usage:
icav2 storageconfigurations list [flags]
Flags:
-h, --help help for list
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis is the root command for actions that act on tokens
Usage:
icav2 tokens [command]
Available Commands:
create Create a JWT token
refresh Refresh a JWT token from basic authentication
Flags:
-h, --help help for tokens
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service
Use "icav2 tokens [command] --help" for more information about a command.This command creates a JWT token from the API key.
Usage:
icav2 tokens create [flags]
Flags:
-h, --help help for create
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThis command refreshes a JWT token from basic authentication with gantype JWT-bearer that is set with the -t flag.
Usage:
icav2 tokens refresh [flags]
Flags:
-h, --help help for refresh
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest serviceThe version of this application
Usage:
icav2 version [flags]
Flags:
-h, --help help for version
Global Flags:
-t, --access-token string JWT used to call rest service
-o, --output-format string output format (default "table")
-s, --server-url string server url to direct commands
-k, --x-api-key string api key used to call rest service