Illumina Connected Annotations
Illumina Connected Annotations, also known as Illumina Annotation Engine (IAE) or Nirvana provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation.
Users can annotate VCF files by enabling annotation on the DRAGEN command-line or by running the standalone tool.
The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease.
NOTE: Before running Annotations, the external data sources, gene models, and reference genome needs to be downloaded from our annotation server.
By default, the Annotations binaries are located in the /opt/dragen/<VERSION>/share/nirvana
directory. This directory includes two files: the Downloader and Nirvana (Illumina Connected Annotations).
Limitations
Illumina Connected Annotations and the Downloader are compatible with the following platforms:
CentOS 7, Oracle 8 and other modern Linux distributions using x64 processors.
Download Data Files
For more upto date and detailed documentation please visit Illumina Connected Annotations Download Data
To store annotation data files, create a top-level directory. The created directory contains three subdirectories:
Cache contains gene models.
SupplementaryAnnotation contains external data sources like dbSNP and gnomAD.
References contains the reference genome.
The following command-line options are used.
Option | Value | Example | Description |
---|---|---|---|
--ga | GRCh37, GRCh38, or Both | GRCh38 | Genome assembly |
--out | output directory | ~/Data | Top-level output directory |
Download data files as follows.
To create a data directory, enter the following command. This example creates the Data directory in your home directory.
Download the files for a genome assembly. This example downloads the genome assembly GRCh38.
You can use the same command to resynchronize the data sources with the Illumina Connected Annotations servers, including the following actions:
Remove obsolete files, such as old versions of data sources, from the output directory.
Download newer files.
The following is the created output:
NOTE: If the DRAGEN server does not have an internet connection, the Downloader executable can be copied to a non-DRAGEN server that is connected to the internet to download the annotation data. Once the download has completed, the annotation data can then be copied locally to the DRAGEN server for subsequent annotation.
Annotate Files (via DRAGEN command-line)
To automatically annotate output VCFs, please add the following command-line arguments:
Argument | Example | Description |
---|---|---|
--enable-variant-annotation | true | enables annotation if the pipeline supports it |
--variant-annotation-data | /path/to/your/NirvanaData | the location where you downloaded the Nirvana annotation files |
--variant-annotation-assembly | GRCh38 | the genome assembly - either GRCh37 or GRCh38. hg19 is handled properly by using GRCh37 |
All the command-line arguments shown together:
Annotate Files (via standalone Illumina Connected Annotations tool)
If you have not generated a VCF file, download a VCF file using the following command.
Annotations supports uncompressed VCF files and bgzip compressed VCF files. VCF files that have been compressed by standard gzip are not supported.
To annotate the file, enter the following command:
The following are the available command line options:
Option | Value | Example | Description |
---|---|---|---|
-c | directory | ~/Data/Cache/ | Cache directory |
-r | directory | ~/Data/References/Homo_sapiens.GRCh38.Nirvana.dat | Reference directory |
--sd | directory | ~/Data/SupplementaryAnnotation/GRCh38 | Supplementary annotation directory |
-i | path | HiSeq.10000.vcf.gz | Input VCF path |
-o | prefix | HiSeq.10000 | Output path prefix |
Using the example above, Annotations generates the following output called HiSeq.10000.json.gz
.
JSON Output File
Annotations produces an output file in JSON format. Please refer to Illumina Connected Annotations JSON for detailed description of the JSON file.
Version History
Annotations binaries have been included with DRAGEN since v3.5. The table below indicates which version of Annotations binaries were included with different DRAGEN releases, and their AI annotation capabilities.
The Annotations binaries distributed with DRAGEN can not be changed. Never versions of Annotations are backward compatible, and can therefore annotate output files from older DRAGEN releases.
DRAGEN version(s) | Annotations version | AI annotations |
---|---|---|
4.3 | 3.23 | spliceAI, primateAI3D |
3.9, 3.10, 4.0, 4.1, 4.2 | 3.16.1 | spliceAI, primateAI |
3.8 | 3.14 | spliceAI, primateAI |
3.6, 3.7 | 3.9.0 | spliceAI, primateAI |
3.5 | 3.6.0 | spliceAI, primateAI |
Last updated