Illumina Connected Annotations, also known as Illumina Annotation Engine (IAE) or Nirvana provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation.
Users can annotate VCF files by enabling annotation on the DRAGEN command-line or by running the standalone tool.
The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease.
NOTE: Before running Annotations, the external data sources, gene models, and reference genome needs to be downloaded from our annotation server.
By default, the Annotations binaries are located in the /opt/dragen/<VERSION>/share/nirvana
directory. This directory includes two files: the Downloader and Nirvana (Illumina Connected Annotations).
Illumina Connected Annotations and the Downloader are compatible with the following platforms:
CentOS 7, Oracle 8 and other modern Linux distributions using x64 processors.
For more upto date and detailed documentation please visit Illumina Connected Annotations Download Data
To store annotation data files, create a top-level directory. The created directory contains three subdirectories:
Cache contains gene models.
SupplementaryAnnotation contains external data sources like dbSNP and gnomAD.
References contains the reference genome.
The following command-line options are used.
Download data files as follows.
To create a data directory, enter the following command. This example creates the Data directory in your home directory.
Download the files for a genome assembly. This example downloads the genome assembly GRCh38.
You can use the same command to resynchronize the data sources with the Illumina Connected Annotations servers, including the following actions:
Remove obsolete files, such as old versions of data sources, from the output directory.
Download newer files.
The following is the created output:
NOTE: If the DRAGEN server does not have an internet connection, the Downloader executable can be copied to a non-DRAGEN server that is connected to the internet to download the annotation data. Once the download has completed, the annotation data can then be copied locally to the DRAGEN server for subsequent annotation.
To automatically annotate output VCFs, please add the following command-line arguments:
All the command-line arguments shown together:
If you have not generated a VCF file, download a VCF file using the following command.
Annotations supports uncompressed VCF files and bgzip compressed VCF files. VCF files that have been compressed by standard gzip are not supported.
To annotate the file, enter the following command:
The following are the available command line options:
Using the example above, Annotations generates the following output called HiSeq.10000.json.gz
.
Annotations produces an output file in JSON format. Please refer to Illumina Connected Annotations JSON for detailed description of the JSON file.
Annotations binaries have been included with DRAGEN since v3.5. The table below indicates which version of Annotations binaries were included with different DRAGEN releases, and their AI annotation capabilities.
The Annotations binaries distributed with DRAGEN can not be changed. Never versions of Annotations are backward compatible, and can therefore annotate output files from older DRAGEN releases.
Option | Value | Example | Description |
---|---|---|---|
Argument | Example | Description |
---|---|---|
Option | Value | Example | Description |
---|---|---|---|
DRAGEN version(s) | Annotations version | AI annotations |
---|---|---|
--ga
GRCh37, GRCh38, or Both
GRCh38
Genome assembly
--out
output directory
~/Data
Top-level output directory
--enable-variant-annotation
true
enables annotation if the pipeline supports it
--variant-annotation-data
/path/to/your/NirvanaData
the location where you downloaded the Nirvana annotation files
--variant-annotation-assembly
GRCh38
the genome assembly - either GRCh37 or GRCh38. hg19 is handled properly by using GRCh37
-c
directory
~/Data/Cache/
Cache directory
-r
directory
~/Data/References/Homo_sapiens.GRCh38.Nirvana.dat
Reference directory
--sd
directory
~/Data/SupplementaryAnnotation/GRCh38
Supplementary annotation directory
-i
path
HiSeq.10000.vcf.gz
Input VCF path
-o
prefix
HiSeq.10000
Output path prefix
4.3
3.23
spliceAI, primateAI3D
3.9, 3.10, 4.0, 4.1, 4.2
3.16.1
spliceAI, primateAI
3.8
3.14
spliceAI, primateAI
3.6, 3.7
3.9.0
spliceAI, primateAI
3.5
3.6.0
spliceAI, primateAI