1 of 1

Illumina Connected Annotations

Illumina Connected Annotations, also known as Illumina Annotation Engine (IAE) or Nirvana provides translational research-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, STRs, gene fusions, and SVs (including CNVs). It can be run as a stand-alone package, or integrated into larger software tools that require variant annotation.

Users can annotate VCF files by enabling annotation on the DRAGEN command-line or by running the standalone tool.

The input to Illumina Connected Annotations are VCFs and the output is a structured JSON representation of all annotation and sample information (as extracted from the VCF). Illumina Connected Annotations handles multiple alternate alleles and multiple samples with ease.

NOTE: Before running Annotations, the external data sources, gene models, and reference genome needs to be downloaded from our annotation server.

By default, the Annotations binaries are located in the /opt/dragen/<VERSION>/share/nirvana directory. This directory includes two files: the Downloader and Nirvana (Illumina Connected Annotations).

Limitations

Illumina Connected Annotations and the Downloader are compatible with the following platforms:

CentOS 7, Oracle 8 and other modern Linux distributions using x64 processors.

Download Data Files

For more upto date and detailed documentation please visit Illumina Connected Annotations Download Data

To store annotation data files, create a top-level directory. The created directory contains three subdirectories:

Cache contains gene models.
SupplementaryAnnotation contains external data sources like dbSNP and gnomAD.
References contains the reference genome.

The following command-line options are used.

Option

Value

Example

Description

--ga

GRCh37, GRCh38, or Both

GRCh38

Genome assembly

--out

output directory

~/Data

Top-level output directory

Download data files as follows.

To create a data directory, enter the following command. This example creates the Data directory in your home directory.

mkdir ~/Data

Download the files for a genome assembly. This example downloads the genome assembly GRCh38.

<INSTALL_PATH>/share/nirvana/Downloader --ga GRCh38 --out ~/Data

You can use the same command to resynchronize the data sources with the Illumina Connected Annotations servers, including the following actions:

Remove obsolete files, such as old versions of data sources, from the output directory.
Download newer files.

The following is the created output:

---------------------------------------------------------------------------
Downloader                                          (c) 2024 Illumina, Inc.
                                                                     3.23.0
---------------------------------------------------------------------------

- downloading manifest... 37 files.

- downloading file metadata:
  - finished (00:00:00.8).

- downloading files (22.123 GB):
  - downloading 1000_Genomes_Project_Phase_3_v3_plus_refMinor.rma.idx (GRCh38)
  - downloading MITOMAP_20200224.nsa.idx (GRCh38)
  - downloading ClinVar_20200302.nsa.idx (GRCh38)
  - downloading REVEL_20160603.nsa.idx (GRCh38)
  - downloading phyloP_hg38.npd.idx (GRCh38)
  - downloading ClinGen_Dosage_Sensitivity_Map_20200131.nsi (GRCh38)
  - downloading MITOMAP_SV_20200224.nsi (GRCh38)
  - downloading dbSNP_151_globalMinor.nsa.idx (GRCh38)
  - downloading ClinGen_Dosage_Sensitivity_Map_20190507.nga (GRCh38)
  - downloading PrimateAI_0.2.nsa.idx (GRCh38)
  - downloading ClinGen_disease_validity_curations_20191202.nga (GRCh38)
  - downloading 1000_Genomes_Project_Phase_3_v3_plus.nsa.idx (GRCh38)
  - downloading SpliceAi_1.3.nsa.idx (GRCh38)
  - downloading dbSNP_153.nsa.idx (GRCh38)
  - downloading TOPMed_freeze_5.nsa.idx (GRCh38)
  - downloading MITOMAP_20200224.nsa (GRCh38)
  - downloading gnomAD_2.1.nsa.idx (GRCh38)
  - downloading ClinGen_20160414.nsi (GRCh38)
  - downloading gnomAD_gene_scores_2.1.nga (GRCh38)
  - downloading 1000_Genomes_Project_(SV)_Phase_3_v5a.nsi (GRCh38)
  - downloading MultiZ100Way_20171006.pcs (GRCh38)
  - downloading 1000_Genomes_Project_Phase_3_v3_plus_refMinor.rma (GRCh38)
  - downloading ClinVar_20200302.nsa (GRCh38)
  - downloading OMIM_20200409.nga (GRCh38)
  - downloading Both.transcripts.ndb (GRCh38)
  - downloading REVEL_20160603.nsa (GRCh38)
  - downloading PrimateAI_0.2.nsa (GRCh38)
  - downloading dbSNP_151_globalMinor.nsa (GRCh38)
  - downloading Both.sift.ndb (GRCh38)
  - downloading Both.polyphen.ndb (GRCh38)
  - downloading Homo_sapiens.GRCh38.Nirvana.dat
  - downloading 1000_Genomes_Project_Phase_3_v3_plus.nsa (GRCh38)
  - downloading phyloP_hg38.npd (GRCh38)
  - downloading SpliceAi_1.3.nsa (GRCh38)
  - downloading TOPMed_freeze_5.nsa (GRCh38)
  - downloading dbSNP_153.nsa (GRCh38)
  - downloading gnomAD_2.1.nsa (GRCh38)
  - finished (00:04:10.1).

Description                                                     Status
---------------------------------------------------------------------------
1000_Genomes_Project_(SV)_Phase_3_v5a.nsi (GRCh38)                OK
1000_Genomes_Project_Phase_3_v3_plus.nsa (GRCh38)                 OK
1000_Genomes_Project_Phase_3_v3_plus.nsa.idx (GRCh38)             OK
1000_Genomes_Project_Phase_3_v3_plus_refMinor.rma (GRCh38)        OK
1000_Genomes_Project_Phase_3_v3_plus_refMinor.rma.idx (...        OK
Both.polyphen.ndb (GRCh38)                                        OK
Both.sift.ndb (GRCh38)                                            OK
Both.transcripts.ndb (GRCh38)                                     OK
ClinGen_20160414.nsi (GRCh38)                                     OK
ClinGen_Dosage_Sensitivity_Map_20190507.nga (GRCh38)              OK
ClinGen_Dosage_Sensitivity_Map_20200131.nsi (GRCh38)              OK
ClinGen_disease_validity_curations_20191202.nga (GRCh38)          OK
ClinVar_20200302.nsa (GRCh38)                                     OK
ClinVar_20200302.nsa.idx (GRCh38)                                 OK
Homo_sapiens.GRCh38.Nirvana.dat                                   OK
MITOMAP_20200224.nsa (GRCh38)                                     OK
MITOMAP_20200224.nsa.idx (GRCh38)                                 OK
MITOMAP_SV_20200224.nsi (GRCh38)                                  OK
MultiZ100Way_20171006.pcs (GRCh38)                                OK
OMIM_20200409.nga (GRCh38)                                        OK
PrimateAI_0.2.nsa (GRCh38)                                        OK
PrimateAI_0.2.nsa.idx (GRCh38)                                    OK
REVEL_20160603.nsa (GRCh38)                                       OK
REVEL_20160603.nsa.idx (GRCh38)                                   OK
SpliceAi_1.3.nsa (GRCh38)                                         OK
SpliceAi_1.3.nsa.idx (GRCh38)                                     OK
TOPMed_freeze_5.nsa (GRCh38)                                      OK
TOPMed_freeze_5.nsa.idx (GRCh38)                                  OK
dbSNP_151_globalMinor.nsa (GRCh38)                                OK
dbSNP_151_globalMinor.nsa.idx (GRCh38)                            OK
dbSNP_153.nsa (GRCh38)                                            OK
dbSNP_153.nsa.idx (GRCh38)                                        OK
gnomAD_2.1.nsa (GRCh38)                                           OK
gnomAD_2.1.nsa.idx (GRCh38)                                       OK
gnomAD_gene_scores_2.1.nga (GRCh38)                               OK
phyloP_hg38.npd (GRCh38)                                          OK
phyloP_hg38.npd.idx (GRCh38)                                      OK
---------------------------------------------------------------------------

Peak memory usage: 52.3 MB
Time: 00:04:12.2

NOTE: If the DRAGEN server does not have an internet connection, the Downloader executable can be copied to a non-DRAGEN server that is connected to the internet to download the annotation data. Once the download has completed, the annotation data can then be copied locally to the DRAGEN server for subsequent annotation.

Annotate Files (via DRAGEN command-line)

To automatically annotate output VCFs, please add the following command-line arguments:

Argument

Example

Description

--enable-variant-annotation

true

enables annotation if the pipeline supports it

--variant-annotation-data

/path/to/your/NirvanaData

the location where you downloaded the Nirvana annotation files

--variant-annotation-assembly

GRCh38

the genome assembly - either GRCh37 or GRCh38. hg19 is handled properly by using GRCh37

All the command-line arguments shown together:

--enable-variant-annotation=true --variant-annotation-data=/path/to/your/NirvanaData --variant-annotation-assembly=GRCh38

Annotate Files (via standalone Illumina Connected Annotations tool)

If you have not generated a VCF file, download a VCF file using the following command.

curl -O https://raw.githubusercontent.com/HelixGrind/DotNetMisc/master/TestFiles/HiSeq.10000.vcf.gz

Annotations supports uncompressed VCF files and bgzip compressed VCF files. VCF files that have been compressed by standard gzip are not supported.

To annotate the file, enter the following command:

<INSTALL_PATH>/share/nirvana/Nirvana -c ~/Data/Cache/ \
-r ~/Data/References/Homo_sapiens.GRCh38.Nirvana.dat \
--sd ~/Data/SupplementaryAnnotation/GRCh38 -i HiSeq.10000.vcf.gz -o HiSeq.10000

The following are the available command line options:

Option

Value

Example

Description

-c

directory

~/Data/Cache/

Cache directory

-r

directory

~/Data/References/Homo_sapiens.GRCh38.Nirvana.dat

Reference directory

--sd

directory

~/Data/SupplementaryAnnotation/GRCh38

Supplementary annotation directory

-i

path

HiSeq.10000.vcf.gz

Input VCF path

-o

prefix

HiSeq.10000

Output path prefix

Using the example above, Annotations generates the following output called HiSeq.10000.json.gz.

---------------------------------------------------------------------------
Illumina Connected Annotations                      (c) 2024 Illumina, Inc.
                                                                     3.23.0
---------------------------------------------------------------------------

Initialization                                         Time     Positions/s
---------------------------------------------------------------------------
Cache                                               00:00:01.9
SA Position Scan                                    00:00:00.4       23,867

Reference                                Preload    Annotation   Variants/s
---------------------------------------------------------------------------
chr1                                    00:00:00.4  00:00:03.7        2,651

Summary                                                Time         Percent
---------------------------------------------------------------------------
Initialization                                      00:00:02.3       25.7 %
Preload                                             00:00:00.4        5.4 %
Annotation                                          00:00:03.7       41.5 %

Peak memory usage: 1.284 GB
Time: 00:00:08.0

JSON Output File

Annotations produces an output file in JSON format. Please refer to Illumina Connected Annotations JSON for detailed description of the JSON file.

Version History

Annotations binaries have been included with DRAGEN since v3.5. The table below indicates which version of Annotations binaries were included with different DRAGEN releases, and their AI annotation capabilities.

The Annotations binaries distributed with DRAGEN can not be changed. Never versions of Annotations are backward compatible, and can therefore annotate output files from older DRAGEN releases.

DRAGEN version(s)

Annotations version

AI annotations

4.3

3.23

spliceAI, primateAI3D

3.9, 3.10, 4.0, 4.1, 4.2

3.16.1

spliceAI, primateAI

3.8

3.14

spliceAI, primateAI

3.6, 3.7

3.9.0

spliceAI, primateAI

3.5

3.6.0

spliceAI, primateAI

Illumina Connected Annotations

Users can annotate VCF files by enabling annotation on the DRAGEN command-line or by running the standalone tool.

NOTE: Before running Annotations, the external data sources, gene models, and reference genome needs to be downloaded from our annotation server.

Limitations

Illumina Connected Annotations and the Downloader are compatible with the following platforms:

CentOS 7, Oracle 8 and other modern Linux distributions using x64 processors.

Download Data Files

For more upto date and detailed documentation please visit Illumina Connected Annotations Download Data

To store annotation data files, create a top-level directory. The created directory contains three subdirectories:

Cache contains gene models.
SupplementaryAnnotation contains external data sources like dbSNP and gnomAD.
References contains the reference genome.

The following command-line options are used.

Option

Value

Example

Description

--ga

GRCh37, GRCh38, or Both

GRCh38

Genome assembly

--out

output directory

~/Data

Top-level output directory

Download data files as follows.

To create a data directory, enter the following command. This example creates the Data directory in your home directory.

mkdir ~/Data

Download the files for a genome assembly. This example downloads the genome assembly GRCh38.

<INSTALL_PATH>/share/nirvana/Downloader --ga GRCh38 --out ~/Data

You can use the same command to resynchronize the data sources with the Illumina Connected Annotations servers, including the following actions:

Remove obsolete files, such as old versions of data sources, from the output directory.
Download newer files.

The following is the created output:

---------------------------------------------------------------------------
Downloader                                          (c) 2024 Illumina, Inc.
                                                                     3.23.0
---------------------------------------------------------------------------

- downloading manifest... 37 files.

- downloading file metadata:
  - finished (00:00:00.8).

- downloading files (22.123 GB):
  - downloading 1000_Genomes_Project_Phase_3_v3_plus_refMinor.rma.idx (GRCh38)
  - downloading MITOMAP_20200224.nsa.idx (GRCh38)
  - downloading ClinVar_20200302.nsa.idx (GRCh38)
  - downloading REVEL_20160603.nsa.idx (GRCh38)
  - downloading phyloP_hg38.npd.idx (GRCh38)
  - downloading ClinGen_Dosage_Sensitivity_Map_20200131.nsi (GRCh38)
  - downloading MITOMAP_SV_20200224.nsi (GRCh38)
  - downloading dbSNP_151_globalMinor.nsa.idx (GRCh38)
  - downloading ClinGen_Dosage_Sensitivity_Map_20190507.nga (GRCh38)
  - downloading PrimateAI_0.2.nsa.idx (GRCh38)
  - downloading ClinGen_disease_validity_curations_20191202.nga (GRCh38)
  - downloading 1000_Genomes_Project_Phase_3_v3_plus.nsa.idx (GRCh38)
  - downloading SpliceAi_1.3.nsa.idx (GRCh38)
  - downloading dbSNP_153.nsa.idx (GRCh38)
  - downloading TOPMed_freeze_5.nsa.idx (GRCh38)
  - downloading MITOMAP_20200224.nsa (GRCh38)
  - downloading gnomAD_2.1.nsa.idx (GRCh38)
  - downloading ClinGen_20160414.nsi (GRCh38)
  - downloading gnomAD_gene_scores_2.1.nga (GRCh38)
  - downloading 1000_Genomes_Project_(SV)_Phase_3_v5a.nsi (GRCh38)
  - downloading MultiZ100Way_20171006.pcs (GRCh38)
  - downloading 1000_Genomes_Project_Phase_3_v3_plus_refMinor.rma (GRCh38)
  - downloading ClinVar_20200302.nsa (GRCh38)
  - downloading OMIM_20200409.nga (GRCh38)
  - downloading Both.transcripts.ndb (GRCh38)
  - downloading REVEL_20160603.nsa (GRCh38)
  - downloading PrimateAI_0.2.nsa (GRCh38)
  - downloading dbSNP_151_globalMinor.nsa (GRCh38)
  - downloading Both.sift.ndb (GRCh38)
  - downloading Both.polyphen.ndb (GRCh38)
  - downloading Homo_sapiens.GRCh38.Nirvana.dat
  - downloading 1000_Genomes_Project_Phase_3_v3_plus.nsa (GRCh38)
  - downloading phyloP_hg38.npd (GRCh38)
  - downloading SpliceAi_1.3.nsa (GRCh38)
  - downloading TOPMed_freeze_5.nsa (GRCh38)
  - downloading dbSNP_153.nsa (GRCh38)
  - downloading gnomAD_2.1.nsa (GRCh38)
  - finished (00:04:10.1).

Description                                                     Status
---------------------------------------------------------------------------
1000_Genomes_Project_(SV)_Phase_3_v5a.nsi (GRCh38)                OK
1000_Genomes_Project_Phase_3_v3_plus.nsa (GRCh38)                 OK
1000_Genomes_Project_Phase_3_v3_plus.nsa.idx (GRCh38)             OK
1000_Genomes_Project_Phase_3_v3_plus_refMinor.rma (GRCh38)        OK
1000_Genomes_Project_Phase_3_v3_plus_refMinor.rma.idx (...        OK
Both.polyphen.ndb (GRCh38)                                        OK
Both.sift.ndb (GRCh38)                                            OK
Both.transcripts.ndb (GRCh38)                                     OK
ClinGen_20160414.nsi (GRCh38)                                     OK
ClinGen_Dosage_Sensitivity_Map_20190507.nga (GRCh38)              OK
ClinGen_Dosage_Sensitivity_Map_20200131.nsi (GRCh38)              OK
ClinGen_disease_validity_curations_20191202.nga (GRCh38)          OK
ClinVar_20200302.nsa (GRCh38)                                     OK
ClinVar_20200302.nsa.idx (GRCh38)                                 OK
Homo_sapiens.GRCh38.Nirvana.dat                                   OK
MITOMAP_20200224.nsa (GRCh38)                                     OK
MITOMAP_20200224.nsa.idx (GRCh38)                                 OK
MITOMAP_SV_20200224.nsi (GRCh38)                                  OK
MultiZ100Way_20171006.pcs (GRCh38)                                OK
OMIM_20200409.nga (GRCh38)                                        OK
PrimateAI_0.2.nsa (GRCh38)                                        OK
PrimateAI_0.2.nsa.idx (GRCh38)                                    OK
REVEL_20160603.nsa (GRCh38)                                       OK
REVEL_20160603.nsa.idx (GRCh38)                                   OK
SpliceAi_1.3.nsa (GRCh38)                                         OK
SpliceAi_1.3.nsa.idx (GRCh38)                                     OK
TOPMed_freeze_5.nsa (GRCh38)                                      OK
TOPMed_freeze_5.nsa.idx (GRCh38)                                  OK
dbSNP_151_globalMinor.nsa (GRCh38)                                OK
dbSNP_151_globalMinor.nsa.idx (GRCh38)                            OK
dbSNP_153.nsa (GRCh38)                                            OK
dbSNP_153.nsa.idx (GRCh38)                                        OK
gnomAD_2.1.nsa (GRCh38)                                           OK
gnomAD_2.1.nsa.idx (GRCh38)                                       OK
gnomAD_gene_scores_2.1.nga (GRCh38)                               OK
phyloP_hg38.npd (GRCh38)                                          OK
phyloP_hg38.npd.idx (GRCh38)                                      OK
---------------------------------------------------------------------------

Peak memory usage: 52.3 MB
Time: 00:04:12.2

Annotate Files (via DRAGEN command-line)

To automatically annotate output VCFs, please add the following command-line arguments:

Argument

Example

Description

--enable-variant-annotation

true

enables annotation if the pipeline supports it

--variant-annotation-data

/path/to/your/NirvanaData

the location where you downloaded the Nirvana annotation files

--variant-annotation-assembly

GRCh38

the genome assembly - either GRCh37 or GRCh38. hg19 is handled properly by using GRCh37

All the command-line arguments shown together:

--enable-variant-annotation=true --variant-annotation-data=/path/to/your/NirvanaData --variant-annotation-assembly=GRCh38

Annotate Files (via standalone Illumina Connected Annotations tool)

If you have not generated a VCF file, download a VCF file using the following command.

curl -O https://raw.githubusercontent.com/HelixGrind/DotNetMisc/master/TestFiles/HiSeq.10000.vcf.gz

Annotations supports uncompressed VCF files and bgzip compressed VCF files. VCF files that have been compressed by standard gzip are not supported.

To annotate the file, enter the following command:

<INSTALL_PATH>/share/nirvana/Nirvana -c ~/Data/Cache/ \
-r ~/Data/References/Homo_sapiens.GRCh38.Nirvana.dat \
--sd ~/Data/SupplementaryAnnotation/GRCh38 -i HiSeq.10000.vcf.gz -o HiSeq.10000

The following are the available command line options:

Option

Value

Example

Description

-c

directory

~/Data/Cache/

Cache directory

-r

directory

~/Data/References/Homo_sapiens.GRCh38.Nirvana.dat

Reference directory

--sd

directory

~/Data/SupplementaryAnnotation/GRCh38

Supplementary annotation directory

-i

path

HiSeq.10000.vcf.gz

Input VCF path

-o

prefix

HiSeq.10000

Output path prefix

Using the example above, Annotations generates the following output called HiSeq.10000.json.gz.

---------------------------------------------------------------------------
Illumina Connected Annotations                      (c) 2024 Illumina, Inc.
                                                                     3.23.0
---------------------------------------------------------------------------

Initialization                                         Time     Positions/s
---------------------------------------------------------------------------
Cache                                               00:00:01.9
SA Position Scan                                    00:00:00.4       23,867

Reference                                Preload    Annotation   Variants/s
---------------------------------------------------------------------------
chr1                                    00:00:00.4  00:00:03.7        2,651

Summary                                                Time         Percent
---------------------------------------------------------------------------
Initialization                                      00:00:02.3       25.7 %
Preload                                             00:00:00.4        5.4 %
Annotation                                          00:00:03.7       41.5 %

Peak memory usage: 1.284 GB
Time: 00:00:08.0

JSON Output File

Annotations produces an output file in JSON format. Please refer to Illumina Connected Annotations JSON for detailed description of the JSON file.

Version History

The Annotations binaries distributed with DRAGEN can not be changed. Never versions of Annotations are backward compatible, and can therefore annotate output files from older DRAGEN releases.

DRAGEN version(s)

Annotations version

AI annotations

4.3

3.23

spliceAI, primateAI3D

3.9, 3.10, 4.0, 4.1, 4.2

3.16.1

spliceAI, primateAI

3.8

3.14

spliceAI, primateAI

3.6, 3.7

3.9.0

spliceAI, primateAI

3.5

3.6.0

spliceAI, primateAI