Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
DRAGEN (Dynamic Read Analysis for GENomics) Array secondary analysis is a powerful bioinformatics software for Illumina Infinium array-based assays. DRAGEN Array uses cutting-edge data analysis tools to provide accurate, comprehensive, and highly efficient secondary analysis to maximize genomic insights and meet your research needs across multiple applications.
DRAGEN Array is offered as a local package with command-line interface (no specialized server or hardware required) and as a cloud-based package with an intuitive graphical user interface, as summerized in the table below.
Genotyping
Provides genotyping results for any human Infinium genotyping array.
Greater than 99.5% genotyping accuracy
Genotyping VCF in as little as 35 seconds per sample
PGx – CNV calling
Provides CNV calling on 7 target PGx genes across 10 target regions, plus genotyping outputs for Infinium microarrays with enhanced PGx content.
Greater than 95% PGx CNV accuracy
PGx – star allele annotation
Provides PGx star allele and variant coverage across 2400+ targets for over 50 genes, plus PGx CNV and genotyping outputs for Infinium microarrays with enhanced PGx content.
Assess hard to discern PGx genes, including the elusive CYP2D6 with greater than 97% call rate
Obtain all PGx analysis results in ~1 minute per sample
Methylation QC
Provides high-throughput, quantitative methylation quality control for Infinium methylation arrays.
21 algorithm-based quantitative control metrics with adjustable thresholds
Data summary plots
Proportion of CG probes passing with user defined p-value threshold
This product documentation describes the installation and setup, analysis execution, and result outputs. For the latest updates and release details, see the DRAGEN Array Release Notes. See Introducing DRAGEN™ Array 1.0 for Infinium™ Array-Based Pharmacogenomics Analysis for additional details on DRAGEN Array genotyping, PGx CNV calling and PGx star allele annotation.
The following section describes the input files required by DRAGEN Array. Product files (anything other than the IDATs) can be found on the support site.
For each sample a pair of raw intensity files (.idat) are generated from the iScan System or NextSeq550 (for select arrays). They provide intensities in the red and green channels for each probe on the Infinium array. More information on which arrays can be used with NextSeq550, can be found on the Illumina Knowledge page on NextSeq550.
An IDAT file is identified by the BeadChip Barcode (12-digit unique Sentrix ID, i.e. 123456789101), BeadChip Position (row and column of the sample, i.e. R01C01), and Grn (Green) or Red for the specific channel.
The CSV and BPM manifest files can be found on the Illumina Support Site for all commercial Infinium BeadChips or on MyIllumina for custom and semi-custom designs. DRAGEN Array only supports manifest files from the Illumina Support site. For instructions on obtaining manifest files from MyIllumina, see Illumina Knowledge article, How to access custom array product files (manifest and product definition files) in MyIllumina.
The CSV manifest file (.csv) provides complementary data to the BPM manifest file in a human readable format. It is a required input to the genotype gtc-to-vcf command to enable VCF generation for insertion/deletion variants. gtc-to-vcf
depends on the presence of accurate mapping information within the manifest, and may produce inaccurate results if the mapping information is incorrect. Mapping information follows the implicit dbSNP standard, where
Positions are reported with 1-based indexing.
Positions in the PAR are reported with mapping position to the X chromosome.
For an insertion relative to the reference, the position of the base immediately 5' to the insertion (on the plus strand) is given.
For a deletion relative to the reference, the position of the most 5' deleted based (on the plus strand) is given.
The cluster file (.egt) is a standard product file provided by Illumina for commercial genotyping products and it is a required input for the genotype call command in DRAGEN Array. Custom cluster files may be required for optimal genotyping performance. See section Optimizing cluster files and copy number models for additional details.
The CN (Copy Number) model file (.dat) is a required input to the copy-number call command to enable accurate copy number calling for pharmacogenomics. Illumina provides a standard CN model file for each PGx array product. See section Optimizing cluster files and copy number models for additional details.
The mask file (.msk) is a required input to the copy-number train command to enable accurate copy number training for pharmacogenomics. It does not need to be provided as an explicit input to the command line interface but should reside in the same folder as the BPM manifest. It should have the same base name as the manifest for the product. Illumina provides a mask file for each PGx array product and these can be found on the product files support page.
The PGx database file (.zip) contains the variant mapping information from Infinium PGx arrays to PGx variants. For each gene and each variant used in the star allele definitions of the gene, there is a mapping to the ID field in the SNV VCF file. Each line in the gene mapping file represents a single variant and contains the SNV VCF ID for that variant followed by the HGVS (Human Genome Variation Society) tag for the variant. The PGx database file is array specific and is one of the product files provided by Illumina for each PGx array product.
The genome FASTA file (.fa) is a text file with the reference genome sequences.The FASTA index file (.fai) contains metadata about chromosomal orchestration within the FASTA file for a particular species. DRAGEN Array PGx calling supports human genome build 37 and 38. The genome FASTA file and FASTA index file are both provided by Illumina for human species and should be stored together in the same input folder. For custom reference genomes, the contig identifiers in the provided genome FASTA file must match exactly the chromosome identifiers specified in the provided manifest. For a standard human product manifest, this means that the contig headers should read ">1" rather than ">chr1".
For local analysis, the IDAT sample sheet can be a CSV or JSON formatted file with direct paths to sample IDAT files. It enables easy analysis of samples from different directories.
Example CSV format:
Green IDAT Path,Red IDAT Path
/path/to/sample1_Grn.idat,/path/to/sample1_Red.idat
/path/to/sample2_Grn.idat,/path/to/sample2_Red.idat
/path/to/sample3_Grn.idat,/path/to/sample3_Red.idat
Example JSON format:
[
{
"Green IDAT Path": "/path/to/sample1_Grn.idat",
"Red IDAT Path": "/path/to/sample1_Red.idat"
},
{
"Green IDAT Path": "/path/to/sample2_Grn.idat",
"Red IDAT Path": "/path/to/sample2_Red.idat"
},
{
"Green IDAT Path": "/path/to/sample3_Grn.idat",
"Red IDAT Path": "/path/to/sample3_Red.idat"
},
]
For cloud analysis, the IDAT sample sheet can be a CSV formatted file.
beadChipName,sampleSectionName
Beadchip 1 barcode (204753010023), sample section (R01C01)
Beadchip 1 barcode (204753010023), sample section (R02C01)
Beadchip 2 barcode (204753010024), sample section (R01C01)
Beadchip 2 barcode (204753010024), sample section (R02C01)
For DRAGEN Array Methylation QC on cloud, additional optional sample sheet fields are available.
Following Sample_Group, any number of additional columns can be added to include meta data fields such as sex, sample type, plate and well information, etc. Additional columns added after the Sample_Group column may have user-defined column header values. The Sample_ID field and any additional metadata added will be replicated in the Sample QC Summary output files.
The Sample_Group field will be used to populate the PCA Control Plot within the Sample QC Summary Plots file and the Principal Component Summary file. For the PCA Control Plot, each sample group will be assigned a unique color. Samples assigned to the same Sample_Group value will be the same color in the PCA Control Plot.
beadChipName,sampleSectionName,Sample_ID,Sample_Group,MetaData1
Beadchip 1 barcode (204753010023), sample section (R01C01),NA1231,Group1,F
Beadchip 1 barcode (204753010023), sample section (R02C01),NA1232,Group2,F
Beadchip 2 barcode (204753010024), sample section (R01C01),NA1233,Group2,M
Beadchip 2 barcode (204753010024), sample section (R02C01),NA1234,Group1,M
The GTC sample sheet is a CSV or JSON formatted file with direct paths to sample GTC files. It enables easy analysis of samples from different directories.
Example CSV format:
GTC Path
/path/to/sample1.gtc
/path/to/sample2.gtc
/path/to/sample3.gtc
Example JSON format:
[
{
"GTC Path": "/path/to/sample1.gtc"
},
{
"GTC Path": "/path/to/sample2.gtc"
},
{
"GTC Path": "/path/to/sample3.gtc"
}
]
In addition to the input files, there are set of intermediate files, including GTC, SNV VCF, CNV VCF and PGx CSV, which are outputs of some DRAGEN Array Local commands and inputs to other commands.
The table below summarizes the input files or intermediate file, their sources, and the associated DRAGEN Array Local commands and options.
IDAT
User provided from scanning instrument
genotype call
--idat-folder
CSV Manifest
Product file from Illumina
genotype gtc-to-vcf
--csv-manifest
BPM Manifest
Product file from Illumina
copy-number train
genotype call
genotype gtc-to-bedgraph
genotype gtc-to-vcf
--bpm-manifest
Cluster File
Product file from Illumina or user created using GenomeStudio
genotype call
--cluster-file
CN Model
Product file from Illumina or user created using DRAGEN Array Local
copy-number call
--cn-model
PGx Database
Product file from Illumina
star-allele call
--database
Genome FASTA
Product file from Illumina
genotype gtc-to-vcf
copy-number train
--genome-fasta-file
IDAT Sample Sheet
User provided
genotype call
--idat-sample-sheet
GTC Sample Sheet
User provided
genotype gtc-to-bedgraph
genotype gtc-to-vcf
copy-number call
copy-number train
--gtc-sample-sheet
GTC
DRAGEN Array output from genotype call
genotype gtc-to-bedgraph
genotype gtc-to-vcf
copy-number call
copy-number train
--gtc-folder
SNV and CNV VCF
DRAGEN Array output from genotype gtc-to-vcf and copy-number call
star-allele call
--vcf-folder
PGx CSV
DRAGEN Array output from star-allele call
star-allele annotate
--star-alleles
The following versions of DRAGEN Array have been released:
December 2023
Improved star allele calling accuracy for Global Diversity Array with enhanced PGx (GDA-ePGx) BeadChips.
Reports star allele calls with quality scores for greater transparency and confidence.
Provides missing variant reporting to improve data quality.
Star Allele Calling
Star allele calling for genes listed in PGx Star Allele Coverage
For in-silico datasets, call rate ≥99%, diplotyping accuracy ≥ 90%
Includes reporting of the hybrid star alleles and allelic specific copy number
Provides quality score that estimates confidence in the star allele call as an additional quality metric
Star allele call rate increased through more robust error tolerance and missing data tolerance
Supporting variants and missing variants are listed and can be further reviewed
Quality score indicates confidence in result considering the missing data
Reports alternative ranked PGx star allele solutions
Allows an alternative to be investigated which may be desirable for samples with low confidence calls
Provides quality score (negative log likelihood) for alternative solutions
Function annotations for PGx genes listed in section PGx Allele Definitions and PGx Guidelines
Metabolizer and function annotations are supported for two sets of guidelines from CPIC and DPWG respectively
Activity scores are provided for CYP2C9, CYP2D6, and DPYD
CNV VCF
CNV coverage for genes listed in PGx CNVs Coverage
Compressed and indexed files for size reduction and faster reading
Updated VCF header description to indicate copy number of 5 may be reported by the software
Revised filter field delimiter to comply with VCF 4.3 specification which allows VCF parsing software to parse the file successfully
Genotyping VCF
Compressed and indexed files for size reduction and faster reading
Corrupt or invalid GTC files will abort with an error instead of skipping. The corrupt or invalid GTC files will need to be removed before proceeding.
In the gtc-to-vcf subcommand a mismatch between BPM and CSV manifests will not cause the command to abort with an error. The mismatch will need to be addressed before proceeding.
For gtc-to-vcf, multi-allelic variants designed with multiple assays might not always collapse into one variant correctly and be reported as two separate variants instead. Some indel variants are missing from SNV VCF due to mapping issue between the designed indels and the reference genome.
Manifest names greater than 80 characters will cause failure when converting IDATs to GTCs.
Symbolic links for VCFs are not supported as the inputs to the “star-allele call” subcommand.
The local Linux CLI and Cloud offering do not sort the star_alleles.csv and various fields in the metabolizer_status.json. The local Windows CLI does.
PGx CNV calling and star allele calling and annotation were only validated and intended to be used with GDA_PGx_E2 product files.
Using subcommands “unsquash-duplicates” and “filter loci” during gtc-to-vcf conversion should not be used when star allele calling is desired.
Only CPIC guidelines are available for star allele annotation (metabolizer status calling) for the cloud offering. For local, CPIC and DPWG are available.
DRAGEN Array provides accurate, comprehensive, and efficient analysis of Infinium microarray data. The local command-line interface makes it easy for power users to have granular control and flexibility to support large scale microarray genomic studies.
DRAGEN Array Local utilizes a command-line interface which allows full user control of software functionality and easy automation of tasks. The software is designed to be used by power users and bioinformaticians. If new to using command-line interface, please review the Command-line interface Basics.
Before downloading and installing the software, ensure the following specifications are met for best performance:
CPU
8 cores
Memory
16 GB or more
Hard Drive
30 GB or more of free disk space
Operating System
One of the following:
Windows 10 or later – win10-x64
CentOS 7 or later, Ubuntu 20.04 or later – linux-x64
The star-allele call command in DRAGEN Array Local requires quota to run. The quota is charged per sample analyzed and can be purchased on the Illumina Product Page. Quota is used for all samples analyzed including re-analysis or low-quality samples.
The credential provided in the activation email after purchasing should be used as an input to the star-allele call command through the "--license-server-url" option. During runtime, the logs will record the remaining quota at the beginning and the end of the analysis.
Internet is required to do a software license check and ensure paid quota is available for all samples in the analysis batch. For the software license check, the following endpoint is used: license.edicogenome.com.
Please follow the steps below to install the software on your compute infrastructure:
Click on the latest DRAGEN Array version installation package for the platform of your choice. Installers for Windows and Linux are available on the Illumina Support Site.
Once download is completed, move the DRAGEN Array installation package to the desired folder. Administrative permissions may be required for system folders, for example /usr/local/bin for Linux
, and C:\Program Files
for Windows.
Note: Throughout the remainder of the document, Linux will be assumed in the examples.
Unzip and extract the package. The executable can be found in the dragena subfolder of the software download after extraction.
To check that the DRAGEN Array installation was successful, follow these steps:
Open a command prompt (Windows) or terminal (Linux).
[Optional] Add /path/to/dragena/
, e.g. /usr/local/bin/dragena-linux-x64-DAv1.1.0/dragena/
, to your PATH – to access the executable anywhere in the folder structure
Execute the following command: /path/to/dragena/dragena version
, or if the environmental variable PATH is set: dragena version
The version of the software will be displayed in the terminal window when the installation was successful.
For genotyping analysis, there is no sample minimum required to run analysis.
For CNV PGx analysis, a minimum of 24 samples is required to run analysis. For a successful analysis, 22 samples must pass QC defined as having log R dev < 0.2. With a standard hardware specification in section Computing Requirements, up to 500 GDA-ePGx samples can be processed per analysis batch.
To optimize performance of the targeted PGx CNV caller and minimize batch effect, it is recommended to:
Group samples in the same assay batch (e.g. whole genome amplication and targeted gene application assay batch) into the same analysis batch.
Avoid combining sample batches processed on different reagent lots.
Analyze batches of 96 samples or more.
Samples processed in a two-week period from multiple library preparation batches can be grouped together to meet size requirement of an analysis batch. In such cases, it is recommended to use the same lot of reagents and instruments used in the workflow.
Use the CN Model and PGx Database File provided as part of the standard product files
Use the following instructions to start the full PGx analysis, covering genotyping, PGx CNV and PGx star allele calling. Refer to Command Index for parameters for all commands.
Review section DRAGEN Array Applications for information on input files to use, sample minimums per analysis type and other best practices.
Command examples show analysis for a Linux system using folders instead of sample sheets. For Windows users, make sure to substitute the file paths in the commands following windows conventions, e.g., using backslash (\) instead of forward-slash (/). A sample sheet can be used to select specific samples out of a folder.
Open a command prompt (Windows) or terminal window (Linux) and navigate to the directory where the software was installed. Or a different, desired directory if the executable was added to the PATH environmental variable.
Use the genotype call command to call genotypes and generate GTC files using IDAT files as input.
dragena genotype call --bpm-manifest /user/productfiles/manifest.bpm --cluster-file /user/productfiles/clusterfile.egt --idat-folder /user/IDATs --output-folder /user/gtc
Use the genotype gtc-to-vcf command to create SNV VCF files from the GTC files generated by the genotype call command.
dragena genotype gtc-to-vcf --bpm-manifest /user/productfiles/manifest.bpm --csv-manifest /user/productfiles/manifest.csv --genome-fasta-file /user/productfiles/genome.fa --gtc-folder /user/gtc --output-folder /user/vcf
Use the copy-number call command to call PGx CNVs from the GTC files and produce CNV VCF files. It is recommended to use the same output folder used for SNV VCF since the star-allele call command accepts one VCF folder with SNV and CNV VCFs.
dragena copy-number call --cn-model /user/productfiles/cnv_model.dat --gtc-folder /user/gtc --output-folder /user/vcf
Note: For PGx CNV calling, it is recommended that 96 or more samples passing LogRDev <= 0.2 are included in the analysis.
Use the star-allele call command to generate star allele calls using the CNV and SNV VCF files generated by the gtc-to-vcf and copy-number call commands.
dragena star-allele call --vcf-folder /user/vcf --database /user/productfiles/GDA_ePGx_E2_DAv1.0.0.zip --output-folder /user/star-alleles --license-server-url https://username:password@license.edicogenome.com
Note: For PGx star allele calling, it is recommended to QC the samples and review the samples that have Log R Dev > 0.2, call rate < 0.99, or TGA Control probe < 1.0 to assess the reliability of the analysis. These metrics are provided in the genotyping sample summary file (gt_sample_summary.csv).
Use the star-allele annotate command to summarize the star alleles and add metabolizer statuses to the star alleles generated by the star-allele call command. Guidelines (CPIC or DPWG) can be specified.
dragena star-allele annotate --star-alleles star_alleles.csv --guidelines CPIC --output-folder /user/metabolizer-statuses
[Optional] Use the copy-number train command to retrain the copy number model.
dragena copy-number train --bpm-manifest /user/productfiles/manifest.bpm --genome-fasta-file /user/productfiles/genome.fa --gtc-folder /user/gtc --platform LCG --output-folder /user/productfiles/cnmodelnew
Note: DRAGEN Array will overwrite older files if using the same --output-folder
from a previous analysis.
If this is not desired, use different --output-folder
for re-analyses.
Use the following syntax when using the command-line interface:
dragena [command] [required parameters] [optional parameters]
The root command for actions that act on copy number variants.
copy-number call
Determines copy number variants given genotypes (GTC to CNV VCF).
copy-number help
Displays help information for a copy-number command.
copy-number train
Trains copy number model for a set of samples (GTC to CN Model File).
copy-number version
Displays version information for copy-number.
The command used to call copy number variants. A batch of 24 samples or more are required for analysis. For a successful analysis, 22 samples must pass QC defined as having log R dev < 0.2.
--cn-model
[Required] Specifies the path to the copy number model parameters file (.dat).
--gtc-folder
[Required] Specifies the path to the directory where all genotype files (.gtc) are located. The command cannot be used with --gtc-sample-sheet.
This path also includes the contents of all subdirectories.
--gtc-sample-sheet
[Required] Specifies the path to a sample sheet containing paths to genotype files (.gtc). The sample sheet can be in CSV or JSON format. The command cannot be used with --gtc-folder.
--debug
Includes stack traces in logs. Default is false.
--help
Displays help information for the copy-number call command.
--json-log
Outputs logs in JSON format. Default is false.
--no-bgzip
VCFs are not bgzip compressed (.gz) and no tabix index files (.tbi) are output. Default is false.
--output-folder
[Optional] Specifies the path to the folder where the output files are saved. The output directory structure matches the directory structure of the GTC folder, if the GTC folder is provided.
--version
Displays version information.
Displays help information for a copy-number command.
Trains copy number (CN) model for a set of samples. Generate a new CN model if using a customized cluster file (.egt) optimized for the specific data set.
Execute the train command using the data sets that were used to optimize the cluster file.
To use a CN model generated by the train command, the mask file for the manifest must be saved in the same directory as the manifest.
A minimum of 96 samples is required to use the copy-number train command. For optimal performance, at least 150 is recommended.
For best performance, validate the CN model using truth data before using in CN calling.
See Optimizing cluster files and copy number models for further details.
--bpm-manifest
[Required] Specifies the path to the bead pool manifest in BPM format. Assumes mask file (.msk) is in the same directory.
--genome-fasta-file
[Required] Specifies the path to the genome FASTA file (.fa). Assumes FASTA index file (.fai) is in the same directory.
--gtc-folder
[Required] Specifies the path to the directory where all genotype files (.gtc) are located. Can be in CSV or JSON format. Cannot be used with --gtc-sample-sheet.
This path also includes the contents of all subdirectories.
--gtc-sample-sheet
[Required] Specifies the path to a sample sheet containing paths to genotype files (.gtc). Can be in CSV or JSON format. Cannot be used with --gtc-folder.
--platform
[Required] Specifies which microarray platform generated the data. Set this to 'LCG' for GDA-ePGx, 'EX' for GSAv4-ePGx or GCRA-ePGx.
--debug
Includes stack traces in logs. Default is false.
--disable-genome-cache
Disables the reference genome cache.
--help
Displays help information for the copy-number train command.
--json-log
Outputs logs in JSON format. Default is false.
--version
Displays version information.
--output-folder
[Optional] The location to output the CN model. By default, the output folder is the current working directory.
Displays version information for copy-number command.
The root command for genotype calling.
genotype call
Determines genotype calls (GTC) from IDAT files.
genotype gtc-to-bedgraph
Converts GTC to BedGraphs, producing BedGraph formatted visualization files from the log R ratio data contained in the GTC intermediate files.
genotype gtc-to-vcf
Converts GTC to VCF.
genotype help
Displays the help information for the genotype command.
genotype version
Displays version information for the genotype command.
Determines genotype calls (GTC) from IDAT files.
--bpm-manifest
[Required] Specifies the path to the bead pool manifest in BPM format.
--cluster-file
[Required] Specifies the path to the EGT cluster file to use.
--idat-folder
[Required] Specifies the path to the directory where all intensity data IDATs (for the samples to be processed) are located. Must be in IDAT format. Cannot be used with --idat-sample-sheet.
This path also includes the contents of all subdirectories.
--idat-sample-sheet
[Required] Specifies the path to a sample sheet containing paths to intensity data IDATs. Can be in CSV or JSON format. Cannot be used with --idat-folder.
--debug
Includes stack traces in logs. Default is false.
--gencall-cutoff
GenCall score cutoff to label a NoCall. Default is 0.15.
--help
Displays help information for the genotype call command.
--json-log
Outputs logs in JSON format. Default is false.
--num-threads
Number of parallel threads to run.
--output-folder
[Optional] Specifies the path to the folder where the output files are saved. The output directory structure matches the directory structure of the IDAT folder, if the IDAT folder is provided.
--version
Displays version information.
Converts GTC to BedGraph files, producing BedGraph formatted visualization files from the log R ratio data contained in the GTC intermediate files.
--bpm-manifest
[Required] Specifies the path to the bead pool manifest in BPM format.
--gtc-folder
[Required] Specifies the path to the directory where all genotype (.gtc) files are located. Cannot be used with --gtc-sample-sheet.
This path also includes the contents of all subdirectories.
--gtc-sample-sheet
[Required] Specifies the path to a sample sheet containing paths to genotype files (.gtc). Can be in CSV or JSON format. Cannot be used with --gtc-folder.
--debug
Include stack traces in logs. Default is false.
--help
Displays help information for the genotype gtc-to-bedgraph command.
--json-log
Outputs logs in JSON format. Default is false.
--output-folder
[Optional] Specifies the path to the folder where the output files are saved. The output directory structure matches the directory structure of the GTC folder, if the GTC folder is provided.
--version
Displays version information.
Converts GTC (v5) to SNV VCF Files. The command is only applicable for Genotype Call Files produced by DRAGEN Array.
--bpm-manifest
[Required] Specifies the path to the bead pool manifest in BPM format.
--csv-manifest
[Required] Specifies the path to the CSV manifest with SourceSeq column.
--genome-fasta-file
[Required] Specifies the path to the genome FASTA file (.fa). Assumes FASTA index file (.fai) is in the same directory.
--gtc-folder
[Required] Specifies the path to the directory where all genotype files (.gtc) are located. Cannot be used with --gtc-sample-sheet.
This path also includes the contents of all subdirectories.
--gtc-sample-sheet
[Required] Specifies the path to a sample sheet containing paths to genotype files (.gtc). Can be in CSV or JSON format. Cannot be used with --gtc-folder.
--auxiliary-loci
Specifies the path to the VCF file with auxiliary definitions of loci, such as for multi-nucleotide variants.
--debug
Include stack traces in logs. Default is false.
--disable-genome-cache
Disables the reference genome cache.
--filter-loci
Generates a text file containing a list of probe names to be filtered.
--unsquash-duplicates
Generates unique VCF records for duplicate assays. Default is false.
--help
Displays help information for the genotype gtc-to-vcf command.
--json-log
Outputs logs in JSON format. Default is false.
--no-bgzip
VCFs are not bgzip compressed (.gz) and no tabix index files (.tbi) are output. Default is false.
--output-folder
[Optional] Specifies the path to the folder where the output files are saved. The output directory structure matches the directory structure of the GTC folder, if GTC folder is provided.
--version
Displays version information.
Squashing duplicates
In the manifest, there can be cases where the same variant is probed by multiple different assays. These assays may be the same design or alternate designs for the same locus. In the default mode of operation, these duplicates will be "squashed" into a single record in the VCF to reflect a true variant rather than probe genotype. The method used to incorporate information across multiple assays is defined further in the VCF description . When the --unsquash-duplicates
option is provided, this "squashing" behavior is disabled, and each duplicate assay will be reported in a separate entry in the VCF file. This option is helpful when you are interested in investigating or validating the performance of individual assays, rather than trying to generate genotypes for specific variants. Note that if a locus has more than two alleles and is also queried with duplicated designs, the duplicates will not be unsquashed (i.e., in the case of multi-allelic variants). DO NOT use --unsquash-duplicates
option if doing star allele calling downstream as that command expects squashed variants.
Genome cache
By default, the entire reference genome will be read into memory. Generally, this will be more efficient than reading data from the indexed reference on disk at the expense of greater memory utilization. For situations in which the genome caching is not desirable (low memory availability or a small input manifest), it is possible to disable this default behavior with the --disable-genome-cache
option.
Auxiliary loci
Certain classes of variant types (such as multi-nucleotide variants) are not currently supported in the upstream analysis software that produces GTC files. However, it is possible to query this type of variant by creating a SNP design that differentiates the specific multi-nucleotide alleles of interest. For example, if the true source sequence is
ATGC[AT/CG]GTAA
This assay could be designed as a SNP assay with the following source sequence
ATGC[A/C]NNNN
gtc-to-vcf
provides an option (--auxiliary-loci
) to supply a list of auxiliary records (in VCF format) to restore the true alleles for these cases in the output VCF. There are several restrictions around this function
The auxiliary definition must NOT be a multi-allelic variant.
The auxiliary definition must be a multi-nucleotide variant.
There must NOT be multiple array assays (e.g., duplicates) for the locus.
Note: The genome fasta files for human genomes are provided by illumina on the support site.
Displays the help information for a genotype command.
Displays current DRAGEN Array Local version.
Displays the help information.
Displays current DRAGEN Array Local version.
The root command PGx star allele calling.
star-allele call
Determines PGx star allele and variant genotypes.
star-allele annotate
Annotate PGx gene functions and product JSON report.
star-allele help
Displays help information for a star allele command.
star-allele version
Displays version information for star allele.
Displays help information for a star-allele command.
Displays version information for star-allele.
Calls PGx star allele diplotypes. The SNV VCF files should be generated using the DRAGEN Array gtc-to-vcf command with unsquash-duplicates off (default) and without filter loci.
--database
[Required] The PGx database file (.zip).
--license-server-url
[Required] The license server url with credentials.
--vcf-folder
[Required] The directory containing *.snv.vcf.gz and *.cnv.vcf.gz files.
--query-license-quota
During beginning and end of analysis, the license server will be queried for the quotas on the valid license(s) and display the result.
--debug
Includes stack traces in logs. Default is false.
--help
Displays help information for the star-allele call command.
--json-log
Outputs logs in JSON format. Default is false.
--output-folder
[Optional] Directory path to output files. Default is the current working directory.
--version
Displays version information.
Annotates and summarizes the star-alleles, specifically for metabolizer statuses and outputs in a consolidated JSON report. Metabolizer status is determined through direct lookup into public PGx guidelines CPIC or DPWG as specified by the user.
--star-alleles
[Required] Path to star alleles file (.csv) generated by the call subcommand.
--guidelines
PGx guidelines to use for annotation. Valid values are ‘CPIC’ and ‘DPWG’. Default is ‘CPIC’.
--debug
Includes stack traces in logs. Default is false.
--help
Displays help information for the star-allele annotate command.
--json-log
Outputs logs in JSON format. Default is false.
--output-folder
[Optional] Directory path to output files. Default is the current working directory.
--version
Displays version information.
DRAGEN Array Local utilizes a command-line interface which allows full user control of software functionality and easy automation of tasks. The software is designed to be used by power users and bioinformaticians.
When using command-line consider the following tips:
Spaces cannot be part of a file name in a command. If the file name has spaces, use quotes around the file name
To correct a typing error in a previously entered command, use the up arrow to repeat the previous command, then correct the error before re-entering it.
Double check the command. Misspelling, extra, or missing dashes, etc. will cause the command to be unrecognizable by the software.
When entering paths or long names, copy and paste the values to help avoid errors.
If using Windows, use a File Explorer window to navigate to the product file or folder that is needed by the DRAGEN Array Local command. While holding down the shift button on the keyboard, right click the file and select the 'Copy as Path' option. Then paste the copied path into the command prompt to use the file or folder.
To cancel a command while it is running, press Control + C on the keyboard.
A Cluster File (.egt) contains the cluster positions of every probe used for genotyping analysis. Illumina provides a standard cluster file for all commercial Infinium BeadChips. It may be desirable to create a custom cluster file if the one provided does not fit the data well or if a semi-custom or custom BeadChip, that do not come with a cluster file, are used. GenomeStudio 2.0 is the software used to create custom cluster files.
To facilitate the review and optimization of PGx variant GenTrain cluster positions, a GenomeStudio auxiliary file is provided for each PGx Array product through the DRAGEN Array Support Site and array product files page, e.g. Infinium Global Diversity Array with Enhanced PGx Product Files. The auxiliary file is a tab-delimited text file that can be imported into GenomeStudio through Column Import. The file contains the Infinium Assay to PGx star allele mapping, covering the variants involved in DRAGEN Array PGx star allele calling.
When updating the cluster file for pharmacogenomic applications, understand the specifications for the copy number model file before beginning.
Before creating a custom cluster file, review the Infinium Genotyping Data Analysis Technical Note, the Infinium Arrays Support Webinar Video, and Custom cluster file creation for improved copy number analysis.
A Copy Number (CN) Model File (.dat) contains the data needed to make accurate copy number calls for pharmacogenomics. This file is used in the creation CNV VCFs which are inputs to the star allele calling command. Illumina provides a standard CN model file for all commercial PGx Infinium BeadChips. If it is determined the cluster file needs to be customized, the CN Model File should also be updated using the copy-number train command available with DRAGEN Array Local only. i.e.,
Use GenomeStudio 2.0 to generate a new cluster file.
Use the genotype call command to call genotypes and generate GTC files using IDAT files as input.
dragena genotype call --bpm-manifest /user/productfiles/manifest.bpm --cluster-file /user/productfiles/new_clusterfile.egt --idat-folder /user/IDATs --output-folder /user/new_gtcs
Use the copy-number train command to retrain the copy number model. Note: The --platform option can be found in the Assay Format
heading value from the CSV manifest.
dragena copy-number train --bpm-manifest /user/productfiles/manifest.bpm --genome-fasta-file /user/productfiles/genome.fa --gtc-folder /user/new_gtcs --platform LCG --output-folder /user/productfiles/new_cnmodel
Use the new_cnmodel
for subsequent copy-number call
commands.
Note the difference in the cluster file requirement based upon the version of DRAGEN Array used:
Version 1.1: If using a CN model with a different cluster file, the software will provide a warning but will proceed with copy number calling. As a result, a user can choose to keep using the commercial CN model from illumina in combination with custom updated EGT file in the PGx analysis.
Version 1.0: The same cluster file used for copy number training must be used to generate GTC files for copy number calling. Otherwise, the software will produce an error and exit.
For reference, see the Command Index for details of copy-number train
command.
To retrain the CN model file, 96 samples must be used at minimum with 90 of those samples passing QC defined as Log R Dev less than or equal to 0.2. It is recommended to train with at least 150 samples. A greater number of samples can be advantageous, but diminishing returns and longer computation times are seen after 3,000 samples.
It is recommended to manually QC the training samples and remove samples that have Log R Dev > 0.2, call rate < 0.99, or TGA Control probe < 1.0 so only the highest quality samples are used in the training. The same samples used to create the new cluster file should be used to retrain the CN Model. To minimize batch effect in the training sample set, the samples should be analyzed in as few batches as possible and come from the same reagent lots.
The copy-number train algorithm is designed with the assumption that the copy number distribution resembles the standard population distributions. This ensures the updated CN model file is representative of the normal populations in which it will be used to calculate copy number for key pharmacogenomic targets.
Semi-custom arrays add additional content or other pre-designed Infinium booster content to enhance the commercial array content. This additional content can be analyzed for genotyping applications to obtain information on SNV and indel calls.
For pharmacogenomic applications, PGx CNV and star allele calls are limited to content included on the commercial Infinium PGx arrays. Additional semi-custom content will not be included in the pharmacogenomic results.
When designing a semi-custom array using a commercial Infinium PGx array backbone, such as the Global Diversity Array with enhanced PGx, it is important to retain all backbone content in the design as removing content could decrease the quality of result.
Pharmacogenomic analysis for semi-custom arrays should be run using DRAGEN Array Local. Because the PGx CNV calling and PGx star allele calling algorithms are only compatible with commercial product files (see Applications), to fully analyze semi-custom PGx beadchips some steps of the pipeline can be run twice; once with the semi-custom product files (to get complete semi-custom SNV VCF files), and once with the commercial product files (to get the PGx CNV VCF files, PGx Star Allele output, and metabolizer report).
The semi-custom product files can be used via the Command-line interface in genotype call
, genotype gtc-to-vcf
, and used in GenomeStudio, i.e.,
Use GenomeStudio 2.0 to prepare a custom cluster file for the semi-custom array, following guidance outlined in Custom cluster file creation for improved copy number analysis.
Open a command prompt (Windows) or terminal window (Linux) and navigate to the directory where the software was installed. Or a different, desired directory if the executable was added to the PATH environmental variable.
Use the genotype call command to call all semi-custom genotypes and generate custom content GTC files using IDAT files as input.
dragena genotype call --bpm-manifest /user/productfiles/semi_custom_manifest.bpm --cluster-file /user/productfiles/semi_custom_clusterfile.egt --idat-folder /user/IDATs --output-folder /user/semi_custom_gtcs
Use the genotype gtc-to-vcf command to create custom content SNV VCF files from the custom content GTC files generated by the genotype call command.
dragena genotype gtc-to-vcf --bpm-manifest /user/productfiles/semi_custom_manifest.bpm --csv-manifest /user/productfiles/semi_custom_manifest.csv --genome-fasta-file /user/productfiles/genome.fa --gtc-folder /user/semi_custom_gtcs --output-folder /user/semi_custom_vcfs
Perform Quick Start steps 1-6 using the commercial Infinium PGx array product files to obtain PGx CNV VCFs, star allele calls, and metabolizer status annotations.
Keep the GTC files and SNV VCF files generated using the semi-custom product files in clearly labelled folders to distinguish them from the GTC and SNV VCF files generated using the commercial product files. Note that the GTC and SNV VCFs generated using the commercial product files will not contain genotypes for the semi-custom/add-on content. The GTC and SNV VCFs generated using the semi-custom product files cannot be used for downstream PGx analysis commands.
Is DRAGEN Array analysis a local (on-premises) or cloud solution? DRAGEN Array analysis is available locally (on-premises) and cloud.
DRAGEN Array Local Analysis utilizes a command-line interface for power users to have granular control and flexibility to support large scale microarray genomic studies. Deployed on Windows or Linux operating systems, the local package is CPU-based and does not require a specialized server or hardware.
DRAGEN Array Cloud Analysis utilizes the user-friendly, graphical interface of BaseSpace Sequence Hub to simplify analysis setup and kickoff.
Which Infinium arrays is DRAGEN Array compatible with? Refer to the Product and Analysis Compatibility table in the Applications section.
How many samples are needed per analysis? Genotyping: As few as one sample can be used for genotyping. Multiple analysis batches can be kicked off and run in parallel.
Pharmacogenomics: A minimum of 24 samples is required for PGx CNV calling with 22 passing QC. Passing QC is defined as Log R Dev < 0.2. 96 samples are recommended for the most accurate CNV results. Multiple analysis batches can be kicked off and run in parallel.
Which PGx CNVs and star alleles are available? Please refer to the DRAGEN Array release notes.
Where can I find demo data? Demo data is available in BaseSpace under the “Demo Data” section. All array data starts with “iScan:” and includes the name of the type of analysis. Supported types of analysis can be found in the Applications section.
March 2024
Ability to genotype and produce related reports for human and non-human arrays in the cloud.
Configureable interfaces in Basespace that allows for flexibility and easy kick off.
Some multi-nucleotide variant (MNV) designs reverse compliment the "Allele1/2 Top" fields in the Final Report
Genotyping only works on diploid organisms at this time. Polyploid genotyping is not currently supported.
May 2024
Adjustable thresholds to determine pass/fail status
Data summary plots for a quick visual check of each analysis batch
Determining detection p-value, beta-values, and m-values from each methylation sample
Deployment on BaseSpace™ Sequence Hub user interface for easy analysis kickoff
Adjustable thresholds for 21 built in controls, p-value detection, proportion probes passing, and offset correction within BaseSpace Sequence Hub to customize for user’s study needs
Thresholds are used to assign pass (1) or fail (0) status to each sample
Failed metrics can be highlighted for easy viewing
Pinpoint areas of failure including bisulfite conversion, staining, hybridization, etc. to identify assay steps in need of troubleshooting
Quantitative values for each control removing ambiguity with manual interpretation
Data summary plots with information on passing p-value detection and principal component analysis of beta values
Provides detection p-value, beta-values and m-values for each CG site per sample to use in downstream analysis
Standard thresholds may not be applicable for all discontinued, semi-custom or custom BeadChips and IDATs originating from NextSeq550
Built-in controls may not be available on all discontinued, semi-custom or custom BeadChips
The following Types of Analysis are currently supported by DRAGEN Array:
DRAGEN Array – Genotyping
DRAGEN Array – PGx – CNV calling
DRAGEN Array – PGx – Star allele annotation
DRAGEN Array - Methylation QC
These products/beadchips have been verified to be compatible with the following analyses and versions of DRAGEN Array:
BovineSNP50_v3_A
v1.0, v1.1
DRAGEN Array – Genotyping
UMD3
GDA-8v1-0_D
v1.0, v1.1
DRAGEN Array – Genotyping
GRCh37, GRCh38
GDA_PGx-8v1-0_20042614_E
v1.0, v1.1
DRAGEN Array – Genotyping
GRCh37, GRCh38
GDA_PGx-8v1-0_20042614_E
v1.0, v1.1
DRAGEN Array – PGx - CNV calling
GRCh37, GRCh38
GDA_PGx-8v1-0_20042614_E
v1.0
DRAGEN Array – PGx - Star allele annotate
GRCh38
GDA_PGx-8v1-0_20042614_G
v1.1
DRAGEN Array – Genotyping
GRCh38
GDA_PGx-8v1-0_20042614_G
v1.1
DRAGEN Array – PGx - CNV Calling
GRCh38
GDA_PGx-8v1-0_20042614_G
v1.1
DRAGEN Array – PGx - Star allele annotate
GRCh38
GSA-24v3-0_A
v1.0, v1.1
DRAGEN Array – Genotyping
GRCh37, GRCh38
GSA-PGx-48v4-0_20079540_E
v1.1
DRAGEN Array – Genotyping
GRCh38
GSA-PGx-48v4-0_20079540_E
v1.1
DRAGEN Array – PGx - CNV Calling
GRCh38
GSA-PGx-48v4-0_20079540_E
v1.1
DRAGEN Array – PGx - Star allele annotate
GRCh38
GCRA-PGx-24v1-0_20084467_C
v1.1
DRAGEN Array – Genotyping
GRCh38
GCRA-PGx-24v1-0_20084467_C
v1.1
DRAGEN Array – PGx - CNV Calling
GRCh38
GCRA-PGx-24v1-0_20084467_C
v1.1
DRAGEN Array – PGx - Star allele annotate
GRCh38
PRSbooster_20083382_A
v1.0, v1.1
DRAGEN Array – Genotyping
GRCh37
EPIC-8v1-0_B5
v1.0
DRAGEN Array – Methylation - QC
GRCh38
EPIC-8v2-0_A2
v1.0
DRAGEN Array – Methylation - QC
GRCh38
MSA-48v1-0_20102838_A1
v1.0
DRAGEN Array – Methylation - QC
GRCh38
Summary
Provides genotyping results for any human Infinium genotyping array.
Variant types detected
SNV
Indel
Sample minimum
1 sample
Arrays supported
Related Local Commands
Genotype Call
Genotype GTC-to-VCF
Related Cloud Specifics
Select Type of Analysis DRAGEN Array - Genotyping from the dropdown. Max 1152 samples are supported.
Inputs
Outputs
Per sample:
Per analysis batch:
Cost
Summary
Provides CNV calling on 7 target PGx genes across 10 target regions, plus genotyping outputs.
Variant types detected
SNV
Indel
CNV
Sample minimum
Minimum of 24 samples with 22 passing QC defined as Log R Dev < 0.2. 96 samples are recommended for best results.
Arrays supported
Related Local Commands
Genotype Call
Genotype GTC-to-VCF [optional]
Copy-number Call
Related Cloud Specifics
Select Type of Analysis DRAGEN Array - PGx – CNV calling from the dropdown. Max 384 samples are supported.
Inputs
Outputs
Per sample:
Per analysis batch:
Cost
Summary
Provides PGx annotation on over 50 genes, plus PGx CNV and genotyping outputs
Variant types detected
SNV
Indel
CNV
Star allele diplotype
Sample minimum
Minimum of 24 samples with 22 passing QC defined as Log R Dev < 0.2. 96 samples are recommended for best results.
Arrays supported
Related Local Commands
Genotype call
Genotype GTC-to-VCF [optional]
Copy-number call
Star-allele call
Star-allele annotate
Related Cloud Specifics
Select Type of Analysis DRAGEN Array - PGx – Star Allele Annotation from the dropdown. Max 384 samples are supported.
Inputs
Outputs
Per sample:
Per analysis batch:
Cost
Local: Per sample analysis.
Summary
Provides methylation QC for Infinium methylation arrays.
Variant types detected
N/A
Sample minimum
1 sample
Arrays supported
Recommended thresholds and all built-in control probes are available for Methylation Screening Array (MSA) and MethylationEPIC (v1 & v2) originating from iScan. In non-human and custom arrays, availability of built-in QC probes may vary, and failure thresholds must be defined by the user.
Related Local Commands
Not available on DRAGEN Array Local.
Related Cloud Specifics
Inputs
Outputs
Cost
The following section describes the outputs produced by DRAGEN Array.
DRAGEN Array produces one CNV variant call file (VCF) (*.cnv.vcf) per sample to report the CN status on the gene and sub gene level, along with the CN events for PGx targets.
The CNV VCF output file follows the standard VCF format. The QUAL field in the VCF file measures the CNV call quality. The CNV call quality is a Phred-scaled score capped at 60 and the minimal value is 0. Low quality calls (QUAL<7) are flagged by the Q7 filter. Low quality samples with LogRDev greater than a threshold 0.2 are flagged with the SampleQuality flag.
The CNV VCF files are by default bgzipped (Block GZIP) and have the “.gz” extension. The compression saves storage space and facilitates efficient lookup when indexed with the TBI Index File. To view these files as plain text, they can be uncompressed with bgzip from Samtools or other third-party tools. The CNV VCF must be bgzipped and indexed to be used in downstream DRAGEN Array commands, such as star allele calling.
The CNV VCF output file includes the following content.
##fileformat=VCFv4.1
##source=dragena 1.1.0
##genomeBuild=38
##reference=file:///hg38_with_alt/hg38_nochr_MT.fa
##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events. CN=5 indicates 5 or 5+">
##FORMAT=<ID=NR,Number=1,Type=Float,Description="Aggregated normalized intensity">
##ALT=<ID=CNV,Description="Copy number variant region">
##FILTER=<ID=Q7,Description="Quality below 7">
##FILTER=<ID=SampleQuality,Description="Sample was flagged as potentially low-quality due to high noise levels.">
##INFO=<ID=CNVLEN,Number=1,Type=Integer,Description="Number of bases in CNV hotspot">
##INFO=<ID=PROBE,Number=1,Type=Integer,Description="Number of probes assayed for CNV hotspot">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of CNV hotspot">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Structural Variant Type">
##CNVOverallPloidy=1.8
##CNVGCCorrect=True
##contig=<ID=1,length=248956422>
##contig=<ID=4,length=190214555>
##contig=<ID=10,length=133797422>
##contig=<ID=16,length=90338345>
##contig=<ID=19,length=58617616>
##contig=<ID=22,length=50818468>
##contig=<ID=22_KI270879v1_alt,length=304135>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 204619760001_R01C01
1 109687842 CNV:GSTM1:chr1:109687842:109693526 N <CNV> 60 PASS CNVLEN=5685;PROBE=124;END=109693526;SVTYPE=CNV CN:NR 2:0.966631132771593
4 68537222 CNV:UGT2B17:chr4:68537222:68568499 N <CNV> 60 PASS CNVLEN=31278;PROBE=383;END=68568499;SVTYPE=CNV CN:NR 0:0.376696837881692
10 133527374 CNV:CYP2E1:chr10:133527374:133539096 N <CNV> 60 PASS CNVLEN=11723;PROBE=194;END=133539096;SVTYPE=CNV CN:NR 2:0.980059731860893
16 28615068 CNV:SULT1A1:chr16:28603587:28613544 N <CNV> 57 PASS CNVLEN=8315;PROBE=164;END=28623382;SVTYPE=CNV CN:NR 2:0.980552325552963
19 40844791 CNV:CYP2A6.intron.7:chr19:40844791:40845293 N <CNV> 60 PASS CNVLEN=503;PROBE=38;END=40845293;SVTYPE=CNV CN:NR 2:0.9663775484762
19 40850267 CNV:CYP2A6.exon.1:chr19:40850267:40850414 N <CNV> 60 PASS CNVLEN=148;PROBE=21;END=40850414;SVTYPE=CNV CN:NR 2:0.9663775484762
22 42126498 CNV:CYP2D6.exon.9:chr22:42126498:42126752 N <CNV> 48 PASS CNVLEN=255;PROBE=370;END=42126752;SVTYPE=CNV CN:NR 2:0.981703411438716
22 42129188 CNV:CYP2D6.intron.2:chr22:42129188:42129734 N <CNV> 10 PASS CNVLEN=547;PROBE=333;END=42129734;SVTYPE=CNV CN:NR 2:0.965498002434641
22 42130886 CNV:CYP2D6.p5:chr22:42130886:42131379 N <CNV> 60 PASS CNVLEN=494;PROBE=172;END=42131379;SVTYPE=CNV CN:NR 2:0.970341562236357
22_KI270879v1_alt 270316 CNV:GSTT1:chr22_KI270879v1_alt:270316:278477 N <CNV> 60 PASS CNVLEN=8162;PROBE=91;END=278477;SVTYPE=CNV CN:NR 2:1.01191145130511
The software produces one genotyping variant call file (*.snv.vcf) file per sample, covering single nucleotide variants (SNV) and indels for the sample. It reports GenCell score (GS), B Allele Frequency (BAF), and Log R Ratio (LRR) per variant. The VCF file output follows VCF4.1 format.
Some additional details:
Genotypes are adjusted to reflect the sample ploidy. Calls are haploid for loci on Y, MT, and non-PAR chromosome X for males.
Multiple SNPs in the input manifest which are mapped to the same chromosomal coordinate (e.g. tri-allelic loci or duplicated sites) are collapsed into one VCF entry and a combined genotype generated. To produce the combined genotype, the set of all possible genotypes is enumerated based on the queried alleles. Genotypes which are not possible based on called alleles and assay design limitations (e.g. Infinium II designs cannot distinguish between A/T and C/G calls) are filtered. If only one consistent genotype remains after the filtering process, then the site is assigned this genotype. Otherwise, the genotype is ambiguous (more than 1) or inconsistent (less than 1) and a no-call is returned.
Certain SNV and indel calls can be skipped when reported in the VCF. Skipped data can include unmapped loci, intensity-only probes used for CNV identification, and indels that do not map back to the genome. See Warning/Error Messages and Logs for messages that may be seen with DRAGEN Array Local related to the skipped data.
The BAF and LRR are oriented with Ref as A and Alt as B relative to the reference genome, while GS is agnostic to the reference genome. Users familiar with GenomeStudio may observe BAF and LRR reported in the VCF as 1 minus the value reported in GenomeStudio depending on the Ref Alt allele orientation with the reference genome. GenomeStudio reports these values based on the information in the manifest without knowledge of the reference genome.
The SNV VCF files are by default bgzipped (Block GZIP) and have the “.gz” extension. The compression saves storage space and facilitates efficient lookup when indexed with the TBI Index File. To view these files as plain text, they can be uncompressed with bgzip from Samtools or other third-party tools. The SNV VCF must be bgzipped and indexed to be used in downstream DRAGEN Array commands, such as star allele calling.
The SNV VCF output file includes the following content. The last row shows an example of variant call.
##fileformat=VCFv4.1
##source=dragena 1.1.0
##genomeBuild=38
##reference=file:///genomes/38/genome.fa
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GS,Number=1,Type=Float,Description="GenCall score. For merged multi-assay or multi-allelic records, min GenCall score is reported.">
##FORMAT=<ID=BAF,Number=1,Type=Float,Description="B Allele Frequency">
##FORMAT=<ID=LRR,Number=1,Type=Float,Description="LogR ratio">
##contig=<ID=1,length=248956422>
##contig=<ID=2,length=242193529>
##contig=<ID=3,length=198295559>
##contig=<ID=4,length=190214555>
##contig=<ID=5,length=181538259>
##contig=<ID=6,length=170805979>
##contig=<ID=7,length=159345973>
##contig=<ID=8,length=145138636>
##contig=<ID=9,length=138394717>
##contig=<ID=10,length=133797422>
##contig=<ID=11,length=135086622>
##contig=<ID=12,length=133275309>
##contig=<ID=13,length=114364328>
##contig=<ID=14,length=107043718>
##contig=<ID=15,length=101991189>
##contig=<ID=16,length=90338345>
##contig=<ID=17,length=83257441>
##contig=<ID=18,length=80373285>
##contig=<ID=19,length=58617616>
##contig=<ID=20,length=64444167>
##contig=<ID=21,length=46709983>
##contig=<ID=22,length=50818468>
##contig=<ID=MT,length=16569>
##contig=<ID=X,length=156040895>
##contig=<ID=Y,length=57227415>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 202937470021_R06C01
1 2290399 rs878093 G A . PASS . GT:GS:BAF:LRR 0/1:0.7923:0.50724137:0.14730307
The genotype call algorithm produces one genotype call file (.gtc) per sample analyzed. The Genotype Call (GTC) file contains the small variant (SNV and indel) genotype for each marker specified by the product and sample quality metrics. The sample marker location is not included and must be extracted from the manifest file. Binary proprietary format can be parsed using the Illumina open-source tool BeadArray Library File Parser.
The BedGraph file contains the log R ratios from the genotyping algorithm for use in visual tools.
The Star Allele CSV file is an intermediate file generated by the star-allele call command and serves as the input to the star-allele annotate command. It contains all the star allele calls for all samples in a run. Each row in the file provides either a star allele diplotype or simple variant call for a PGx-related gene. Star allele diplotype calls for a sample and a gene may span multiple lines where alternative solutions can be listed.
The Star Allele CSV file also contains meta information marked by # at the top of the file for the genome build and PGx database used for the star allele calling.
The star_allele.csv file contains the following details per sample:
Sample
Sentrix barcode and position of the sample.
Rank
Rank of a single star allele solution for a gene. The top solution based on quality score is ranked as 1 with the alternative solutions ranked lower.
Gene or Variant
The gene symbol, or gene symbol plus rsID for variants.
Type
‘Haplotype’ (star allele) or ‘Variant’ PGx calling type.
Solution
Star allele or variant solution. If diploid, variant solutions have the format of Allele1/Allele2.
Solution Long
Long format solution for star alleles. The field has the following format: Structural Variant Type: Underlying Star allele.
An example of a long solution is: Complete: CYP2D64, Complete: CYP2D610, CYP2D668: CYP2D64 where there are two complete alleles that have CYP2D64 and CYP2D610 haplotypes and one CYP2D668 structural variant that has a CYP2D64 haplotype configuration.
Supporting Variants
All variants present in the array that support the star allele solution. The field has the following format: Long Solution Star Allele: (Supporting Variants).
Each supporting variant is listed with essential information extracted from the SNV VCF to assist with troubleshooting, including Chromosome, Location, Reference allele, Alternative allele, Genotype, GenCall score (GS), and B-allele frequency (BAF).
Missing/Masked Core Variants
All variants not present in the array or not called in the SNV VCF file for the star allele. The field has the following format: Long Solution Star-Allele: (Missing Variants).
All Missing Variants in Array
All core definition variants that are not on the array or are not called in the SNV VCF along with the associated star alleles that are impacted. The field has the following format: Missing Variant: (List of impacted star alleles).
Collapsed Star-Alleles
Star alleles that cannot be distinguished from the solution star allele given the input array’s content. The field has the following format: Long Solution Star-Allele: (List of collapsed star alleles).
The most frequent star allele based on the population frequency of PGx alleles will be the star allele in the solution.
Score
Quality score of the solution including the population frequency of PGx alleles. The score ranges from 0 to 1.
Raw Score:
Raw quality score of the solution without including the population frequency of PGx alleles. The score ranges from 0 to 1.
Copy Number Solution
Estimated copy number for each gene region. The field has the following format: Gene Region: Copy Number.
Below is an example of the first 4 columns from a star allele CSV file:
Sample,Rank,Gene or Variant,Type,Solution
204650490282_R02C01,1,CYP2C9,Haplotype,*9/*11
204650490282_R02C01,1,CYP2C19,Haplotype,*2/*10
The software produces genotype summary files (gt_sample_summary.csv and gt_sample_summary.json) that contains the following details per sample:
Sample ID
Sample Name
Sample Folder
Autosomal Call Rate
Call Rate
Log R Ratio Std Dev
Sex Estimate
TGA_Ctrl_5716 Norm R
The TGA_Ctrl_5716 Norm R field is specific to PGx products (e.g., Global Diversity Array with enhanced PGx). The field value is the Normalized R value of one probe and is meant as an assay control where < 1 indicates the sample failed in the TGA (Targeted Gene Amplification) process. If the product does not have this probe, it is not included in the gt_sample_summary.
DRAGEN Array Cloud produces a Final Report (gtc_final_report.csv) per analysis batch similar to the one available in GenomeStudio. It contains the following details per locus per sample:
SNP Name
SNP identifier.
SNP
SNP alleles as reported by assay probes. Alleles on the Design strand (the ILMN strand) are listed in order of Allele A/B.
Sample ID
Sample identifier.
Allele 1 – Top
Allele 1 corresponds to Allele A and are reported on the Top strand.
Allele 2 – Top
Allele 2 corresponds to Allele B and are reported on the Top strand.
Allele 1 – Forward
Allele 1 corresponds to Allele A and are reported on the Forward strand.
Allele 2 – Forward
Allele 2 corresponds to Allele B and are reported on the Forward strand.
Allele 1 – Plus
Allele 1 corresponds to Allele A and are reported on the Plus strand.
Allele 2 – Plus
Allele 2 corresponds to Allele B and are reported on the Plus strand.
GC Score
Quality metric calculated for each genotype (data point), and ranges from 0 to 1.
GT Score
The SNP cluster quality. Score for a SNP from the GenTrain clustering algorithm.
Log R Ratio
Base-2 log of the normalized R value over the expected R value for the theta value (interpolated from the R-values of the clusters). For loci categorized as intensity only; the value is adjusted so that the expected R value is the mean of the cluster.
B Allele Freq
B allele frequency for this sample as interpolated from known B allele frequencies of 3 canonical clusters: 0, 0.5 and 1 if it is equal to or greater than the theta mean of the BB cluster. B Allele Freq is between 0 and 1, or set to NaN for loci categorized as intensity only.
Chr
Chromosome containing the SNP.
Position
SNP chromosomal position.
Note: Analyses on products with large numbers of loci (>1 Million) and large numbers of samples (>100) yield a large (50+ Gigabyte) Final Report that are difficult to download and review. It’s recommended to create analysis configurations that do not produce this report if large batches are desired.
For more information on interpreting DNA strand and allele information, see Illumina Knowledge article How to interpret DNA strand and allele information for Infinium genotyping array data.
DRAGEN Array Cloud produces a Locus Summary (locus_summary.csv) per analysis batch similar to the one available in GenomeStudio. It contains the following details per locus:
Locus_Name
Locus name from the manifest file.
Illumicode_Name
Locus ID from the manifest file.
#No_Calls
Number of loci with GenCall scores below the call region threshold.
#Calls
Number of loci with GenCall scores above the call region threshold.
Call_Freq
Call frequency or call rate calculated as follows: #Calls/(#No_Calls + #Calls)
A/A_Freq
Frequency of homozygote allele A calls.
A/B_Freq
Frequency of heterozygote calls.
B/B_Freq
Frequency of homozygote allele B calls.
Minor_Freq
Frequency of the minor allele.
Gentrain_Score
Quality score for samples clustered for this locus.
50%_GC_Score
50th percentile GenCall score for all samples.
10%_GC_Score
10th percentile GenCall score for all samples.
Het_Excess_Freq
Heterozygote excess frequency, calculated as (Observed -Expected)/Expected for the heterozygote class. If $f_{ab}$ is the heterozygote frequency observed at a locus, and p and q are the major and minor allele frequencies, then het excess calculation is the following: $(f_{ab} - 2pq)/(2pq + \varepsilon)$
ChiTest_P100
Hardy-Weinberg p-value estimate calculated using genotype frequency. The value is calculated with 1 degree of freedom and is normalized to 100 individuals.
Cluster_Sep
Cluster separation score.
AA_T_Mean
Normalized theta angles mean for the AA genotype.
AA_T_Std
Normalized theta angles standard deviation for the AA genotype.
AB_T_Mean
Normalized theta angles mean for the AB genotype.
AB_T_Std
Standard deviation of the normalized theta angles for the AB genotype.
BB_T_Mean
Normalized theta angles mean for the BB genotypes.
BB_T_Std
Standard deviation of the normalized theta angles for the BB genotypes.
AA_R_Mean
Normalized R value mean for the AA genotypes.
AA_R_Std
Standard deviation of the normalized R value for the AA genotypes.
AB_R_Mean
Normalized R value mean for the AB genotypes.
AB_R_Std
Standard deviation of the normalized R value for the AB genotypes.
BB_R_Mean
Normalized R value mean for the BB genotypes.
BB_R_Std
Standard deviation of the normalized R value for the BB genotypes.
Plus/Minus Strand
Designated "+" or "-" with respect to the reference genome strand. "U" designates unknown.
The sample summary contains per sample key stats for each sample in a batch that contains the following details per sample:
Sample ID
Sample Name
Sample Folder
The copy number batch summary file (cn_batch_summary.csv) shows the total copy number gain, loss, and neutral (CN=2) values for each target region across all the samples in the analysis.
Example copy number batch summary file content:
Target Region,Total CN gain,Total CN loss,Total CN neutral
CYP2A6.exon.1,0,1,47
CYP2A6.intron.7,0,1,47
CYP2D6.exon.9,2,4,42
CYP2D6.intron.2,7,2,39
CYP2D6.p5,13,2,33
CYP2E1,2,0,46
GSTM1,0,42,6
GSTT1,0,33,15
SULT1A1,0,0,48
UGT2B17,0,34,14
All Target Regions,24,119,337
The following scenarios result in a warning or error message:
Manifest file used to generate GTC is not the same as the manifest file used to generate the CN model.
FASTA files and FASTA index files do not match.
For the following scenarios, the software reports messages to the terminal output (as either a warning or an error):
Indel processing for GTC to VCF conversion failed.
The input folder does not contain the required input files.
An input file is corrupt.
Examples of such notifications can include the following:
Failed to normalize and gencall sample: {sample_id}, it will be skipped. Error: The given key '{loci_id}' was not present in the dictionary.
Warning
This generally occurs because of a mismatch between the manifest (bpm) and cluster file (egt) (i.e., the cluster file was generated via a different manifest). To remedy the issue, use the manifest and cluster files intended for use together.
Reference allele is not queried for locus: {identifier}
Warning
True reference allele does not match any alleles in the manifest. The error is common for MNVs and will be addressed in future versions of the software.
Skipping non-mapped locus: {identifier}
Warning
Locus has no chromosome position (usually 0) These loci may be used for quality purposes or CNV calling only.
Skipping intensity only locus: {identifier}
Warning
Similar to non-mapped loci, intensity only probes have applications outside creating variants for SNV VCFs such as CNV calling.
Skipping indel: {identifier}
Warning
Indel context (deletion/insertion) could not be determined.
Failed to process entry for record: {identifier}
Warning
Unable to determine reference allele for indel.
Incomplete match of source sequence to genome for indel: {identifier}
Warning
Indel not properly mapped to the reference genome.
Failed to combine genotypes due to ambiguity - exm1068284 (InfiniumII): TT, ilmnseq_rs1131690890_mnv (InfiniumII): AA, rs1131690890_mnv (InfiniumII): AA
Warning
Detailed information about a NoCall ("./.”) in the VCF as a result of combining multiple probes that assay the same variant with conflicting results. The example here is two probes with homozygous REF genotypes (AA) and one probe with homozygous ALT probe (TT)
Cluster file ({GTC.egt}) is not the same as CN Model Cluster file ({CN_Model.egt}).
Warning
Cluster file used to generated GTCs used for copy number calling is not the same as was used for the GTCs used during copy number training that created the input CN model. Though CNV model is robust to minor cluster file updates, CNV training should be considered when there are significant updates in the cluster file. To remove the warning, copy number training needs to be re-run with the new GTCs generated via the new cluster file during genotyping, a different CN model with the expected cluster file needs to be used, or different GTCs should be used for copy number calling that were generated using the same cluster file as was used during the generation of the input CN model.
{numPassingSamples} sample(s) passed QC.
Requires at least {minPassingSamples} samples to proceed.
Error
CNV calling is batch dependent and requires a certain number of samples with high-quality to make accurate calls. More high-quality samples need to be added to analysis batch to resolve error.
Invalid manifest file path {manifestPath}
Error
Application could not find manifest file provided or user error.
Failed to load cluster file: {e.Message}
Error
Corrupted file or unsupported version.
The star allele JSON file is produced per sample. It contains the fields present in the star allele CSV file as well as additional meta data and annotations.
Fields included in the star allele JSON header are described below.
softwareVersion
DRAGEN Array software version, e.g. dragena 1.0.0.
genomeBuild
Genome build, e.g hg38.
starAlleleDatabaseSources
Public databases with versions used as the sources of the star allele definitions and population frequencies.
phenotypeDatabaseSources
Public databases with versions used as the sources of the star allele phenotypes.
mappingFile
The PGx database file used for the star allele calling.
pgxGuideline
The PGx guidelines used for metabolizer status/phenotype annotations, e.g. CPIC or DPWG
sampleId
Sentrix barcode and position of the sample.
locusAnnotations
The star allele call information.
Fields included in the star allele call (locusAnnotations) information are described below.
gene
The gene symbol.
callType
‘Star Allele’ or ‘Variant’ PGx calling type.
genotype
Most likely star allele or variant solution. If diploid, variant solutions have the format of Allele1/Allele2.
activityScore
Activity score annotation of the determined genotype of the gene determined based on public PGx guidelines CPIC or DPWG.
phenotypeDatabaseAnnotation
Metabolizer status and function annotations of the determined genotype of the gene based on lookup into public PGx guidelines CPIC or DPWG per user choice.
qualityScore
Quality score of the solution including the population frequency of PGx alleles. The score ranges from 0 to 1.
rawScore
Raw quality score of the solution without including the population frequency of PGx alleles. The score ranges from 0 to 1.
supportingVariants
All variants present in the array that support the star allele solution. The field provides an array (list) of supporting Variants.
Each supporting variant is listed with essential information extracted from the SNV VCF to assist with troubleshooting, including Chromosome (chrom), Location (pos), Reference allele (ref), Alternative allele (alt), Genotype (gt), GenCall score (gs), B-allele frequency (baf), the variant ID (id), and the associated star allele IDs (alleleIds).
candidateSolutions
The set of alternative star allele calling solutions, this is only relevant for genes of the ‘Star Allele’ call type.
missingVariantSites
All core variants that are not available (e.g. not on the array, or no calls in the SNV VCF) for star allele calling for this gene. For star alleles, the field provides an array (list) of variant "id" and impacted "alleleIds" pairs
allelesTested
Alleles that are covered by the star allele caller. The capability to call star alleles is also dependent on array content coverage and data quality. This field is defined by the array's content and will be the same across all samples.
Fields included in the candidateSolution section, only available for star allele call type, are described below.
rank
Rank of a single star allele solution for a gene. The top solution based on quality score is ranked as 1 with the alternative solutions ranked lower.
genotype
Star allele or variant solution. If diploid, variant solutions have the format of Allele1/Allele2.
activityScore
Activity score annotation of the determined genotype of the gene determined based on public PGx guidelines CPIC or DPWG.
phenotype
Metabolizer status and function annotations of the determined genotype of the gene based on lookup into public PGx guidelines CPIC or DPWG per user choice.
qualityScore
Quality score of the solution including the population frequency of PGx alleles. The score ranges from 0 to 1.
rawScore
Raw quality score of the solution without including the population frequency of PGx alleles. The score ranges from 0 to 1.
alleles
The composite alleles of the candidate genotype solution.
solutionLong
Long format solution for star alleles. The field has the following format: Structural Variant Type: Underlying Star allele.
An example of a long solution is: Complete: CYP2D64, Complete: CYP2D610, CYP2D668: CYP2D64 where there are two complete alleles that have CYP2D64 and CYP2D610 haplotypes and one CYP2D668 structural variant that has a CYP2D64 haplotype configuration.
supportingVariants
All variants present in the array that support the star allele solution. The field provides an array (list) of supporting Variants.
Each supporting variant is listed with essential information extracted from the SNV VCF to assist with troubleshooting, including Chromosome (chrom), Location (pos), Reference allele (ref), Alternative allele (alt), Genotype (gt), GenCall score (gs), B-allele frequency (baf), and the variant ID (id).
missingVariantSites
All variants not present in the array or not called in the SNV VCF file for the star allele solution. The field provides an array (list) of missing variants.
collapsedAlleles
Star alleles that cannot be distinguished from the solution star allele given the input array’s content. The field has the following format: Long Solution Star-Allele: (List of collapsed star alleles).
The most frequent star allele based on the population frequency of PGx alleles will be the star allele in the solution.
copyNumberRegions
Gene regions for the copy numbers listed in CopyNumberSolution.
copyNumberSolution
Estimated copy number for each gene region listed in CopyNumberRegions
Example of JSON file content:
The TBI (TABIX) index file is associated with the bgzipped VCF files. It allows for data line lookup in VCF files for quick data retrieval. The format is a tab-delimited genome index file developed by Samtools as part of the HTSlib utilities. For more information, visit the Samtools website.
The software produces a control probe output file ({BeadChipBarcode}_{Position}_ctrl.tsv.gz) per sample that includes the raw methylated and unmethylated values for each control probe.
Each control probe has an address, type, color channel, name, and probe ID. It also provides the raw signal for methylated green (MG), methylated red (MR), unmethylated green (UG) and unmethylated red (UR).
The file can help identify which probes are available on a given BeadChip.
The software produces a CG output file ({BeadChipBarcode}_{Position}_cgs.tsv.gz) per sample that includes beta values, m-values and detection p-values for each CG site.
Beta values measure methylation levels in a linear fashion for easy interpretation. Unmethylated probes are close to zero and methylated probes are close to 1.
M-values are a log transformed beta value which provides a more representative measure of methylation.
Detection p-values measure the likelihood that the signal is background noise. It is recommended that p-value >0.05 are excluded from analysis as they are likely background noise.
see High-throughput Infinium methylation array QC using DRAGEN Array Methylation QC software tech note for further detail on calculation of these metrics.
The software produces methylation sample QC summary in .xlsx and .tsv file formats (sample_qc_summary.xlsx and sample_qc_summary.tsv) per analysis batch, which provides per sample QC data for all samples in the batch.
The QC summary provides details on 21 controls metrics (see tables below), which are computed in same way as in the BeadArray Controls Reporter software from Illumina. In addition, it provides average red and green raw and normalized signals, time of scanning, proportion of probes passing, overall sample pass/fail status, and the failure codes for control metrics that did not pass. The sample pass status is defined as the passing of all 21 control metrics. The QC summary .xlsx file further highlights failing parameters for easy viewing.
The QC summary files contain the following fields:
Sentrix_ID
12-digit BeadChip Barcode associated with the sample.
Sentrix_Position
Row and column on the BeadChip ie R01C01
Sample_ID
Optional field that can be indicated using IDAT Sample Sheet
User Defined Meta Data
Optional field(s) that can be indicated using IDAT Sample Sheet. Any number of fields indicated will appear in this output file.
restoration
The default threshold is 0.
If using the FFPE DNA Restore Kit, the restoration control identifies success of the FFPE restoration chemistry. Change the threshold from 0 to 1 if the FFPE DNA Restore Kit was used.
The green channel intensity is higher than Background. Therefore, the metric provided is the Green Channel Intensity/Background.
staining_green
staining_red
Staining controls are used to examine the efficiency of the staining step in both the red and green channels. These controls are independent of the hybridization and extension step.
The green channel shows a higher signal for biotin staining when compared to biotin background, whereas the red channel shows higher signal for DNP staining when compared to DNP background.
The metric provided for green is the (Biotin High value)/ (Biotin Bkg) and the metric provided for red is (DNP High value)/(DNP Bkg value)
The default threshold is 5. This threshold can be increased on some scanners.
extension_green
extension_red
Extension controls test the extension efficiency of A, T, C, and G nucleotides from a hairpin probe, and are therefore sample independent.
In the green channel, the lowest intensity for C or G is always greater than the highest intensity for A or T.
The metric provided is the (lowest of the C or G intensity)/ (highest of A or T extension) for a single sample.
The default threshold is 5. This threshold can be increased on some scanners.
hybridization_high_medium
hybridization_medium_low
Hybridization controls test the overall performance of the Infinium Assay using synthetic targets instead of amplified DNA. These synthetic targets complement the sequence on the array, allowing the probe to extend on the synthetic target as a template. Synthetic targets are present in the Hybridization Buffer at 3 levels, monitoring the response from high-concentration (5 pM), medium concentration (1 pM), and low concentration (0.2 pM) targets. All bead type IDs result in signals with various intensities, corresponding to the concentrations of the initial synthetic targets.
The value for high concentration is always higher than medium and the value for medium concentration is always higher than low.
The metric provided is the value of high/medium and the value of medium/low.
The default thresholds are 1. Do not change the default threshold.
target_removal1
target_removal2
Target removal controls test the efficiency of the stripping step after the extension reaction. In contrast to allele-specific extension, the control oligos are extended using the probe sequence as a template. This process generates labeled targets. The probe sequences are designed such that extension from the probe does not occur. All target removal controls result in low signal compared to the hybridization controls, indicating that the targets were removed efficiently after extension. Target removal controls are present in the Hybridization Buffer.
The Background for the same sample is close to or larger than either control.
The metric provided is Background/Control Intensity.
The default threshold is 1. Do not change the default threshold; however, the offset correction can be changed.
bisulfite_conversion1_green
bisulfite_conversion1_background_green
bisulfite_conversion1_red
bisulfite_conversion1_background_red
These controls assess the efficiency of bisulfite conversion of the genomic DNA. The Infinium Methylation probes query a [C/T] polymorphism created by bisulfite conversion of non-CpG cytosines in the genome.
These controls use Infinium I probe design and allele-specific single base extension to monitor efficiency of bisulfite conversion. If the bisulfite conversion reaction was successful, the "C" (Converted) probes matches the converted sequence and get extended. If the sample has unconverted DNA, the "U" (Unconverted) probes get extended. There are no underlying C bases in the primer landing sites, except for the query site itself.
The calculation is done in both the green and red channels separately to provide 2 unique sets of values:
Green Channel
Lowest value of C1 or C2 / Highest value of U1 or U2. The default threshold is 1. This value can be increased for some scanners.
Background/(U1, or U2). The default threshold is 1. Do not change the default threshold; however, the offset correction can be changed.
Red Channel
Lowest value of C3, 4, or 5 / Highest value of U3, 4, or 5. The default threshold is 1. This value can be increased for some scanners.
Background /(Highest value of U4, U5, or U6). The default threshold is 1. Do not change the default threshold; however, the offset correction can be changed.
bisulfite_conversion2
bisulfite_conversion2_background
These controls assess the efficiency of bisulfite conversion of the genomic DNA. The Infinium Methylation probes query a [C/T] polymorphism created by bisulfite conversion of non-CpG cytosines in the genome.
These controls use Infinium II probe design and single base extension to monitor efficiency of bisulfite conversion. If the bisulfite conversion reaction was successful, the "A" base gets incorporated and the probe has intensity in the red channel. If the sample has unconverted DNA, the "G" base gets incorporated across the unconverted cytosine, and the probe has elevated signal in the green channel.
The calculation is done using both channels for 1 set of numbers returned.
The following metrics are provided:
(Lowest of red C 1, 2, 3, or 4) / (Highest of green C 1, 2, 3, or 4). The default threshold is 1. This value can be increased for some scanners.
Background/(Highest C1, C2, C3, or C4 green). The default threshold is 1. Do not change the default threshold; however, the offset correction can be changed.
specificity1_green
specificity1_red
Specificity controls are designed to monitor potential nonspecific primer extension for Infinium I and Infinium II assay probes. Specificity controls are designed against nonpolymorphic T sites.
These controls are designed to monitor allele-specific extension for Infinium I probes. The methylation status of a particular cytosine is carried out following bisulfite treatment of DNA by using query probes for unmethylated and methylated state of each CpG locus. In assay oligo design, the A/T match corresponds to the unmethylated status of the interrogated C, and G/C match corresponds to the methylated status of C. G/T mismatch controls check for nonspecific detection of methylation signal over unmethylated background. PM controls correspond to A/T perfect match and give high signal. MM controls correspond to G/T mismatch and give low signal.
The metrics provided are the ratio of the lowest PM/highest MM in each channel.
The default threshold is 1. Do not change the default threshold.
specificity2
specificity2_background
Specificity controls are designed to monitor potential nonspecific primer extension for Infinium I and Infinium II assay probes. Specificity controls are designed against nonpolymorphic T sites.
These controls are designed to monitor extension specificity for Infinium II probes and check for potential nonspecific detection of methylation signal over unmethylated background. Specificity II probes incorporate the "A" base across the nonpolymorphic T and have intensity in the Red channel. If there was nonspecific incorporation of the "G" base, the probe has elevated signal in the Green channel.
The following metrics are provided:
(Lowest intensity of S1, S2, or S3 red) / (Highest intensity of S1, S2, or S3 green). The default threshold is 1. Do not change the default threshold.
Background/(Highest intensity S1, S2, S3, or S4 green). The default threshold is 1. Do not change the default threshold; however, the offset correction can be changed.
nonpolymorphic_green
nonpolymorphic_red
Nonpolymorphic controls test the overall performance of the assay, from amplification to detection, by querying a particular base in a nonpolymorphic region of the genome. They let you compare assay performance across different samples. One nonpolymorphic control has been designed for each of the 4 nucleotides (A, T, C, and G).
In the green channel, the lowest intensity of C or G is always greater than the highest intensity of A or T.
The metric provided is the (lowest intensity for C or G) /(highest intensity for A or T) for a single sample.
The default threshold is 5. This value can be increased for some scanners.
avg_green_raw
avg_red_raw
Average green and red raw signal for the given sample.
avg_green_norm
avg_red_norm
Average green and red signal after dye bias correction and noob normalization for the given sample.
ScanTime
The date (MM/DD/YY) and time (HH:MM) that the sample was scanned by the iScan system.
NProbes
Number of probes on the BeadChip, including SNP and CG probes
NPassDetection
Number of probes on the BeadChip that passed detection p-value at the threshold defined.
prop_probes_passing
The proportion of probes passing defined as the number of probes passing detection p-value divided by the total number of probes on the BeadChip.
passQC
1 = sample passed all QC metrics for the thresholds defined
0 = sample did not pass all QC metrics for the thresholds defined
failCodes
The list of parameters that failed QC for the thresholds defined.
The control metrics in the QC summary files are calculated as following. The default value for background correction offset (x) of 3,000 can be modified and applies to all background calculations indicated with (bkg + x). Note that the table uses default thresholds for EPIC arrays as example, the default thresholds changes with the methylation arrays. See section Threshold Adjustment for additional details.
Control
Calculation
Additional Information
Restoration Green > bkg
If using the FFPE Restore kit, change the default threshold from 0 to 1.
bkg = Extension Green highest A or T intensity
Staining Green
Biotin High > Biotin Bkg
(High/Biotin Bkg) > 5
Staining Red
DNP High > DNP Bkg
(High/DNP Bkg) > 5
Extension Green Lowest CG/Highest AT
(C or G/A or T) > 5
Green channel—Lowest C or G intensity is used; highest A or T intensity is used.
Extension Red
Lowest AT/Highest CG
(A or T/C or G) > 5
Red channel—Lowest A or T intensity is used; highest C or G intensity is used.
Hybridization Green High > Medium > Low
(High/Med) > 1 (Med/Low) > 1
Target Removal Green ctrl 1 ≤ bkg
((bkg + x)/ctrl) > 1
bkg = Extension Green highest A or T intensity
Target Removal Green ctrl 2 ≤ bkg
((bkg + x)/ctrl) > 1
bkg = Extension Green highest A or T intensity
Bisulfite Conversion I Green
C1, 2 > U1, 2
(C/U) > 1
Lowest C intensity is used. Highest U intensity is used.
Bisulfite Conversion I Green
U ≤ bkg
For MSA arrays, the default is 0.5
Highest U intensity is used.
Green channel—bkg = Extension Green highest AT
Bisulfite Conversion I Red C3, 4, 5 > U3, 4, 5
(C/U) >1
Lowest C intensity is used. Highest U intensity is used.
Bisulfite Conversion I Red U ≤ bkg
For MSA arrays, the default is 0.5
Highest U intensity is used.
Red Channel—bkg = Extension Red highest CG
Bisulfite Conversion II C Red > C Green
For MSA arrays, the default is 0.5
Lowest C Red intensity is used. Highest C Green intensity is used.
Bisulfite Conversion II C green ≤ bkg
For MSA arrays, the default is 0.5
Highest C Green intensity is used.
Green channel—bkg = Extension Green highest AT
Specificity I Green PM > MM
(PM/MM) > 1
Lowest PM intensity is used. Highest MM intensity is used
Specificity I Red PM > MM
(PM/MM) > 1
Lowest PM intensity is used. Highest MM intensity is used
Specificity II
S Red > S Green
(S Red/ S Green) > 1
Lowest S Red intensity is used. Highest S Green intensity is used.
Specificity II
S Green ≤ bkg
((bkg + x)/ S green) > 1
bkg = Extension Green highest A or T intensity
Highest S Green intensity is used.
Nonpolymorphic Green Lowest CG/ Highest AT
Lowest C or G intensity is used; highest A or T intensity is used
For MSA arrays, the default threshold is 2.5
Nonpolymorphic Red Lowest AT/ Highest CG
Lowest A or T intensity is used; highest C or G intensity is used
For MSA arrays, the default threshold is 3
The software produces methylation sample QC summary plots (sample_qc_summary.pdf) per analysis batch which provides visual depictions of two QC summary plots for quick visual review.
The file contains the following control plots:
Proportion of Probes Passing Threshold
Histogram of the proportion of probes passing the p-value detection threshold. Samples passing QC are shown in one color, and samples failing QC are shown in another color.
Principal Component Analysis (PCA)
Uses beta values for all analytical probes to compare samples. Principal component analysis (PCA) is applied to the beta values to reduce the dimensionality of the data to two “principal components” that reflect the most variation across samples. If more than 100 samples are used in the analysis, a random subset of 10,000 probes are used for the PCA analysis to reduce computational burden. PCA control plot assigns unique colors to each sample group defined by the IDAT Sample Sheet. If no groups were assigned, all samples will appear the same color. Sample groups may cluster together and can be used to explain some of the variation. Coordinates used to plot each sample in the PCA control plot are provided in the pcs.tsv.gz output file (see below).
The software produces a methylation principal component summary file (pcs.tsv.gz) per analysis batch which provides principal component data for each sample within the batch. This can be used to identify the specific samples associated with points on the PCA control plot within the Methylation Sample QC Control Plots output file.
The files contain the following fields:
blank
BeadChip Barcode and Position ie 123456789101_R01C01
principal component 1
The variable of the first axis for the Principal Component Analysis
principal component 2
The variable of the second axis for the Principal Component Analysis
Sample_Group
Sample group defined by the user in the IDAT Sample Sheet. If no sample group was defined, all samples will show NA.
The software produces two methylation manifest files
Manifest in Sesame format (probes.csv)
Additional information for control probes (controls.csv)
The probes.csv file has the following columns:
Probe_ID
This is a unique identifier for each probe. It corresponds to the IlmnID column in the standard Illumina manifest format or ctl_[AddressA_ID] for control probes.
U
This is corresponds to the AddressA_ID column in the standard Illumina manifest format.
M
This corresponds to the AddressB_ID column in the standard Illumina manifest format.
col
This is the color channel for Infinium I probes (R/G). For Infinium I probes, this column will be NA.
The controls.csv file has the following columns:
Address
The address of the probe
Type
The control probe type
Color_Channel
A color used to denote certain control probes in legacy software
Name
A human readable identifier for certain control probes
Probe_ID
This is a unique identifier for each probe. It corresponds to the IlmnID column in the standard Illumina manifest format or ctl_[AddressA_ID] for control probes.
The following scenarios result in a warning or error message:
Missing IDATs or manifest
Incorrect sample sheet formatting
Duplicate BeadChip Barcode and Position within the sample sheet
Missing control or assay probes
Missing required columns in the manifest
Unable to compute certain metrics
Examples of such notifications can include the following:
Log
Error
Type
Cause
write_samplesheet.log
No IDATs found
Error
No IDATs provided for analysis
format_samplesheet.log
No samples in sample sheet
Error
No samples in user’s sample sheet input
format_samplesheet.log
Sample sheet not correctly formatted
Error
Sample sheet is not in CSV format or header lines do not start with “<”
format_samplesheet.log
beadChipName and sampleSectionName columns are required for the sample sheet.
Error
Sample sheet does not contain required columns: beadChipName and sampleSectionName.
format_samplesheet.log
Warning: <Number> samples have duplicate Sample_ID
Warning
X lines in the sample sheet have duplicate <beadChipName>_<sampleSectionName>. Duplicates are dropped from analysis.
convert_manifest_ilmn_sesame.log
Missing control probes in manifest
Error
Missing “[Controls]” line in CSV manifest
convert_manifest_ilmn_sesame.log
Probe section not found
Error
Missing “[Assay]” line in CSV manifest
convert_manifest_ilmn_sesame.log
Missing required columns: IlmnID, AddressA_ID, AddressB_ID, Color_Channel
Error
Missing one of required columns in Assay section of manifest
convert_manifest_ilmn_sesame.log
Controls not formatted correctly. Must have 4 columns (Address,Type,Color_Channel,Name)
Error
Missing one of required columns in Control section of manifest
run_sesame_gs.log
Missing sample: <Sample_ID>
Error
Missing idats for a particular sample
run_sesame_gs.log
No scan time available
Warning
No scan time in idat
run_sesame_gs.log
Prep failed
Error
Dye bias correction or noob failure for sample
run_sesame_gs.log
Warning: missing control probe types <Missing probes>
Warning
Missing control probe types to compute a BACR metric. Metric will be set to NA.
run_sesame_gs.log
Warning: missing control probe names <Missing probe types>
Warning
Missing control probes to compute a BACR metric. Metric will be set to NA.
qc.log
No features, skipping PCA plot
Warning
No common betas found in all samples. This may occur if a sample has no signal intensity in the IDAT files.
DRAGEN Array Cloud utilizes the user-friendly graphical interface of BaseSpace Sequence Hub to simplify DRAGEN Array analysis setup and kickoff. Optional integration with the iScan System allows data to be streamed directly from the instrument to the cloud platform. Analysis data is stored on the Illumina Connected Platform providing secure storage for both microarray and sequencing data.
The following prerequisites are needed to get started with DRAGEN Array Cloud:
Illumina Connected Analytics subscription: An ICA Basic, Professional or Enterprise subscription can be used which include access to BaseSpace Sequence Hub. Follow the Illumina Software Registration Guide to register the software.
Workgroup setup: Workgroups must be created before login. Using a workgroup allows all members of the workgroup to share access to resources, analyses, and data. Learn more about managing a Workgroup.
Designating a workgroup as ‘Collaborative’ allows projects to be shared with collaborators or Illumina Tech Support to assist with troubleshooting. To create a collaborative workgroup, select the Enable collaborators outside of this domain checkbox during workgroup creation.
Software consumables: iCredits can be purchased for storage on the cloud platform and analysis pipelines with a compute charge. Per sample analysis can be purchased for relevant pipelines as listed in section Applications. Follow the Illumina Software Registration Guide (found under Example 3: Configuring the Software Consumables) to register the software consumables.
[Optional] iScan integration: The iScan System is integrated with Illumina Connected Platform and can send IDATs for further analysis. The iScan System must be running iScan Control Software version 4.2.1 or later.
EULA acceptance: Accept all necessary End User License Agreements in BaseSpace Sequence Hub before scanning begins.
Internet connection: For uploading product files or IDATs, a network connection 1 GbE or faster is recommended.
Note: Accessioning BeadChips before scanning and starting analysis is no longer a required step and has been automated within the system.
Before beginning analysis, ensure workgroup context is being used so analysis can be viewed by all members of your workgroup. The name of your workgroup should appear in the top right corner.
Use the following steps to run the Microarray Analysis Setup on BaseSpace Sequence Hub:
Select the Runs tab
Select New Run
Select Microarray Analysis Setup
Enter the Analysis Name (Figure 1)
Use the Select Project link to choose the project for your output files To select an existing project, click the radio button next to the desired project name. You can also create a project by clicking the New button in the project selection window.
Select the Type of Analysis Further detail of each Type of Analysis is available in section Applications. Note: For PGx CNV calling, it is recommended that 96 or more samples passing LogRDev <= 0.2 are included in the analysis. For PGx star allele calling, it is recommended to QC the samples and review the samples that have Log R Dev > 0.2, call rate < 0.99, or TGA Control probe < 1.0 to assess the reliability of the analysis. These metrics are provided in the genotyping sample summary file (gt_sample_summary.csv).
(Optional) Create a custom configuration via the "Add Custom Configuration" option in Configuration Settings. Custom configurations must be assigned a name and product files can be uploaded or selected (Figure 2). Custom configuration options vary by Type of Analysis including:
DRAGEN Array - Genotyping provides flexibility for turning off/on specific output files and adjusting GenCall score cutoff. Its recommended to turn off VCF output for non-human species and Final Report output for large sample numbers.
DRAGEN Array - Methylation - QC provides options to adjust thresholds as detailed in section DRAGEN Array Methylation QC Threshold Adjustment.
Select your preferred option in the Configuration Settings drop-down menu Configuration setup will vary based on the Type of Analysis selected. More details are available in section Applications.
Select Next
Select either Import Sample Sheet, Select BeadChips, or Import IDAT Files (Figure 3)
Import Sample Sheet presents a link to upload sample sheet. Users may download a template sample sheet by selecting the Download Template link.
Select BeadChips allows users to select BeadChips from the displayed list of available BeadChips. If selecting specific samples within the BeadChip is desired the Import Sample Sheet option should be used.
Import IDAT Files allows users to upload the IDAT files from a local folder to the cloud platform for use with the current and future analyses by users within the same workgroup.
Select Launch Analysis
On the Analyses tab, view the analysis status, e.g., initializing or complete.
After the analysis is complete, select the analysis and select the Files tab.
From the Files tab, select the Output folder.
The data management tab allows you to view and manage all your scanned IDAT files in the cloud. Before viewing, ensure workgroup context is being used so all data available your workgroup can be seen. The name of your workgroup should appear in the top right corner. For more information, see BaseSpace Data Management
When using DRAGEN Array – Methylation – QC cloud analysis type, additional customization options will appear after product files are selected within Configuration Settings. Adjustments to these thresholds will be saved as part of the Configuration Setting. Thresholds can be adjusted based on study objectives. Adjusting thresholds will impact the pass or fail status of samples in the output files.
Illumina recommends thresholds for MethylationEPIC v1 & v2 and Methylation Screening Array (MSA). Users may use these thresholds as a starting point when defining thresholds for their custom or semi-custom BeadChip or other Infinium Methylation arrays. Further tuning may be required based on BeadChip used, laboratory conditions, iScan settings, bisulfite conversion methods, FPPE sample type, etc. A dataset deemed acceptable to the user based on proportion probes passing can be used for these additional threshold adjustments.
To customize thresholds, use the toggle to allow additional thresholds to be displayed and adjust as desired by typing in a numeric value or using the arrows to adjust up or down. Further detail of these thresholds including calculation method can be found in the Methylation Sample QC Summary Files section.
The recommended thresholds are pre-set within the software for MethylationEPIC and Methylation Screening Array with the following values:
0
0
StainingGreen
5
5
StainingRed
5
5
ExtensionGreen
5
5
ExtensionRed
5
5
HybridizationHighMedium
1
1
HybridizationMediumLow
1
1
TargetRemoval1
1
1
TargetRemoval2
1
1
BisulfiteConversion1Green
1
1
BisulfiteConversion1BackgroundGreen
0.5
1
BisulfiteConversion1Red
1
1
BisulfiteConversion1BackgroundRed
0.5
1
BisulfiteConversion2
0.5
1
BisulfiteConversion2Background
0.5
1
Specificity1Green
1
1
Specificity1Red
1
1
Specificity2
1
1
Specificity2Background
1
1
NonpolymorphicGreen
2.5
5
NonpolymorphicRed
3
5
BgCorrectionOffset
3000
3000
PvalThreshold
0.05
0.05
The first 21 rows in the tables correspond to the 21 control metrics used in the methylation sample QC. See section Methylation Sample QC Summary Files for details.
DRAGEN Array Methylation QC software provides automated methylation sample QC using assay control probes on the Infinium Methylation Arrays. Unlike the manual visual QC in GenomeStudio, DRAGEN Array ultilizes 21 numerical metrics defined based on the control probes and uses standard thresholds to determine pass/fail status of a sample. Unlike GenomeStuio, probe detection rate (proportion of probes passing at a given p-value threshold) is not utilized to determine sample pass/fail status in DRAGEN Array. For more information, see High-throughput Infinium methylation array QC using DRAGEN Array Methylation QC software tech note.
DRAGEN Array Methylation QC performs background normalization, dye bias correction, and detection p-value calculation differently in comparison to the GenomeStudio Methylation module, leading to differences in probe detection p-values and detection rates. For the GenomeStudio Methylation Module, non-cancer samples at standard DNA input typically have detection rate > 96%. The detection rates from DRAGEN Array Methylation QC are typically lower compared to GenomeStudio, because the detection p-value from DRAGEN Array is more stringent than that from the GenomeStudio Methylation Module. The table below shows example detection rates from the DRAGEN Array Methylation QC software from MSA (Methylation Screening Array) datasets.
A
86%
93%
220
B
61%
83%
951
C
63%
85%
34
D
77%
85%
22
Note that only samples passing QC are included and all samples are at or above 50ng DNA input. Detection p-value threshold 0.05.
The firewall protects the iScan control computer by filtering incoming traffic to remove potential threats. The firewall is enabled by default to block all inbound connections. Keep the firewall enabled and allow outbound connections.
For the instrument to connect to BaseSpace Sequence Hub, you will need to add regional platform endpoints and instrument specific endpoints to the allow list on your firewall. Regional endpoints and further detail can be found in Security and Networking for Illumina instrument control computers.
The following table shows the applicable endpoints for the iScan.
ica.illumina.com
Required
Send IDAT files to ICA
o.ss2.us
Required
Certificate authorization
ocsp.digicert.com
Required
Certificate authorization
ocsp.pki.goog/gsr2
Required
Certificate authorization
ocsp.rootca1.amazontrust.com
Required
Certificate authorization
ocsp.rootg2.amazontrust.com
Required
Certificate authorization
ocsp.sca1b.amazontrust.com
Required
Certificate authorization
fonts.gstatic.com
Required
Display fonts
fonts.googleapis.com
Recommended
Display fonts
cdn.walkme.com
Recommended
Telemetry
cdn3.userzoom.com
Recommended
Telemetry
dpm.demdex.net
Recommended
Telemetry
illuminainc.demdex.net
Recommended
Telemetry
illuminainc.tt.omtrdc.net
Recommended
Telemetry
smetrics.illumina.com
Recommended
Telemetry
google.com
Recommended
Telemetry
google-analytics.com
Recommended
Telemetry
stats.g.doubleclick.net
Recommended
Telemetry
illumina.com
Optional
Access Illumina support material
Some notes on IDAT fail status: iScan will mark certain samples with a FAIL status if the registration quality is too poor for that particular section. Selected samples that are marked with FAIL status will be excluded from analysis and there would be no results for that sample, even though IDATs are generated. The registration quality can be found in the metrics.txt file.
Project sharing allows a user to share files with users outside the workgroup for collaboration or with Illumina Tech Support for troubleshooting. To share a project on BaseSpace Sequence Hub, first set the Workgroup type as ‘Collaborative’ during Workgroup setup, and then use the following steps to obtain a link to your project. The project can then be accessed by anyone with the link. All files in the project are shared.
Navigate to the Projects tab
Click the button next to the desired project
Select the Share button above to list (Figure 3)
Select the Get Link Option to Activate a link for the project
Copy the link and send it to the desired recipient(s)
Additional Notes:
The project owner maintains ownership and write access. If project owner deletes the data, the collaborators lose access to it.
Either sending or receiving domain must be collaborative
https://help.basespace.illumina.com/microarray/getting-started -> Workgroup Setup
Must be in the same AWS regional instance
https://help.basespace.illumina.com/manage-your-account/regions -> "Data cannot be transferred directly between instances, however you can download and share data separately."
For Enterprise domains, use this same method (share-by-link, not share-by-transfer)
https://help.basespace.illumina.com/collaborate/share-with-collaborators/share-by-link
September 2024
New EX PGx beadchips enabled for PGx analysis
Increased coverage of high priority PGx genes
Custom optimized .egt files accepted in PGx analysis
Up-to-date database reflecting latest versions of public PGx resources
DPWG guidelines now available for metabolizer status calling on cloud analysis
DRAGEN Array supports multiple PGx products
Two new EX PGx beadchips enabled through genotyping, PGx CNV calling, and star allele annotation
Infinium Global Screening Array with Enhanced PGx-48 v4.0 Kit
Infinium Global Clinical Research Array with Enhanced PGx-24 v1.0 Kit
In total 3 PGx products supported
GDA-ePGx
GDA_PGx-8v1-0_20042614_G2
38
GSAv4-ePGx
GSA-PGx-48v4-0_20079540_E2
38
GCRA-ePGx
GCRA-PGx-24v1-0_20084467_C2
38
Increased coverage of high priority PGx genes
Star allele annotation now covers CYP2E1, CYP1A2, ABCG2, CYP2C8, HMGCR, UGT1A4, UGT2B15, F13A1, and HLA-B*15:02
CNV calling now covers SULT1A1
Extended bi-allelic PGx variants from source databases to multi-allelic variants based on the designs in the supported PGx products.
See PGx Star Allele Coverage and PGx CNV Coverage for the full coverage lists.
Allows flexibility for GTCs generated with a custom cluster file (.egt) to be used with the commercial CN model file (.dat). This alleviates the burden to retrain the CN model file.
The cluster file is a required input for the genotype call command in DRAGEN Array. The CN (Copy Number) model file is a required input to the copy-number call command to enable accurate copy number calling for pharmacogenomics. Custom cluster files and CN model files may be required for optimal genotyping and PGx performance. See section Optimizing cluster files and copy number models for additional details.
Database revision reflecting PGx Allele Definitions and PGx Guidelines updates.
Standardization of star allele JSON output file
Renamed databaseSources to phenotypeDatabaseSources and starAlleleDatabaseSources
Renamed Phenotype to PhenotypeDatabaseAnnotation
Combined missingVariants and allMissingVariants to missingVariantSites
JSONized supportingVariants and missingVariants at the gene and candidate solution allele levels
Removed redundant info in the Alleles fields
Updated VCF tabix indexing, improving performance and disk usage for SNV VCF.
Some simple variants have REF and ALT delimited by _ instead of > in the star_alleles.csv and metabolizer status JSON files (e.g., "ryr1.38577931a_c" instead of "ryr1.38577931a>c")
Some multi-nucleotide variant (MNV) designs reverse compliment the "Allele1/2 Top" fields in the Final Report
Occasional star-allele solution score discorcordance between Linux and Windows OS with concordant solution ranking.
Rare intermittent memory issues during star allele calling. Example error message: The model has been changed since the solution was last computed.
. To workaround the issue, user should restart star allele calling or run it on a machine with more memory.
Star allele calling does not support novel alleles but those defined in the PharmVar and PharmGKB databases.
CYP2D6 non-*36 star alleles with exon 9 conversion, such as *83, are reported as *36 with *83 as an underlying allele.
Genotyping only supports diploid organisms. Polyploid genotyping is currently not supported.
DRAGEN Array were only validated and intended to be used for commercial PGx beadchips with specified manifests (see table above). PGx star allele annotation is not backwards compatable with v1.0 manifest version, e.g., GDA_PGx-8v1-0_20042614_E2 is supported in DRAGEN Array v1.0, GDA_PGx-8v1-0_20042614_G2 is supported in DRAGEN Array v1.1.
Command line options unsquash-duplicates
and filter-loci
for gtc-to-vcf
conversion should not be used when star allele calling is desired. In addition, VCFs must be gzipped and tabix indexed (the default for gtc-to-vcf
) to be used in star allele calling.
For support, questions, and feedback on DRAGEN Array, please contact Illumina Tech Support at techsupport@illumina.com.
Product features and benefits and allows product ordering.
Support site for DRAGEN Array which includes installers and product documentation.
Illumina Software Resources article with technical details on DRAGEN Array v1.0 Methylation QC.
Illumina Software Resources article with technical details on DRAGEN Array v1.0 PGx analysis.
Lab setup and maintenance information for Infinium assays.
List of consumables and equipment used in Infinium assays.
Instructions for operating and maintaining the iScan System.
Instructions for using the Polygenic Risk Score – Predict Module.
Instructions for using the hosted environment Illumina Connected Analytics.
Instructions for using the hosted environment BaseSpace Sequence Hub.
DRAGEN Array star allele calling leverages the star allele definitions provided by PharmVar and PharmGKB. DRAGEN Array star allele phenotype annotation, using the “star-allele annotate” command, is achieved through direct lookup into public PGx guidelines CPIC or DPWG, which is selected by the user when running DRAGEN Array.
See table below for details of the data sources.
DRAGEN Array “star-allele annotate” command provides both metabolizer status and activity score annotations for genes covered by the CPIC and DPWG guidelines.
Specifically, CPIC metabolizer/phenotype annotations are supported for CACNA1S, CYP2B6, CYP2C19, CYP2C9, CYP2D6, CYP3A5, DPYD, G6PD, MT-RNR1, NUDT15, RYR1, SLCO1B1, TPMT, UGT1A1, CFTR, IFNL3/IFNL4 and VKORC1, among them activity scores are supported for CYP2C9, CYP2D6, and DPYD. DPWG metabolizer/phenotype annotations are supported for CYP1A2, CYP2B6, CYP2C19, CYP2C9, CYP2D6, CYP3A4, CYP3A5, DPYD, NUDT15, SLCO1B1, TPMT, UGT1A1, VKORC1 and F5, among them activity scores are supported for CYP2D6 and DPYD.
DRAGEN Array PGx extends any single allele variant definitions obtained from PharmVar or PharmGKB that have multiple alleles in Illumina's product files to include all alleles of the Multi Allelic Variant (MAV). The table below shows the MAVs that were extended in the DRAGEN Array Database to cover all alleles for that MAV that are in the product files. Allele Name describes the allele that was added to the database.
With the changes of reference genomes, the definition for a star allele sometimes need to be updated accordingly.
Mediterranean Haplotype
and Mediterranean, Dallas, Panama, Sassari, Cagliari, Birmingham
are defined by two variants rs5030868 and rs2230037. In genome build GRCh37, Mediterranean Haplotype
is defined by rs2230037 G>A and rs5030868 G>A, and Mediterranean, Dallas, Panama, Sassari, Cagliari, Birmingham
is defined by rs5030868 G>A, with rs2230037 reference allele G.
In genome build GRCh38, Mediterranean Haplotype
is defined by rs5030868 G>A, with rs2230037 reference allele A, and Mediterranean, Dallas, Panama, Sassari, Cagliari, Birmingham
is defined by rs2230037 A>G and rs5030868 G>A.
Variant rs2230037 is ignored in all other G6PD alleles except in the two Mediterranean alleles.
The PGx genes and star/variant alleles listed below can be detected by DRAGEN Array v1.1 if available on the microarray. Known and novel star alleles not in the below list will not be reported. Star allele definitions are sourced from PharmVar and PharmGKB.
Among the PGx genes, HLA-A, HLA-B, and IFNL3/IFNL4 alleles are covered through tagging variants, specifically HLA-A,*31:01 (rs1061235.A>T); HLA-B,*15:02 (rs144012689.T>A); HLA-B,*57:01 (rs2395029.T>G); HLA-B,*58:01 (rs9263726.G>A); IFNL3/4, rs12979860 variant (T). Reliability of the tagging SNPs varies depending on the population. Additional information on PGx gene types, variant type versus star allele type, can be found here:
PGx star alleles can only be called when the related variants in the star allele definition are present in a PGx product. An auxiliary file ([Product]_GS_import.txt) is provided for each product with the PGx variants and associated star alleles. The product files pages that contain the auxiliary files are listed in the table below.
APOE: GSAv4-ePGx and GCRA-ePGx do not support calling E2 and E4 due to the lack of functional probes for rs7412 and rs429358.
CYP2A6: GDA-ePGx does not support *5 due to lack of coverage for *5 core variants.
CYP4F2: for all three products
*1 and *2 are not distinguishable due to the lack of probes for rs30193105. Samples with *2 will be called as *1.
*3 and *4 are not distinguishable due to the lack of probes for rs30193105, while *3 core variant rs2108622 is covered by all three products. Samples with *4 will be called as *3.
UGT1A1: *28 (rs8175347 [TA]8) and *37 (rs8175347 [TA]9) are not covered in all three PGx products due to the lack of functional probes.
UGT2B15: GSAv4-ePGx and GCRA-ePGx do not support *4 or *5 due to the lack of probes for rs4148269 and rs1902023.
During DRAGEN Array star allele calling, poorly performing PGx variants are masked and treated as "No Calls". Star alleles that are solely defined by the masked variants will NOT be called by DRAGEN Array. The tables below provide the variants that are masked per product with each row represents a single variant. The Variant_ID matches the ID field of the corresponding SNV VCF entry of the PGx product.
Any human Infinium genotyping array including custom and semi-custom to create a SNV VCF output. Illumina provides required to map to the reference genome for human, genome build 37 and 38. DRAGEN Array Cloud offers additional output formats including Locus Summary and Final Report which are applicable for Infinium arrays for human and non-human species.
•
• [may be pre-setup on cloud]
• [may be pre-setup on cloud]
• [pre-setup on cloud]
• [optional on cloud and local]
•
• [optional on cloud and local]
• [optional on cloud and local]
•
• [cloud only]
• [cloud only]
•
Local: No cost download from .
Cloud: to analyze and store data as needed.
Check Product & Analysis Compatibility here
See for further detail.
•
• [may be pre-setup on cloud]
• [may be pre-setup on cloud]
• [pre-setup on cloud]
• [pre-setup on cloud]
• [optional on cloud and local]
•
• [optional on local]
• [optional on local]
•
• [optional on local]
•
•
•
•
Local: No cost download from .
Cloud: to analyze and store data as needed.
Check Product & Analysis Compatibility here
See for further detail.
•
• [may be pre-setup on cloud]
• [may be pre-setup on cloud]
• [pre-setup on cloud]
• [pre-setup on cloud]
• [pre-setup on cloud]
• [optional on cloud and local]
•
• [optional on local]
• [optional on local]
•
• [optional on local]
•
•
•
•
•
•
Cloud: Per sample analysis. to store data as needed.
Visit the to learn more.
Select Type of Analysis DRAGEN Array – Methylation – QC from the dropdown. Adjust customizable thresholds as desired. Further detail can be found in Additional information for . A maximum of 1152 samples are supported, when sample sheet is used.
• [from iScan instrument] • [may be pre-setup on cloud] • [optional on cloud]
Per sample: • • Per analysis batch: • • • • •
Cloud: to analyze and store data as needed.
(Green/(bkg+x))>
((bkg + x)/U) >
((bkg + x)/U) >
(C Red/ C Green) >
((bkg + x)/C Green) >
(C or G/ A or T) >
(A or T/ C or G) >
DRAGEN Array - Star allele annotation provides an option to change the default metabolizer status database used from CPIC to DPWG.
Instructions on how to use the auxiliary file can be found here: .
PharmVar
6.1
https://www.pharmvar.org
PharmGKB
Snapshot-2024.05.16
https://www.pharmgkb.org/
UGT Alleles Nomenclature
2010.12.21
https://www.pharmacogenomics.pha.ulaval.ca/ugt-alleles-nomenclature/
Human Cytochrome P450 (CYP) Allele Nomenclature Database Legacy Content
July 2024
https://www.pharmvar.org/htdocs/archive/index_original.htm
CPIC guidelines
1.38.0
https://cpicpgx.org/guidelines/
https://github.com/cpicpgx/cpic-data/
DPWG guidelines
June 2023
https://www.pharmgkb.org/page/dpwgMapping
CACNA1S.rs1800559
rs1800559.C>A
NC_000001.11:g.201060815C>A
CFTR.rs113993958
rs113993958.G>A
NC_000007.14:g.117530953G>A
CFTR.rs113993958
rs113993958.G>T
NC_000007.14:g.117530953G>T
CFTR.rs11971167
rs11971167.G>T
NC_000007.14:g.117642528G>T
CFTR.rs121908755
rs121908755.G>T
NC_000007.14:g.117587800G>T
CFTR.rs121909005
rs121909005.T>C
NC_000007.14:g.117587801T>C
CFTR.rs121909020
rs121909020.G>C
NC_000007.14:g.117611640G>C
CFTR.rs150212784
rs150212784.T>C
NC_000007.14:g.117611595T>C
CFTR.rs193922525
rs193922525.G>C
NC_000007.14:g.117664770G>C
CFTR.rs267606723
rs267606723.G>T
NC_000007.14:g.117642451G>T
CFTR.rs397508288
rs397508288.A>C
NC_000007.14:g.117590409A>C
CFTR.rs397508759
rs397508759.G>T
NC_000007.14:g.117534363G>T
CFTR.rs74551128
rs74551128.C>T
NC_000007.14:g.117548795C>T
CFTR.rs75039782
rs75039782.C>G
NC_000007.14:g.117639961C>G
CFTR.rs77834169
rs77834169.C>A
NC_000007.14:g.117530974C>A
CFTR.rs77834169
rs77834169.C>G
NC_000007.14:g.117530974C>G
CFTR.rs77932196
rs77932196.G>C
NC_000007.14:g.117540270G>C
CFTR.rs77932196
rs77932196.G>T
NC_000007.14:g.117540270G>T
CFTR.rs78655421
rs78655421.G>C
NC_000007.14:g.117530975G>C
CFTR.rs78655421
rs78655421.G>T
NC_000007.14:g.117530975G>T
COMT.rs13306278
rs13306278.C>G
NC_000022.11:g.19941504C>G
DPYD.rs114096998
rs114096998.2.G>C
NC_000001.11:g.97078987G>C
DPYD.rs140602333
rs140602333.G>T
NC_000001.11:g.97573919G>T
DPYD.rs142619737
rs142619737.C>G
NC_000001.11:g.97515851C>G
DPYD.rs143154602
rs143154602.G>T
NC_000001.11:g.97593289G>T
DPYD.rs145548112
rs145548112.C>A
NC_000001.11:g.97306195C>A
DPYD.rs190951787
rs190951787.G>T
NC_000001.11:g.97515889G>T
DPYD.rs200687447
rs200687447.2.C>A
NC_000001.11:g.97193209C>A
DPYD.rs3918289
rs3918289.G>A
NC_000001.11:g.97450059G>A
DPYD.rs3918290
rs3918290.C>G
NC_000001.11:g.97450058C>G
DPYD.rs6670886
rs6670886.C>A
NC_000001.11:g.97699506C>A
DPYD.rs72549304
rs72549304.G>C
NC_000001.11:g.97549609G>C
DPYD.rs72549304
rs72549304.G>T
NC_000001.11:g.97549609G>T
DPYD.rs748620513
rs748620513.C>A
NC_000001.11:g.97573799C>A
DPYD.rs748639205
rs748639205.A>G
NC_000001.11:g.97082415A>G
DPYD.rs760663364
rs760663364.G>C
NC_000001.11:g.97515928G>C
DPYD.rs777425216
rs777425216.C>A
NC_000001.11:g.97515815C>A
RYR1.38499667G>A
NC_000019.10:g.38499667G>T
NC_000019.10:g.38499667G>T
RYR1.rs118192116
rs118192116.C>T
NC_000019.10:g.38451850C>T
RYR1.rs118192151
rs118192151.G>C
NC_000019.10:g.38584974G>C
RYR1.rs118204423
rs118204423.G>A
NC_000019.10:g.38457539G>A
RYR1.rs142474192
rs142474192.G>T
NC_000019.10:g.38443790G>T
RYR1.rs143988412
rs143988412.A>G
NC_000019.10:g.38580066A>G
RYR1.rs1801086
rs1801086.G>T
NC_000019.10:g.38446710G>T
RYR1.rs186983396
rs186983396.C>G
NC_000019.10:g.38442434C>G
RYR1.rs193922762
rs193922762.C>A
NC_000019.10:g.38448673C>A
RYR1.rs193922767
rs193922767.G>A
NC_000019.10:g.38452996G>A
RYR1.rs193922772
rs193922772.G>A
NC_000019.10:g.38457546G>A
RYR1.rs193922826
rs193922826.C>G
NC_000019.10:g.38504319C>G
RYR1.rs193922838
rs193922838.G>A
NC_000019.10:g.38529036G>A
RYR1.rs193922842
rs193922842.C>T
NC_000019.10:g.38543821C>T
RYR1.rs370634440
rs370634440.G>T
NC_000019.10:g.38463499G>T
GDA-ePGx
GDAePGx_G2_GS_import.txt
GSAv4-ePGx
GSAePGx_E2_GS_import.txt
GCRA-ePGx
GCRAePGx_E2_GS_import.txt
GDA_PGx-8v1-0_20042614_G
CYP1A2
ilmnseq_rs35694136_ilmnfwd;ilmnseq_rs35694136_ilmnfwd_ilmndup1;ilmnseq_rs35694136_ilmnfwd_ilmndup2;ilmnseq_rs35694136_ilmnfwd_ilmndup3;ilmnseq_rs35694136_ilmnfwd_ilmndup4;ilmnseq_rs35694136_ilmnfwd_ilmndup5;ilmnseq_rs35694136_ilmnfwd_ilmndup6;ilmnseq_rs35694136_ilmnfwd_ilmndup7
GDA_PGx-8v1-0_20042614_G
CYP2D6
ilmnseq_rs72549352_ilmnrev_F2BTindel_deg3a3b3_IlmnRep;ilmnseq_rs72549352_ilmnrev_F2BTindel_deg3a3b3_ilmndup1;ilmnseq_rs72549352_ilmnrev_F2BTindel_deg3a3b3_ilmndup3;ilmnseq_rs72549352_ilmnrev_F2BTindel_ilmndup1;ilmnseq_rs72549352_ilmnrev_F2BTindel_ilmndup3
GDA_PGx-8v1-0_20042614_G
CYP4F2
ilmnseq_rs4020346_ilmnfwd
GDA_PGx-8v1-0_20042614_G
UGT1A1
ilmnseq_rs8175347_ilmnfwd_F2BTindel;ilmnseq_rs8175347_ilmnfwd_F2BTindel_ilmndup1;ilmnseq_rs8175347_ilmnrev;ilmnseq_rs8175347_ilmnrev_ilmndup1;ilmnseq_rs8175347_ilmnrev_ilmndup2;ilmnseq_rs8175347_ilmnrev_ilmndup3
GSA-PGx-48v4-0_20079540_E
CYP1A2
IlmnSeq_rs35694136_IlmnFWD;ilmnseq_rs35694136_ilmnfwd_ilmndup2;ilmnseq_rs35694136_ilmnfwd_ilmndup3;ilmnseq_rs35694136_ilmnfwd_ilmndup4;ilmnseq_rs35694136_ilmnfwd_ilmndup5;ilmnseq_rs35694136_ilmnfwd_ilmndup6;ilmnseq_rs35694136_ilmnfwd_ilmndup7
GSA-PGx-48v4-0_20079540_E
CYP2C19
IlmnSeq_rs367543002,ilmnseq_rs367543002_ilmnfwd,ilmnseq_rs367543002_ilmnfwd_ilmndup1,ilmnseq_rs367543002_ilmnrev_deg3a1b0_ilmndup1,rs367543002
GSA-PGx-48v4-0_20079540_E
CYP2C19
ilmnseq_rs17882687_ilmnfwd_ilmndup2,ilmnseq_rs17882687_ilmnrev,ilmnseq_rs17882687_ilmnrev_ilmndup1,ilmnseq_rs17882687_ilmnrev_ilmndup2
GSA-PGx-48v4-0_20079540_E
CYP2C19
IlmnSeq_rs113934938,ilmnseq_rs113934938_ilmnfwd,ilmnseq_rs113934938_ilmnfwd_ilmndup1,ilmnseq_rs113934938_ilmnfwd_ilmndup2,rs113934938
GSA-PGx-48v4-0_20079540_E
CYP2C9
10:96701973,ilmnseq_rs774607211_ilmnfwd_ilmndup1,ilmnseq_rs774607211_ilmnfwd_ilmndup2
GSA-PGx-48v4-0_20079540_E
CYP2D6
ilmnseq_rs1135836_ilmnrev_deg3a3b0
GSA-PGx-48v4-0_20079540_E
CYP2D6
PGX_IlmnSeq_rs769157652_BEST,ilmnseq_rs769157652_ilmnrev_F2BT,ilmnseq_rs769157652_ilmnrev_deg3a1b0
GSA-PGx-48v4-0_20079540_E
CYP4F2
ilmnseq_rs4020346_ilmnfwd
GSA-PGx-48v4-0_20079540_E
OPRM1
ilmnseq_rs9384179.1_F2BT
GSA-PGx-48v4-0_20079540_E
UGT1A1
ilmnseq_rs8175347.2_ilmnrev_F2BTindel_cei_ilmndup31
GCRA-PGx-24v1-0_20084467_C
COMT
ilmnseq_rs7287550_ilmnfwd_F2BT
GCRA-PGx-24v1-0_20084467_C
CYP1A2
IlmnSeq_rs35694136;IlmnSeq_rs35694136_IlmnFWD;ilmnseq_rs35694136_ilmnfwd_ilmndup2;ilmnseq_rs35694136_ilmnfwd_ilmndup5;ilmnseq_rs35694136_ilmnfwd_ilmndup6;rs35694136
GCRA-PGx-24v1-0_20084467_C
CYP2C19
ilmnseq_rs367543002_ilmnfwd
GCRA-PGx-24v1-0_20084467_C
CYP2C19
ilmnseq_rs17882687_ilmnfwd_ilmndup2,ilmnseq_rs17882687_ilmnrev_ilmndup1
GCRA-PGx-24v1-0_20084467_C
CYP2C19
IlmnSeq_rs113934938,ilmnseq_rs113934938_ilmnfwd,ilmnseq_rs113934938_ilmnfwd_ilmndup1,ilmnseq_rs113934938_ilmnfwd_ilmndup2,rs113934938
GCRA-PGx-24v1-0_20084467_C
CYP2C9
ilmnseq_rs774607211_ilmnrev,ilmnseq_rs774607211_ilmnrev_ilmndup2
GCRA-PGx-24v1-0_20084467_C
CYP2D6
ilmnseq_rs2004511_dup1
GCRA-PGx-24v1-0_20084467_C
CYP2D6
PGX_IlmnSeq_rs769157652_BEST,ilmnseq_rs769157652_ilmnrev,ilmnseq_rs769157652_ilmnrev_deg3a1b0,ilmnseq_rs769157652_ilmnrev_deg3a1b0_ilmndup1,ilmnseq_rs769157652_ilmnrev_ilmndup1,seq-rs61737947
GCRA-PGx-24v1-0_20084467_C
CYP4F2
ilmnseq_rs4020346_ilmnfwd
GCRA-PGx-24v1-0_20084467_C
OPRM1
ilmnseq_rs9384179.1_F2BT
ABCG2
Reference;rs2231142.G>T
ADH1B
Reference;rs1229984.T>C;rs1229984.T>G;rs1229985.A>G;rs17033.T>C;rs1789891.C>A;rs2018417.C>A;rs2018417.C>T;rs2066702.G>A;rs75967634.C>T
ALDH2
Reference;rs671.G>A
ANK3
Reference;rs143414470.T>C
ANKK1
Reference;rs1800497.G>A;rs2587550.G>A;rs2734849.A>C;rs2734849.A>G;rs4938013.A>C;rs4938013.A>G;rs4938013.A>T;rs7118900.G>A;rs7118900.G>C
APOE
E2;E3;E4
ATM
Reference;rs11212570.G>A;rs11212570.G>T;rs11212617.C>A;rs1801516.G>A;rs620815.T>A;rs620815.T>C
BDNF
Reference;rs10835210.C>A;rs10835210.C>G;rs11030101.A>G;rs11030101.A>T;rs11030104.A>G;rs11030118.G>A;rs11030119.G>A;rs11030119.G>T;rs1491850.T>C;rs16917234.T>A;rs16917234.T>C;rs1967554.A>C;rs2030324.A>G;rs61888800.G>T;rs6265.C>T;rs7103411.C>T;rs7124442.C>G;rs7124442.C>T;rs7127507.T>C;rs7934165.G>A;rs962369.T>C;rs988748.C>G
CACNA1C
Reference;rs1006737.G>A;rs1034936.C>A;rs1034936.C>G;rs1034936.C>T;rs1051375.G>A;rs1051375.G>C;rs10774053.A>C;rs10774053.A>G;rs10848635.T>A;rs10848635.T>C;rs11062040.C>T;rs12813888.A>C;rs12813888.A>T;rs2041135.T>C;rs215976.C>G;rs215976.C>T;rs215994.T>C;rs216008.C>T;rs216013.A>G;rs2238032.T>C;rs2238032.T>G;rs2238087.C>G;rs2238087.C>T;rs2239050.G>A;rs2239050.G>C;rs2239128.T>A;rs2239128.T>C;rs2283271.T>A;rs723672.C>A;rs723672.C>G;rs723672.C>T;rs7295250.T>C;rs7316246.G>A;rs7316246.G>C;rs758723.T>A;rs758723.T>C
CACNA1S
Reference;rs1800559.C>A;rs1800559.C>T;rs772226819.G>A
CFTR
Reference;rs113993958.G>A;rs113993958.G>C;rs113993958.G>T;rs115545701.C>T;rs11971167.G>A;rs11971167.G>T;rs121908752.T>G;rs121908753.G>A;rs121908755.G>A;rs121908755.G>T;rs121908757.A>C;rs121909005.T>C;rs121909005.T>G;rs121909013.G>A;rs121909020.G>A;rs121909020.G>C;rs121909041.T>C;rs141033578.C>T;rs150212784.T>C;rs150212784.T>G;rs186045772.T>A;rs193922525.G>A;rs193922525.G>C;rs200321110.G>A;rs202179988.C>T;rs267606723.G>A;rs267606723.G>T;rs368505753.C>T;rs397508256.G>A;rs397508288.A>C;rs397508288.A>G;rs397508387.G>T;rs397508442.C>T;rs397508513.A>C;rs397508537.C>A;rs397508759.G>A;rs397508759.G>T;rs397508761.A>G;rs74503330.G>A;rs74551128.C>A;rs74551128.C>T;rs75039782.C>G;rs75039782.C>T;rs75527207.G>A;rs75541969.G>C;rs76151804.A>G;rs77834169.C>A;rs77834169.C>G;rs77834169.C>T;rs77932196.G>A;rs77932196.G>C;rs77932196.G>T;rs78655421.G>A;rs78655421.G>C;rs78655421.G>T;rs78769542.G>A;rs80224560.G>A;rs80282562.G>A
COMT
Reference;rs13306278.C>T;rs165599.G>A;rs165599.G>C;rs165722.C>T;rs165728.C>A;rs165728.C>G;rs165728.C>T;rs165774.G>A;rs174675.T>C;rs174696.C>A;rs174696.C>T;rs174699.C>T;rs2020917.C>T;rs2075507.G>A;rs2075507.G>C;rs2075507.G>T;rs2239393.A>G;rs4633.C>T;rs4646312.T>C;rs4646316.C>G;rs4646316.C>T;rs4680.G>A;rs4818.C>G;rs4818.C>T;rs5746849.A>G;rs5993882.T>C;rs5993882.T>G;rs5993883.T>G;rs6267.G>A;rs6267.G>T;rs6269.A>G;rs6269.A>T;rs7287550.T>C;rs7287550.T>G;rs737865.A>G;rs737866.T>A;rs737866.T>C;rs740603.A>G;rs9332377.C>A;rs9332377.C>T;rs933271.T>A;rs933271.T>C;rs9606186.C>A;rs9606186.C>G;rs9606186.C>T
CYP1A2
*10;*11;*12;*13;*14;*15;*16;*17;*18;*19;*1A;*1B;*1C;*1D;*1E;*1F;*1G;*1J;*1K;*1L;*1M;*1N;*1P;*1Q;*1R;*1S;*1T;*1U;*1V;*2;*20;*21;*3;*4;*5;*6;*7;*8;*9
CYP2A6
*1;*10;*11;*12;*13;*14;*15;*16;*17;*18;*19;*1x2;*2;*20;*21;*22;*23;*24;*25;*26;*27;*28;*31;*34;*35;*36;*37;*38;*39;*4;*40;*41;*42;*43;*44;*45;*46;*48;*49;*5;*50;*51;*52;*53;*54;*55;*56;*6;*7;*8;*9
CYP2B6
*1;*10;*11;*12;*13;*14;*15;*17;*18;*19;*2;*20;*21;*22;*23;*24;*25;*26;*27;*28;*3;*31;*32;*33;*34;*35;*36;*37;*38;*39;*4;*40;*41;*42;*43;*44;*45;*46;*47;*48;*49;*5;*6;*7;*8;*9
CYP2C19
*1;*10;*11;*12;*13;*14;*15;*16;*17;*18;*19;*2;*22;*23;*24;*25;*26;*28;*29;*3;*30;*31;*32;*33;*34;*35;*38;*39;*4;*5;*6;*7;*8;*9
CYP2C8
*1;*10;*11;*12;*13;*14;*15;*16;*17;*18;*2;*3;*4;*5;*6;*7;*8;*9
CYP2C9
*1;*10;*11;*12;*13;*14;*15;*16;*17;*18;*19;*2;*20;*21;*22;*23;*24;*25;*26;*27;*28;*29;*3;*30;*31;*32;*33;*34;*35;*36;*37;*38;*39;*4;*40;*41;*42;*43;*44;*45;*46;*47;*48;*49;*5;*50;*51;*52;*53;*54;*55;*56;*57;*58;*59;*6;*60;*61;*62;*63;*64;*65;*66;*67;*68;*69;*7;*70;*71;*72;*73;*74;*75;*76;*77;*78;*79;*8;*80;*81;*82;*83;*84;*85;*9
CYP2D6
*1;*1-*90;*10;*100;*101;*102;*103;*104;*105;*106;*107;*108;*109;*10x2;*11;*110;*111;*112;*113;*114;*115;*116;*117;*118;*119;*12;*120;*121;*122;*123;*124;*125;*126;*127;*128;*129;*13;*13-*1;*13-*2;*13-*4-*68;*130;*131;*132;*133;*134;*135;*136;*137;*138;*139;*13x2-*1;*13x2-*2;*14;*140;*141;*142;*143;*144;*145;*146;*147;*148;*149;*15;*150;*151;*152;*153;*154;*155;*156;*157;*158;*159;*160;*161;*162;*163;*164;*165;*166;*167;*168;*169;*17;*170;*171;*172;*17x2;*18;*19;*1x2;*2;*20;*21;*22;*23;*24;*25;*26;*27;*28;*29;*29x2;*2x2;*3;*30;*31;*32;*33;*34;*35;*35x2;*36;*36;*36-*10;*36-*10x2;*36x2-*10;*36x3-*10;*37;*38;*39;*4;*40;*41;*42;*43;*43x2;*44;*45;*46;*47;*48;*49;*4M;*4N-*4;*4x2;*5;*50;*51;*52;*53;*54;*55;*56;*58;*59;*6;*60;*62;*64;*65;*68;*68-*4;*69;*7;*70;*71;*72;*73;*74;*75;*8;*81;*82;*83;*84;*85;*86;*87;*88;*89;*9;*90;*91;*92;*93;*94;*95;*96;*97;*98;*99;*9x2
CYP2E1
*1A;*1B;*2;*3;*4;*5A;*5B;*6;*7A;*7B;*7C
CYP3A4
*1;*10;*11;*12;*13;*14;*15;*16;*17;*18;*19;*2;*20;*21;*22;*23;*24;*26;*28;*29;*3;*30;*31;*32;*33;*34;*35;*37;*38;*39;*4;*40;*41;*42;*43;*44;*45;*46;*47;*48;*5;*6;*7;*8;*9
CYP3A5
*1;*3;*6;*7;*8;*9
CYP4F2
*1;*10;*11;*12;*13;*14;*15;*17;*2;*3;*4;*5;*6;*7;*8;*9
DPYD
Reference;rs111858276.T>C;rs112766203.1.G>A;rs112766203.2.G>C;rs114096998.1.G>T;rs114096998.2.G>A;rs114096998.2.G>C;rs115232898.T>C;rs116364703.T>A;rs1180771326.T>C;rs137878450.C>A;rs137999090.C>T;rs138391898.C>T;rs138545885.C>A;rs138616379.C>T;rs139459586.A>C;rs139834141.C>T;rs140039091.C>G;rs140114515.C>T;rs140602333.G>A;rs140602333.G>T;rs140989814.C>G;rs141044036.T>C;rs141439344.C>T;rs141462178.T>C;rs141726921.C>T;rs142512579.C>T;rs142619737.C>G;rs142619737.C>T;rs143154602.G>A;rs143154602.G>T;rs143815742.1.C>A;rs143815742.2.C>T;rs143879757.1.G>T;rs143879757.2.G>A;rs143986398.G>C;rs144395748.1.G>C;rs144395748.2.G>T;rs144935781.T>C;rs145112791.G>A;rs145529148.T>C;rs145548112.C>A;rs145548112.C>T;rs145773863.C>T;rs146356975.T>C;rs146529561.G>A;rs147545709.G>A;rs147601618.A>G;rs148799944.C>G;rs148994843.C>T;rs150036960.G>C;rs150385342.1.C>T;rs150385342.2.C>A;rs150437414.A>G;rs151074666.C>T;rs17376848.A>G;rs1801158.C>T;rs1801159.T>C;rs1801160.C>T;rs1801265.A>G;rs1801266.G>A;rs1801267.C>T;rs1801268.C>A;rs183105782.A>G;rs183385770.C>T;rs186169810.A>C;rs187713395.A>G;rs188052243.T>C;rs190577302.G>C;rs190951787.G>C;rs190951787.G>T;rs199549923.G>T;rs199634007.G>T;rs199646142.C>T;rs199777072.C>T;rs200064537.A>T;rs200296941.T>C;rs200562975.T>C;rs200643089.A>C;rs200687447.1.C>T;rs200687447.2.C>A;rs200687447.2.C>G;rs200693895.A>G;rs200709381.T>G;rs201018345.C>T;rs201035051.T>G;rs201268750.G>T;rs201433243.C>T;rs201615754.1.C>A;rs201615754.2.C>T;rs201648613.C>G;rs201785202.G>A;rs202144771.G>A;rs202212118.C>A;rs2297595.T>C;rs267598785.G>A;rs267598786.C>T;rs267598789.G>A;rs367619008.T>C;rs368146607.T>G;rs368152149.T>C;rs368327291.C>G;rs368519011.T>C;rs368970772.G>T;rs369103276.A>G;rs369575517.G>A;rs370569731.1.C>G;rs370569731.2.C>T;rs370615432.C>A;rs370707404.A>G;rs371258350.C>T;rs371313778.C>T;rs371587702.1.G>A;rs371587702.2.G>C;rs371792178.1.G>A;rs371792178.2.G>C;rs372058915.T>C;rs372307932.A>T;rs372909322.T>C;rs374527058.A>G;rs374531732.C>T;rs374825099.1.G>T;rs374825099.2.G>C;rs374827081.G>C;rs375436137.C>T;rs375990187.A>G;rs376073289.1.C>T;rs376073289.2.C>A;rs376128878.G>T;rs376273539.G>C;rs377143350.C>T;rs377169736.C>G;rs3918289.G>A;rs3918289.G>C;rs3918290.C>G;rs3918290.C>T;rs45589337.T>C;rs527580106.T>C;rs528152707.C>A;rs528430685.G>A;rs528768620.C>T;rs529019871.T>C;rs532341730.A>T;rs536577604.T>C;rs538336580.T>A;rs538703919.G>A;rs547099198.G>A;rs548783838.C>T;rs55674432.C>A;rs556933127.A>C;rs557220418.G>A;rs558354142.G>A;rs55886062.1.A>C;rs55886062.2.A>T;rs559427764.C>A;rs55971861.T>G;rs56005131.G>T;rs56038477.C>T;rs568169006.T>C;rs568367673.C>A;rs569661196.A>G;rs570122671.G>A;rs571114616.A>G;rs573299212.C>T;rs575763449.G>A;rs575853463.C>T;rs576409484.T>A;rs57918000.G>A;rs59086055.G>A;rs60139309.T>C;rs60511679.A>C;rs61622928.C>T;rs61757362.G>A;rs6670886.C>A;rs6670886.C>T;rs672601273.1.C>A;rs672601273.2.C>T;rs672601275.T>G;rs672601276.C>A;rs672601282.G>A;rs672601284.C>T;rs672601285.T>C;rs672601287.T>G;rs672601288.C>A;rs67376798.T>A;rs72547601.T>C;rs72547602.T>A;rs72549303.del;rs72549304.G>A;rs72549304.G>C;rs72549304.G>T;rs72549305.T>C;rs72549306.1.C>A;rs72549306.2.C>T;rs72549307.T>C;rs72549308.T>G;rs72549309.ATGA[1];rs72549310.G>A;rs72975710.1.G>A;rs72975710.2.G>C;rs745512069.G>A;rs745704371.G>C;rs745833535.T>C;rs745911874.C>T;rs745982505.1.T>C;rs745982505.2.T>A;rs746115989.C>T;rs746329786.T>A;rs746777181.C>T;rs747132274.C>G;rs747161261.C>T;rs747627716.A>C;rs747633945.C>T;rs747858350.G>A;rs747872037.C>A;rs748214188.A>T;rs748235192.1.T>A;rs748235192.2.T>C;rs748266854.G>A;rs748320430.A>C;rs748620513.C>A;rs748620513.C>G;rs748639205.A>C;rs748639205.A>G;rs748853941.T>C;rs748958293.G>A;rs748974194.G>A;rs749157068.C>A;rs749269410.C>T;rs749354734.A>T;rs749586100.T>A;rs749699298.A>C;rs749982106.G>A;rs750147471.T>C;rs75017182.G>C;rs750224169.G>A;rs750423752.A>C;rs750687600.C>T;rs750721736.A>T;rs751049055.C>A;rs751104498.T>C;rs751113340.G>A;rs751190912.G>A;rs751340819.A>G;rs751374989.T>A;rs751399062.G>T;rs751841116.1.C>A;rs751841116.2.C>T;rs751848058.T>A;rs752020412.C>T;rs752228747.G>A;rs752388408.C>T;rs752518145.C>A;rs752985272.C>A;rs753166888.C>G;rs753217888.G>C;rs753296078.C>G;rs753419296.C>G;rs753527420.C>G;rs753707032.G>A;rs753710779.G>A;rs753820482.T>C;rs753950237.G>A;rs754028972.A>G;rs754125729.1.G>A;rs754125729.2.G>T;rs754467630.G>A;rs754786483.T>C;rs755155824.C>A;rs755407188.T>G;rs755416212.C>T;rs755428442.C>G;rs755645831.A>C;rs755692084.T>G;rs755729055.T>C;rs756020314.G>C;rs756372042.A>G;rs756613407.T>C;rs756684474.T>C;rs756890859.T>C;rs756992995.C>T;rs757155354.T>C;rs757227327.C>T;rs757342874.C>T;rs757376267.C>A;rs757695236.C>T;rs757954074.C>T;rs757958938.T>C;rs757994597.G>A;rs758154803.A>G;rs758489611.C>T;rs758514990.C>T;rs758649719.C>T;rs758699471.T>C;rs759082282.C>A;rs759249769.G>T;rs759424419.A>T;rs759479759.T>C;rs759562628.T>G;rs759766897.T>C;rs759967863.A>G;rs760038956.C>T;rs760222167.T>C;rs760235888.C>T;rs760485592.G>A;rs760553268.G>C;rs760570391.A>G;rs760663364.G>A;rs760663364.G>C;rs761302217.T>C;rs761351410.G>A;rs761479700.G>C;rs761555670.T>C;rs761609256.T>G;rs762083671.T>A;rs762102298.A>C;rs762198241.G>A;rs762430779.G>T;rs762446803.A>C;rs762468894.G>C;rs762523739.T>A;rs762533012.C>T;rs762598766.T>C;rs762779297.T>C;rs762858106.C>T;rs762911226.T>A;rs763008163.T>G;rs763061658.A>G;rs763449831.C>T;rs763506271.T>C;rs763557204.A>G;rs763572567.T>G;rs763623595.A>C;rs763784786.G>C;rs763862486.C>T;rs763893877.T>C;rs763984510.G>C;rs764111543.C>T;rs764270260.G>A;rs764555085.A>G;rs764635955.G>T;rs764666241.C>A;rs764679468.A>C;rs764945792.C>T;rs765001324.C>T;rs765034707.C>A;rs765075551.T>C;rs765131182.G>A;rs765247038.G>A;rs765309287.G>T;rs765465250.T>C;rs765640386.C>A;rs765990958.G>A;rs766411970.A>C;rs766438205.T>C;rs766635900.C>T;rs766700777.C>G;rs766761199.T>G;rs766833304.G>C;rs766885021.A>C;rs767200577.T>C;rs767376585.C>G;rs767437717.G>T;rs767464878.C>A;rs767468952.C>T;rs767482279.A>G;rs767547827.G>C;rs767818267.C>T;rs767836989.T>C;rs767986711.T>G;rs768117152.T>C;rs768157853.G>C;rs768200107.T>G;rs768288280.T>C;rs768501828.T>C;rs768507975.A>T;rs768680499.G>T;rs768915005.C>T;rs769190350.T>A;rs769306962.C>T;rs769466648.1.T>G;rs769466648.2.T>C;rs769514867.G>T;rs769696395.T>C;rs769709846.T>C;rs769820114.C>T;rs769847078.T>C;rs769932607.G>A;rs770229152.T>A;rs770566506.A>G;rs770958862.G>A;rs771194906.A>G;rs771534236.T>C;rs771536388.C>T;rs771573678.T>A;rs771646887.C>T;rs771648776.T>C;rs771885007.A>G;rs771930534.1.A>T;rs771930534.2.A>G;rs772097379.G>A;rs772264512.G>A;rs772320654.T>C;rs772358811.C>G;rs772544099.G>T;rs772826416.A>G;rs772906420.C>T;rs773159364.C>G;rs773407491.T>C;rs773584401.C>A;rs773652644.T>C;rs773815814.1.C>A;rs773815814.2.C>T;rs773868825.C>T;rs773983635.A>T;rs774134971.T>C;rs774500505.A>T;rs774579695.1.C>T;rs774799003.G>A;rs774883578.A>C;rs775494607.G>A;rs775526810.C>A;rs775570841.G>C;rs775601164.G>A;rs775926386.G>C;rs776082092.C>T;rs776236081.C>T;rs776289153.C>T;rs776321529.G>C;rs776662759.T>G;rs776973423.C>T;rs776984091.T>C;rs777220476.1.C>T;rs777220476.2.C>A;rs777238016.T>C;rs777347164.C>T;rs777368221.A>C;rs777425216.C>A;rs777425216.C>T;rs777560627.G>A;rs777673186.G>C;rs777902288.T>A;rs778022685.C>T;rs778054451.C>T;rs778141885.T>C;rs778298325.C>T;rs778601245.C>T;rs778754188.A>G;rs778760295.C>G;rs778776264.T>C;rs778867644.T>C;rs778911905.A>C;rs779465366.A>G;rs779557503.G>A;rs779573574.T>A;rs779728902.A>T;rs779925747.T>G;rs779967271.T>C;rs780025995.G>A;rs780047918.T>C;rs780120302.T>C;rs78060119.C>A;rs780813130.C>T;rs780873985.T>C;rs780885126.T>C;rs781184141.T>C;rs80081766.C>T;rs866110709.C>T;rs866869468.C>A;rs867143119.C>A;rs867226255.C>T;rs867232786.C>T;rs867600987.C>T;rs868047175.C>T;rs868235016.C>T
DRD2
Reference;rs1076560.C>A;rs1076560.C>G;rs1076563.A>C;rs1079596.C>A;rs1079596.C>T;rs1079597.C>T;rs1079598.A>G;rs1079598.A>T;rs1110976.T>G;rs11214607.T>G;rs1124491.G>A;rs1124491.G>C;rs1124493.T>G;rs1125394.T>C;rs12364283.A>G;rs12574471.C>G;rs12574471.C>T;rs17601612.G>C;rs1799732._113475530insG;rs1799732.dup;rs1799978.T>C;rs1800497.G>A;rs1800498.G>A;rs1801028.G>C;rs2075652.G>A;rs2234689.G>C;rs2283265.C>A;rs2440390.T>C;rs2514218.C>T;rs2587548.G>A;rs2587548.G>C;rs2734833.G>A;rs2734841.A>C;rs2734841.A>G;rs2734841.A>T;rs2734842.G>C;rs4274224.G>A;rs4274224.G>C;rs4436578.C>G;rs4436578.C>T;rs4460839.C>G;rs4460839.C>T;rs4648317.G>A;rs4648318.T>A;rs4648318.T>C;rs4648318.T>G;rs4936274.A>G;rs4936274.A>T;rs6275.A>G;rs6277.G>A;rs6279.G>C;rs7122246.G>A;rs7131056.A>C;rs7131056.A>G;rs7131440.C>T
F13A1
Reference;rs5985.C>A;rs5985.C>T
F2
Reference;rs1799963.G>A;rs3136516.G>A;rs5896.C>G;rs5896.C>T
F5
Reference;rs6025.C>T
FKBP5
Reference;rs1360780.T>A;rs1360780.T>C;rs17614642.T>C;rs3800373.C>A;rs3800373.C>G;rs4713916.A>C;rs4713916.A>G;rs4713916.A>T;rs73748206.C>T;rs9380524.C>A;rs9380524.C>T
G6PD
202G>A_376A>G_1264C>G;A;A- 202A_376G;A- 680T_376G;A- 968C_376G;Aachen;Abeno;Acrokorinthos;Alhambra;Amazonia;Amiens;Amsterdam;Anadia;Ananindeua;Andalus;Arakawa;Asahi;Asahikawa;Aures;Aveiro;B (reference);Bajo Maumere;Bangkok;Bangkok Noi;Bao Loc;Bari;Belem;Beverly Hills, Genova, Iwate, Niigata, Yamaguchi;Brighton;Buenos Aires;Cairo;Calvo Mackenna;Campinas;Canton, Taiwan-Hakka, Gifu-like, Agrigento-like;Cassano;Chatham;Chikugo;Chinese-1;Chinese-5;Cincinnati;Cleveland Corum;Clinic;Coimbra Shunde;Cosenza;Costanzo;Covao do Lobo;Crispim;Dagua;Durham;Farroupilha;Figuera da Foz;Flores;Fukaya;Fushan;Gaohe;Georgia;Gidra;Gond;Guadalajara;Guangzhou;Haikou;Hammersmith;Harilaou;Harima;Hartford;Hechi;Hermoupolis;Honiara;Ierapetra;Ilesha;Insuli;Iowa, Walter Reed, Springfield;Iwatsuki;Japan, Shinagawa;Kaiping, Anant, Dhon, Sapporo-like, Wosera;Kalyan-Kerala, Jamnaga, Rohini;Kambos;Kamiube, Keelung;Kamogawa;Kawasaki;Kozukata;Krakow;La Jolla;Lages;Lagosanto;Laibin;Lille;Liuzhou;Loma Linda;Ludhiana;Lynwood;Madrid;Mahidol;Malaga;Manhattan;Mediterranean Haplotype;Mediterranean, Dallas, Panama, Sassari, Cagliari, Birmingham;Metaponto;Mexico City;Miaoli;Minnesota, Marion, Gastonia, LeJeune;Mira d'Aire;Mizushima;Montalbano;Montpellier;Mt Sinai;Munich;Murcia Oristano;Musashino;Namouru;Nankang;Nanning;Naone;Nara;Nashville, Anaheim, Portici;Neapolis;Nice;Nilgiri;No name;North Dallas;Olomouc;Omiya;Orissa;Osaka;Palestrina;Papua;Partenope;Pawnee;Pedoplis-Ckaro;Piotrkow;Plymouth;Praha;Puerto Limon;Quing Yan;Radlowo;Rehevot;Rignano;Riley;Riverside;Roubaix;S. Antioco;Salerno Pyrgos;Santa Maria;Santiago;Santiago de Cuba, Morioka;Sao Borja;Seattle, Lodi, Modena, Ferrara II, Athens-like;Seoul;Serres;Shenzen;Shinshu;Sibari;Sierra Leone;Sinnai;Songklanagarind;Split;Stonybrook;Sugao;Sumare;Sunderland;Surabaya;Suwalki;Swansea;Taipei, Chinese-3;Telti, Kobe;Tenri;Tokyo, Fukushima;Toledo;Tomah;Tondela;Torun;Tsukui;Ube Konan;Union,Maewo, Chinese-2, Kalo;Urayasu;Utrecht;Valladolid;Vancouver;Vanua Lava;Viangchan, Jammu;Villeurbanne;Volendam;Wayne;West Virginia;Wexham;Wisconsin;Yunan
GRIK1
Reference;rs2832407.C>A;rs2832407.C>T
GRIK4
Reference;rs12800734.G>A;rs1954787.T>C
GRIN2B
Reference;rs1019385.C>A;rs1072388.G>A;rs1072388.G>C;rs1806191.G>A;rs1806191.G>T;rs1806201.G>A;rs2058878.T>A;rs2058878.T>C;rs2160733.A>C;rs2160734.C>G;rs2160734.C>T;rs2284411.C>T;rs890.A>C;rs890.A>G
HLA-A
*31:01;Reference
HLA-B
*15:02;*57:01;*58:01;Reference
HMGCR
Reference;rs10474433.T>C;rs10474433.T>G;rs12654264.A>T;rs17238540.T>G;rs17244841.A>T;rs17671591.C>T;rs3846662.A>G;rs3846662.A>T
HTR2A
Reference;rs17288723.T>C;rs17289304.T>C;rs17289304.T>G;rs1928040.G>A;rs1928040.G>C;rs2274639.C>G;rs2274639.C>T;rs2770296.C>G;rs2770296.C>T;rs3742278.A>G;rs3803189.T>G;rs6305.G>A;rs6311.C>A;rs6311.C>T;rs6312.C>A;rs6312.C>G;rs6312.C>T;rs6313.G>A;rs6313.G>C;rs6314.G>A;rs659734.G>A;rs659734.G>C;rs659734.G>T;rs7997012.A>C;rs7997012.A>G;rs7997012.A>T;rs9316233.C>A;rs9316233.C>G;rs9316233.C>T;rs9567746.A>C;rs9567746.A>G
HTR2C
Reference;rs1023574.C>G;rs1023574.C>T;rs12836771.A>G;rs1414334.C>G;rs2497538.A>C;rs3813928.G>A;rs3813929.C>G;rs3813929.C>T;rs498207.G>A;rs518147.C>A;rs518147.C>G;rs539748.C>T;rs6318.C>G;rs6318.C>T;rs9698290.T>A;rs9698290.T>C
IFNL3/4
Reference;rs12979860 variant (T)
IL6
Reference;rs10242595.G>A;rs10242595.G>C;rs10242595.G>T;rs10499563.T>C;rs1524107.C>G;rs1524107.C>T;rs1800795.C>G;rs1800795.C>T;rs1800796.G>A;rs1800796.G>C;rs1800797.A>C;rs1800797.A>G;rs1800797.A>T;rs2066992.G>A;rs2066992.G>C;rs2066992.G>T;rs2069835.T>C;rs2069837.A>C;rs2069837.A>G;rs2069840.C>G
ITGB3
Reference;rs11871251.G>A;rs11871251.G>C;rs2317676.A>G;rs3785873.G>A;rs3785873.G>T;rs58847127.G>A;rs58847127.G>C;rs58847127.G>T;rs5918.T>C;rs8069732.C>A;rs8069732.C>T
KIF6
Reference;rs20455.A>G;rs9462535.C>A;rs9462535.C>G;rs9462535.C>T;rs9471077.A>G
LPA
Reference;rs10455872.A>G;rs3798220.T>C
MT-RNR1
NC_012920.1:m.1520T>C;NC_012920.1:m.1537C>T;NC_012920.1:m.1556C>T;NC_012920.1:m.669T>C;NC_012920.1:m.747A>G;NC_012920.1:m.786G>A;NC_012920.1:m.807A>C;NC_012920.1:m.807A>G;NC_012920.1:m.839A>G;NC_012920.1:m.896A>G;NC_012920.1:m.930A>G;NC_012920.1:m.960delC;NC_012920.1:m.988G>A;Reference;rs1556422499.delT;rs200887992.G>A;rs267606617.A>G;rs267606618.T>C;rs267606619.C>T;rs28358569.A>G;rs28358571.T>C;rs28358572.T>C;rs3888511.T>G;rs56489998.A>G
MTHFR
Reference;rs1476413.C>G;rs1476413.C>T;rs17367504.A>G;rs17421511.G>A;rs1801131.T>G;rs1801133.G>A;rs1801133.G>C;rs2274976.C>T;rs3737967.G>A;rs4846051.G>A;rs4846051.G>C;rs4846051.G>T
NUDT15
*1;*10;*11;*12;*13;*14;*15;*16;*17;*18;*19;*2;*20;*3;*4;*5;*6;*7;*8;*9
OPRD1
Reference;rs1042114.G>C;rs1042114.G>T;rs10753331.G>A;rs10753331.G>T;rs12749204.A>G;rs204047.G>C;rs204047.G>T;rs204055.T>A;rs204055.T>C;rs204069.A>G;rs204076.T>A;rs204076.T>C;rs204076.T>G;rs2234918.C>G;rs2234918.C>T;rs2236855.C>A;rs2236855.C>G;rs2236857.T>C;rs2236861.G>A;rs2298895.A>T;rs2298896.T>G;rs2298897.C>G;rs3766951.T>C;rs419335.A>G;rs421300.A>C;rs421300.A>G;rs4654327.G>A;rs4654327.G>T;rs482387.G>A;rs482387.G>C;rs508448.A>G;rs529520.A>C;rs529520.A>G;rs533123.G>A;rs533123.G>C;rs569356.A>G;rs581111.A>C;rs581111.A>G;rs581111.A>T;rs6669447.T>C;rs678849.C>G;rs678849.C>T;rs680090.G>A;rs760589.G>A;rs797397.G>A
OPRK1
Reference;rs10111937.C>T;rs1051660.C>A;rs1051660.C>G;rs1051660.C>T;rs16918842.C>A;rs16918842.C>T;rs16918875.G>A;rs16918909.A>G;rs16918941.A>G;rs3802279.C>T;rs3802281.T>C;rs3808627.C>G;rs3808627.C>T;rs6473797.T>C;rs6473799.A>G;rs6985606.T>A;rs6985606.T>C;rs7016778.A>T;rs702764.T>C;rs702764.T>G;rs7813478.T>C;rs963549.C>T;rs997917.T>C
OPRM1
Reference;rs10457090.A>G;rs10457090.A>T;rs10485057.A>G;rs10485058.A>G;rs10485060.C>A;rs1074287.A>G;rs11575856.G>A;rs12190259.A>C;rs12205732.G>A;rs12209447.C>T;rs12210856.T>G;rs1294092.A>G;rs1319339.T>A;rs1319339.T>C;rs13195018.A>C;rs13195018.A>T;rs13203628.A>G;rs1323040.A>G;rs1323042.G>C;rs1323042.G>T;rs1381376.C>A;rs1381376.C>G;rs1381376.C>T;rs1461773.G>A;rs17174629.A>G;rs17174794.C>G;rs17174794.C>T;rs17174801.A>G;rs17180982.dup;rs17181352.A>G;rs1799971.A>G;rs1799972.C>A;rs1799972.C>G;rs1799972.C>T;rs1852629.T>A;rs1852629.T>C;rs1852629.T>G;rs2010884.G>A;rs2075572.G>C;rs2236256.C>A;rs2236257.G>C;rs2236258.C>G;rs2236258.C>T;rs2236259.T>A;rs2236259.T>C;rs2236259.T>G;rs2281617.C>G;rs2281617.C>T;rs3778148.G>T;rs3778150.T>C;rs3778151.T>C;rs3778152.A>G;rs3778156.A>G;rs3798676.C>T;rs3798677.A>G;rs3798678.A>C;rs3798678.A>G;rs3798683.G>A;rs3798688.G>T;rs3823010.G>A;rs483481.G>A;rs483481.G>C;rs4870266.G>A;rs495491.A>G;rs497976.G>A;rs497976.G>T;rs499796.A>G;rs506247.A>C;rs510769.C>T;rs511435.C>G;rs511435.C>T;rs518596.G>A;rs524731.C>A;rs527434.T>A;rs527434.T>C;rs538174.T>C;rs540825.A>C;rs540825.A>G;rs540825.A>T;rs544093.G>A;rs544093.G>T;rs548646.T>A;rs548646.T>C;rs548646.T>G;rs553202.C>T;rs558025.A>G;rs558948.C>G;rs558948.C>T;rs562859.C>A;rs562859.C>G;rs562859.C>T;rs563649.C>T;rs569284.A>C;rs583664.T>C;rs589046.C>T;rs598160.G>A;rs598160.G>C;rs598682.A>C;rs598682.A>G;rs598682.A>T;rs599548.G>A;rs606545.G>A;rs606545.G>C;rs609148.G>A;rs609148.G>T;rs609623.T>A;rs609623.T>C;rs610231.G>A;rs610231.G>C;rs613355.C>A;rs613355.C>G;rs613355.C>T;rs618207.A>C;rs618207.A>G;rs618207.A>T;rs62436463.C>T;rs62638690.G>T;rs632499.A>C;rs632499.A>G;rs632499.A>T;rs639855.C>A;rs639855.C>G;rs642489.G>A;rs642489.G>T;rs644261.G>A;rs644261.G>C;rs644261.G>T;rs645027.A>G;rs647192.G>A;rs647192.G>C;rs648007.A>C;rs648007.A>G;rs648893.A>G;rs650825.G>A;rs6557337.C>A;rs6557337.C>T;rs658156.A>C;rs658156.A>G;rs658156.A>T;rs671531.A>G;rs671531.A>T;rs675026.A>C;rs675026.A>G;rs677830.C>A;rs677830.C>G;rs677830.C>T;rs681243.T>A;rs681243.T>C;rs6902403.T>C;rs6912029.G>T;rs73576470.A>G;rs7748401.T>G;rs7763748.C>A;rs7763748.C>T;rs7776341.A>C;rs79910351.C>T;rs9282815.C>A;rs9282815.C>T;rs9322446.G>A;rs9322447.A>C;rs9322447.A>G;rs9322447.A>T;rs9322453.G>C;rs9371773.G>A;rs9371776.G>A;rs9384174.C>G;rs9384174.C>T;rs9384179.G>A;rs9384179.G>T;rs9397685.A>G;rs9397685.A>T;rs9397687.C>T;rs9479757.G>A;rs9479779.A>G
RYR1
NC_000019.10:g.38440818G>C;NC_000019.10:g.38444179C>A;NC_000019.10:g.38444252G>T;NC_000019.10:g.38444257A>C;NC_000019.10:g.38444257A>G;NC_000019.10:g.38448680_38448681insGGA;NC_000019.10:g.38448715G>A;NC_000019.10:g.38451785C>A;NC_000019.10:g.38452985C>T;NC_000019.10:g.38455253C>G;NC_000019.10:g.38455254T>C;NC_000019.10:g.38455347T>C;NC_000019.10:g.38455504G>T;NC_000019.10:g.38466392G>A;NC_000019.10:g.38469404A>C;NC_000019.10:g.38485679T>C;NC_000019.10:g.38486095A>G;NC_000019.10:g.38490642A>C;NC_000019.10:g.38494454G>A;NC_000019.10:g.38496455G>A;NC_000019.10:g.38499234T>C;NC_000019.10:g.38499642C>A;NC_000019.10:g.38499667G>A;NC_000019.10:g.38499667G>T;NC_000019.10:g.38499680T>A;NC_000019.10:g.38499683G>A;NC_000019.10:g.38499696C>G;NC_000019.10:g.38499719A>G;NC_000019.10:g.38499730G>A;NC_000019.10:g.38499985A>T;NC_000019.10:g.38500000G>A;NC_000019.10:g.38502669C>G;NC_000019.10:g.38504298G>A;NC_000019.10:g.38506508C>G;NC_000019.10:g.38506865C>T;NC_000019.10:g.38507821C>T;NC_000019.10:g.38512279G>A;NC_000019.10:g.38515052C>T;NC_000019.10:g.38516181T>C;NC_000019.10:g.38516208G>C;NC_000019.10:g.38517470T>C;NC_000019.10:g.38517523T>A;NC_000019.10:g.38519424C>A;NC_000019.10:g.38519432A>T;NC_000019.10:g.38519447A>G;NC_000019.10:g.38525432C>T;NC_000019.10:g.38527710G>C;NC_000019.10:g.38528372G>T;NC_000019.10:g.38529002G>C;NC_000019.10:g.38529042C>T;NC_000019.10:g.38543380A>T;NC_000019.10:g.38543566G>A;NC_000019.10:g.38543810C>T;NC_000019.10:g.38548253A>T;NC_000019.10:g.38561140G>C;NC_000019.10:g.38561213C>T;NC_000019.10:g.38561362G>A;NC_000019.10:g.38561363G>T;NC_000019.10:g.38565023T>G;NC_000019.10:g.38570649C>G;NC_000019.10:g.38577931A>C;NC_000019.10:g.38578205G>T;NC_000019.10:g.38580039_38580040delinsAA;NC_000019.10:g.38580041C>A;NC_000019.10:g.38580126C>G;NC_000019.10:g.38580397G>C;NC_000019.10:g.38580416C>T;NC_000019.10:g.38585078A>G;NC_000019.10:g.38585099G>A;NC_000019.10:g.38586190A>G;NC_000019.10:g.38587362G>C;NC_000019.10:g.38587363G>C;Reference;rs111272095.C>T;rs111364296.G>A;rs111565359.G>A;rs111657878.T>C;rs111888148.G>A;rs112151058.G>A;rs112196644.A>G;rs112563513.G>A;rs112596687.T>A;rs112772310.G>A;rs113210953.A>G;rs113332073.G>A;rs113332073.G>T;rs117886618.C>G;rs118192113.C>A;rs118192116.C>G;rs118192116.C>T;rs118192121.A>C;rs118192122.G>A;rs118192123.T>C;rs118192124.C>T;rs118192126.A>G;rs118192130.G>A;rs118192135.G>A;rs118192140.C>T;rs118192151.G>A;rs118192151.G>C;rs118192158.G>A;rs118192159.C>G;rs118192160.G>A;rs118192160.G>T;rs118192161.C>T;rs118192162.A>C;rs118192162.A>G;rs118192163.G>A;rs118192163.G>C;rs118192163.G>T;rs118192167.A>G;rs118192168.G>A;rs118192170.T>C;rs118192172.C>T;rs118192175.C>T;rs118192176.G>A;rs118192177.C>G;rs118192177.C>T;rs118192178.C>G;rs118192178.C>T;rs118192181.C>T;rs118204421.C>T;rs118204422.T>C;rs118204423.G>A;rs118204423.G>C;rs121918592.G>A;rs121918592.G>C;rs121918593.G>A;rs121918594.G>A;rs121918594.G>T;rs121918595.C>T;rs121918596._38499648delGAG;rs137932199.G>A;rs137933390.A>G;rs138874610.G>A;rs139161723.G>A;rs139647387.A>G;rs140152019.G>A;rs140616359.G>A;rs141646642.C>G;rs141942845.G>A;rs142474192.G>A;rs142474192.G>T;rs143398211.G>A;rs143520367.C>T;rs143987857.G>A;rs143988412.A>G;rs143988412.A>T;rs144336148.G>A;rs144685735.C>T;rs145573319.A>G;rs145801146.C>T;rs146306934.G>A;rs146429605.A>G;rs146504767.G>A;rs146876145.C>T;rs147136339.A>G;rs147213895.A>G;rs147303895.G>A;rs147707463.C>T;rs147723844.A>G;rs148399313.G>A;rs148623597.G>A;rs150396398.G>C;rs151029675.C>T;rs151119428.G>A;rs1801086.G>A;rs1801086.G>C;rs1801086.G>T;rs180714609.G>A;rs186983396.C>G;rs186983396.C>T;rs192863857.C>T;rs193922744.T>G;rs193922745._38440752delTGA;rs193922746.A>G;rs193922747.T>C;rs193922748.C>T;rs193922749.C>A;rs193922750.C>A;rs193922751.G>A;rs193922752.A>G;rs193922753.G>A;rs193922753.G>T;rs193922754.G>A;rs193922755.G>A;rs193922756.A>G;rs193922757.C>T;rs193922759.G>A;rs193922760.A>T;rs193922761.G>T;rs193922762.C>A;rs193922762.C>T;rs193922764.C>A;rs193922764.C>G;rs193922764.C>T;rs193922766.G>A;rs193922766.G>T;rs193922767.G>A;rs193922767.G>T;rs193922768.C>A;rs193922768.C>T;rs193922769.T>C;rs193922769.T>G;rs193922770.C>T;rs193922772.G>A;rs193922772.G>T;rs193922775.C>T;rs193922776.C>T;rs193922777.C>T;rs193922781.C>T;rs193922782.T>G;rs193922783.T>A;rs193922788.G>C;rs193922789.G>A;rs193922790.A>T;rs193922791.C>T;rs193922792.G>T;rs193922793.T>A;rs193922795.G>A;rs193922797.G>A;rs193922798.G>C;rs193922799.G>A;rs193922801.A>G;rs193922802.G>A;rs193922803.C>T;rs193922804.A>G;rs193922805.T>G;rs193922806.C>G;rs193922807.G>C;rs193922809.G>A;rs193922810.G>A;rs193922810.G>T;rs193922812.C>T;rs193922813.G>C;rs193922815.G>A;rs193922815.G>C;rs193922816.C>T;rs193922817.C>T;rs193922818.G>A;rs193922819.T>C;rs193922822.C>G;rs193922822.C>T;rs193922824.C>T;rs193922826.C>G;rs193922826.C>T;rs193922827.G>C;rs193922828.G>A;rs193922829.G>A;rs193922830.C>T;rs193922831.T>A;rs193922832.G>A;rs193922833.G>A;rs193922834.G>A;rs193922838.G>A;rs193922838.G>T;rs193922839.G>A;rs193922840.T>G;rs193922842.C>G;rs193922842.C>T;rs193922843.G>T;rs193922844.C>A;rs193922848.A>T;rs193922849.C>A;rs193922850.T>C;rs193922852.G>C;rs193922852.G>T;rs193922853.A>T;rs193922855.C>T;rs193922860.G>A;rs193922862._38572267delinsCT;rs193922863.C>T;rs193922864.T>C;rs193922865.T>G;rs193922866.G>A;rs193922867.C>T;rs193922868.G>A;rs193922873.G>A;rs193922873.G>T;rs193922874.T>C;rs193922876.C>T;rs193922877.delA;rs193922878.C>G;rs193922879.G>A;rs193922880.C>G;rs193922883.T>C;rs193922888.G>A;rs193922895.C>A;rs193922896.G>T;rs193922898.T>A;rs199738299.A>G;rs199870223.C>T;rs200766617.G>A;rs201321695.A>G;rs2145447772.G>A;rs2145447772.G>C;rs28933396.G>A;rs28933396.G>T;rs28933397.C>T;rs34390345.A>G;rs34694816.A>G;rs34934920.C>T;rs35180584.C>G;rs35364374.G>T;rs370634440.G>A;rs370634440.G>T;rs372958050.T>C;rs373406011.C>T;rs375626634.T>C;rs375915752.C>T;rs376149732.C>T;rs4802584.C>G;rs537994744.G>A;rs549201486.C>T;rs551223467.C>T;rs553055844.G>A;rs55876273.G>C;rs587784372.C>T;rs63749869.G>A;rs727504129.C>T;rs746818096.T>A;rs747177274.G>C;rs748575133.T>A;rs749040743.G>A;rs751180702.G>A;rs752652072.C>T;rs754476250.C>T;rs754785770.A>G;rs755088027.G>A;rs756850145.A>G;rs757753317.G>A;rs759500310.T>C;rs761616815.G>A;rs762401851.G>A;rs763112609.C>T;rs763352221.C>T;rs767553612.A>G;rs768360593.G>A;rs768535909.T>C;rs769482889.C>T;rs770593660.G>C;rs771058055.G>A;rs771741606.C>T;rs773040531.A>G;rs778241277.G>A;rs781104539.A>G;rs781126470.C>T;rs901087791.G>A;rs914804033.G>A;rs914804033.G>C;rs917523269.C>T;rs936513262.G>A;rs959170123.G>A;rs976108591.A>G;rs995399438.T>C
SLCO1B1
*1;*10;*11;*12;*13;*14;*15;*16;*19;*2;*20;*23;*24;*25;*26;*27;*28;*29;*3;*30;*31;*32;*33;*34;*36;*37;*38;*39;*4;*40;*41;*42;*43;*44;*45;*46;*47;*5;*6;*7;*8;*9
TNF
Reference;rs1799724.C>T;rs1799964.T>C;rs1800610.G>A;rs1800629.G>A;rs1800630.C>A;rs1800750.G>A;rs2736195.A>G;rs3093548.C>T;rs3093662.A>G;rs3093726.T>C;rs361525.G>A;rs4248158.C>T;rs4248159.C>A;rs4248160.G>A;rs4248163.C>A;rs4248163.C>G;rs4248163.C>T;rs4647198.C>T;rs4987086.G>A;rs55634887.G>A;rs55994001.C>A;rs55994001.C>T
TPMT
*1;*10;*11;*12;*13;*14;*15;*16;*17;*18;*19;*2;*20;*21;*22;*23;*24;*25;*26;*27;*28;*29;*30;*31;*32;*33;*34;*35;*36;*37;*38;*39;*3A;*3B;*3C;*4;*40;*41;*42;*43;*44;*5;*6;*7;*8;*9
UGT1A1
*1;*27;*28;*36;*37;*6;*80;*80+*28;*80+*37
UGT1A4
*1a;*1b;*1c;*2;*3a;*3b;*4;*7
UGT2B15
*1;*2;*3;*4;*5;*6;*7
VKORC1
Reference;rs9923231 variant (T)
YEATS4
Reference;rs7297610.C>T
Copy number variation can be detected for genes and regions listed below. The chromosome locations are GRCh38 based.
GSTM1
GSTM1
1
109687842
109693526
UGT2B17
UGT2B17
4
68537222
68568499
CYP2E1
CYP2E1
10
133527374
133539096
SULT1A1
SULT1A1
16
28603587
28613544
CYP2A6
CYP2A6.intron.7
19
40844791
40845293
CYP2A6
CYP2A6.exon.1
19
40850267
40850414
CYP2D6
CYP2D6.exon.9
22
42126498
42126752
CYP2D6
CYP2D6.intron.2
22
42129188
42129734
CYP2D6
CYP2D6.p5
22
42130886
42131379
GSTT1
GSTT1
22_KI270879v1_alt
270316
278477
The version history for DRAGEN Array product documentation:
01
December 2023
Initial release.
02
March 2024
Added details for DRAGEN Array v1.0.0 cloud genotype pipeline release.
03
May 2024
Added details for DRAGEN Array methylation QC pipeline v1.0.0 release. Error correction in the CNV VCF example (CN=4 to CN=5).
04
September 2024
DRAGEN Array v1.1.0 release