1 of 19

DRAGEN Array v1.0

Overview

Welcome to DRAGEN Array

DRAGEN (Dynamic Read Analysis for GENomics) Array secondary analysis is a powerful bioinformatics software for Illumina Infinium array-based assays. DRAGEN Array uses cutting-edge data analysis tools to provide accurate, comprehensive, and highly efficient secondary analysis to maximize genomic insights and meet your research needs across multiple applications.

DRAGEN Array is offered as a local package with command-line interface (no specialized server or hardware required) and as a cloud-based package with an intuitive graphical user interface, as summerized in the table below.

DRAGEN Array Applications

The following Types of Analysis are currently supported by DRAGEN Array:

DRAGEN Array – Genotyping
DRAGEN Array – PGx – CNV calling
DRAGEN Array – PGx – Star allele annotation
DRAGEN Array - Methylation QC

DRAGEN Array – Genotyping

DRAGEN Array - PGx – CNV calling

DRAGEN Array – PGx – Star Allele Annotation

DRAGEN Array – Methylation QC

Product Guides

DRAGEN Array Cloud Analysis

DRAGEN Array Cloud Analysis Overview

DRAGEN Array Cloud utilizes the user-friendly graphical interface of BaseSpace Sequence Hub to simplify DRAGEN Array analysis setup and kickoff. Optional integration with the iScan System allows data to be streamed directly from the instrument to the cloud platform. Analysis data is stored on the Illumina Connected Platform providing secure storage for both microarray and sequencing data.

Getting Started

The following prerequisites are needed to get started with DRAGEN Array Cloud:

Illumina Connected Analytics subscription: An ICA Basic, Professional or Enterprise subscription can be used which include access to BaseSpace Sequence Hub. Follow the Illumina Software Registration Guide to register the software.
Workgroup setup: Workgroups must be created before login. Using a workgroup allows all members of the workgroup to share access to resources, analyses, and data. Learn more about managing a Workgroup.
- Designating a workgroup as ‘Collaborative’ allows projects to be shared with collaborators or Illumina Tech Support to assist with troubleshooting. To create a collaborative workgroup, select the Enable collaborators outside of this domain checkbox during workgroup creation.
Software consumables: iCredits can be purchased for storage on the cloud platform and analysis pipelines with a compute charge. Per sample analysis can be purchased for relevant pipelines as listed in section Applications. Follow the Illumina Software Registration Guide (found under Example 3: Configuring the Software Consumables) to register the software consumables.
[Optional] iScan integration: The iScan System is integrated with Illumina Connected Platform and can send IDATs for further analysis. The iScan System must be running iScan Control Software version 4.2.1 or later.
- Instructions to Use Illumina Connect Analytics (ICA) with the iScan System
- Troubleshooting iScan integration
EULA acceptance: Accept all necessary End User License Agreements in BaseSpace Sequence Hub before scanning begins.
Internet connection: For uploading product files or IDATs, a network connection 1 GbE or faster is recommended.

Note: Accessioning BeadChips before scanning and starting analysis is no longer a required step and has been automated within the system.

Running Analysis

Before beginning analysis, ensure workgroup context is being used so analysis can be viewed by all members of your workgroup. The name of your workgroup should appear in the top right corner.

Use the following steps to run the Microarray Analysis Setup on BaseSpace Sequence Hub:

Select the Runs tab
Select New Run
Select Microarray Analysis Setup
Enter the Analysis Name (Figure 1)
Use the Select Project link to choose the project for your output files To select an existing project, click the radio button next to the desired project name. You can also create a project by clicking the New button in the project selection window.
Select the Type of Analysis Further detail of each Type of Analysis is available in section Applications.
(Optional) Create a custom configuration via the "Add Custom Configuration" option in Configuration Settings. Custom configurations must be assigned a name and product files can be uploaded or selected (Figure 2). Custom configuration options vary by Type of Analysis including:

DRAGEN Array - Genotyping provides flexibility for turning off/on specific output files and adjusting GenCall score cutoff. Its recommended to turn off VCF output for non-human species and Final Report output for large sample numbers.
DRAGEN Array - Methylation - QC provides options to adjust thresholds as detailed in section DRAGEN Array Methylation QC Threshold Adjustment.

Select your preferred option in the Configuration Settings drop-down menu Configuration setup will vary based on the Type of Analysis selected. More details are available in section Applications.
Select Next
Select either Import Sample Sheet, Select BeadChips, or Import IDAT Files (Figure 3)

Import Sample Sheet presents a link to upload sample sheet. Users may download a template sample sheet by selecting the Download Template link.
Select BeadChips allows users to select BeadChips from the displayed list of available BeadChips. If selecting specific samples within the BeadChip is desired the Import Sample Sheet option should be used.
Import IDAT Files allows users to upload the IDAT files from a local folder to the cloud platform for use with the current and future analyses by users within the same workgroup.

Select Launch Analysis

View Outputs

On the Analyses tab, view the analysis status, e.g., initializing or complete.
After the analysis is complete, select the analysis and select the Files tab.
From the Files tab, select the Output folder.

DRAGEN Array Methylation QC

Threshold Adjustment

When using DRAGEN Array – Methylation – QC cloud analysis type, additional customization options will appear after product files are selected within Configuration Settings. Adjustments to these thresholds will be saved as part of the Configuration Setting. Thresholds can be adjusted based on study objectives. Adjusting thresholds will impact the pass or fail status of samples in the output files.

Illumina recommends thresholds for MethylationEPIC v1 & v2 and Methylation Screening Array (MSA). Users may use these thresholds as a starting point when defining thresholds for their custom or semi-custom BeadChip or other Infinium Methylation arrays. Further tuning may be required based on BeadChip used, laboratory conditions, iScan settings, bisulfite conversion methods, FPPE sample type, etc. A dataset deemed acceptable to the user based on proportion probes passing can be used for these additional threshold adjustments.

To customize thresholds, use the toggle to allow additional thresholds to be displayed and adjust as desired by typing in a numeric value or using the arrows to adjust up or down. Further detail of these thresholds including calculation method can be found in the Methylation Sample QC Summary Files section.

The recommended thresholds are pre-set within the software for MethylationEPIC and Methylation Screening Array with the following values:

Threshold

Methylation Screening Array

MethylationEPIC

StainingGreen

StainingRed

ExtensionGreen

ExtensionRed

HybridizationHighMedium

HybridizationMediumLow

TargetRemoval1

TargetRemoval2

BisulfiteConversion1Green

BisulfiteConversion1BackgroundGreen

0.5

BisulfiteConversion1Red

BisulfiteConversion1BackgroundRed

0.5

BisulfiteConversion2

0.5

BisulfiteConversion2Background

0.5

Specificity1Green

Specificity1Red

Specificity2

Specificity2Background

NonpolymorphicGreen

2.5

NonpolymorphicRed

BgCorrectionOffset

3000

PvalThreshold

0.05

The first 21 rows in the tables correspond to the 21 control metrics used in the methylation sample QC. See section Methylation Sample QC Summary Files for details.

DRAGEN Array Methylation QC and GenomeStudio Methylation Module Differences

DRAGEN Array Methylation QC software provides automated methylation sample QC using assay control probes on the Infinium Methylation Arrays. Unlike the manual visual QC in GenomeStudio, DRAGEN Array ultilizes 21 numerical metrics defined based on the control probes and uses standard thresholds to determine pass/fail status of a sample. Unlike GenomeStuio, probe detection rate (proportion of probes passing at a given p-value threshold) is not utilized to determine sample pass/fail status in DRAGEN Array.

DRAGEN Array Methylation QC performs background normalization, dye bias correction, and detection p-value calculation differently in comparison to the GenomeStudio Methylation module, leading to differences in probe detection p-values and detection rates. For the GenomeStudio Methylation Module, non-cancer samples at standard DNA input typically have detection rate > 96%. The detection rates from DRAGEN Array Methylation QC are typically lower compared to GenomeStudio, because the detection p-value from DRAGEN Array is more stringent than that from the GenomeStudio Methylation Module. The table below shows example detection rates from the DRAGEN Array Methylation QC software from MSA (Methylation Screening Array) datasets.

Dataset

Min detection rate

Mean detection rate

Sample Count

86%

93%

220

61%

83%

951

63%

85%

77%

85%

Note that only samples passing QC are included and all samples are at or above 50ng DNA input. Detection p-value threshold 0.05.

Known Issues

DRAGEN Array Methylation QC cloud v1.0.0 is released on BSSH version 7.21. There are two known issues with the BSSH UI that impact Methylation QC analyses.

Sample sheets with greater than 144 samples lead to undefined failures. The issue impact methylation QC as well as other analysis types. To process larger sample numbers (up to 1152 for DRAGEN Array methylation QC), samples can be separated into batches of 144 when using sample sheet, or by selecting BeadChips directly from the BeadChip table.
Uploading custom methylation BPM manifests results in upload failure. The issue only impact the methylation QC and not other analysis types. For assistance uploading these manifests, contact Illumina Tech Support.

Troubleshooting and Additional Support

Troubleshooting iScan integration

The firewall protects the iScan control computer by filtering incoming traffic to remove potential threats. The firewall is enabled by default to block all inbound connections. Keep the firewall enabled and allow outbound connections.

For the instrument to connect to BaseSpace Sequence Hub, you will need to add regional platform endpoints and instrument specific endpoints to the allow list on your firewall. Regional endpoints and further detail can be found in Security and Networking for Illumina instrument control computers.

The following table shows the applicable endpoints for the iScan.

Endpoint

Sharing a project

Project sharing allows a user to share files with users outside the workgroup for collaboration or with Illumina Tech Support for troubleshooting. To share a project on BaseSpace Sequence Hub, first set the Workgroup type as ‘Collaborative’ during Workgroup setup, and then use the following steps to obtain a link to your project. The project can then be accessed by anyone with the link. All files in the project are shared.

Navigate to the Projects tab
Click the button next to the desired project
Select the Share button above to list (Figure 3)
Select the Get Link Option to Activate a link for the project
Copy the link and send it to the desired recipient(s)

Note: The project owner maintains ownership and write access. If project owner deletes the data, the collaborators lose access to it.

DRAGEN Array Local Analysis

DRAGEN Array Local Overview

DRAGEN Array provides accurate, comprehensive, and efficient analysis of Infinium microarray data. The local command-line interface makes it easy for power users to have granular control and flexibility to support large scale microarray genomic studies.

Getting Started

Computing Requirements

Before downloading and installing the software, ensure the following specifications are met for best performance:

Quota Specifications

Internet is required to do a software license check and ensure paid quota is available for all samples in the analysis batch. For the software license check, the following endpoint is used: license.edicogenome.com.

Installation

Please follow the steps below to install the software on your compute infrastructure:

Unzip and extract the package. The executable can be found in the dragena subfolder of the software download after extraction.
To check that the DRAGEN Array installation was successful, follow these steps:
- Open a command prompt (Windows) or terminal (Linux).
- [Optional] Add /path/to/dragena/, e.g. /usr/local/bin/dragena-linux-x64-DAv1.0.0/dragena/, to your PATH – to access the executable anywhere in the folder structure
- Execute the following command: /path/to/dragena/dragena version, or if the environmental variable PATH is set: dragena version

The version of the software will be displayed in the terminal window when the installation was successful.

Run DRAGEN Array Local

For genotyping analysis, there is no sample minimum required to run analysis.

To optimize performance of the targeted PGx CNV caller and minimize batch effect, it is recommended to:

Analyze samples that were processed together in one batch
Avoid combining sample batches processed on different reagent lots.
Analyze batches of 96 samples or more
Use the CN Model and PGx Database File provided as part of the standard product files

Quick Start

Command examples show analysis for a Linux system using folders instead of sample sheets. For Windows users, make sure to substitute the file paths in the commands following windows conventions, e.g., using backslash (\) instead of forward-slash (/). A sample sheet can be used to select specific samples out of a folder.

Open a command prompt (Windows) or terminal window (Linux) and navigate to the directory where the software was installed. Or a different, desired directory if the executable was added to the PATH environmental variable.
Use the genotype call command to call genotypes and generate GTC files using IDAT files as input. dragena genotype call --bpm-manifest /user/productfiles/manifest.bpm --cluster-file /user/productfiles/clusterfile.egt –-idat-folder /user/IDATs –-output-folder /user/gtc
Use the genotype gtc-to-vcf command to create SNV VCF files from the GTC files generated by the genotype call command. dragena genotype gtc-to-vcf --bpm-manifest /user/productfiles/manifest.bpm --csv-manifest /user/productfiles/manifest.csv --genome-fasta-file /user/productfiles/genome.fa --gtc-folder /user/gtc --output-folder /user/vcf
Use the copy-number call command to call PGx CNVs from the GTC files and produce CNV VCF files. It is recommended to use the same output folder used for SNV VCF since the star-allele call command accepts one VCF folder with SNV and CNV VCFs. dragena copy-number call --cn-model /user/productfiles/cnv_model.dat --gtc-folder /user/gtc --output-folder /user/vcf
Use the star-allele call command to generate star allele calls using the CNV and SNV VCF files generated by the gtc-to-vcf and copy-number call commands. dragena star-allele call --vcf-folder /user/vcf --database /user/productfiles/GDA_ePGx_E2_DAv1.0.0.zip --output-folder /user/star-alleles --license-server-url https://username:password@license.edicogenome.com
Use the star-allele annotate command to summarize the star alleles and add metabolizer statuses to the star alleles generated by the star-allele call command. Guidelines (CPIC or DPWG) can be specified. dragena star-allele annotate –-star-alleles star_alleles.csv --guidelines CPIC --output-folder /user/metabolizer-statuses
[Optional] Use the copy-number train command to retrain the copy number model. dragena copy-number train --bpm-manifest /user/productfiles/manifest.bpm -–genome-fasta-file /user/productfiles/genome.fa –-gtc-folder /user/gtc --platform LCG –-output-folder /user/productfiles/cnmodelnew

Command Index

Use the following syntax when using the command-line interface:

dragena [command] [required parameters] [optional parameters]

copy-number

The root command for actions that act on copy number variants.

copy-number call

The command used to call copy number variants. A batch of 24 samples or more are required for analysis. For a successful analysis, 22 samples must pass QC defined as having log R dev < 0.2.

copy-number help

Displays help information for a copy-number command.

copy-number train

Trains copy number (CN) model for a set of samples. Generate a new CN model if using a customized cluster file (.egt) optimized for the specific data set.

Execute the train command using the data sets that were used to optimize the cluster file.
To use a CN model generated by the train command, the mask file for the manifest must be saved in the same directory as the manifest.
A minimum of 96 samples is required to use the copy-number train command. For optimal performance, at least 150 is recommended.
For best performance, validate the CN model using truth data before using in CN calling.

copy-number version

Displays version information for copy-number command.

genotype

The root command for genotype calling.

genotype call

Determines genotype calls (GTC) from IDAT files.

genotype gtc-to-bedgraph

Converts GTC to BedGraph files, producing BedGraph formatted visualization files from the log R ratio data contained in the GTC intermediate files.

genotype gtc-to-vcf

genotype help

Displays the help information for a genotype command.

genotype version

Displays current DRAGEN Array Local version.

help

Displays the help information.

version

Displays current DRAGEN Array Local version.

star-allele

The root command PGx star allele calling.

star-allele help

Displays help information for a star-allele command.

star-allele version

Displays version information for star-allele.

star-allele call

Calls PGx star allele diplotypes. The SNV VCF files should be generated using the DRAGEN Array gtc-to-vcf command with unsquash-duplicates off (default) and without filter loci.

star-allele annotate

Annotates and summarizes the star-alleles, specifically for metabolizer statuses and outputs in a consolidated JSON report. Metatolizer status is determined through direct lookup into public PGx guidelines CPIC or DPWG as specified by the user.

Troubleshooting and Additional Support

Tips for using the Command-line interface

DRAGEN Array Local utilizes a command-line interface which allows full user control of software functionality and easy automation of tasks. The software is designed to be used by power users and bioinformaticians.

When using command-line consider the following tips:

Spaces cannot be part of a file name in a command. If the file name has spaces, use quotes around the file name
To correct a typing error in a previously entered command, use the up arrow to repeat the previous command, then correct the error before re-entering it.
Double check the command. Misspelling, extra, or missing dashes, etc. will cause the command to be unrecognizable by the software.
- When entering paths or long names, copy and paste the values to help avoid errors.
- If using Windows, use a File Explorer window to navigate to the product file or folder that is needed by the DRAGEN Array Local command. While holding down the shift button on the keyboard, right click the file and select the 'Copy as Path' option. Then paste the copied path into the command prompt to use the file or folder.
To cancel a command while it is running, press Control + C on the keyboard.

Optimizing cluster files and copy number models

When updating the cluster file for pharmacogenomic applications, understand the specifications for the copy number model file before beginning.

To retrain the CN model file, 96 samples must be used at minimum with 90 of those samples passing QC defined as Log R Dev less than or equal to 0.2. It is recommended to train with at least 150 samples. A greater number of samples can be advantageous, but diminishing returns and longer computation times are seen after 3,000 samples.

It is recommended to manually QC the training samples and remove samples that have Log R Dev > 0.2, call rate < 0.99, or TGA Control probe < 1.0 so only the highest quality sample are used in the training. The same samples used to create the new cluster file should be used to retrain the CN Model. To minimize batch effect in the training sample set, the samples should be analyzed in as few batches as possible and come from the same reagent lots.

The copy-number train algorithm is designed with the assumption that the copy number distribution resembles the standard population distributions. This ensures the updated CN model file is representative of the normal populations in which it will be used to calculate copy number for key pharmacogenomic targets.

Pharmacogenomic analysis for semi-custom arrays

When designing a semi-custom array using a commercial Infinium PGx array backbone, such as the Global Diversity Array with enhanced PGx, it is important to retain all backbone content in the design as removing content could decrease the quality of result.

Input Files

The following section describes the input files required by DRAGEN Array.

IDAT Files

For each sample a pair of raw intensity files (.idat) are generated from the iScan System or NextSeq550 (for non-methylation arrays). They provide intensities in the red and green channels for each probe on the Infinium array.

An IDAT file is identified by the BeadChip Barcode (12-digit unique Sentrix ID, i.e. 123456789101), BeadChip Position (row and column of the sample, i.e. R01C01), and Grn (Green) or Red for the specific channel.

Manifest Files

The CSV and BPM manifest files can be found on the Illumina Support Site for all commercial Infinium BeadChips or on MyIllumina for custom and semi-custom designs. For instructions on obtaining manifest files from MyIllumina, see Illumina Knowledge article, How to access custom array product files (manifest and product definition files) in MyIllumina.

The CSV manifest file (.csv) provides complementary data to the BPM manifest file in a human readable format. It is a required input to the genotype gtc-to-vcf command to enable VCF generation for insertion/deletion variants.

Cluster File

The cluster file (.egt) is a standard product file provided by Illumina for commercial genotyping products and it is a required input for the genotype call command in DRAGEN Array. Custom cluster files may be required for optimal genotyping performance. See section Optimizing cluster files and copy number models for additional details.

CN Model File

The CN (Copy Number) model file (.dat) is a required input to the copy-number call command to enable accurate copy number calling for pharmacogenomics. Illumina provides a standard CN model file for each PGx array product. See section Optimizing cluster files and copy number models for additional details.

PGx Database File

The PGx database file (.zip) contains the variant mapping information from Infinium PGx arrays to PGx variants. For each gene and each variant used in the star allele definitions of the gene, there is a mapping to the ID field in the SNV VCF file. Each line in the gene mapping file represents a single variant and contains the SNV VCF ID for that variant followed by the HGVS (Human Genome Variation Society) tag for the variant. The PGx database file is array specific and is one of the product files provided by Illumina for each PGx array product.

Genome FASTA Files

The genome FASTA file (.fa) is a text file with the reference genome sequences.The FASTA index file (.fai) contains meta-data about chromosomal orchestration within the FASTA file for a particular species. DRAGEN Array PGx calling supports human genome build 37 and 38. The genome FASTA file and FASTA index file are both provided by Illumina for human species and should be stored together in the same input folder.

IDAT Sample Sheet

For local analysis, the IDAT sample sheet can be a CSV or JSON formatted file with direct paths to sample IDAT files. It enables easy analysis of samples from different directories.

Example CSV format:

Green IDAT Path,Red IDAT Path

/path/to/sample1_Grn.idat,/path/to/sample1_Red.idat

/path/to/sample2_Grn.idat,/path/to/sample2_Red.idat

/path/to/sample3_Grn.idat,/path/to/sample3_Red.idat

Example JSON format:

[

{

"Green IDAT Path": "/path/to/sample1_Grn.idat",

"Red IDAT Path": "/path/to/sample1_Red.idat"

},

{

"Green IDAT Path: "/path/to/sample2_Grn.idat",

"Red IDAT Path": "/path/to/sample2_Red.idat"

},

{

"Green IDAT Path": "/path/to/sample3_Grn.idat",

"Red IDAT Path": "/path/to/sample3_Red.idat"

},

]

For cloud analysis, the IDAT sample sheet can be a CSV formatted file.

beadChipName,sampleSectionName

Beadchip 1 barcode (204753010023), sample section (R01C01)

Beadchip 1 barcode (204753010023), sample section (R02C01)

Beadchip 2 barcode (204753010024), sample section (R01C01)

Beadchip 2 barcode (204753010024), sample section (R02C01)

For DRAGEN Array Methylation QC on cloud, additional optional sample sheet fields are available.

Following Sample_Group, any number of additional columns can be added to include meta data fields such as sex, sample type, plate and well information, etc. Additional columns added after the Sample_Group column may have user-defined column header values. The Sample_ID field and any additional meta data added will be replicated in the Sample QC Summary output files.

The Sample_Group field will be used to populate the PCA Control Plot within the Sample QC Summary Plots file and the Principal Component Summary file. For the PCA Control Plot, each sample group will be assigned a unique color. Samples assigned to the same Sample_Group value will be the same color in the PCA Control Plot.

beadChipName,sampleSectionName,Sample_ID,Sample_Group,MetaData1

Beadchip 1 barcode (204753010023), sample section (R01C01),NA1231,Group1,F

Beadchip 1 barcode (204753010023), sample section (R02C01),NA1232,Group2,F

Beadchip 2 barcode (204753010024), sample section (R01C01),NA1233,Group2,M

Beadchip 2 barcode (204753010024), sample section (R02C01),NA1234,Group1,M

GTC Sample Sheet

The GTC sample sheet is a CSV or JSON formatted file with direct paths to sample GTC files. It enables easy analysis of samples from different directories.

Example CSV format:

GTC Path

/path/to/sample1.gtc

/path/to/sample2.gtc

/path/to/sample3.gtc

Example JSON format:

[

{

"GTC Path": "/path/to/sample1.gtc"

},

{

"GTC Path": "/path/to/sample2.gtc"

},

{

"GTC Path": "/path/to/sample3.gtc"

}

]

Input File Summary Table

In addition to the input files, there are set of intermediate files, including GTC, SNV VCF, CNV VCF and PGx CSV, which are outputs of some DRAGEN Array Local commands and inputs to other commands.

The table below summarizes the input files or intermediate file, their sources, and the associated DRAGEN Array Local commands and options.

Input File

Source

Command

Option

IDAT

User provided from scanning instrument

genotype call

--idat-folder

CSV Manifest

Product file from Illumina

genotype gtc-to-vcf

--csv-manifest

BPM Manifest

Product file from Illumina

copy-number train

genotype call

genotype gtc-to-bedgraph

genotype gtc-to-vcf

--bpm-manifest

Cluster File

Product file from Illumina or user created using GenomeStudio

genotype call

--cluster-file

CN Model

Product file from Illumina or user created using DRAGEN Array Local

copy-number call

--cn-model

PGx Database

Product file from Illumina

star-allele call

--database

Genome FASTA

Product file from Illumina

genotype gtc-to-vcf

copy-number train

--genome-fasta-file

IDAT Sample Sheet

User provided

genotype call

--idat-sample-sheet

GTC Sample Sheet

User provided

genotype gtc-to-bedgraph

genotype gtc-to-vcf

copy-number call

copy-number train

--gtc-sample-sheet

GTC

DRAGEN Array output from genotype call

genotype gtc-to-bedgraph

genotype gtc-to-vcf

copy-number call

copy-number train

--gtc-folder

SNV and CNV VCF

DRAGEN Array output from genotype gtc-to-vcf and copy-number call

star-allele call

--vcf-folder

PGx CSV

DRAGEN Array output from star-allele call

star-allele annotate

--star-alleles

Output Files

The following section describes the outputs produced by DRAGEN Array.

CNV VCF File

DRAGEN Array produces one CNV variant call file (VCF) (*.cnv.vcf) per sample to report the CN status on the gene and sub gene level, along with the CN events for PGx targets.

The CNV VCF output file follows the standard VCF format. The QUAL field in the VCF file measures the CNV call quality. The CNV call quality is a Phred-scaled score capped at 60 and the minimal value is 0. Low quality calls (QUAL<7) are flagged by the Q7 filter. Low quality samples with LogRDev greater than a threshold 0.2 are flagged with the SampleQuality flag.

The CNV VCF output file includes the following content.

##fileformat=VCFv4.1

##source=dragena 1.0.0

##genomeBuild=38

##reference=file:///hg38_with_alt/hg38_nochr_MT.fa

##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events. CN=5 indicates 5 or 5+">

##FORMAT=<ID=NR,Number=1,Type=Float,Description="Aggregated normalized intensity">

##ALT=<ID=CNV,Description="Copy number variant region">

##FILTER=<ID=Q7,Description="Quality below 7">

##FILTER=<ID=SampleQuality,Description="Sample was flagged as potentially low-quality due to high noise levels.">

##INFO=<ID=CNVLEN,Number=1,Type=Integer,Description="Number of bases in CNV hotspot">

##INFO=<ID=PROBE,Number=1,Type=Integer,Description="Number of probes assayed for CNV hotspot">

##INFO=<ID=END,Number=1,Type=Integer,Description="End position of CNV hotspot">

##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Structural Variant Type">

##CNVOverallPloidy=1.8

##CNVGCCorrect=True

##contig=<ID=1,length=248956422>

##contig=<ID=4,length=190214555>

##contig=<ID=10,length=133797422>

##contig=<ID=16,length=90338345>

##contig=<ID=19,length=58617616>

##contig=<ID=22,length=50818468>

##contig=<ID=22_KI270879v1_alt,length=304135>

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 204619760001_R01C01

1 109687842 CNV:GSTM1:chr1:109687842:109693526 N <CNV> 60 PASS CNVLEN=5685;PROBE=124;END=109693526;SVTYPE=CNV CN:NR 2:0.966631132771593

4 68537222 CNV:UGT2B17:chr4:68537222:68568499 N <CNV> 60 PASS CNVLEN=31278;PROBE=383;END=68568499;SVTYPE=CNV CN:NR 0:0.376696837881692

10 133527374 CNV:CYP2E1:chr10:133527374:133539096 N <CNV> 60 PASS CNVLEN=11723;PROBE=194;END=133539096;SVTYPE=CNV CN:NR 2:0.980059731860893

16 28615068 CNV:SULT1A1:chr16:28615068:28623382 N <CNV> 57 PASS CNVLEN=8315;PROBE=164;END=28623382;SVTYPE=CNV CN:NR 2:0.980552325552963

19 40844791 CNV:CYP2A6.intron.7:chr19:40844791:40845293 N <CNV> 60 PASS CNVLEN=503;PROBE=38;END=40845293;SVTYPE=CNV CN:NR 2:0.9663775484762

19 40850267 CNV:CYP2A6.exon.1:chr19:40850267:40850414 N <CNV> 60 PASS CNVLEN=148;PROBE=21;END=40850414;SVTYPE=CNV CN:NR 2:0.9663775484762

22 42126498 CNV:CYP2D6.exon.9:chr22:42126498:42126752 N <CNV> 48 PASS CNVLEN=255;PROBE=370;END=42126752;SVTYPE=CNV CN:NR 2:0.981703411438716

22 42129188 CNV:CYP2D6.intron.2:chr22:42129188:42129734 N <CNV> 10 PASS CNVLEN=547;PROBE=333;END=42129734;SVTYPE=CNV CN:NR 2:0.965498002434641

22 42130886 CNV:CYP2D6.p5:chr22:42130886:42131379 N <CNV> 60 PASS CNVLEN=494;PROBE=172;END=42131379;SVTYPE=CNV CN:NR 2:0.970341562236357

22_KI270879v1_alt 270316 CNV:GSTT1:chr22_KI270879v1_alt:270316:278477 N <CNV> 60 PASS CNVLEN=8162;PROBE=91;END=278477;SVTYPE=CNV CN:NR 2:1.01191145130511

SNV VCF File

The software produces one genotyping variant call file (*.snv.vcf) file per sample, covering single nucleotide variants (SNV) and indels for the sample. It reports GenCell score (GS), B Allele Frequency (BAF), and Log R Ratio (LRR) per variant.

The BAF and LRR are oriented with Ref as A and Alt as B relative to the reference genome, while GS is agnostic to the reference genome. Users familiar with GenomeStudio may observe BAF and LRR reported in the VCF as 1 minus the value reported in GenomeStudio depending on the Ref Alt allele orientation with the reference genome. GenomeStudio reports these values based on the information in the manifest without knowledge of the reference genome.

The SNV VCF output file includes the following content. The last row shows an example of variant call.

##fileformat=VCFv4.1

##source=dragena 1.0.0

##genomeBuild=38

##reference=file:///genomes/38/genome.fa

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

##FORMAT=<ID=GS,Number=1,Type=Float,Description="GenCall score. For merged multi-assay or multi-allelic records, min GenCall score is reported.">

##FORMAT=<ID=BAF,Number=1,Type=Float,Description="B Allele Frequency">

##FORMAT=<ID=LRR,Number=1,Type=Float,Description="LogR ratio">

##contig=<ID=1,length=248956422>

##contig=<ID=2,length=242193529>

##contig=<ID=3,length=198295559>

##contig=<ID=4,length=190214555>

##contig=<ID=5,length=181538259>

##contig=<ID=6,length=170805979>

##contig=<ID=7,length=159345973>

##contig=<ID=8,length=145138636>

##contig=<ID=9,length=138394717>

##contig=<ID=10,length=133797422>

##contig=<ID=11,length=135086622>

##contig=<ID=12,length=133275309>

##contig=<ID=13,length=114364328>

##contig=<ID=14,length=107043718>

##contig=<ID=15,length=101991189>

##contig=<ID=16,length=90338345>

##contig=<ID=17,length=83257441>

##contig=<ID=18,length=80373285>

##contig=<ID=19,length=58617616>

##contig=<ID=20,length=64444167>

##contig=<ID=21,length=46709983>

##contig=<ID=22,length=50818468>

##contig=<ID=MT,length=16569>

##contig=<ID=X,length=156040895>

##contig=<ID=Y,length=57227415>

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 202937470021_R06C01

1 2290399 rs878093 G A . PASS . GT:GS:BAF:LRR 0/1:0.7923:0.50724137:0.14730307

Genotype Call (GTC) File

BedGraph File

The BedGraph file contains the log R ratios from the genotyping algorithm for use in visual tools.

Star Allele CSV File

The Star Allele CSV file is an intermediate file generated by the star-allele call command and serves as the input to the star-allele annotate command. It contains all the star allele calls for all samples in a run. Each row in the file provides either a star allele diplotype or simple variant call for a PGx-related gene. Star allele diplotype calls for a sample and a gene may span multiple lines where alternative solutions can be listed.

The Star Allele CSV file also contains meta information marked by # at the top of the file for the genome build and PGx database used for the star allele calling.

The star_allele.csv file contains the following details per sample:

Below is an example of the first 4 columns from a star allele CSV file:

Sample,Rank,Gene or Variant,Type,Solution

204650490282_R02C01,1,CYP2C9,Haplotype,*9/*11

204650490282_R02C01,1,CYP2C19,Haplotype,*2/*10

Genotype Summary Files

The software produces genotype summary files (gt_sample_summary.csv and gt_sample_summary.json) that contains the following details per sample:

Sample ID
Sample Name
Sample Folder
Autosomal Call Rate
Call Rate
Log R Ratio Std Dev
Sex Estimate
TGA_Ctrl_5716 Norm R

The TGA_Ctrl_5716 Norm R field is specific to PGx products (e.g., Global Diversity Array with enhanced PGx). The field value is the Normalized R value of one probe and is meant as an assay control where < 1 indicates the sample failed in the TGA (Targeted Gene Amplification) process. If the product does not have this probe, it is not included in the gt_sample_summary.

Final Report

DRAGEN Array Cloud produces a Final Report (gtc_final_report.csv) per analysis batch similar to the one available in GenomeStudio. It contains the following details per locus per sample:

Note: Analyses on products with large numbers of loci (>1 Million) and large numbers of samples (>100) yield a large (50+ Gigabyte) Final Report that are difficult to download and review. It’s recommended to create analysis configurations that do not produce this report if large batches are desired.

Locus Summary

DRAGEN Array Cloud produces a Locus Summary (locus_summary.csv) per analysis batch similar to the one available in GenomeStudio. It contains the following details per locus:

CN Summary File

The sample summary contains per sample key stats for each sample in a batch that contains the following details per sample:

Sample ID
Sample Name
Sample Folder

Copy Number Batch File

The copy number batch summary file (cn_batch_summary.csv) shows the total copy number gain, loss, and neutral (CN=2) values for each target region across all the samples in the analysis.

Example copy number batch summary file content:

Target Region,Total CN gain,Total CN loss,Total CN neutral

CYP2A6.exon.1,0,1,47

CYP2A6.intron.7,0,1,47

CYP2D6.exon.9,2,4,42

CYP2D6.intron.2,7,2,39

CYP2D6.p5,13,2,33

CYP2E1,2,0,46

GSTM1,0,42,6

GSTT1,0,33,15

SULT1A1,0,0,48

UGT2B17,0,34,14

All Target Regions,24,119,337

Warning/Error Messages and Logs

The following scenarios result in a warning or error message:

Manifest file used to generate GTC is not the same as the manifest file used to generate the CN model.
FASTA files and FASTA index files do not match.

For the following scenarios, the software reports messages to the terminal output (as either a warning or an error):

Indel processing for GTC to VCF conversion failed.
The input folder does not contain the required input files.
An input file is corrupt.

Examples of such notifications can include the following:

Star allele JSON File

Fields included in the star allele JSON header are described below.

Fields included in the star allele call (locusAnnotations) information are described below.

Example of JSON file content:

{

"softwareVersion": "dragena 1.0.0",

"genomeBuild": "hg38",

"databaseSources": "PharmVar Version: 6.0.5, PharmGKB Database Version: Snapshot-2023.08.30, CPIC Database Version: 1.30.0",

"mappingFile": "gda_mapping_53e0931.zip",

"pgxGuideline": "CPIC",

"sampleId": "204619760027_R01C01",

"locusAnnotations": [

{

"gene": "CYP2C9",

"callType": "Star Allele",

"genotype": "*1/*1",

"activityScore": "2",

"phenotype": "Normal Metabolizer",

"qualityScore": "0.9999",

"rawScore": "0.9999",

"supportingVariants": "Complete: *1 ( )",

"candidateSolutions": [

{

"rank": 1,

"genotype": "*1/*1",

"activityScore": "2",

"phenotype": "Normal Metabolizer",

"qualityScore": 0.9999,

"rawScore": 0.9999,

"alleles": [

{

"solutionLong": "Complete: *1",

"supportingVariants": "Complete: *1 ( )",

"missingVariants": "Complete: *1 ( )",

"collapsedAlleles": "Complete: *1 ( )"

}

],

"copyNumberRegions": "p5,exon.1,intron.1,exon.2,intron.2,exon.3,intron.3,exon.4,intron.4,exon.5,intron.5,exon.6,intron.6,exon.7,intron.7,exon.8,intron.8,exon.9,p3",

"copyNumberSolution": "2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2"

}

],

TBI Index File

Methylation Control Probe Output File

The software produces a control probe output file ({BeadChipBarcode}_{Position}_ctrl.tsv.gz) per sample that includes the raw methylated and unmethylated values for each control probe.

Each control probe has an address, type, color channel, name, and probe ID. It also provides the raw signal for methylated green (MG), methylated red (MR), unmethylated green (UG) and unmethylated red (UR).

The file can help identify which probes are available on a given BeadChip.

Methylation CG Output File

The software produces a CG output file ({BeadChipBarcode}_{Position}_cgs.tsv.gz) per sample that includes beta values, m-values and detection p-values for each CG site.

Beta values measure methylation levels in a linear fashion for easy interpretation. Unmethylated probes are close to zero and methylated probes are close to 1.

M-values are a log transformed beta value which provides a more representative measure of methylation.

Detection p-values measure the likelihood that the signal is background noise. It is recommended that p-value >0.05 are excluded from analysis as they are likely background noise.

Methylation Sample QC Summary Files

The software produces methylation sample QC summary in .xlsx and .tsv file formats (sample_qc_summary.xlsx and sample_qc_summary.tsv) per analysis batch, which provides per sample QC data for all samples in the batch.

The QC summary provides details on 21 controls metrics (see tables below), which are computed in same way as in the BeadArray Control Reporter software from Illumina. In addition, it provides average red and green raw and normalized signals, time of scanning, proportion of probes passing, overall sample pass/fail status, and the failure codes for control metrics that did not pass. The sample pass status is defined as the passing of all 21 control metrics. The QC summary .xlsx file further highlights failing parameters for easy viewing.

The QC summary files contain the following fields:

Methylation Sample QC Summary Plots

The software produces methylation sample QC summary plots (sample_qc_summary.pdf) per analysis batch which provides visual depictions of two QC summary plots for quick visual review.

The file contains the following control plots:

Methylation Principal Component Summary

The software produces a methylation principal component summary file (pcs.tsv.gz) per analysis batch which provides principal component data for each sample within the batch. This can be used to identify the specific samples associated with points on the PCA control plot within the Methylation Sample QC Control Plots output file.

The files contain the following fields:

Methylation Manifest Files

The software produces two methylation manifest files

Manifest in Sesame format (probes.csv)
Additional information for control probes (controls.csv)

The probes.csv file has the following columns:

The controls.csv file has the following columns:

Methylation Warning/Error Messages and Logs

The following scenarios result in a warning or error message:

Missing IDATs or manifest
Incorrect sample sheet formatting
Duplicate BeadChip Barcode and Position within the sample sheet
Missing control or assay probes
Missing required columns in the manifest
Unable to compute certain metrics

Examples of such notifications can include the following:

Reference

Release Notes

The following versions of DRAGEN Array have been released:

DRAGEN Array v1.0.0 Release Notes
- DRAGEN Array Genotyping Cloud v1.0.0 Release Notes
- DRAGEN Array Methylation QC Cloud v1.0.0 Release Notes

DRAGEN Array Cloud Analysis

DRAGEN Array Cloud Analysis Overview

Getting Started

The following prerequisites are needed to get started with DRAGEN Array Cloud:

Illumina Connected Analytics subscription: An ICA Basic, Professional or Enterprise subscription can be used which include access to BaseSpace Sequence Hub. Follow the Illumina Software Registration Guide to register the software.
Workgroup setup: Workgroups must be created before login. Using a workgroup allows all members of the workgroup to share access to resources, analyses, and data. Learn more about managing a Workgroup.
- Designating a workgroup as ‘Collaborative’ allows projects to be shared with collaborators or Illumina Tech Support to assist with troubleshooting. To create a collaborative workgroup, select the Enable collaborators outside of this domain checkbox during workgroup creation.
Software consumables: iCredits can be purchased for storage on the cloud platform and analysis pipelines with a compute charge. Per sample analysis can be purchased for relevant pipelines as listed in section Applications. Follow the Illumina Software Registration Guide (found under Example 3: Configuring the Software Consumables) to register the software consumables.
[Optional] iScan integration: The iScan System is integrated with Illumina Connected Platform and can send IDATs for further analysis. The iScan System must be running iScan Control Software version 4.2.1 or later.
- Instructions to Use Illumina Connect Analytics (ICA) with the iScan System
- Troubleshooting iScan integration
EULA acceptance: Accept all necessary End User License Agreements in BaseSpace Sequence Hub before scanning begins.
Internet connection: For uploading product files or IDATs, a network connection 1 GbE or faster is recommended.

Note: Accessioning BeadChips before scanning and starting analysis is no longer a required step and has been automated within the system.

Running Analysis

Before beginning analysis, ensure workgroup context is being used so analysis can be viewed by all members of your workgroup. The name of your workgroup should appear in the top right corner.

Use the following steps to run the Microarray Analysis Setup on BaseSpace Sequence Hub:

Select the Runs tab
Select New Run
Select Microarray Analysis Setup
Enter the Analysis Name (Figure 1)
Use the Select Project link to choose the project for your output files To select an existing project, click the radio button next to the desired project name. You can also create a project by clicking the New button in the project selection window.
Select the Type of Analysis Further detail of each Type of Analysis is available in section Applications.
(Optional) Create a custom configuration via the "Add Custom Configuration" option in Configuration Settings. Custom configurations must be assigned a name and product files can be uploaded or selected (Figure 2). Custom configuration options vary by Type of Analysis including:

DRAGEN Array - Genotyping provides flexibility for turning off/on specific output files and adjusting GenCall score cutoff. Its recommended to turn off VCF output for non-human species and Final Report output for large sample numbers.
DRAGEN Array - Methylation - QC provides options to adjust thresholds as detailed in section DRAGEN Array Methylation QC Threshold Adjustment.

Select your preferred option in the Configuration Settings drop-down menu Configuration setup will vary based on the Type of Analysis selected. More details are available in section Applications.
Select Next
Select either Import Sample Sheet, Select BeadChips, or Import IDAT Files (Figure 3)

Import Sample Sheet presents a link to upload sample sheet. Users may download a template sample sheet by selecting the Download Template link.
Select BeadChips allows users to select BeadChips from the displayed list of available BeadChips. If selecting specific samples within the BeadChip is desired the Import Sample Sheet option should be used.
Import IDAT Files allows users to upload the IDAT files from a local folder to the cloud platform for use with the current and future analyses by users within the same workgroup.

Select Launch Analysis

View Outputs

On the Analyses tab, view the analysis status, e.g., initializing or complete.
After the analysis is complete, select the analysis and select the Files tab.
From the Files tab, select the Output folder.

DRAGEN Array Methylation QC

Threshold Adjustment

The recommended thresholds are pre-set within the software for MethylationEPIC and Methylation Screening Array with the following values:

Threshold

Methylation Screening Array

MethylationEPIC

StainingGreen

StainingRed

ExtensionGreen

ExtensionRed

HybridizationHighMedium

HybridizationMediumLow

TargetRemoval1

TargetRemoval2

BisulfiteConversion1Green

BisulfiteConversion1BackgroundGreen

0.5

BisulfiteConversion1Red

BisulfiteConversion1BackgroundRed

0.5

BisulfiteConversion2

0.5

BisulfiteConversion2Background

0.5

Specificity1Green

Specificity1Red

Specificity2

Specificity2Background

NonpolymorphicGreen

2.5

NonpolymorphicRed

BgCorrectionOffset

3000

PvalThreshold

0.05

The first 21 rows in the tables correspond to the 21 control metrics used in the methylation sample QC. See section Methylation Sample QC Summary Files for details.

DRAGEN Array Methylation QC and GenomeStudio Methylation Module Differences

Dataset

Min detection rate

Mean detection rate

Sample Count

86%

93%

220

61%

83%

951

63%

85%

77%

85%

Note that only samples passing QC are included and all samples are at or above 50ng DNA input. Detection p-value threshold 0.05.

Known Issues

DRAGEN Array Methylation QC cloud v1.0.0 is released on BSSH version 7.21. There are two known issues with the BSSH UI that impact Methylation QC analyses.

Sample sheets with greater than 144 samples lead to undefined failures. The issue impact methylation QC as well as other analysis types. To process larger sample numbers (up to 1152 for DRAGEN Array methylation QC), samples can be separated into batches of 144 when using sample sheet, or by selecting BeadChips directly from the BeadChip table.
Uploading custom methylation BPM manifests results in upload failure. The issue only impact the methylation QC and not other analysis types. For assistance uploading these manifests, contact Illumina Tech Support.

Troubleshooting and Additional Support

Troubleshooting iScan integration

The following table shows the applicable endpoints for the iScan.

Endpoint

Sharing a project

Navigate to the Projects tab
Click the button next to the desired project
Select the Share button above to list (Figure 3)
Select the Get Link Option to Activate a link for the project
Copy the link and send it to the desired recipient(s)

Note: The project owner maintains ownership and write access. If project owner deletes the data, the collaborators lose access to it.

DRAGEN Array Local Analysis

DRAGEN Array Local Overview

Getting Started

Computing Requirements

Before downloading and installing the software, ensure the following specifications are met for best performance:

Quota Specifications

The star-allele call command in DRAGEN Array Local requires quota to run. The quota is charged per sample analyzed and can be purchased on the . Quota is used for all samples analyzed including re-analysis or low-quality samples.

The credential provided in the activation email after purchasing should be used as an input to the star-allele call command through the "--license-server-url" option. During runtime, the will record the remaining quota at the beginning and the end of the analysis.

Installation

Please follow the steps below to install the software on your compute infrastructure:

Click on the DRAGEN Array v1.0 installation package for the platform of your choice. Installers for Windows and Linux are available on the . Once download is completed, move the DRAGEN Array v1.0 installation package to the desired folder. Administrative permissions may be required for system folders, for example /usr/local/bin for Linux, and C:\Program Files for Windows. Note: Throughout the remaining of the document, Linux will be assumed in the examples.\
Unzip and extract the package. The executable can be found in the dragena subfolder of the software download after extraction.
To check that the DRAGEN Array installation was successful, follow these steps:
- Open a command prompt (Windows) or terminal (Linux).
- [Optional] Add /path/to/dragena/, e.g. /usr/local/bin/dragena-linux-x64-DAv1.0.0/dragena/, to your PATH – to access the executable anywhere in the folder structure
- Execute the following command: /path/to/dragena/dragena version, or if the environmental variable PATH is set: dragena version

The version of the software will be displayed in the terminal window when the installation was successful.

Run DRAGEN Array Local

For CNV PGx analysis, a minimum of 24 samples is required to run analysis. For a successful analysis, 22 samples must pass QC defined as having log R dev < 0.2. With a standard hardware specification in section , up to 500 GDA-ePGx samples can be processed per analysis batch.

For genotyping analysis, there is no sample minimum required to run analysis.

To optimize performance of the targeted PGx CNV caller and minimize batch effect, it is recommended to:

Analyze samples that were processed together in one batch
Avoid combining sample batches processed on different reagent lots.
Analyze batches of 96 samples or more
Use the CN Model and PGx Database File provided as part of the standard product files

Quick Start

Use the following instructions to start the full PGx analysis, covering genotyping, PGx CNV and PGx star allele calling. Refer to for parameters for all commands.

Review section for information on input files to use, sample minimums per analysis type and other best practices.

Open a command prompt (Windows) or terminal window (Linux) and navigate to the directory where the software was installed. Or a different, desired directory if the executable was added to the PATH environmental variable.
Use the genotype call command to call genotypes and generate GTC files using IDAT files as input. dragena genotype call --bpm-manifest /user/productfiles/manifest.bpm --cluster-file /user/productfiles/clusterfile.egt –-idat-folder /user/IDATs –-output-folder /user/gtc
Use the genotype gtc-to-vcf command to create SNV VCF files from the GTC files generated by the genotype call command. dragena genotype gtc-to-vcf --bpm-manifest /user/productfiles/manifest.bpm --csv-manifest /user/productfiles/manifest.csv --genome-fasta-file /user/productfiles/genome.fa --gtc-folder /user/gtc --output-folder /user/vcf
Use the copy-number call command to call PGx CNVs from the GTC files and produce CNV VCF files. It is recommended to use the same output folder used for SNV VCF since the star-allele call command accepts one VCF folder with SNV and CNV VCFs. dragena copy-number call --cn-model /user/productfiles/cnv_model.dat --gtc-folder /user/gtc --output-folder /user/vcf
Use the star-allele call command to generate star allele calls using the CNV and SNV VCF files generated by the gtc-to-vcf and copy-number call commands. dragena star-allele call --vcf-folder /user/vcf --database /user/productfiles/GDA_ePGx_E2_DAv1.0.0.zip --output-folder /user/star-alleles --license-server-url https://username:password@license.edicogenome.com
Use the star-allele annotate command to summarize the star alleles and add metabolizer statuses to the star alleles generated by the star-allele call command. Guidelines (CPIC or DPWG) can be specified. dragena star-allele annotate –-star-alleles star_alleles.csv --guidelines CPIC --output-folder /user/metabolizer-statuses
[Optional] Use the copy-number train command to retrain the copy number model. dragena copy-number train --bpm-manifest /user/productfiles/manifest.bpm -–genome-fasta-file /user/productfiles/genome.fa –-gtc-folder /user/gtc --platform LCG –-output-folder /user/productfiles/cnmodelnew

Command Index

Use the following syntax when using the command-line interface:

dragena [command] [required parameters] [optional parameters]

copy-number

The root command for actions that act on copy number variants.

Command

Description

copy-number call

The command used to call copy number variants. A batch of 24 samples or more are required for analysis. For a successful analysis, 22 samples must pass QC defined as having log R dev < 0.2.

Option

Description

copy-number help

Displays help information for a copy-number command.

copy-number train

Trains copy number (CN) model for a set of samples. Generate a new CN model if using a customized cluster file (.egt) optimized for the specific data set.

Execute the train command using the data sets that were used to optimize the cluster file.
To use a CN model generated by the train command, the mask file for the manifest must be saved in the same directory as the manifest.
A minimum of 96 samples is required to use the copy-number train command. For optimal performance, at least 150 is recommended.
For best performance, validate the CN model using truth data before using in CN calling.

See for further details.

Option

Description

copy-number version

Displays version information for copy-number command.

genotype

The root command for genotype calling.

Command

Description

genotype call

Determines genotype calls (GTC) from IDAT files.

Option

Description

genotype gtc-to-bedgraph

Converts GTC to BedGraph files, producing BedGraph formatted visualization files from the log R ratio data contained in the GTC intermediate files.

Option

Description

genotype gtc-to-vcf

Converts GTC to . The command is only applicable for produced by DRAGEN Array.

Option

Description

genotype help

Displays the help information for a genotype command.

genotype version

Displays current DRAGEN Array Local version.

help

Displays the help information.

version

Displays current DRAGEN Array Local version.

star-allele

The root command PGx star allele calling.

Command

Description

star-allele help

Displays help information for a star-allele command.

star-allele version

Displays version information for star-allele.

star-allele call

Calls PGx star allele diplotypes. The SNV VCF files should be generated using the DRAGEN Array gtc-to-vcf command with unsquash-duplicates off (default) and without filter loci.

Option

Description

star-allele annotate

Option

Description

Troubleshooting and Additional Support

Tips for using the Command-line interface

When using command-line consider the following tips:

Spaces cannot be part of a file name in a command. If the file name has spaces, use quotes around the file name
To correct a typing error in a previously entered command, use the up arrow to repeat the previous command, then correct the error before re-entering it.
Double check the command. Misspelling, extra, or missing dashes, etc. will cause the command to be unrecognizable by the software.
- When entering paths or long names, copy and paste the values to help avoid errors.
- If using Windows, use a File Explorer window to navigate to the product file or folder that is needed by the DRAGEN Array Local command. While holding down the shift button on the keyboard, right click the file and select the 'Copy as Path' option. Then paste the copied path into the command prompt to use the file or folder.
To cancel a command while it is running, press Control + C on the keyboard.

Optimizing cluster files and copy number models

A (.egt) contains the cluster positions of every probe used for genotyping analysis. Illumina provides a standard cluster file for all commercial Infinium BeadChips. It may be desirable to create a custom cluster file if the one provided does not fit the data well or if a semi-custom or custom BeadChip, that do not come with a cluster file, are used. is the software used to create custom cluster files.

To facilitate the review and optimization of PGx variant GenTrain cluster positions, a GenomeStudio auxiliary file is provided for each PGx Array product through the and array product files page, e.g. . The auxiliary file is a tab-delimited text file that can be imported into GenomeStudio through Column Import. The file contains the Infinium Assay to PGx star allele mapping, covering the variants involved in DRAGEN Array PGx star allele calling.

When updating the cluster file for pharmacogenomic applications, understand the specifications for the copy number model file before beginning.

Before creating a custom cluster file, review the , the , and .

A (.dat) contains the data needed to make accurate copy number calls for pharmacogenomics. This file is used in the creation CNV VCFs which are inputs to the star allele calling command. Illumina provides a standard CN model file for all commercial PGx Infinium BeadChips. If it is determined the cluster file needs to be customized, the CN Model File should also be updated using the copy-number train command available with DRAGEN Array Local only. Review the for details of this command.

Pharmacogenomic analysis for semi-custom arrays

Semi-custom arrays add additional content or other pre-designed to enhance the commercial array content. This additional content can be analyzed for to obtain information on SNV and indel calls.

For , PGx CNV and star allele calls are limited to content included on the commercial Infinium PGx arrays. Additional semi-custom content will not be included in the pharmacogenomic results.

Pharmacogenomic analysis for semi-custom arrays should be run using . The genotype call, copy-number call, and star-allele call commands should all be run using the commercial Infinium PGx array product files.

Output Files

The following section describes the outputs produced by DRAGEN Array.

CNV VCF File

DRAGEN Array produces one CNV variant call file (VCF) (*.cnv.vcf) per sample to report the CN status on the gene and sub gene level, along with the CN events for PGx targets.

The CNV VCF files are by default bgzipped (Block GZIP) and have the “.gz” extension. The compression saves storage space and facilitates efficient lookup when indexed with the TBI Index File. To view these files as plain text, they can be uncompressed with from Samtools or other third-party tools. The CNV VCF must be bgzipped and indexed to be used in downstream DRAGEN Array commands, such as star allele calling.