# DRAGEN Array Local Analysis

## DRAGEN Array Local Overview <a href="#toc150786120" id="toc150786120"></a>

DRAGEN Array provides accurate, comprehensive, and efficient analysis of Infinium microarray data. The local command-line interface makes it easy for power users to have granular control and flexibility to support large scale microarray genomic studies.

## Getting Started <a href="#toc150786121" id="toc150786121"></a>

DRAGEN Array Local utilizes a command-line interface which allows full user control of software functionality and easy automation of tasks. The software is designed to be used by power users and bioinformaticians. If new to using command-line interface, please review the [Command-line interface Basics](#toc150786129).

### Computing Requirements <a href="#computing_requirements" id="computing_requirements"></a>

Before downloading and installing the software, ensure the following specifications are met for best performance:

| Category         | Recommendation                                                                                                                             |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| CPU              | 8 cores                                                                                                                                    |
| Memory           | 16 GB available or more                                                                                                                    |
| Hard Drive       | 30 GB or more of free disk space                                                                                                           |
| Operating System | <p>One of the following:</p><ul><li>Windows 10 or later – win10-x64</li><li>CentOS 7 or later, Ubuntu 20.04 or later – linux-x64</li></ul> |

### Quota Specifications <a href="#toc150786123" id="toc150786123"></a>

The star-allele call command in DRAGEN Array Local requires quota to run. The quota is charged per sample analyzed and can be purchased on the [Illumina Product Page](https://www.illumina.com/products/by-type/informatics-products/dragen-array-secondary-analysis.html). Quota is used for all samples analyzed including re-analysis or low-quality samples.

The credential provided in the activation email after purchasing should be used as an input to the star-allele call command through the "--license-server-url" option. During runtime, the [logs](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/output-files#toc150786153) will record the remaining quota at the beginning and the end of the analysis.

Internet is required to do a software license check and ensure paid quota is available for all samples in the analysis batch. For the software license check, the following endpoints are used:

* In v1.0 and v1.1: `license.edicogenome.com`
* In v1.2+: `license.dragen.illumina.com`

**NOTE:** Do not use `license.dragen.illumina.com` license server urls when running DRAGEN Array v1.0 and v1.1 as that domain only works with v1.2+ versions.\
This is described in the [1.0.0](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/reference/release-notes/dragen-array-v1.1.0-release-notes#known-issues) and [1.1.0](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/reference/release-notes/dragen-array-v1.1.0-release-notes#known-issues) known issues.

## Installation <a href="#toc150786124" id="toc150786124"></a>

Please follow the steps below to install the software on your compute infrastructure:

1. Click on the latest DRAGEN Array version installation package for the platform of your choice. Installers for Windows and Linux are available on the [Illumina Support Site](https://support.illumina.com/array/array_software/dragen-array-secondary-analysis/downloads.html).\
   \
   Once download is completed, move the DRAGEN Array installation package to the desired folder. Administrative permissions may be required for system folders, for example `/usr/local/bin for Linux`, and `C:\Program Files` for Windows.\
   \
   **Note**: Throughout the remainder of the document, Linux will be assumed in the examples.
2. Unzip and extract the package. The executable can be found in the dragena subfolder of the software download after extraction.
3. To check that the DRAGEN Array installation was successful, follow these steps:
   * Open a command prompt (Windows) or terminal (Linux).
   * \[Optional] Add `/path/to/dragena/`, e.g. `/usr/local/bin/dragena-linux-x64-DAv1.1.0/dragena/`, to your PATH – to access the executable anywhere in the folder structure
   * Execute the following command: `/path/to/dragena/dragena version`, or if the environmental variable PATH is set: dragena version

The version of the software will be displayed in the terminal window when the installation was successful.

## Run DRAGEN Array Local <a href="#toc150786125" id="toc150786125"></a>

For genotyping or cytogenetic analysis, there is no sample minimum required to run analysis.

For CNV PGx analysis, a minimum of 24 samples is required to run analysis. For a successful analysis, 22 samples must pass QC defined as having log R dev < 0.2. With a standard hardware specification in section [Computing Requirements](#computing_requirements), up to 500 GDA-ePGx samples can be processed per analysis batch.

To optimize performance of the targeted PGx CNV caller and minimize batch effect, it is recommended to:

* Group samples in the same assay batch (e.g. whole genome amplication and targeted gene application assay batch) into the same analysis batch.
* Avoid combining sample batches processed on different reagent lots.
* Analyze batches of 96 samples or more.
* Samples processed in a two-week period from multiple library preparation batches can be grouped together to meet size requirement of an analysis batch. In such cases, it is recommended to use the same lot of reagents and instruments used in the workflow.
* Use the CN Model and PGx Database File provided as part of the standard product files

## Quick Start <a href="#toc150786126" id="toc150786126"></a>

Review section [DRAGEN Array Applications](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/overview/our-features) for information on input files to use, sample minimums per analysis type and other best practices.

Command examples show analysis for a Linux system using folders instead of sample sheets. For Windows users, make sure to substitute the file paths in the commands following windows conventions, e.g., using backslash (\\) instead of forward-slash (/). A sample sheet can be used to select specific samples out of a folder.

**Note**: DRAGEN Array will overwrite older files if using the same `--output-folder` from a previous analysis. If this is not desired, use different `--output-folder` for re-analyses.

### PGx

Use the following instructions to start the full PGx analysis, covering genotyping, PGx CNV and PGx star allele calling. Refer to [Command Index](#command_index_1) for parameters for all commands.

1. Open a command prompt (Windows) or terminal window (Linux) and navigate to the directory where the software was installed. Or a different, desired directory if the executable was added to the PATH environmental variable.
2. Use the genotype call command to call genotypes and generate GTC files using IDAT files as input.\
   `dragena genotype call --bpm-manifest /user/productfiles/manifest.bpm --cluster-file /user/productfiles/clusterfile.egt --idat-folder /user/IDATs --output-folder /user/gtc`
3. Use the genotype gtc-to-vcf command to create SNV VCF files from the GTC files generated by the genotype call command.\
   `dragena genotype gtc-to-vcf --bpm-manifest /user/productfiles/manifest.bpm --csv-manifest /user/productfiles/manifest.csv --genome-fasta-file /user/productfiles/genome.fa --gtc-folder /user/gtc --output-folder /user/vcf`
4. Use the pgx copy-number call command to call PGx CNVs from the GTC files and produce CNV VCF files. It is recommended to use the same output folder used for SNV VCF since the star-allele call command accepts one VCF folder with SNV and PGx CNV VCFs.\
   `dragena pgx copy-number call --cn-model /user/productfiles/cnv_model.dat --gtc-folder /user/gtc --output-folder /user/vcf`**Note**: For PGx CNV calling, it is recommended that 96 or more samples passing LogRDev <= 0.2 are included in the analysis.
5. Use the pgx star-allele call command to generate star allele calls using the CNV and SNV VCF files generated by the gtc-to-vcf and copy-number call commands.\
   `dragena pgx star-allele call --vcf-folder /user/vcf --database /user/productfiles/GDA_ePGx_E2_DAv1.0.0.zip --output-folder /user/star-alleles --license-server-url https://username:password@license.dragen.illumina.com`**Note**: For PGx star allele calling, it is recommended to QC the samples and review the samples that have Log R Dev > 0.2, call rate < 0.99, or TGA Control probe < 1.0 to assess the reliability of the analysis. These metrics are provided in the genotyping sample summary file (gt\_sample\_summary.csv).
6. Use the pgx star-allele annotate command to summarize the star alleles and add metabolizer statuses to the star alleles generated by the star-allele call command. Guidelines (CPIC or DPWG) can be specified.\
   `dragena pgx star-allele annotate --star-alleles star_alleles.csv --guidelines CPIC --output-folder /user/metabolizer-statuses`
7. \[Optional] Use the pgx copy-number train command to retrain the copy number model.\
   `dragena pgx copy-number train --bpm-manifest /user/productfiles/manifest.bpm --genome-fasta-file /user/productfiles/genome.fa --gtc-folder /user/gtc --platform LCG --output-folder /user/productfiles/cnmodelnew`

### Cytogenetics

Use the following instructions to start the full cytogenetics analysis, covering genotyping, CNV and LOH calling, and annotation. Refer to [Command Index](#command_index_1) for parameters for all commands.

1. Open a command prompt (Windows) or terminal window (Linux) and navigate to the directory where the software was installed. Or a different, desired directory if the executable was added to the PATH environmental variable.
2. Use the genotype call command to call genotypes and generate GTC files using IDAT files as input.\
   `dragena genotype call --bpm-manifest /user/productfiles/manifest.bpm --cluster-file /user/productfiles/clusterfile.egt --idat-folder /user/IDATs --output-folder /user/gtc`
3. Use the cyto call command to determine copy number variants and loss of heterozygosity given genotypes.\
   `dragena cyto call --cn-model /user/productfiles/cyto_model.dat --gtc-folder /user/gtc --output-folder /user/vcf`
4. Use the cyto annotate command to generate JSON annotation files with gene annotations, cytogenetic bands, various QC fields, and the variant information from the VCFs.\
   `dragena cyto annotate --annotation-db /user/productfiles/CytoAnnotateData_DAv1.2.0.zip --vcf-folder user/vcf --output-folder /user/cyto-annotations`

## Command Index <a href="#command_index_1" id="command_index_1"></a>

Use the following syntax when using the command-line interface:

`dragena [module] [sub-module (not needed for cyto)] [command] [required parameters] [optional parameters]`

### **pgx**

The root command for pgx module

| Command     | Description                                    |
| ----------- | ---------------------------------------------- |
| copy-number | Call and train copy number variants.           |
| star-allele | Star Allele Caller for Illumina Microarrays    |
| help        | Display more information on a specific command |
| version     | Display version information.                   |

### **pgx copy-number**

The root command for actions that act on pgx copy number variants.

| Command                 | Description                                                           |
| ----------------------- | --------------------------------------------------------------------- |
| pgx copy-number call    | Determines copy number variants given genotypes (GTC to CNV VCF).     |
| pgx copy-number help    | Displays help information for a copy-number command.                  |
| pgx copy-number train   | Trains copy number model for a set of samples (GTC to CN Model File). |
| pgx copy-number version | Displays version information for copy-number.                         |

### **pgx copy-number call**

The command used to call copy number variants. A batch of 24 samples or more are required for analysis. For a successful analysis, 22 samples must pass QC defined as having log R dev < 0.2.

| Option             | Description                                                                                                                                                                                                               |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| --cn-model         | \[Required] Specifies the path to the copy number model parameters file (.dat).                                                                                                                                           |
| --gtc-folder       | <p>\[Required] Specifies the path to the directory where all genotype files (.gtc) are located. The command cannot be used with --gtc-sample-sheet.</p><p>This path also includes the contents of all subdirectories.</p> |
| --gtc-sample-sheet | \[Required] Specifies the path to a sample sheet containing paths to genotype files (.gtc). The sample sheet can be in CSV or JSON format. The command cannot be used with --gtc-folder.                                  |
| --debug            | Includes stack traces in logs. Default is false.                                                                                                                                                                          |
| --help             | Displays help information for the copy-number call command.                                                                                                                                                               |
| --json-log         | Outputs logs in JSON format. Default is false.                                                                                                                                                                            |
| --no-bgzip         | VCFs are not bgzip compressed (.gz) and no tabix index files (.tbi) are output. Default is false.                                                                                                                         |
| --output-folder    | \[Optional] Specifies the path to the folder where the output files are saved. The output directory structure matches the directory structure of the GTC folder, if the GTC folder is provided.                           |
| --version          | Displays version information.                                                                                                                                                                                             |

### **pgx copy-number help**

Displays help information for a copy-number command.

### **pgx copy-number train**

Trains pgx copy number (CN) model for a set of samples. Generate a new pgx CN model if using a customized cluster file (.egt) optimized for the specific data set.

* Execute the train command using the data sets that were used to optimize the cluster file.
* To use a pgx CN model generated by the train command, the mask file for the manifest must be saved in the same directory as the manifest.
* A minimum of 96 samples is required to use the copy-number train command. For optimal performance, at least 150 is recommended.
* For best performance, validate the pgx CN model using truth data before using in pgx CN calling.

See [Optimizing cluster files and copy number models](#optimizing_cluster_files) for further details.

| Option                 | Description                                                                                                                                                                                                   |
| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| --bpm-manifest         | \[Required] Specifies the path to the bead pool manifest in BPM format. Assumes mask file (.msk) is in the same directory.                                                                                    |
| --genome-fasta-file    | \[Required] Specifies the path to the genome FASTA file (.fa). Assumes FASTA index file (.fai) is in the same directory.                                                                                      |
| --gtc-folder           | <p>\[Required] Specifies the path to the directory where all genotype files (.gtc) are located. Cannot be used with --gtc-sample-sheet.</p><p>This path also includes the contents of all subdirectories.</p> |
| --gtc-sample-sheet     | \[Required] Specifies the path to a sample sheet containing paths to genotype files (.gtc). Can be in CSV or JSON format. Cannot be used with --gtc-folder.                                                   |
| --platform             | \[Required] Specifies which microarray platform generated the data. Set this to 'LCG' for GDA-ePGx, 'EX' for GSAv4-ePGx or GCRA-ePGx.                                                                         |
| --debug                | Includes stack traces in logs. Default is false.                                                                                                                                                              |
| --disable-genome-cache | Disables the reference genome cache.                                                                                                                                                                          |
| --help                 | Displays help information for the copy-number train command.                                                                                                                                                  |
| --json-log             | Outputs logs in JSON format. Default is false.                                                                                                                                                                |
| --version              | Displays version information.                                                                                                                                                                                 |
| --output-folder        | \[Optional] The location to output the CN model. By default, the output folder is the current working directory.                                                                                              |

### **pgx copy-number version**

Displays version information for pgx copy-number command.

### **genotype**

The root command for genotype calling.

| Command                  | Description                                                                                                                                    |
| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| genotype call            | Determines genotype calls (GTC) from IDAT files.                                                                                               |
| genotype gtc-to-bedgraph | Converts GTC to BedGraphs, producing BedGraph formatted visualization files from the log R ratio data contained in the GTC intermediate files. |
| genotype gtc-to-vcf      | Converts GTC to VCF.                                                                                                                           |
| genotype help            | Displays the help information for the genotype command.                                                                                        |
| genotype version         | Displays version information for the genotype command.                                                                                         |

### **genotype call**

Determines genotype calls (GTC) from IDAT files.

| Option              | Description                                                                                                                                                                                                                                                             |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| --bpm-manifest      | \[Required] Specifies the path to the bead pool manifest in BPM format.                                                                                                                                                                                                 |
| --cluster-file      | \[Required] Specifies the path to the EGT cluster file to use.                                                                                                                                                                                                          |
| --idat-folder       | <p>\[Required] Specifies the path to the directory where all intensity data IDATs (for the samples to be processed) are located. Must be in IDAT format. Cannot be used with --idat-sample-sheet.</p><p>This path also includes the contents of all subdirectories.</p> |
| --idat-sample-sheet | \[Required] Specifies the path to a sample sheet containing paths to intensity data IDATs. Can be in CSV or JSON format. Cannot be used with --idat-folder.                                                                                                             |
| --debug             | Includes stack traces in logs. Default is false.                                                                                                                                                                                                                        |
| --gencall-cutoff    | GenCall score cutoff to label a NoCall. Default is 0.15.                                                                                                                                                                                                                |
| --help              | Displays help information for the genotype call command.                                                                                                                                                                                                                |
| --json-log          | Outputs logs in JSON format. Default is false.                                                                                                                                                                                                                          |
| --num-threads       | Number of parallel threads to run.                                                                                                                                                                                                                                      |
| --output-folder     | \[Optional] Specifies the path to the folder where the output files are saved. The output directory structure matches the directory structure of the IDAT folder, if the IDAT folder is provided.                                                                       |
| --version           | Displays version information.                                                                                                                                                                                                                                           |

### **genotype gtc-to-bedgraph**

Converts GTC to BedGraph files, producing BedGraph formatted visualization files from the Log R Ratio and B-allele frequency data contained in the GTC intermediate files.

| Option             | Description                                                                                                                                                                                                   |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| --bpm-manifest     | \[Required] Specifies the path to the bead pool manifest in BPM format.                                                                                                                                       |
| --gtc-folder       | <p>\[Required] Specifies the path to the directory where all genotype (.gtc) files are located. Cannot be used with --gtc-sample-sheet.</p><p>This path also includes the contents of all subdirectories.</p> |
| --gtc-sample-sheet | \[Required] Specifies the path to a sample sheet containing paths to genotype files (.gtc). Can be in CSV or JSON format. Cannot be used with --gtc-folder.                                                   |
| --debug            | Include stack traces in logs. Default is false.                                                                                                                                                               |
| --help             | Displays help information for the genotype gtc-to-bedgraph command.                                                                                                                                           |
| --json-log         | Outputs logs in JSON format. Default is false.                                                                                                                                                                |
| --output-folder    | \[Optional] Specifies the path to the folder where the output files are saved. The output directory structure matches the directory structure of the GTC folder, if the GTC folder is provided.               |
| --smoothing        | \[Optional] Smoothing window size, specifying the number of probes on each side of the center probe used for smoothing LRR. Default is 0.                                                                     |
| --version          | Displays version information.                                                                                                                                                                                 |

**NOTE:** The `--smoothing` option is not functioning as intended due to a bug in v1.2.0. See the [Release Notes](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/reference/release-notes/dragen-array-v1.2.0-release-notes#known-issues) for more details.

### **genotype gtc-to-vcf**

Converts GTC (v5) to [SNV VCF Files](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/output-files#snv_vcf_file). The command is only applicable for [Genotype Call Files](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/output-files#genotype_call_file) produced by DRAGEN Array.

| Option                 | Description                                                                                                                                                                                                   |
| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| --bpm-manifest         | \[Required] Specifies the path to the bead pool manifest in BPM format.                                                                                                                                       |
| --csv-manifest         | \[Required] Specifies the path to the CSV manifest with SourceSeq column.                                                                                                                                     |
| --genome-fasta-file    | \[Required] Specifies the path to the genome FASTA file (.fa). Assumes FASTA index file (.fai) is in the same directory.                                                                                      |
| --gtc-folder           | <p>\[Required] Specifies the path to the directory where all genotype files (.gtc) are located. Cannot be used with --gtc-sample-sheet.</p><p>This path also includes the contents of all subdirectories.</p> |
| --gtc-sample-sheet     | \[Required] Specifies the path to a sample sheet containing paths to genotype files (.gtc). Can be in CSV or JSON format. Cannot be used with --gtc-folder.                                                   |
| --auxiliary-loci       | Specifies the path to the VCF file with auxiliary definitions of loci, such as for multi-nucleotide variants.                                                                                                 |
| --debug                | Include stack traces in logs. Default is false.                                                                                                                                                               |
| --disable-genome-cache | Disables the reference genome cache.                                                                                                                                                                          |
| --filter-loci          | Generates a text file containing a list of probe names to be filtered.                                                                                                                                        |
| --unsquash-duplicates  | Generates unique VCF records for duplicate assays. Default is false.                                                                                                                                          |
| --help                 | Displays help information for the genotype gtc-to-vcf command.                                                                                                                                                |
| --json-log             | Outputs logs in JSON format. Default is false.                                                                                                                                                                |
| --no-bgzip             | VCFs are not bgzip compressed (.gz) and no tabix index files (.tbi) are output. Default is false.                                                                                                             |
| --output-folder        | \[Optional] Specifies the path to the folder where the output files are saved. The output directory structure matches the directory structure of the GTC folder, if GTC folder is provided.                   |
| --version              | Displays version information.                                                                                                                                                                                 |

#### Squashing duplicates

In the manifest, there can be cases where the same variant is probed by multiple different assays. These assays may be the same design or alternate designs for the same locus. In the default mode of operation, these duplicates will be "squashed" into a single record in the VCF to reflect a true variant rather than probe genotype. The method used to incorporate information across multiple assays is defined further in the [VCF description](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/output-files#snv_vcf_file). When the `--unsquash-duplicates` option is provided, this "squashing" behavior is disabled, and each duplicate assay will be reported in a separate entry in the VCF file. This option is helpful when you are interested in investigating or validating the performance of individual assays, rather than trying to generate genotypes for specific variants. Note that if a locus has more than two alleles and is also queried with duplicated designs, the duplicates will not be unsquashed (i.e., in the case of multi-allelic variants).**DO NOT** use `--unsquash-duplicates` option if doing star allele calling downstream as that command expects squashed variants.

#### Genome cache

By default, the entire reference genome will be read into memory. Generally, this will be more efficient than reading data from the indexed reference on disk at the expense of greater memory utilization. For situations in which the genome caching is not desirable (low memory availability or a small input manifest), it is possible to disable this default behavior with the `--disable-genome-cache` option.

#### Auxiliary loci

Certain classes of variant types (such as multi-nucleotide variants) are not currently supported in the upstream analysis software that produces GTC files. However, it is possible to query this type of variant by creating a SNP design that differentiates the specific multi-nucleotide alleles of interest. For example, if the true source sequence is

ATGC\[AT/CG]GTAA

This assay could be designed as a SNP assay with the following source sequence

ATGC\[A/C]NNNN

`gtc-to-vcf` provides an option (`--auxiliary-loci`) to supply a list of auxiliary records (in VCF format) to restore the true alleles for these cases in the output VCF. There are several restrictions around this function

* The auxiliary definition must NOT be a multi-allelic variant.
* The auxiliary definition must be a multi-nucleotide variant.
* There must NOT be multiple array assays (e.g., duplicates) for the locus.

**Note:** The genome fasta files for human genomes are provided by Illumina on the [support site](https://support.illumina.com/array/array_software/dragen-array-secondary-analysis/downloads.html).

### **genotype help**

Displays the help information for a genotype command.

### **genotype version**

Displays current DRAGEN Array Local version.

### **help**

Displays the first-layer help information.

### **version**

Displays current DRAGEN Array Local version.

### **pgx star-allele**

The root command PGx star allele calling.

| Command                  | Description                                          |
| ------------------------ | ---------------------------------------------------- |
| pgx star-allele call     | Determines PGx star allele and variant genotypes.    |
| pgx star-allele annotate | Annotate PGx gene functions and product JSON report. |
| pgx star-allele help     | Displays help information for a star allele command. |
| pgx star-allele version  | Displays version information for star allele.        |

### **pgx star-allele help**

Displays help information for a star-allele command.

### **pgx star-allele version**

Displays version information for star-allele.

### **pgx star-allele call**

Calls PGx star allele diplotypes. The SNV VCF files should be generated using the DRAGEN Array gtc-to-vcf command with unsquash-duplicates off (default) and without filter loci.

| Option                | Description                                                                                                                             |
| --------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| --database            | \[Required] The PGx database file (.zip).                                                                                               |
| --license-server-url  | \[Required] The license server url with credentials.                                                                                    |
| --vcf-folder          | \[Required] The directory containing \*.snv.vcf.gz and \*.cnv.vcf.gz files.                                                             |
| --query-license-quota | During beginning and end of analysis, the license server will be queried for the quotas on the valid license(s) and display the result. |
| --debug               | Includes stack traces in logs. Default is false.                                                                                        |
| --help                | Displays help information for the star-allele call command.                                                                             |
| --json-log            | Outputs logs in JSON format. Default is false.                                                                                          |
| --output-folder       | \[Optional] Directory path to output files. Default is the current working directory.                                                   |
| --version             | Displays version information.                                                                                                           |

### **pgx star-allele annotate**

Annotates and summarizes the star-alleles, specifically for metabolizer statuses and outputs in a consolidated JSON report. Metabolizer status is determined through direct lookup into public PGx guidelines CPIC or DPWG as specified by the user.

| Option          | Description                                                                                  |
| --------------- | -------------------------------------------------------------------------------------------- |
| --star-alleles  | \[Required] Path to star alleles file (.csv) generated by the call subcommand.               |
| --guidelines    | PGx guidelines to use for annotation. Valid values are ‘CPIC’ and ‘DPWG’. Default is ‘CPIC’. |
| --debug         | Includes stack traces in logs. Default is false.                                             |
| --help          | Displays help information for the star-allele annotate command.                              |
| --json-log      | Outputs logs in JSON format. Default is false.                                               |
| --output-folder | \[Optional] Directory path to output files. Default is the current working directory.        |
| --version       | Displays version information.                                                                |

### **cyto**

The root command for cytogenetics CNV/LOH calling and annotation.

| Command       | Description                                                                 |
| ------------- | --------------------------------------------------------------------------- |
| cyto call     | Determines copy number variants and loss of heterozygosity given genotypes. |
| cyto annotate | Annotates samples and generates cytogenetics json reports.                  |
| cyto help     | Display more information on a specific command.                             |
| cyto version  | Displays version information.                                               |

### **cyto help**

Display more information on a specific command.

### **cyto version**

Displays version information.

### **cyto call**

Determines copy number variants (CNV) and loss/absence of heterozygosity (LOH/AOH) given genotypes.

| Option             | Description                                                                                                                                  |
| ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------- |
| --cn-model         | \[Required] Path to cyto model parameters file (.dat).                                                                                       |
| --gtc-folder       | \[Required] Folder containing genotype files (.gtc). Cannot be used in conjunction with --gtc-sample-sheet.                                  |
| --gtc-sample-sheet | \[Required] Sample sheet with paths to genotype files (.gtc), can be in CSV or JSON format. Cannot be used in conjunction with --gtc-folder. |
| --debug            | Logs will include stack traces. Default is false.                                                                                            |
| --help             | Display this help screen.                                                                                                                    |
| --json-log         | Logs will be output in JSON format. Default is false.                                                                                        |
| --no-bgzip         | VCFs are not bgzip compressed (.gz) and no tabix index files (.tbi) are output. Default is false.                                            |
| --output-folder    | \[Optional] Directory path to output files. Default is the current working directory.                                                        |
| --version          | Displays version information.                                                                                                                |
| --min-cnv-probes   | CNV size limit (probes). Default is 10.                                                                                                      |
| --min-cnv-size     | CNV size limit (kb). Default is 0.                                                                                                           |
| --min-loh-probes   | LOH size limit (probes). Default is 500.                                                                                                     |
| --min-loh-size     | LOH size limit (kb). Default is 3000.                                                                                                        |
| --smoothing        | Smoothing window size, specifying the number of probes on each side of the center probe used for smoothing LRR values. Default is 5.         |

#### Notes

* Greater than 10 events (DEL/DUP/AOH) per chromosome is an indication of need for visual inspection.
* LogRDev > 0.2 is indicative of a low-quality sample.
* VCFs generated on Windows machines will not work when manually uploading to [Emedgene](https://help.connected.illumina.com/emedgene). Only Linux-based VCFs will work.

### **cyto annotate**

Annotates samples and generates cytogenetic json reports.

| Option           | Description                                                                           |
| ---------------- | ------------------------------------------------------------------------------------- |
| --debug          | Logs will include stack traces. Default is false.                                     |
| --help           | Display this help screen.                                                             |
| --json-log       | Logs will be output in JSON format. Default is false.                                 |
| --annotation-db  | \[Required] Database for variant annotations.                                         |
| --vcf-folder     | \[Required] The directory containing the \*.cnv.vcf.gz files.                         |
| --output-folder  | \[Optional] Directory path to output files. Default is the current working directory. |
| --version        | Displays version information.                                                         |
| --min-del-probes | Deletion CNV size limit (probes). Default is 10.                                      |
| --min-del-size   | Deletion CNV size limit (kb). Default is 0.                                           |
| --min-dup-probes | Duplication CNV size limit (probes). Default is 10.                                   |
| --min-dup-size   | Duplication CNV size limit (kb). Default is 0.                                        |
| --min-loh-probes | LOH size limit (probes). Default is 500.                                              |
| --min-loh-size   | LOH size limit (kb). Default is 3000.                                                 |
| --min-qual       | Min CNV qual and LOH qual scores. Default is 20.                                      |

#### Notes

* The metadata "cyto.cnv.dat" file that is generated during cyto call in the vcf-folder needs to be kept in the vcf-folder for cyto annotate.
* The vcfs files need to be zipped and indexed for cyto annotate, which means "--no-bgzip" flag cannot be turned on for the cyto vcf file generation if those vcf files are going to be used for cyto annotate command.
* The "cyto annotate" step needs at least 5GB free space on the hard drive.
* You can safely ignore the logs that say `No credential is provided`. This is a known issue described in the [1.2.0 release notes](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/reference/release-notes/dragen-array-v1.2.0-release-notes#known-issues)

## Troubleshooting and Additional Support <a href="#toc150786128" id="toc150786128"></a>

### Tips for using the Command-line interface <a href="#toc150786129" id="toc150786129"></a>

DRAGEN Array Local utilizes a command-line interface which allows full user control of software functionality and easy automation of tasks. The software is designed to be used by power users and bioinformaticians.

When using command-line consider the following tips:

* Spaces cannot be part of a file name in a command. If the file name has spaces, use quotes around the file name
* To correct a typing error in a previously entered command, use the up arrow to repeat the previous command, then correct the error before re-entering it.
* Double check the command. Misspelling, extra, or missing dashes, etc. will cause the command to be unrecognizable by the software.
  * When entering paths or long names, copy and paste the values to help avoid errors.
  * If using Windows, use a File Explorer window to navigate to the product file or folder that is needed by the DRAGEN Array Local command. While holding down the shift button on the keyboard, right click the file and select the 'Copy as Path' option. Then paste the copied path into the command prompt to use the file or folder.
* To cancel a command while it is running, press Control + C on the keyboard.

### Optimizing cluster files and copy number models <a href="#optimizing_cluster_files" id="optimizing_cluster_files"></a>

A [Cluster File](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/input-files#toc150786136) (.egt) contains the cluster positions of every probe used for genotyping analysis. Illumina provides a standard cluster file for all commercial Infinium BeadChips. It may be desirable to create a custom cluster file if the one provided does not fit the data well or if a semi-custom or custom BeadChip, that do not come with a cluster file, are used. [GenomeStudio 2.0](https://www.illumina.com/techniques/microarrays/array-data-analysis-experimental-design/genomestudio.html) is the software used to create custom cluster files.

To facilitate the review and optimization of PGx variant GenTrain cluster positions, a GenomeStudio auxiliary file is provided for each PGx Array product through the [DRAGEN Array Support Site](https://support.illumina.com/array/array_software/dragen-array-secondary-analysis.html) and array product files page, e.g. [Infinium Global Diversity Array with Enhanced PGx Product Files](https://support.illumina.com/array/array_kits/infinium-global-diversity-pgx/product-files.html). The auxiliary file is a tab-delimited text file that can be imported into GenomeStudio through Column Import. The file contains the Infinium Assay to PGx star allele mapping, covering the variants involved in DRAGEN Array PGx star allele calling.

When updating the cluster file for pharmacogenomic applications, understand the specifications for the copy number model file before beginning.

Before creating a custom cluster file, review the [Infinium Genotyping Data Analysis Technical Note](https://www.illumina.com/Documents/products/technotes/technote_infinium_genotyping_data_analysis.pdf), the [Infinium Arrays Support Webinar Video](https://youtu.be/4JTrbMUbVN0?si=ZgRDLwN6umGBhv2G), and [Custom cluster file creation for improved copy number analysis](https://www.illumina.com/content/dam/illumina/gcs/assembled-assets/marketing-literature/custom-cluster-file-tech-note-m-gl-02142/custom-cluster-file-tech-note-m-gl-02142.pdf).

A [PGx Copy Number (CN) Model File](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/input-files#cn_model_file) (.dat) contains the data needed to make accurate copy number calls for pharmacogenomics. This file is used in the creation CNV VCFs which are inputs to the star allele calling command. Illumina provides a standard CN model file for all commercial PGx Infinium BeadChips. If it is determined the cluster file needs to be customized, the CN Model File should also be updated using the copy-number train command available with DRAGEN Array Local only. i.e.,

1. Use GenomeStudio 2.0 to generate a new cluster file.
2. Use the genotype call command to call genotypes and generate GTC files using IDAT files as input.\
   `dragena genotype call --bpm-manifest /user/productfiles/manifest.bpm --cluster-file /user/productfiles/new_clusterfile.egt --idat-folder /user/IDATs --output-folder /user/new_gtcs`
3. Use the copy-number train command to retrain the copy number model. **Note: The --platform option can be found in the `Assay Format` heading value from the CSV manifest.**\
   `dragena copy-number train --bpm-manifest /user/productfiles/manifest.bpm --genome-fasta-file /user/productfiles/genome.fa --gtc-folder /user/new_gtcs --platform LCG --output-folder /user/productfiles/new_cnmodel`
4. Use the `new_cnmodel` for subsequent `copy-number call` commands.

Note the difference in the cluster file requirement based upon the version of DRAGEN Array used:

* **Version 1.1**: If using a CN model with a different cluster file, the software will provide a warning but will proceed with copy number calling. As a result, a user can choose to keep using the commercial CN model from Illumina in combination with custom updated EGT file in the PGx analysis.
* **Version 1.0**: The same cluster file used for copy number training must be used to generate GTC files for copy number calling. Otherwise, the software will produce an error and exit.

For reference, see the [Command Index](#command_index_1) for details of `copy-number train` command.

To retrain the CN model file, 96 samples must be used at minimum with 90 of those samples passing QC defined as Log R Dev less than or equal to 0.2. It is recommended to train with at least 150 samples. A greater number of samples can be advantageous, but diminishing returns and longer computation times are seen after 3,000 samples.

It is recommended to manually QC the training samples and remove samples that have Log R Dev > 0.2, call rate < 0.99, or TGA Control probe < 1.0 so only the highest quality samples are used in the training. The same samples used to create the new cluster file should be used to retrain the CN Model. To minimize batch effect in the training sample set, the samples should be analyzed in as few batches as possible and come from the same reagent lots.

The copy-number train algorithm is designed with the assumption that the copy number distribution resembles the standard population distributions. This ensures the updated CN model file is representative of the normal populations in which it will be used to calculate copy number for key pharmacogenomic targets.

### Pharmacogenomic analysis for semi-custom arrays <a href="#toc150786131" id="toc150786131"></a>

Semi-custom arrays add additional content or other pre-designed [Infinium booster content](https://www.illumina.com/science/consortia/human-consortia.html) to enhance the commercial array content. This additional content can be analyzed for [genotyping applications](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/overview/our-features#toc150786108) to obtain information on SNV and indel calls.

For [pharmacogenomic applications](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/overview/our-features#toc150786109), PGx CNV and star allele calls are limited to content included on the commercial Infinium PGx arrays. Additional semi-custom content will not be included in the pharmacogenomic results.

When designing a semi-custom array using a commercial Infinium PGx array backbone, such as the Global Diversity Array with enhanced PGx, it is important to retain all backbone content in the design as removing content could decrease the quality of result.

Pharmacogenomic analysis for semi-custom arrays should be run using [DRAGEN Array Local](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/product-guides/dragen-array-local-analysis). Because the PGx CNV calling and PGx star allele calling algorithms are only compatible with commercial product files (see [Applications](https://help.connected.illumina.com/dragen-array/dragen-array-v1.2/overview/our-features)), to fully analyze semi-custom PGx beadchips some steps of the pipeline can be run twice; once with the semi-custom product files (to get complete semi-custom SNV VCF files), and once with the commercial product files (to get the PGx CNV VCF files, PGx Star Allele output, and metabolizer report).

The semi-custom product files can be used via the Command-line interface in `genotype call`, `genotype gtc-to-vcf`, and used in GenomeStudio, i.e.,

1. Use GenomeStudio 2.0 to prepare a custom cluster file for the semi-custom array, following guidance outlined in [Custom cluster\
   file creation for improved copy number analysis](https://www.illumina.com/content/dam/illumina/gcs/assembled-assets/marketing-literature/custom-cluster-file-tech-note-m-gl-02142/custom-cluster-file-tech-note-m-gl-02142.pdf).
2. Open a command prompt (Windows) or terminal window (Linux) and navigate to the directory where the software was installed. Or a different, desired directory if the executable was added to the PATH environmental variable.
3. Use the genotype call command to call all semi-custom genotypes and generate custom content GTC files using IDAT files as input.\
   `dragena genotype call --bpm-manifest /user/productfiles/semi_custom_manifest.bpm --cluster-file /user/productfiles/semi_custom_clusterfile.egt --idat-folder /user/IDATs --output-folder /user/semi_custom_gtcs`
4. Use the genotype gtc-to-vcf command to create custom content SNV VCF files from the custom content GTC files generated by the genotype call command.\
   `dragena genotype gtc-to-vcf --bpm-manifest /user/productfiles/semi_custom_manifest.bpm --csv-manifest /user/productfiles/semi_custom_manifest.csv --genome-fasta-file /user/productfiles/genome.fa --gtc-folder /user/semi_custom_gtcs --output-folder /user/semi_custom_vcfs`
5. Perform [Quick Start](#_toc150786126) steps 1-6 using the **commercial** Infinium PGx array product files to obtain PGx CNV VCFs, star allele calls, and metabolizer status annotations.

Keep the GTC files and SNV VCF files generated using the semi-custom product files in clearly labelled folders to distinguish them from the GTC and SNV VCF files generated using the commercial product files. Note that the GTC and SNV VCFs generated using the commercial product files will not contain genotypes for the semi-custom/add-on content. The GTC and SNV VCFs generated using the semi-custom product files cannot be used for downstream PGx analysis commands.
