# DNA Germline Panel UMI

The DRAGEN recipe includes the recommended pipeline specific commands. A DRAGEN recipe is a predefined set of analysis parameters and workflow settings tailored for a specific type of genomic analysis. Some default parameters are included for clarity and are marked with comments.

```
  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN graph hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH        #e.g. SDD /staging 
--output-file-prefix $PREFIX 
# Inputs 
--fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
--fastq-list-sample-id $STRING 
# Mapper 
--enable-map-align true                 #optional with BAM/CRAM input 
--enable-map-align-output true          #optionally save the output BAM 
--enable-sort true                      #default=true 
# UMI 
--umi-enable true 
--umi-source STRING                     #Default='qname' 
--umi-library-type STRING               #e.g. random-duplex 
--umi-metrics-interval-file $BED 
--remove-duplicates false 
--umi-min-supporting-reads 1            #Default=2 
# Small variant caller 
--enable-variant-caller true 
--vc-target-bed $VC_TARGET_BED 
# Annotation 
--variant-annotation-data PATH 
--variant-annotation-assembly GRCh37/8 
--enable-variant-annotation true 
# SV 
--enable-sv true 
--sv-exome true 
--sv-call-regions-bed $SV_TARGET_BED 
# CNV 
--enable-cnv true 
--cnv-target-bed $PATH 
--cnv-combined-counts $PATH             #CNV PON 
# HLA genotyper 
--enable-hla true 
--hla-enable-class-2 true               #optional if assay covers class II HLA regions 
--hla-as-filter-min-threshold 29.0      #panel specific setting 
--hla-as-filter-ratio-threshold 0.85    #panel specific setting 
```

## Notes and additional options

### Hashtable

For DRAGEN germline runs, it is recommended to use the graph hashtable.

See: [Product Files](https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html)

### Input options

DRAGEN input sources include: fastq list, fastq, bam, or cram.

FQ list Input

```
--fastq-list $PATH 
--fastq-list-sample-id $STRING 
```

FQ Input

```
--fastq-file1 $PATH 
--fastq-file2 $PATH 
--RGSM $STRING 
--RGID $STRING 
```

BAM Input

```
--bam-input $PATH 
```

CRAM Input

```
--cram-input $PATH 
```

### Mapping and Aligning

| Option                           | Description                                                                                          |
| -------------------------------- | ---------------------------------------------------------------------------------------------------- |
| `--enable-map-align true`        | Optionally disable map & align (default=true).                                                       |
| `--enable-map-align-output true` | Optionally save the output BAM (default=false).                                                      |
| `--Aligner.clip-pe-overhang 2`   | Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run. |

### UMI

| Option                             | Description                                                                                                                                                                                                                                                                                                                      |
| ---------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--umi-source STRING`              | Specify the input type for the UMI sequence. Options: `qname`, `fastq`, `bamtag`.                                                                                                                                                                                                                                                |
| `--umi-library-type STRING`        | Set the batch option for different UMIs correction. Options: `random-duplex`, `random-simplex`, `nonrandom-duplex`.                                                                                                                                                                                                              |
| `--umi-nonrandom-whitelist $PATH`  | If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.                                                                                                                                                                                                   |
| `--umi-correction-table $PATH`     | If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: \<INSTALL\_PATH>/resources/umi/umi\_correction\_table.txt.gz.                                                                                                                     |
| `--umi-min-supporting-reads INT`   | Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).  |
| `--umi-metrics-interval-file $BED` | Target region in BED format.                                                                                                                                                                                                                                                                                                     |
| `--umi-emit-multiplicity both`     | Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see [Merge Duplex UMIs](https://help.connected.illumina.com/dragen/dragen-v4.3/product-guide/dragen-dna-pipeline/unique-molecular-identifiers#merge-duplex-umis). |
| `--umi-start-mask-length INT`      | Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.                                                                                                                                                                                                                     |
| `--umi-end-mask-length INT`        | Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.                                                                                                                                                                                                                       |

For more information see: [UMI Options](https://help.connected.illumina.com/dragen/dragen-v4.3/product-guide/dragen-dna-pipeline/unique-molecular-identifiers#umi-options).

### SNV

DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.

Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.

| Option                                      | Description                                                                                                            |
| ------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `--vc-target-bed`                           | Limit variant calling to region of interest.                                                                           |
| `--vc-combine-phased-variants-distance INT` | Maximum distance over which phased variants will be combined. Set to 0 to disable. Valid range is \[0; 15] (Default=2) |
| `--vc-emit-ref-confidence GVCF`             | To enable gVCF output.                                                                                                 |
| `--vc-enable-vcf-output`                    | To enable VCF file output during a gVCF run, set to true. The default value is false.                                  |

For more detail on the small variant caller in somatic mode please refer to [Somatic Mode](https://help.connected.illumina.com/dragen/dragen-v4.3/product-guide/dragen-v4.3/dragen-dna-pipeline/small-variant-calling/somatic-mode)

### Annotation

For instructions on how to download the Nirvana annotation database, please refer to [Nirvana](https://help.connected.illumina.com/dragen/dragen-v4.3/product-guide/dragen-v4.3/nirvana)

### HLA

| Option                            | Description                                                                                                                     |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `--enable-hla`                    | Enable HLA typer (this setting by default will only genotype class 1 genes)                                                     |
| `--hla-as-filter-min-threshold`   | Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.        |
| `--hla-as-filter-ratio-threshold` | Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels. |
| `--hla-enable-class-2`            | Extend genotyping to HLA class 2 genes (default=true).                                                                          |

### CNV

| Option                                | Description                                                                                                                                                                                            |
| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `--cnv-enable-gcbias-correction true` | Enable or disable GC bias correction when generating target counts.                                                                                                                                    |
| `--cnv-segmentation-mode $SEG_MODE`   | Option to override the default segmentation algorithm. Defaults include `slm` for germline WGS, `aslm` for somatic WGS, and `hslm` for targeted analysis.                                              |
| `--cnv-segmentation-bed $PATH`        | If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed. |

For more information, see [CNV Calling](https://help.connected.illumina.com/dragen/dragen-v4.3/product-guide/dragen-v4.3/dragen-dna-pipeline/cnv-calling).

### CNV Panel of Normals (PON)

The panel of normals mode uses a set of matched normal samples to determine the baseline level from which to call CNV events. These matched normal samples should be derived from the same library prep and sequencing workflow that was used for the case sample. CNV requires PON files for all targeted analyses (including panels, exomes, germline, tumor-only and tumor-normal workflows). It is recommended to use 30-100 normal samples when building the PON, but fewer may be used. If sample coverage noise is relatively stable, as few as 5 PON samples may yield acceptable results.

Follow the two steps below to generate CNV PON:

**Step 1. Generate target counts of individual normal samples.**

Any options used for panel of normals generation (BED file, GC Bias Correction, etc) should be matched when processing the case sample.

```
  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN graph hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH        #e.g. SDD /staging 
--output-file-prefix $PREFIX 
--fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
--fastq-list-sample-id $STRING 
--enable-cnv true 
--cnv-target-bed $PATH 
```

**Step 2. Combined counts generation.**

Individual PON counts can be merged into a single file as a `<prefix>.combined.counts.txt.gz` file.

```
  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN graph hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH        #e.g. SDD /staging 
--output-file-prefix $PREFIX 
--enable-cnv true 
--cnv-generate-combined-counts true 
--cnv-normals-list $CNV_NORMALS_LIST 
```

`$CNV_NORMALS_LIST` is a single lines file with paths to each target counts file generated by step1 (either `.target.counts.gz` or `.target.counts.gc-corrected.gz`). Output will have a PON file with suffix `.combined.counts.txt.gz` file. Use the PON file in case sample runs of DRAGEN CNV with `--cnv-combined-counts` option.

For more information, see [Panel of Normals](https://help.connected.illumina.com/dragen/dragen-v4.3/product-guide/dragen-v4.3/dragen-dna-pipeline/cnv-calling).

In some cases, an in-run PON containing germline samples from the same batch (i.e. sample source, DNA extraction, library prep and sequencing run) may provide superior normalization.

Analysis of a full batch of germline samples with an automatically generated in-run PON can be performed using [DRAGEN Enrichment on BaseSpace](https://www.illumina.com/products/by-type/informatics-products/basespace-sequence-hub/apps/dragen-enrichment.html) or DRAGEN Germline Enrichment on [ICA](https://www.illumina.com/products/by-type/informatics-products/connected-analytics.html).

CNV PONs can also be built in the cloud using the [DRAGEN Baseline Builder App on BaseSpace](https://www.illumina.com/products/by-type/informatics-products/basespace-sequence-hub/apps/dragen-baseline-builder.html) or the DRAGEN Systematic Noise File Builder Pipeline on [ICA](https://www.illumina.com/products/by-type/informatics-products/connected-analytics.html).
