DNA Germline WES UMI
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-source STRING #Default='qname'
--umi-library-type STRING #e.g. random-duplex
--umi-min-supporting-reads 1 #Default=2
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# SV
--enable-sv true
--sv-exome true
--sv-call-regions-bed $SV_TARGET_BED
# CNV
--enable-cnv true
--cnv-target-bed $PATH
--cnv-combined-counts $PATH #CNV PON. See 'In-run PON' section below.
# HLA genotyper
--enable-hla true
# Targeted caller (only if using the Illumina CS/PGx Custom Enrichment Research Panel)
--enable-targeted true
--targeted-pon $PATH #Targeted PON. See 'In-run PON' section below.
--targeted-systematic-noise $PATH #Targeted systematic noise file
Notes and additional options
Hashtable
For DRAGEN germline runs, it is recommended to use the pangenome hashtable.
See: Product Files
Input options
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
--fastq-list $PATH
--fastq-list-sample-id $STRING
FQ Input
--fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING
BAM Input
--bam-input $PATH
CRAM Input
--cram-input $PATH
Mapping and Aligning
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
UMI
--umi-source STRING
Specify the input type for the UMI sequence. Options: qname
, fastq
, bamtag
.
--umi-library-type STRING
Set the batch option for different UMIs correction. Options: random-duplex
, random-simplex
, nonrandom-duplex
.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
--umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
For more information see: UMI Options.
SNV
DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.
Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
For more detail on the small variant caller in somatic mode please refer to Somatic Mode
Annotation
For instructions on how to download the Nirvana annotation database, please refer to Nirvana
HLA
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
CNV
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm
for germline WGS, aslm
for somatic WGS, and hslm
for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
For more information, see CNV Calling.
In-run PON
For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.
For Targeted Caller PON requirements and generation options see Targeted Caller | Exome calling.
CNV and Targeted Caller require separate PON files, but the intermediate counts files can be generated in the same DRAGEN command line invocation. Follow the steps below to generate the CNV and Targeted Caller PON files. Note that Targeted Caller is only supported with the Illumina CS/PGx Custom Enrichment Research Panel.
Step 1. Generate CNV target counts and Targeted exome counts of individual samples from the sequencing run.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# CNV
--enable-cnv true
--cnv-target-bed $PATH
# Targeted caller (only if using the Illumina CS/PGx Custom Enrichment Research Panel)
--targeted-generate-exome-counts true
Step 2. CNV combined counts file generation.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST
$CNV_NORMALS_LIST
is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz
or <output-file-prefix>.target.counts.gc-corrected.gz
). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz
PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts
option.
Step 3. Targeted Caller PON file generation.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--targeted-pon-counts-list $TARGETED_PON_COUNTS_LIST
$TARGETED_PON_COUNTS_LIST
is a text file with one line for each path to a Targeted Caller exome counts file generated in step 1 (<output-file-prefix>.targeted.exome.counts.json.gz
). Individual exome counts files are merged into a single <output-file-prefix>.targeted.pon.json.gz
PON file in the output directory. The PON file is used for each case sample run of DRAGEN Targeted Caller using the --targeted-pon
option.
Targeted Caller
A systematic noise file corresponding to one of the pre-built pangenome references can be downloaded from the [DRAGEN Software Support Site page]https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html).
Last updated
Was this helpful?