1 of 1

DNA Somatic Tumor-Only Solid WES

The DRAGEN recipe includes the recommended pipeline specific commands.

Notes and additional options

Hashtable

For DRAGEN somatic runs it is recommended to use the linear hashtable.

Input options

DRAGEN input sources include: fastq list, fastq, bam, or cram.

FQ list Input

FQ Input

BAM Input

CRAM Input

Mapping and Aligning

Duplicate Marking

Fractional (Raw Reads) Downsampling

DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

SNV

High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.

HLA

CNV

Option

Description

--cnv-enable-gcbias-correction true

Enable or disable GC bias correction when generating target counts.

--cnv-segmentation-mode $SEG_MODE

Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

--cnv-segmentation-bed $PATH

If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.

--cnv-population-b-allele-vcf $POP_VCF

--heme-cnv true

Configures DRAGEN to use CNV settings for HEME.

Annotation

TMB

The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.

Option

Description

--tmb-vaf-threshold FLOAT

Variant mininum allele frequency for usable variants (default=0.05)

--vc-callability-tumor-thresh INT

Required read coverage to use a site (default=50).

--tmb-enable-proxi-filter BOOL

Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).

MSI

Option

Description and recommended setting

--msi-coverage-threshold INT

Minimum coverage for a microsatellite: 60 (default)

--msi-distance-threshold FLOAT

Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)

SV

Option

Description

--sv-call-regions-bed

Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

--sv-exclusion-bed

Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

--enable-variant-deduplication true

Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

--sv-systematic-noise $BEDPE

Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for Tumor-Normal, but strongly recommended for Tumor-Only.

--sv-somatic-ins-tandup-hotspot-regions-bed $BED

Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

--sv-min-candidate-variant-size

Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

--sv-min-scored-variant-size

After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

Option

Recommended Value for Liquid Tumors (e.g. AML/MLL)

--heme-sv true

Configures DRAGEN to use SV settings for Liquid Tumors (e.g., AML/MLL).

--sv-min-scored-variant-size $INT

100000

For more information, see Structural Variant Calling.

Resource Files

DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

SNV Systematic Noise

Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

Prebuild

Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.

Prebuilt WES/WGS noise files

Description

WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

For WGS FF

FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

For WGS FFPE (only hg38)

WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

For WES FF and FFPE

Custom

Prebuilt systematic noise files are available for WES or WGS applications. For these applications, it is considered optional to build custom noise files. For high-sensitivity applications, including panels, it is required to build custom noise files. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30–70 normals when building a noise file, but fewer can be used.

Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH        #e.g. SDD /staging 
--output-file-prefix $PREFIX 
--tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
--tumor-fastq-list-sample-id $STRING 
--vc-detect-systematic-noise=true 
--vc-target-bed-padding 500 
--vc-enable-germline-tagging=true 
--variant-annotation-data $PATH 
--intermediate-results-dir $PATH 
--output-directory $PATH 
--output-file-prefix $STRING

For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.

Step 2. Generate the final noise file.

  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH        #e.g. SDD /staging 
--output-file-prefix $PREFIX 
--build-sys-noise-vcfs-list ${VCF_LIST}

The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

SV Systematic Noise

Systematic noise files are also recommended for Tumor-Normals workflows, but are considered essential for reducing FP calls in Tumor-Only workflows.

Prebuilt

Prebuilt SV systematic noise files can be downloaded here: Product Files.

Prebuilt WES/WGS noise files

Description

WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

For WGS/WES FF/FFPE

IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

For HEME

Custom

It is considered optional to build a custom systematic noise file for WES or WGS applications, but for high sensitivity applications like panels it is strongly recommended. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH        #e.g. SDD /staging 
--output-file-prefix $PREFIX 
--tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
--tumor-fastq-list-sample-id $STRING 
--sv-detect-systematic-noise true

Step 2. Build the BEDPE file using input VCFs from previous step.

  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH        #e.g. SDD /staging 
--output-file-prefix $PREFIX 
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line.

Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

CNV Panel of Normals (PON)

The panel of normals mode uses a set of matched normal samples to determine the baseline level from which to call CNV events. These matched normal samples should be derived from the same library prep and sequencing workflow that was used for the case sample. CNV requires PON files for all targeted analyses (including panels, exomes, germline, tumor-only and tumor-normal workflows). It is recommended to use 30-100 normal samples when building the PON, but fewer may be used. If sample coverage noise is relatively stable, as few as 5 PON samples may yield acceptable results.

If a matched normal is available it is recommended to include it in the PON.

Follow the two steps below to generate CNV PON:

Step 1. Generate target counts of individual normal samples.

Any options used for panel of normals generation (BED file, GC Bias Correction, etc) should be matched when processing the case sample.

  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH        #e.g. SDD /staging 
--output-file-prefix $PREFIX 
--tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
--tumor-fastq-list-sample-id $STRING 
--enable-cnv true 
--cnv-target-bed $PATH

Step 2. Combined counts generation.

Individual PON counts can be merged into a single file as a <prefix>.combined.counts.txt.gz file.

  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH        #e.g. SDD /staging 
--output-file-prefix $PREFIX 
--enable-cnv true 
--cnv-generate-combined-counts true 
--cnv-normals-list $CNV_NORMALS_LIST

$CNV_NORMALS_LIST is a single lines file with paths to each target counts file generated by step1 (either .target.counts.gz or .target.counts.gc-corrected.gz). Output will have a PON file with suffix .combined.counts.txt.gz file. Use the PON file in case sample runs of DRAGEN CNV with --cnv-combined-counts option.

For more information, see Panel of Normals.

CNV PONs can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.