Germline WES

DRAGEN Recipe - Germline WES

Overview

This recipe is for processing whole exome sequencing data for germline workflows.

Example Command Line

For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.

Configure the INPUT options
Configure the OUTPUT options
Configure MAP/ALIGN depending on if realignment is desired or not
Configure the VARIANT CALLERs based on the application
Configure any additional options
Build up the necessary options for each component separately, so that they can be re-used in the final command line.

We highly recommend using a pangenome reference for human samples (excluding RNA). For more details, refer to .

The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.

#!/bin/bash
set -euo pipefail

# Path to DRAGEN hashtable
DRAGEN_HASH_TABLE=<REF_DIR> # pangenome reference for human samples

# Path to output directory for the DRAGEN run
OUTPUT=<OUT_DIR>

# File prefix for DRAGEN output files
PREFIX=<OUT_PREFIX>

# Define the input sources, select fastq list, fastq, bam, or cram.
INPUT_FASTQ_LIST="
  --fastq-list $FASTQ_LIST \
  --fastq-list-sample-id $FASTQ_LIST_SAMPLE_ID \
"

INPUT_FASTQ="
  --fastq-file1 $FASTQ1 \
  --fastq-file2 $FASTQ2 \
  --RGSM $RGSM \
  --RGID $RGID \
"

INPUT_BAM="
  --bam-input $BAM \
"

INPUT_CRAM="
  --cram-input $CRAM \
"

# Select input source, here in this example we use INPUT_FASTQ_LIST
INPUT_OPTIONS="
  --ref-dir $DRAGEN_HASH_TABLE \
  $INPUT_FASTQ_LIST \
"

OUTPUT_OPTIONS="
  --output-directory $OUTPUT \
  --output-file-prefix $PREFIX \
"

MA_OPTIONS="
  --enable-map-align true \
  --enable-sort true \
  --enable-duplicate-marking true \
"

CNV_OPTIONS="
  --enable-cnv true \
  --cnv-target-bed $CNV_TARGET_BED \
  --cnv-combined-counts $CNV_PANEL_OF_NORMALS \
"

SNV_OPTIONS="
  --enable-variant-caller true \
  --vc-target-bed $VC_TARGET_BED \
"

SV_OPTIONS="
  --enable-sv true \
  --sv-exome true \
  --sv-call-regions-bed $SV_TARGET_BED \
"

HLA_OPTIONS="
--enable-hla=true \
--hla-enable-class-2=true \ # only if the panel has sufficient coverage for class II HLA typing 
"

# Construct final command line
CMD="
  dragen \
  $INPUT_OPTIONS \
  $OUTPUT_OPTIONS \
  $MA_OPTIONS \
  $CNV_OPTIONS \
  $SNV_OPTIONS \
  $SV_OPTIONS \
  $HLA_OPTIONS \
"

# Execute
echo $CMD
bash -c $CMD

Additional Notes and Options

CNV

Please include the matched normal sample in the CNV panel of normals.

Option

Description

--cnv-enable-gcbias-correction true

Generating Panel of Normals (PON)

WES CNV requires PON files. Follow the two steps below to generate CNV PON:

Target counts generation (per normal sample): Target counts of individual normal sample should be generated as baseline. Any options used for panel of normals generation (BED file, GC Bias Correction, etc) should be matched when processing the case sample.

CNV_PON_OPTIONS="
  --enable-cnv true \
  --cnv-target-bed $CNV_TARGET_BED \
"

CMD="
  dragen \
  $INPUT_OPTIONS \
  $OUTPUT_OPTIONS \
  $CNV_PON_OPTIONS \
"

Combined counts generation: Individual PON counts can be merged into a single file as a <prefix>.combined.counts.txt.gz file.

CNV_COMBINED_COUNTS_OPTIONS="
  --enable-cnv true \
  --cnv-generate-combined-counts true \
  --cnv-normals-list $CNV_NORMALS_LIST \
"

CMD="
  dragen \
  $INPUT_OPTIONS \
  $OUTPUT_OPTIONS \
  $CNV_COMBINED_COUNTS_OPTIONS \
"

$CNV_NORMALS_LIST is a single text file with paths to each target counts file generated by step1 (either .target.counts.gz or .target.counts.gc-corrected.gz). Output will have a PON file with suffix .combined.counts.txt.gz file. Use the PON file in case sample runs of DRAGEN CNV with --cnv-combined-counts option.

SNV

HLA

Option

Description

enable-hla

Enable HLA typer (this setting by default will only genotype class 1 genes)

hla-enable-class-2

Extend genotyping to HLA class 2 genes

PreviousGermline WGS NextSomatic Tumor Normal with UMI

Last updated 5 months ago

Was this helpful?

#!/bin/bash set -euo pipefail # Path to DRAGEN hashtable DRAGEN_HASH_TABLE=<REF_DIR> # pangenome reference for human samples # Path to output directory for the DRAGEN run OUTPUT=<OUT_DIR> # File prefix for DRAGEN output files PREFIX=<OUT_PREFIX> # Define the input sources, select fastq list, fastq, bam, or cram. INPUT_FASTQ_LIST=" --fastq-list $FASTQ_LIST \ --fastq-list-sample-id $FASTQ_LIST_SAMPLE_ID \ " INPUT_FASTQ=" --fastq-file1 $FASTQ1 \ --fastq-file2 $FASTQ2 \ --RGSM $RGSM \ --RGID $RGID \ " INPUT_BAM=" --bam-input $BAM \ " INPUT_CRAM=" --cram-input $CRAM \ " # Select input source, here in this example we use INPUT_FASTQ_LIST INPUT_OPTIONS=" --ref-dir $DRAGEN_HASH_TABLE \ $INPUT_FASTQ_LIST \ " OUTPUT_OPTIONS=" --output-directory $OUTPUT \ --output-file-prefix $PREFIX \ " MA_OPTIONS=" --enable-map-align true \ --enable-sort true \ --enable-duplicate-marking true \ " CNV_OPTIONS=" --enable-cnv true \ --cnv-target-bed $CNV_TARGET_BED \ --cnv-combined-counts $CNV_PANEL_OF_NORMALS \ " SNV_OPTIONS=" --enable-variant-caller true \ --vc-target-bed $VC_TARGET_BED \ " SV_OPTIONS=" --enable-sv true \ --sv-exome true \ --sv-call-regions-bed $SV_TARGET_BED \ " HLA_OPTIONS=" --enable-hla=true \ --hla-enable-class-2=true \ # only if the panel has sufficient coverage for class II HLA typing " # Construct final command line CMD=" dragen \ $INPUT_OPTIONS \ $OUTPUT_OPTIONS \ $MA_OPTIONS \ $CNV_OPTIONS \ $SNV_OPTIONS \ $SV_OPTIONS \ $HLA_OPTIONS \ " # Execute echo $CMD bash -c $CMD

CNV_COMBINED_COUNTS_OPTIONS=" --enable-cnv true \ --cnv-generate-combined-counts true \ --cnv-normals-list $CNV_NORMALS_LIST \ " CMD=" dragen \ $INPUT_OPTIONS \ $OUTPUT_OPTIONS \ $CNV_COMBINED_COUNTS_OPTIONS \ "