A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.
Step 0: Fastq generation
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--output-directory $OUTPUT_DIR
--sample-sheet $SAMPLE_SHEET
--bcl-input-directory $RUN_FOLDER
--bcl-conversion-only true
--strict-mode true
# if using ora compression (.fastq.ora) rather than gzip (.fastq.gz)
--ora-reference $ORA_REFERENCE
--fastq-compression-format dragen
BCL conversion is optional if FASTQ data already exists. If starting from BCL files, this step must be completed before running the MRD pipeline to ensure sample-specific FASTQs are available as input.
Step 1: Read alignment and targeted variant calling
Step 1A: Read alignment and targeted germline variant calling (FFPE)
Step 1B: Read alignment and targeted germline variant calling (BC/Plasma)
Read alignment and targeted variant calling notes
For consistency, use the linear reference. For FFPE samples, use --Aligner.hard-clips=7 to use hard clipping for all alignment types; omit this parameter for Buffy Coat or Plasma samples.
Step 2: Fingerprint generation + QC
Step 2A: Fingerprint generation and FFPE normal-aware contamination QC
Fingerprint generation notes
For the DRAGEN MRD pipeline (similar to all somatic runs) it is recommended to use the linear hashtable. DRAGEN hashtables can be downloaded from Product Files.
It is recommended to use --vc-target-bed $BED or --vc-excluded-regions-bed $BED to limit fingerprint calls to high-confidence regions. Construct a BED file covering only easily mapped regions, excluding ALU or highly repetitive regions where recurring noise tends to be more frequent.
Use a systematic noise file to further reduce false positives. Prebuilt systematic noise BED files can be downloaded from Product Files:
It is recommended to use --qc-somatic-contam-vcf for FFPE normal-aware contamination detection. The somatic contamination VCF files are bundled with every DRAGEN installation at /opt/dragen/<version>/resources/qc/. For the hg38 reference, use somatic_sample_cross_contamination_resource_hg38.vcf.gz.
For samples not run in matched Tumor/Normal mode (i.e., BC and Plasma samples in Step 1B), a germline contamination VCF file can be passed to --qc-cross-cont-vcf. These files are also located at /opt/dragen/<version>/resources/qc/. For the hg38 reference, use sample_cross_contamination_resource_hg38.vcf.gz.
Step 2B: FFPE/BC sample matching QC
Step 3: MRD detection
MRD detection notes
The command line parameters that control MRD detect are:
Parameter Name
Description
--enable-mrd
Enables MRD detect. Default = "false".
--mrd-probes-file
Path to the individual's tumor fingerprint VCF file
--mrd-score-threshold
Threshold used to determine the presence/absence of residual cancer DNA in the plasma. Default = 4.0.
The MRD detect module generates an output summary file using the standard DRAGEN output directory and prefix: .mrd_summary.json. The file is a valid JSON file that contains an array of JSON objects. DRAGEN supports running one sample at a time, so the array will be of length one.
The output JSON will include the following two fields of interest:
Run[1].TumorEstimate.illumina.eVAF
Run[1].TumorEstimate.illumina.score
The "eVAF" (estimated Variant Allele Frequency) is the estimated fraction of cancer DNA in the plasma sample.
The "score" can be used to determine presence/absence of residual cancer DNA in the plasma. A higher score indicates that the presence of cancer DNA is more likely. The exact threshold score that is used to indicate a positive ctDNA status may depend on sample quality and coverage, and can be optimized for a specific pipeline. It is expected that this threshold will typically be between 4 - 7.
Step 4: Plasma QC
Step 4A: Plasma/BC sample matching QC
Step 4B: Plasma contamination QC
Plasma QC notes
Similar to the MRD detection step, the output JSON will include the following field of interest:
The "eVAF", multiplied by two, can be used as a proxy for plasma contamination from a different human.