# DNA Somatic Tumor-Normal MRD

> **For conceptual background and pipeline overview, see** [**DRAGEN MRD Pipeline Overview**](https://help.connected.illumina.com/dragen/product-guides/dragen-v4.5/mrd)

A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

## Step 0: Fastq generation

```
  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--output-directory $OUTPUT_DIR 
--sample-sheet $SAMPLE_SHEET 
--bcl-input-directory $RUN_FOLDER 
--bcl-conversion-only true 
--strict-mode true 
# if using ora compression (.fastq.ora) rather than gzip (.fastq.gz) 
--ora-reference $ORA_REFERENCE 
--fastq-compression-format dragen 
```

BCL conversion is optional if FASTQ data already exists. If starting from BCL files, this step must be completed before running the MRD pipeline to ensure sample-specific FASTQs are available as input.

## Step 1: Read alignment and targeted variant calling

### Step 1A: Read alignment and targeted germline variant calling (FFPE)

```
  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
--validate-pangenome-reference false    #required 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH 
--output-file-prefix $PREFIX 
--events-log-file $EVENTS_LOG_FILE 
--watchdog-active-timeout 1800 
--watchdog-idle-timeout 1800 
# Inputs (e.g. FQ list) 
--fastq-list $PATH 
--fastq-list-sample-id $STRING 
# Mapper 
--enable-map-align true                 #optional for BAM/CRAM input 
--enable-map-align-output true          #optionally save the output BAM 
--enable-duplicate-marking true         #default=true 
--enable-targeted false 
--Aligner.hard-clips 7                  #for FFPE samples only, uses hard clipping for all alignment types 
# Small variant caller 
--enable-variant-caller true            #targeted germline calling for ~37K SNP sites with high population allele frequencies (typically close to 50% VAF) 
--vc-target-bed $COMMON_GERMLINE_TARGET_BED 
# QC 
--gc-metrics-enable true 
--qc-coverage-ignore-overlaps false     #de-duplicated conventional coverage is output rather than molecular coverage 
--qc-cross-cont-vcf $QC_CROSS_CONTAMINATION_VCF 
# ORA 
--ora-reference $ORA_REFERENCE 
```

### Step 1B: Read alignment and targeted germline variant calling (BC/Plasma)

```
  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
--validate-pangenome-reference false    #required 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH 
--output-file-prefix $PREFIX 
--events-log-file $EVENTS_LOG_FILE 
--watchdog-active-timeout 1800 
--watchdog-idle-timeout 1800 
# Inputs (e.g. FQ list) 
--fastq-list $PATH 
--fastq-list-sample-id $STRING 
# Mapper 
--enable-map-align true                 #optional for BAM/CRAM input 
--enable-map-align-output true          #optionally save the output BAM 
--enable-duplicate-marking true         #default=true 
--enable-targeted false 
# Small variant caller 
--enable-variant-caller true            #targeted germline calling for ~37K SNP sites with high population allele frequencies (typically close to 50% VAF) 
--vc-target-bed $COMMON_GERMLINE_TARGET_BED 
# QC 
--gc-metrics-enable true 
--qc-coverage-ignore-overlaps false     #de-duplicated conventional coverage is output rather than molecular coverage 
--qc-cross-cont-vcf $QC_CROSS_CONTAMINATION_VCF 
# ORA 
--ora-reference $ORA_REFERENCE 
```

### Read alignment and targeted variant calling notes

For consistency, use the linear reference. For FFPE samples, use `--Aligner.hard-clips=7` to use hard clipping for all alignment types; omit this parameter for Buffy Coat or Plasma samples.

## Step 2: Fingerprint generation + QC

### Step 2A: Fingerprint generation and FFPE normal-aware contamination QC

```
  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH 
--output-file-prefix $PREFIX 
--events-log-file $EVENTS_LOG_FILE 
# Inputs (BAM files from Step 1) 
--bam-input $BUFFY_COAT_BAM 
--tumor-bam-input $FFPE_BAM 
# M/A 
--enable-map-align false 
--enable-map-align-output false 
# Small variant caller 
--enable-variant-caller true 
--vc-enable-germline-tagging true 
--vc-target-bed $HIGH_CONFIDENCE_REGIONS 
--vc-systematic-noise $VC_SYSTEMATIC_NOISE 
--vc-germline-tagging-db-files $VC_GERMLINE_TAGGING_DB_FILES 
# FP 
--mrd-fingerprint true 
# QC 
--qc-somatic-contam-vcf $QC_SOMATIC_CONTAMINATION_VCF 
```

### Fingerprint generation notes

For the DRAGEN MRD pipeline (similar to all somatic runs) it is recommended to use the linear hashtable. DRAGEN hashtables can be downloaded from [Product Files](https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html).

It is recommended to use `--vc-target-bed $BED` or `--vc-excluded-regions-bed $BED` to limit fingerprint calls to high-confidence regions. Construct a BED file covering only easily mapped regions, excluding ALU or highly repetitive regions where recurring noise tends to be more frequent.

Use a systematic noise file to further reduce false positives. Prebuilt systematic noise BED files can be downloaded from [Product Files](https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html):

| Prebuilt WGS noise files                           | Description              |
| -------------------------------------------------- | ------------------------ |
| `WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz`      | For WGS FF               |
| `FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz` | For WGS FFPE (only hg38) |

For more information see [SNV systematic noise](https://help.connected.illumina.com/dragen/product-guides/dragen-dna-pipeline/small-variant-calling/somatic-mode#systematic-noise-filtering). To download germline annotation files, refer to [Nirvana](https://help.connected.illumina.com/dragen/product-guides/dragen-v4.5/nirvana).

### FFPE normal-aware contamination QC notes

It is recommended to use `--qc-somatic-contam-vcf` for FFPE normal-aware contamination detection. The somatic contamination VCF files are bundled with every DRAGEN installation at `/opt/dragen/<version>/resources/qc/`. For the hg38 reference, use `somatic_sample_cross_contamination_resource_hg38.vcf.gz`.

For samples not run in matched Tumor/Normal mode (i.e., BC and Plasma samples in Step 1B), a germline contamination VCF file can be passed to `--qc-cross-cont-vcf`. These files are also located at `/opt/dragen/<version>/resources/qc/`. For the hg38 reference, use `sample_cross_contamination_resource_hg38.vcf.gz`.

### Step 2B: FFPE/BC sample matching QC

```
  
/opt/dragen/$VERSION/bin/dragen 
--ref-dir $REF_DIR 
--output-directory $OUTPUT_DIR 
--output-file-prefix $SAMPLE_ID 
--events-log-file $EVENTS_LOG_FILE 
# Inputs 
--checkfingerprint-expected-vcf $BUFFY_COAT_TARGETED_GERMLINE_VCF 
--checkfingerprint-observed-vcf $FFPE_TARGETED_GERMLINE_VCF 
# QC 
--enable-checkfingerprint true 
```

## Step 3: MRD detection

```
  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR 
--output-directory $OUTPUT_DIR 
--output-file-prefix $SAMPLE_ID 
--events-log-file $EVENTS_LOG_FILE 
# Inputs 
--bam-input $PLASMA_BAM 
--mrd-probes-file $FINGERPRINT_VCF 
# M/A 
--enable-map-align false 
--enable-map-align-output false 
--enable-sort false 
# MRD 
--enable-mrd true 
```

### MRD detection notes

The command line parameters that control MRD detect are:

| Parameter Name          | Description                                                                                           |
| ----------------------- | ----------------------------------------------------------------------------------------------------- |
| `--enable-mrd`          | Enables MRD detect. Default = "false".                                                                |
| `--mrd-probes-file`     | Path to the individual's tumor fingerprint VCF file                                                   |
| `--mrd-score-threshold` | Threshold used to determine the presence/absence of residual cancer DNA in the plasma. Default = 4.0. |

The MRD detect module generates an output summary file using the standard DRAGEN output directory and prefix: .mrd\_summary.json. The file is a valid JSON file that contains an array of JSON objects. DRAGEN supports running one sample at a time, so the array will be of length one.

The output JSON will include the following two fields of interest:

* Run\[1].TumorEstimate.illumina.eVAF
* Run\[1].TumorEstimate.illumina.score

The "eVAF" (estimated Variant Allele Frequency) is the estimated fraction of cancer DNA in the plasma sample.

The "score" can be used to determine presence/absence of residual cancer DNA in the plasma. A higher score indicates that the presence of cancer DNA is more likely. The exact threshold score that is used to indicate a positive ctDNA status may depend on sample quality and coverage, and can be optimized for a specific pipeline. It is expected that this threshold will typically be between 4 - 7.

## Step 4: Plasma QC

### Step 4A: Plasma/BC sample matching QC

```
  
/opt/dragen/$VERSION/bin/dragen 
--ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
--validate-pangenome-reference false    #required 
--output-directory $OUTPUT 
--output-file-prefix $PREFIX 
--events-log-file $EVENTS_LOG_FILE 
# Inputs 
--checkfingerprint-expected-vcf $BUFFY_COAT_TARGETED_GERMLINE_VCF 
--checkfingerprint-observed-vcf $PLASMA_TARGETED_GERMLINE_VCF 
# QC 
--enable-checkfingerprint true 
```

### Step 4B: Plasma contamination QC

```
  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR 
--output-directory $OUTPUT_DIR 
--output-file-prefix $SAMPLE_ID 
--events-log-file $EVENTS_LOG_FILE 
# Inputs 
--bam-input $PLASMA_BAM 
# M/A 
--enable-map-align false 
--enable-map-align-output false 
--enable-sort false 
# MRD 
--enable-mrd true 
--mrd-probes-file $COMMON_GERMLINE_VCF 
--mrd-blocklist $BUFFY_COAT_TARGETED_GERMLINE_VCF 
```

### Plasma QC notes

Similar to the MRD detection step, the output JSON will include the following field of interest:

```
 Run[1].TumorEstimate.illumina.eVAF 
```

The "eVAF", multiplied by two, can be used as a proxy for plasma contamination from a different human.
