Contamination Detection

The DRAGEN cross-sample contamination module estimates the fraction of sequencing reads originating from another human sample using a probabilistic mixture model.

DRAGEN provides two contamination detection modes. The appropriate mode depends on sample type, coverage, and expected contamination level.

Quick Decision Guide

What are you running?

Sample characteristics

Setting to use

What DRAGEN does

General germline or somatic (default)

>= 20X coverage; FFPE/CNV/LOH allowed

--qc-detect-contamination=true

Runs GATK-based model; automatically falls back to legacy VerifyBamID-like model if GATK fails (e.g. high contamination)

RNA-seq

Variable expression and coverage

--qc-detect-contamination=true

Runs GATK-based model in experimental mode; results are best-effort and qualitative

Low coverage germline

Low coverage (~10×), no FFPE/CNV/LOH

--qc-cross-cont-vcf

Runs legacy VerifyBamID-like model directly; robust at low coverage

Fallback Mechanism

When --qc-detect-contamination=true is specified, DRAGEN:

First attempts contamination estimation using the GATK-based model
Automatically falls back to the legacy VerifyBamID-like model if the GATK-based model fails to converge, most commonly at high contamination levels

No additional settings are required to enable fallback behavior.

GATK-Based Contamination Detection (Default)

Use for: Germline, tumor-only, and tumor-normal workflows. This is the recommended default.

Enable

--qc-detect-contamination=true

Population Marker Resources

/opt/dragen/<VERSION>/resources/qc/somatic_sample_cross_contamination_resource_*.vcf.gz

(hg19, hg38, hs37d5)

Markers can also be supplied explicitly:

--qc-somatic-contam-vcf <population_markers.vcf>

Behavior

Accounts for FFPE damage, copy number variation (CNV), and loss of heterozygosity (LOH)
Empirically adjusts base qualities to reduce FFPE deamination and oxidation noise
Optimized for low-to-moderate contamination levels

RNA-seq Support (Experimental)

--qc-detect-contamination=true can be run on RNA-seq data.

Limitations

Less stable than DNA due to expression and coverage variability
Results are qualitative indicators only
Feature is experimental

Legacy Contamination Detection (VerifyBamID-like)

Use for: Clean germline samples, especially at low coverage (~10×), or when fallback occurs.

Enable

--qc-cross-cont-vcf <population_markers.vcf>

Population Marker Resources

/opt/dragen/<VERSION>/resources/qc/sample_cross_contamination_resource_*.vcf.gz

(hg19, hg38, hs37d5)

Behavior

Models the sample as a mixture of individuals
Performs well on clean germline data
Robust at low coverage
Can remain informative at high contamination
Not robust to FFPE, CNVs, or extended ROH

Output and Interpretation

The contamination estimate is reported as a fraction:

MAPPING/ALIGNING SUMMARY Estimated sample contamination 0.011

This corresponds to 1.1% contamination.

Interpretation Guidance

Contamination should be well below the minimum allele frequency of interest
Example: at 1% contamination, variants below ~5% AF may be unreliable
The metric saturates near ~30% contamination

Coverage and Validity Requirements

Contamination estimation requires ≥100 valid pileups.

A pileup is valid if:

Coverage ≥ 10×
≥ 95% of reads are valid

Soft-clipped reads are excluded. Excessive soft clipping is often caused by untrimmed adapters. If contamination is reported as NA, inspect marker loci in IGV and correct adapter issues upstream.

Legacy Model–Specific Settings

Setting

Description

--qc-contam-min-cov

Minimum coverage per pileup (default: 10).

--qc-contam-min-valid-read-ratio

Minimum fraction of valid reads (default: 0.95). Can be lowered to ~0.75, but adapter trimming issues should be fixed instead.

Key Takeaways

Use GATK-based contamination detection for most workflows
Use the legacy model for low-coverage clean germline samples
High contamination triggers automatic fallback when using --qc-detect-contamination=true
RNA-seq support is experimental

PreviousMinimal Checklist NextFastQC

Last updated 3 hours ago

Was this helpful?

hashtagQuick Decision Guide

hashtagFallback Mechanism

hashtagGATK-Based Contamination Detection (Default)

hashtagRNA-seq Support (Experimental)

hashtagLegacy Contamination Detection (VerifyBamID-like)

hashtagOutput and Interpretation

hashtagCoverage and Validity Requirements

hashtagLegacy Model–Specific Settings

hashtagKey Takeaways