CheckFingerprint
CheckFingerprint identifies whether two or more sequencing datasets originate from the same individual. It supports multiple comparison modes depending on the available inputs and the desired trade-off between statistical rigor, runtime, and scalability.
LOD-based CheckFingerprint modes are broadly based on Picard CheckFingerprint and report logarithmic odds (LOD) scores to assess sample identity using probabilistic genotype modeling and haplotype blocks.
Pairwise pileup comparison mode is a new, experimental method that reports a simple MatchRate based on direct genotype concordance. This mode is intended for rapid screening and large-scale comparisons.
A positive LOD score or a high MatchRate indicates that samples are likely derived from the same individual.
CheckFingerprint Modes (Summary)
CheckFingerprint supports two types of identity metrics:
LOD score (logarithmic odds) quantifies how much more likely two samples are to come from the same person than from different people. A positive score indicates a likely match, with higher values indicating stronger evidence.
Match rate is a simpler 0–1 concordance score based on direct genotype comparison, where values above 0.90–0.95 indicate a likely match.
From reads (generate VCF)
BAM/CRAM + expected genotype VCF. Requires the DRAGEN germline small variant caller to be enabled.
LOD score
Medium
WGS or larger datasets; best general-purpose option
Precomputed VCF
One or more observed genotype VCFs + expected genotype VCF (no BAM needed). Small variant caller is skipped.
LOD score
Fast
VCFs (must be germline) already available
Pairwise pileup (experimental)
Pileup files generated during DRAGEN contamination detection
Match rate
Very fast
Batch screening across many samples; requires contamination detection to have been run
Processing Flow
LOD-Based Modes (On-the-fly VCF / Precomputed Germline VCF)
In LOD-based modes, CheckFingerprint uses reference-specific haplotype map files (*.map), bundled with DRAGEN and automatically selected based on the reference, to define curated SNPs grouped into haplotype (linkage disequilibrium) blocks. Genotype likelihoods are estimated from VCF PL values, and evidence is aggregated at the haplotype level to avoid over-counting correlated variants. A logarithmic odds (LOD) score is then computed to quantify how much more likely the samples originate from the same individual than from different individuals.
Processing steps:
Select SNPs from reference-specific haplotype maps
Estimate genotype likelihoods from VCF PL values
Aggregate evidence across haplotype blocks
Compute LOD scores for sample pairs
LOD-based modes provide a statistically rigorous identity assessment and are recommended for final confirmation.
Pairwise Pileup Mode (Experimental)
In pairwise pileup mode, CheckFingerprint performs a fast, direct comparison of genotypes using pileup files, without haplotype modeling or probabilistic inference. This mode is optimized for rapid screening and large-scale, multi-sample comparisons.
Processing steps:
Load pileup files for all input samples
Select overlapping marker sites across samples
Apply minimum depth and heterozygosity filters
Exclude uninformative sites (e.g. homozygous reference in both samples)
Compare genotypes at remaining sites
Compute a MatchRate for all pairwise sample comparisons
Interpretation of Results
LOD-Based Modes
LOD > 0: samples likely from the same individual
LOD < 0: samples likely from different individuals
LOD ≈ 0: inconclusive (often due to low coverage)
LOD scores are reported on a base-10 logarithmic scale. For example, a LOD of 4 indicates the data are 10,000× more likely to match than not.
Pairwise Pileup Mode
MatchRate ≥ 0.90–0.95: samples likely from the same origin
Lower MatchRate: samples likely from different individuals
MatchRate = NA: insufficient overlapping informative sites
MatchRate is intended for screening and triage, not formal identity confirmation.
Command-Line Options
[Required]
--enable-checkfingerprint true
[Required for LOD-Based Modes]
--checkfingerprint-expected-vcf <expected.vcf>
The expected VCF may contain one or multiple samples. The input sample is compared independently against each expected sample.
[Mode Selection Options]
--checkfingerprint-enable-vcf-comparison true
Enable VCF comparison mode (required for either precomputed or on the fly)
--checkfingerprint-observed-vcf <vcf>
Enable precomputed VCF comparison mode
--checkfingerprint-pairwise-read-files <pileup>
Enable pairwise pileup mode (repeatable)
[Optional – Advanced (LOD-Based Modes)]
--checkfingerprint-haplotype-map <map_file>Specify a custom haplotype map file. By default, DRAGEN automatically selects a reference-specific haplotype map bundled with the software.
[Pairwise Pileup Mode Settings]
--checkfingerprint-pairwise-min-depth
Minimum depth required at a locus
10
--checkfingerprint-pairwise-het-width
Total AF window around 0.5 used to classify heterozygous sites (e.g. 0.5 → AF 0.25–0.75)
0.5
--checkfingerprint-pairwise-min-passing-sites
Minimum overlapping passing sites required to compute MatchRate
500
[Tumor-Aware Settings – LOD Modes]
--checkfingerprint-enable-tumor-aware true
Enable tumor-aware LOD computation
--checkfingerprint-loss-of-het-rate
Rate at which heterozygous sites become homozygous due to LOH
0.5
Command-Line Examples
On-the-fly VCF Comparison Mode
Most applicable for: Whole-genome sequencing (WGS) datasets (≈30× coverage) and general-purpose identity checking.
Standalone VCF Comparison Mode
Most applicable for: VCF-only workflows where both observed and expected VCFs are already available.
Pairwise Pileup Comparison Mode (Experimental)
Most applicable for: Rapid batch-level screening of many samples (e.g. WGS runs), duplicate detection, and large-scale identity sanity checks.
Pileup files can be generated during DRAGEN map-align steps by:
DRAGEN contamination detection (
--qc-detect-contamination true)External tools such as
samtools mpileup
Outputs
LOD-Based Modes
<prefix>.CheckFingerprint.summary.txt<prefix>.CheckFingerprint.detail.txt
Pairwise Pileup Mode Output
<prefix>.CheckFingerprint.pairwise.csv
The CSV file contains all pairwise sample comparisons, sorted by MatchRate (highest to lowest):
SampleA / SampleB
Input pileup file names
OverlappingSites
Total shared loci
PassingSites
Loci passing depth and genotype filters
UninformativeSites
Loci where both samples are homozygous reference
MatchingGenotypes
Matching genotype calls
MismatchingGenotypes
Mismatching genotype calls
MatchRate
Matching / (Matching + Mismatching), or NA
If PassingSites < checkfingerprint-pairwise-min-passing-sites, MatchRate is reported as NA.
Limitations
Pairwise pileup mode:
Experimental; intended for rapid screening
Non-probabilistic and haplotype-free
Less sensitive for low-coverage or targeted panels
LOD-based modes:
Tumor-aware LOD assumes loss of heterozygosity
Observed and expected VCFs should originate from the same pipeline
Compatible only with DRAGEN germline and tumor-only pipelines
Last updated
Was this helpful?
