TruPath Outputs
This page describes key DRAGEN secondary analysis outputs and metrics.
The proximity mode enabled DRAGEN Germline pipeline for use with the Illumina TruPath Genome prep produces a layered set of outputs that begin with proximity‑aware templates and alignments, expand into long‑range phasing, and culminate in phased small variant calls, haplotype‑resolved SVs, small variants in paralogous regions (MRJD), STR expansion calls, and colocation‑validated break ends, all backed by extensive QC and reporting. Each algorithm consumes the same underlying proximity signal but exposes results through standard genomics artifacts (BAM, VCF, CSV, JSON, Cooler), making the workflow powerful yet interoperable.
For more information on the DRAGEN algorithms, features, and outputs supporting Illumina TruPath Genome prep, please navigate to the DRAGEN User Guide linked here.

1. Proximity Linking Model & Template Reconstruction
A non‑linear proximity linking model is fit per read group using flow cell (X,Y) distance and genomic distance to reconstruct long DNA templates. This is the foundational signal reused by each downstream algorithm.
Key Output Files:
Proximity‑aware BAM/CRAM
Reads tagged with
BX:Z(template ID)
Template metrics CSVs (WGS + QC regions):
<prefix>.<qc>_template_subpairs.csv<prefix>.<qc>_template_gdist.csv<prefix>.<qc>_template_xdist.csv<prefix>.<qc>_template_ydist.csv<prefix>.<qc>_template_thresholds.csv
Link metrics CSVs:
<prefix>.<qc>_proximity_gdist.csv<prefix>.<qc>_proximity_xdist.csv<prefix>.<qc>_proximity_ydist.csv
2. Proximity-Aware Mapping & Alignment
Uses proximity link probabilities as an additional alignment score to resolve ambiguous mappings that standard short-read alignment cannot resolve.
Key Output Files:
Mapped BAM/CRAM
Improved placement in repeats/paralogs
Carries proximity and template tags
3. Read Phasing & Personalized Haplotypes
Combines haplotype databases with long‑range proximity links to create long, confident phase blocks, enabling haplotype‑aware variant calling and downstream analyses.
Key Output Files:
Phased BAM/CRAM
Tags:
HP(haplotype),PS(phase block),pp(phasing confidence)
Personalized haplotypes TSV
<prefix>.personal_haplotypes.tsv.gz
Phase block GTF
<prefix>.phase_blocks.gtf.gz
Imputed variants VCF
<prefix>.personal.vcf.gz
Phasing metrics CSV
<prefix>.phasing_summary_stats.csv
4. Proximity-Aware Structural Variant (SV) Calling
Uses haplotype‑segregated assemblies and phasing‑aware machine learning (ML) models to improve SV detection and genotyping in single‑sample germline WGS. Proximity information enters indirectly through phasing.
Key Output Files
SV VCF
Includes TruPath‑specific fields:
PHASEDASMML_UPDATEDMLQSColocationSum(when colocation filter applied)
5. Multi-Region Joint Detection (MRJD)
Performs de novo, haplotype‑resolved small‑variant calling in paralogous regions, estimating copy number and assigning variants to specific gene copies or haplotypes using proximity information
Key output files
Primary MRJD VCF
<prefix>.mrjd.hard-filtered.vcf.gz
MRJD JSON summary
<prefix>.mrjd.jsonCopy number, region/haplotype assignments, run status
MRJD phased BAM
<prefix>.mrjd.phased.bamTags:
HP(copy),PC(confidence),PS,BX
Supporting files directory
mrjd_supporting_files/Multi‑column VCFs per paralog set
Reference region alignments SAM
6. STR Calling with IRR Recovery
Recovers otherwise unmapped in‑repeat reads using proximity, enabling more accurate sizing of large STR expansions and improving genotyping in heterozygous cases.
Key Output Files
STR VCF (standard DRAGEN‑STR format)
BAM/CRAM with IRR tags
trtag encoding recovered repeat motif
7. Colocation Analysis & Filtering
Summarizes long‑range genomic interactions from proximity‑linked reads and uses that signal to validate or filter SV break ends lacking molecular support
Key Output Files:
Cooler file
Sparse colocation matrix (Hi‑C‑like)
SV VCF annotations
NORMALIZED_COLOC_SUMColocationSumfilter (when applied)
8. Proximity‑Filtered Coverage & Reporting
Provides QC and interpretability for proximity data quality, template structure, and phasing performance, integrated into standard DRAGEN Report
Key Output Files:
Proximity coverage CSVs (per link‑quality threshold):
<prefix>_proximity_linkqual<q>_coverage_metrics.csv<prefix>_proximity_linkqual<q>_hist.csv<prefix>_proximity_linkqual<q>_fine_hist.csv<prefix>_proximity_linkqual<q>_overall_mean_cov.csv<prefix>_proximity_linkqual<q>_contig_mean_cov.csv
9. DRAGEN Reports (TruPath Germline WGS)
Dedicated Proximity tab with QC metrics and visualizations. See the DRAGEN Reports section of the DRAGEN User guide for additional information:
Key Metrics:
Fit RMSE- An estimate of how different the estimated probabilities can be between the parametric and non-parametric models, on the phred scaleQ25 Proximity Rate- Percentage of read-pairs with at least one neighbor above Q25, on the phred scaleQ25 Proximity Coverage- Average alignment coverage over genome with link-quality ≥Q25, on the phred scaleP75 Template Size- The size of linked template molecules at the 75th percentileNG50- The size of the smallest phasing block required to phase 50% of the genome
Key Plots:
Template Genomic Span, displaying the distribution of template genomic lengths from
<prefix>.wgs_template_gdist.csv

Last updated
Was this helpful?
