TruPath Outputs

This page describes key DRAGEN secondary analysis outputs and metrics.

The proximity mode enabled DRAGEN Germline pipeline for use with the Illumina TruPath Genome prep produces a layered set of outputs that begin with proximity‑aware templates and alignments, expand into long‑range phasing, and culminate in phased small variant calls, haplotype‑resolved SVs, small variants in paralogous regions (MRJD), STR expansion calls, and colocation‑validated break ends, all backed by extensive QC and reporting. Each algorithm consumes the same underlying proximity signal but exposes results through standard genomics artifacts (BAM, VCF, CSV, JSON, Cooler), making the workflow powerful yet interoperable.

For more information on the DRAGEN algorithms, features, and outputs supporting Illumina TruPath Genome prep, please navigate to the DRAGEN User Guide linked here.

1. Proximity Linking Model & Template Reconstruction

A non‑linear proximity linking model is fit per read group using flow cell (X,Y) distance and genomic distance to reconstruct long DNA templates. This is the foundational signal reused by each downstream algorithm.
Key Output Files:
- Proximity‑aware BAM/CRAM
  - Reads tagged with BX:Z (template ID)
- Template metrics CSVs (WGS + QC regions):
  - <prefix>.<qc>_template_subpairs.csv
  - <prefix>.<qc>_template_gdist.csv
  - <prefix>.<qc>_template_xdist.csv
  - <prefix>.<qc>_template_ydist.csv
  - <prefix>.<qc>_template_thresholds.csv
- Link metrics CSVs:
  - <prefix>.<qc>_proximity_gdist.csv
  - <prefix>.<qc>_proximity_xdist.csv
  - <prefix>.<qc>_proximity_ydist.csv

2. Proximity-Aware Mapping & Alignment

Uses proximity link probabilities as an additional alignment score to resolve ambiguous mappings that standard short-read alignment cannot resolve.
Key Output Files:
- Mapped BAM/CRAM
  - Improved placement in repeats/paralogs
  - Carries proximity and template tags

3. Read Phasing & Personalized Haplotypes

Combines haplotype databases with long‑range proximity links to create long, confident phase blocks, enabling haplotype‑aware variant calling and downstream analyses.
Key Output Files:
- Phased BAM/CRAM
  - Tags: HP (haplotype), PS (phase block), pp (phasing confidence)
- Personalized haplotypes TSV
  - <prefix>.personal_haplotypes.tsv.gz
- Phase block GTF
  - <prefix>.phase_blocks.gtf.gz
- Imputed variants VCF
  - <prefix>.personal.vcf.gz
- Phasing metrics CSV
  - <prefix>.phasing_summary_stats.csv

4. Proximity-Aware Structural Variant (SV) Calling

Uses haplotype‑segregated assemblies and phasing‑aware machine learning (ML) models to improve SV detection and genotyping in single‑sample germline WGS. Proximity information enters indirectly through phasing.
Key Output Files
- SV VCF
  - Includes TruPath‑specific fields:
    PHASEDASM
    ML_UPDATED
    MLQS
    ColocationSum (when colocation filter applied)

5. Multi-Region Joint Detection (MRJD)

Performs de novo, haplotype‑resolved small‑variant calling in paralogous regions, estimating copy number and assigning variants to specific gene copies or haplotypes using proximity information
Key output files
- Primary MRJD VCF
  - <prefix>.mrjd.hard-filtered.vcf.gz
- MRJD JSON summary
  - <prefix>.mrjd.json
  - Copy number, region/haplotype assignments, run status
- MRJD phased BAM
  - <prefix>.mrjd.phased.bam
  - Tags: HP (copy), PC (confidence), PS, BX
- Supporting files directory
  - mrjd_supporting_files/
    Multi‑column VCFs per paralog set
    Reference region alignments SAM

6. STR Calling with IRR Recovery

Recovers otherwise unmapped in‑repeat reads using proximity, enabling more accurate sizing of large STR expansions and improving genotyping in heterozygous cases.
Key Output Files
- STR VCF (standard DRAGEN‑STR format)
- BAM/CRAM with IRR tags
  - tr tag encoding recovered repeat motif

7. Colocation Analysis & Filtering

Summarizes long‑range genomic interactions from proximity‑linked reads and uses that signal to validate or filter SV break ends lacking molecular support
Key Output Files:
- Cooler file
  - Sparse colocation matrix (Hi‑C‑like)
- SV VCF annotations
  - NORMALIZED_COLOC_SUM
  - ColocationSum filter (when applied)

8. Proximity‑Filtered Coverage & Reporting

Provides QC and interpretability for proximity data quality, template structure, and phasing performance, integrated into standard DRAGEN Report
Key Output Files:
- Proximity coverage CSVs (per link‑quality threshold):
  - <prefix>_proximity_linkqual<q>_coverage_metrics.csv
  - <prefix>_proximity_linkqual<q>_hist.csv
  - <prefix>_proximity_linkqual<q>_fine_hist.csv
  - <prefix>_proximity_linkqual<q>_overall_mean_cov.csv
  - <prefix>_proximity_linkqual<q>_contig_mean_cov.csv

9. DRAGEN Reports (TruPath Germline WGS)

Dedicated Proximity tab with QC metrics and visualizations. See the DRAGEN Reports section of the DRAGEN User guide for additional information:
Key Metrics:
- Fit RMSE - An estimate of how different the estimated probabilities can be between the parametric and non-parametric models, on the phred scale
- Q25 Proximity Rate - Percentage of read-pairs with at least one neighbor above Q25, on the phred scale
- Q25 Proximity Coverage - Average alignment coverage over genome with link-quality ≥Q25, on the phred scale
- P75 Template Size - The size of linked template molecules at the 75th percentile
- NG50 - The size of the smallest phasing block required to phase 50% of the genome
Key Plots:
- Template Genomic Span, displaying the distribution of template genomic lengths from <prefix>.wgs_template_gdist.csv

PreviousOn-Prem DRAGEN Server Analysis NextTruPath Metrics

Last updated 7 days ago

Was this helpful?

hashtag1. Proximity Linking Model & Template Reconstruction

hashtag2. Proximity-Aware Mapping & Alignment

hashtag3. Read Phasing & Personalized Haplotypes

hashtag4. Proximity-Aware Structural Variant (SV) Calling

hashtag5. Multi-Region Joint Detection (MRJD)

hashtag6. STR Calling with IRR Recovery

hashtag7. Colocation Analysis & Filtering

hashtag8. Proximity‑Filtered Coverage & Reporting

hashtag9. DRAGEN Reports (TruPath Germline WGS)