DRAGEN Fragmentomics
Last updated
Last updated
Fragmentomics is the study of fragmentation patterns of cell-free DNA or circulating turmor DNA (ctDNA). DNA molecules are released into the plasma from various tissues and cell types. Fragmentation features, such as fragment sizes and end motifs, of the cell-free DNA contains the characteristics of their tissue of origin. Studies have shown that fragmentation features are distinct between cancer and noncancer cells derived ctDNA. The use of genome-wide fragment profile of cell-free DNA has proven to be a powerful tool to infer cancer status and their tissue of origin. The DRAGEN fragmentomics component computes three fragmentomics metrics as following.[1]
Fragment profile
End motif frequency
Window protection score (WPS)
The fragmentomics component works by taking aligned reads from the mapper, calculating per read metrics, and finally tabulating into per-bin or target region metrics. DRAGEN first gets the chromosome sizes from the reference genome. Only autosomes and X, Y chromosomes are considered for fragment profile calculation. The genome is binned with the bin size specified by the user. Each aligned read is processed sequentially. Only reads satisfied with the following criteria are considered: 1) mapped, 2) mate-mapped, 3) not PCR duplicates, 4) primary alignment, 5) mapping quality no less than minimum mapq specified by the user. Reads that have template length within the short fragment size ranges are counted as short fragment. Reads that have template length within the long fragment size range are counted as long fragment. The fragment profile is calculated as the ratio of short-to-long fragment counts for each genomic bin. Genome-wide short fragment counts, long fragment counts, and their ratio are normalized against the GC bias of each genomic bin using the GC correction module from DRAGEN CNV component.
End motif frequency calculation is enabled when --fragmentomics-end-motif-len
is set to positive integers. Unmapped, duplicated, or secondary alignments are excluded for end motif frequency calculation. The first x basepair sequences (x is specified by --fragmentomics-end-motif-len
) at the 5' end of the reads is tabulated into a frequency dictionary with keys being the sequences and values being the frequencies. If the first x basepair contains any 'N's, this read will be ignored. After all reads are processed, the frequency table is sorted by the sequences in alphabetic order.
Window protection score (WPS) calculation is enabled when a target region is provided with --fragmentomics-wps-target-file
. This file must be a BED format text file with three columns. Each row in the file represents a 120-bp region for which WPS will be calculated. An interval tree will be constructed for the target regions. Then each aligned read is processed sequentially, and unmapped, duplicated, or secondary alignments are excluded. Any read with 5' end falling in a target region increments the read count for the region by one. Forward and reverse reads are counted separately. If a read fully spans the region, the fully-span read count for the region increments by one. After all reads are processed, WPS is calculated for each target region. Two ways of WPS calculation are supported, 1) number of fully spanning rads subtracted by the number of reads with 5' ending in the region. 2) percentage of reads ending in the region of all reads mapped to this region.
DRAGEN Fragmentomics currently supports Tumor-only
and Normal-only
sequencing data from TSO500/WES/WGS ctDNA assays. The results for Tumor-Normal
pair data are undefined because ctDNA data are derived from mixture of tumor and normal DNA. Therefore, users should avoid running Fragmentomics in Tumor-Normal
mode.
Enable the Fragmentomics component:
The target regions file is used only in window protection score calculation. The target regions file is in BED format with three columns.
Users can provide a blocklist of regions to remove reads from fragment profile calculation. For example, low mappability regions. This file is in BED format with three columns.
The system should output the fragment profile file, and optionally the end motif frequency file or WPS file if either or both are enabled.
The fragment profile file is in the following format:
The end motif frequency file is in the following format:
The WPS file is in the following format:
Y. M. DENNIS LO, DIANA S. C. HAN, PEIYONG JIANG, ROSSA W. K. CHIU. Epigenetics, fragmentomics, and topology of cell-free DNA in liquid biopsies. Science. 2021. DOI: 10.1126/science.aaw3616