# DRAGEN Fragmentomics

Fragmentomics is the study of fragmentation patterns of cell-free DNA or circulating tumor DNA (ctDNA). DNA molecules are released into plasma from various tissues and cell types. Fragmentation features of cell-free DNA, such as fragment sizes and end motifs, carry characteristics of their tissue of origin. Studies have shown that fragmentation features differ between cancer-derived and noncancer-derived ctDNA. The use of genome-wide fragment profiles of cell-free DNA has proven to be a powerful tool for inferring cancer status and tissue of origin. DRAGEN supports three fragmentomics components, which can be run independently or combined in a single run.\[1]

1. Fragment profile
2. End motif frequency
3. Window protection score (WPS)

![workflow](https://25033470-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG9szlFZupV6Q2DasL98y%2Fuploads%2Fgit-blob-b2bc8975a01dfea4b9308e2e20de22019bd394fa%2FFragmentomics_workflow.png?alt=media)

The fragmentomics workflow processes aligned reads from the mapper, calculates per-read metrics, and tabulates them into per-bin or target-region metrics. DRAGEN first gets chromosome sizes from the reference genome. Only autosomes and chromosomes X and Y are considered for fragment profile calculation. The genome is binned using the bin size specified by the user. Each aligned read is processed sequentially. Only reads satisfying the following criteria are considered: 1) mapped, 2) mate-mapped, 3) not PCR duplicates, 4) primary alignment, and 5) mapping quality no less than the minimum MAPQ specified by the user. Reads with template lengths within the short-fragment size range are counted as short fragments. Reads with template lengths within the long-fragment size range are counted as long fragments. The fragment profile is calculated as the ratio of short-to-long fragment counts for each genomic bin. Genome-wide short-fragment counts, long-fragment counts, and their ratio are normalized against the GC bias of each genomic bin using the GC correction module from the DRAGEN CNV component.

End motif frequency calculation is enabled with `--enable-fragmentomics-end-motif true`. The motif length is controlled by `--fragmentomics-end-motif-len`. Unmapped, duplicated, or secondary alignments are excluded from end motif frequency calculation. The first x-base sequences at the 5' end of each read, where x is specified by `--fragmentomics-end-motif-len`, are tabulated into a frequency dictionary with the sequences as keys and their counts as values. If the first x bases contain any `N` characters, the read is ignored. After all reads are processed, the frequency table is sorted alphabetically by sequence. End motif analysis also supports dedicated fragment-size filtering through `--fragmentomics-end-motif-fragment-min-size` and `--fragmentomics-end-motif-fragment-max-size`.

Window protection score (WPS) calculation is enabled with `--enable-fragmentomics-wps true`. A target region file should generally be specified with `--fragmentomics-wps-target-file`. If no target region file is provided, DRAGEN runs WPS across full chromosomes, which is not recommended because WPS signals are typically sparse genome-wide and analysis is usually intended for selected regions of interest. The target region file must be a BED-format text file with three columns. Each row defines a region of interest (ROI). DRAGEN automatically tiles each ROI with sliding windows whose size is controlled by `--fragmentomics-wps-window-size` (default = 120). Optional flanking bases can be added with `--fragmentomics-wps-region-left-padding` and `--fragmentomics-wps-region-right-padding`. Reads are counted at each window based on 5' end position and strand orientation; reads fully spanning the window are also tracked. After all reads are processed, DRAGEN reports the WPS and related per-window count metrics. WPS analysis also supports dedicated fragment-size filtering through `--fragmentomics-wps-fragment-min-size` and `--fragmentomics-wps-fragment-max-size`.

## Supported assays and DRAGEN modes

DRAGEN Fragmentomics currently supports `Tumor-only` and `Normal-only` sequencing data from TSO500/WES/WGS ctDNA assays. The results for `Tumor-Normal` pair data are undefined because ctDNA data are derived from a mixture of tumor and normal DNA. Therefore, users should **avoid** running Fragmentomics in `Tumor-Normal` mode.

## Command-Line Options

#### Component enablement options:

Enable fragment profile calculation:

```
--enable-fragmentomics true
```

Enable end motif calculation:

```
--enable-fragmentomics-end-motif true
```

Enable WPS calculation:

```
--enable-fragmentomics-wps true
```

#### Optional options:

```
    --fragmentomics-bin-size                      Uint. Default 100000
    --fragmentomics-num-threads                   Uint. Default 12
    --fragmentomics-min-mapq                      Uint. Default 30
    --fragmentomics-short-fragment-min-size       Uint. Default 100
    --fragmentomics-short-fragment-max-size       Uint. Default 150
    --fragmentomics-long-fragment-min-size        Uint. Default 151
    --fragmentomics-long-fragment-max-size        Uint. Default 220
    --fragmentomics-num-gc-bins                   Uint. Default 25
    --fragmentomics-gc-enable-smoothing           Bool. Default true
    --fragmentomics-end-motif-len                 Uint. Default 4
    --fragmentomics-end-motif-fragment-min-size   Uint. Default 50
    --fragmentomics-end-motif-fragment-max-size   Uint. Default 1500
    --fragmentomics-wps-target-file               String 
    --fragmentomics-wps-window-size               Uint. Default 120
    --fragmentomics-wps-region-left-padding       Uint. Default 0
    --fragmentomics-wps-region-right-padding      Uint. Default 0
    --fragmentomics-wps-fragment-min-size         Uint. Default 50
    --fragmentomics-wps-fragment-max-size         Uint. Default 1500
    --fragmentomics-exclude-bed                   String
```

### Target regions for window protection score

The target regions file is used only for window protection score calculation. The file must be in BED format with at least three columns (chrom, start, end); additional annotation columns such as a transcript ID are permitted. Each row defines a region of interest. DRAGEN automatically tiles each region into sliding windows based on `--fragmentomics-wps-window-size`, with optional left and right padding applied before tiling.

If a target region set is not readily available, common regions of interest include transcription start sites, promoters, and DHS/open chromatin regions. These genomic features can be accessed through the GENCODE gene annotations: <https://www.gencodegenes.org/human/>.

```
chr1    1615319    1615560    ENST00000958539.1
chr1    1615322    1615563    ENST00000889171.1
```

Note: The example above shows two TSS intervals from GENCODE (ENST00000958539.1 and ENST00000889171.1) each padded by 120 bp on both sides (e.g., TSS at chr1:1615439–1615440 expanded to chr1:1615319–1615560). Because the padding is already incorporated into the coordinates, `--fragmentomics-wps-region-left-padding` and `--fragmentomics-wps-region-right-padding` should be left at their defaults (0) unless additional flanking sequence is desired.

## Exclude regions for fragment profile

Users can provide a blocklist of regions to remove reads from fragment profile calculation. For example, low mappability regions. This file is in BED format with three columns.

```
chr1    1       1000
chr2    1000    2000
```

## Example command-line options for FASTQ input of WGS ctDNA

The following example enables all three fragmentomics components in a single run.

```
dragen \
	--ref-dir=$REF \
        --fastq-file1 $fastq1 \
	--fastq-file2 $fastq2 \
	--RGID "test" \
	--RGSM "test" \
	--enable-map-align true \
	--enable-sort false \
	--generate-ploidy-vcf false \
	--enable-cnv false \
	--enable-fragmentomics true \
	--enable-fragmentomics-end-motif true \
	--enable-fragmentomics-wps true \
	--fragmentomics-exclude-bed hg38_exclude.bed \
	--fragmentomics-bin-size 100000 \
	--fragmentomics-num-threads 12 \
	--fragmentomics-min-mapq 30 \
	--fragmentomics-short-fragment-min-size 100 \
	--fragmentomics-short-fragment-max-size 150 \
	--fragmentomics-long-fragment-min-size 151 \
	--fragmentomics-long-fragment-max-size 220 \
	--fragmentomics-num-gc-bins 25 \
	--fragmentomics-gc-enable-smoothing true \
	--fragmentomics-end-motif-len 4 \
	--fragmentomics-end-motif-fragment-min-size 50 \
	--fragmentomics-end-motif-fragment-max-size 1500 \
	--fragmentomics-wps-target-file hg38_regions_of_interest.bed \
	--fragmentomics-wps-window-size 120 \
	--fragmentomics-wps-region-left-padding 0 \
	--fragmentomics-wps-region-right-padding 0 \
	--fragmentomics-wps-fragment-min-size 50 \
	--fragmentomics-wps-fragment-max-size 1500 \
	--output-directory $outdir \
	--output-file-prefix $outprefix

```

## Fragmentomics Output

DRAGEN outputs the fragment profile file, end motif frequency file, and WPS file for whichever components are enabled.

The fragment profile file is in the following format:

```
Chr     Start     End    ShortFragCounts    LongFragCounts    ShortToLongRatio    GCBias    ShortFragCountsCorrected    LongFragCountsCorrected    ShortToLongRatioCorrected
chr1    0         100000    4    7    0.571429    0.424522    3.921048    6.925498    0.572042
chr1    100000    200000    1    2    0.500000    0.436106    0.980262    1.978714    0.500537
chr1    200000    300000    0    2    0.000000    0.391445    0.000000    2.008326    0.000000
```

The end motif frequency file is in the following format:

```
Motif   Frequency
AAAA    111559
AAAC    39204
AAAG    56773
AAAT    68437
```

The WPS file includes the following columns:

| Column         | Description                                                         |
| -------------- | ------------------------------------------------------------------- |
| Chr            | Chromosome name                                                     |
| windowStart    | Start coordinate of the WPS window                                  |
| windowEnd      | End coordinate of the WPS window                                    |
| windowCenter   | Center coordinate of the WPS window                                 |
| ForwardCount   | Read count for forward-mapped reads with a 5' end within the region |
| ReverseCount   | Read count for reverse-mapped reads with a 5' end within the region |
| FullySpanCount | Read count for mapped reads fully spanning the region               |
| WPS            | Window protection score for the region                              |
| TotalCount     | Total read count in the region                                      |
| RatioForward   | Ratio of `ForwardCount` to `TotalCount`                             |
| RatioReverse   | Ratio of `ReverseCount` to `TotalCount`                             |

The WPS file is in the following format:

```
Chr     windowStart  windowEnd  windowCenter  ForwardCount  ReverseCount  FullySpanCount  WPS  TotalCount  RatioForward  RatioReverse
chr1    2488041      2488160    2488100       3             21            314             290  355         0.0084507     0.0591549
chr1    2488042      2488161    2488101       3             32            285             250  366         0.00819672    0.0874317
chr1    2488043      2488162    2488102       4             41            255             210  376         0.0106383     0.109043
```

## Reference

1. Y. M. Dennis Lo, Diana S. C. Han, Peiyong Jiang, Rossa W. K. Chiu. Epigenetics, fragmentomics, and topology of cell-free DNA in liquid biopsies. Science. 2021. DOI: 10.1126/science.aaw3616
