# Fractional (Raw Reads) Downsampling

DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.). In this sense it behaves similar to popular downsampling tools such as `samtools view -s` or `seqtk`.

To enable fractional downsampling, set the `--enable-fractional-down-sampler` command line option to `true`.

Any valid sequencing data format that is compatible with the DRAGEN Host Software can be used. For more information on compatible input options, see [Input Options](https://help.connected.illumina.com/dragen/dragen-v4.3/product-guide/dragen-host-software#input-options).

## Determining an Appropriate Downsampling Fraction

DRAGEN generates metrics that can be used to determine an appropriate downsampling fraction:

* DRAGEN BCL Demux generates a 'Demultiplex\_Stats.csv' file that contains the '# Reads' column — i.e., the number of pass-filter read fragments (read pairs) for each sample and lane.
* DRAGEN Mapping and Aligning generates a '.mapping\_metrics.csv' file that contains the 'Total input reads', i.e., the total number of reads (not pairs) present in the original input files.

The fractional downsampling ratio can be estimated from:

* estimated original coverage = (Total number of reads \[not pairs] \* estimated read length) / genome size or enrichment region
* downsampling fraction = estimated original coverage / desired coverage

Adjustments may be required for samples with a high fraction of duplicate-marked reads or short fragments with overlapping mates.

## Command Line Options

In addition to enabling the fractional downsampling command line option, you must set the subsample fraction to downsample. To set the subsample fraction, use `--down-sampler-normal-subsample` and/or `--down-sampler-tumor-subsample` depending on the input files.

You can also specify a seed using `--down-sampler-random-seed` to generate different subsamples of the input data set.

| Option                             | Description                                                                                                 |
| ---------------------------------- | ----------------------------------------------------------------------------------------------------------- |
| `--enable-fractional-down-sampler` | Set to `true` to enable fractional downsampling. The default value is false.                                |
| `--down-sampler-normal-subsample`  | Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%). |
| `--down-sampler-tumor-subsample`   | Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).  |
| `--down-sampler-random-seed`       | Specify the random seed for different runs of the same input data. The default value is 42.                 |
