DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
To enable fractional downsampling, set the --enable-fractional-down-sampler
command line option to true
.
Any valid sequencing data format that is compatible with the DRAGEN Host Software can be used. For more information on compatible input options, see Input Options.
In addition to enabling the fractional downsampling command line option, you must set the subsample fraction to downsample. To set the subsample fraction, use --down-sampler-normal-subsample
and/or --down-sampler-tumor-subsample
depending on the input files.
You can also specify a seed using --down-sampler-random-seed
to generate different subsamples of the input data set.
Option | Description |
---|---|
--enable-fractional-down-sampler
Set to true
to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
DRAGEN can reserve a random subset of fragments that are separate from the normal alignment outputs using downsampling. You can use downsampling to generate data sets for performing comparisons between samples or between replicates. DRAGEN samples fragments after performing any hardware accelerated trimming or filtering functions, which enables DRAGEN to rapidly create analysis-read test data sets.
To enable downsampling, set the --enable-down-sampler
command line option to true
.
You can use any valid sequencing data format that is compatible with the DRAGEN Host Software. For more information on compatible input options, see Input Options.
DRAGEN downsampling outputs the reserved subset of data in FASTQ format. If the input is paired-ended, DRAGEN outputs two FASTQ files that contain subsampled data. If the input is unpaired, DRAGEN outputs two FASTQ files.
In addition to enabling the downsampling command line option, you must set the quantity of fragments to downsample. To set the quantity of fragments, use either --down-sampler-fragments
or --down-sampler-coverage
.
If you specified a coverage level, you must also specify a genome using the --ref-dir
or manually specify the genome size using --down-sampler-genome-size
. If you specify both a read and coverage limit, DRAGEN applies both quantity limits and keeps whichever result is smaller.
Option | Description |
---|---|
--enable-down-sampler
Set to true
to enable downsampling. The default value is false. If enabled, you must set either down-sampler-fragments
or --down-sampler-coverage
.
--down-sampler-num-threads
Specify the number of threads to use for down-sampled reads. The default value is 8.
--down-sampler-random-seed
Set random seed for down-sampled fragments. The default value is 42.
--down-sampler-genome-size
Set target genome size for downsampling coverage. The default value is 0. The --down-sampler-genome-size
option is not compatible with the --ref-dir
option.
--down-sampler-fragments
Specify the target number of fragments for downsampling. The default value is 0.
--down-sampler-coverage
Set target genomic coverage for downsampling. The default value is 0. If enabled, you must set either -ref-dir
or --down-sampler-genome-size
.