Filter Duplicate Variants
Last updated
Last updated
DRAGEN can find and remove variants that are common to separate VCF files. DRAGEN supports the following modes:
Small indel deduplication—If using a structural variant VCF and a small variant VCF, DRAGEN filters all small indels in the structural variant VCF that appear and are passing in the small variant VCF (PASS
in the FILTER
column of the small variant VCF file). Using this feature, DRAGEN will create a new VCF (without changing SV and SNV VCF files) that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix
followed by sv.small_indel_dedup.vcf.gz
as suffix. The diagram below describes the small indel deduplication pipeline. You must provide a reference genome to generate the VCF files to normalize the variants. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases. An instance of utilizing this feature is when incorporating both SV and SNV callers in somatic workflows, which can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A.
SMN deduplication—If using a small variant VCF and an ExpansionHunter VCF, DRAGEN filters any lines in the small variant VCF that have the same chromosome and position as lines in the ExpansionHunter VCF with the INFO tag VARID=SMN
. A reference genome is not required.
Use the following command line options to input VCF or gVCF files. The input files are not altered.
vd-sv-vcf
—Specify a structural variant VCF or gVCF.
vd-small-variant-vcf
—Specify a small variant VCF or gVCF.
vd-eh-vcf
—Specify an ExpansionHunter VCF or gVCF.
DRAGEN determines the name and type of the output file as follows.
Component | Description |
---|---|
Output prefix | If a value is specified for |
Deduplication mode | The prefix is followed by |
File type | The output file type matches the input file type (VCF or gVCF). If |
You can use the following command line options for variant deduplication.
Option | Description |
---|---|
| To enable variant deduplication, set to |
| To generate tabix index files, set to 'true'. The default is 'true'. |
| To log matching lines to a text file, set to true. The default is false. For each match, the two matching lines follow each other, then by a new line. |
The following is an example command for an SMN deduplication standalone run:
You can also run small indel deduplication automatically on outputs from the DRAGEN joint caller where both structural variant and small variant callers are enabled. To run small indel deduplication automatically, set enable-variant-deduplication
to true
, and make sure the vd-sv-vcf
, vd-small-indel-vcf
, and vd-eh-vcf
input options are not set. Only small indel deduplication can be run automatically.
The following is an example command for an automatic small indel deduplication run.