De Novo Small Variant Filtering
The filtering step identifies de novo variants calls of the joint calling workflow in regions with ploidy changes. Since de novo calling can have reduced specificity in regions where at least one of the pedigree members shows non-diploid genotypes, the de novo variant filtering marks relevant variants and thus can improve specificity of the call set.
Based on the structural and copy number variant calls of the pedigree, the FORMAT/DN field in the proband is changed from the original DeNovo value to DeNovoSV or DeNovoCNV if the de novo variant overlaps with a ploidy-changing SV or CNV, respectively. All other variant details remain unchanged, and all variants of the input VCF will also be present in the filtered output VCF. Structural or copy number variants which result in no change of ploidy, such as inversions, are not considered in the filtering. As an example, a de novo SNV calls in the input VCF
Overlapping with an SV duplication in the proband, mother or father would be represented in the filtered output VCF as follows:
The following is an example command line for running the de novo filtering, based on the files returned by the joint calling workflows:
De Novo Small Variant Filtering Options
The following options are used for de novo variant filtering:
--dn-input-vcf
---Joint small variant VCF from the de novo calling step to be filtered.--dn-output-vcf
---File location to which the filtered VCF should be written. If not specified, the input VCF is overwritten.--dn-sv-vcf
---Joint structural variant VCF from the SV calling step. If omitted, checks with overlapping structural variants are skipped.--dn-cnv-vcf
--- Joint structural variant VCF from the CNV calling step. If omitted, checks with overlapping copy number variants are skipped.
Germline Small Variant Hard Filtering
DRAGEN provides post-VCF variant filtering based on annotations present in the VCF records. Default and non-default variant hard filtering are described below. However, due to the nature of DRAGEN's algorithms, which incorporate the hypothesis of correlated errors from within the core of variant caller, the pipeline has improved capabilities in distinguishing the true variants from noise, and therefore the dependency on post-VCF filtering is substantially reduced. For this reason, the default post-VCF filtering in DRAGEN is very simple.
Default Small Variant Hard Filtering
The default filters in the germline pipeline are as follows:
##FILTER=<ID=DRAGENSnpHardQUAL,Description="Set if true:QUAL < 10.41 (3 when ML recalibration is enabled)">
##FILTER=<ID=DRAGENIndelHardQUAL,Description="Set if true:QUAL < 7.83 (3 when ML recalibration is enabled)">
##FILTER=<ID=LowDepth,Description="Set if true:DP <= 1">
##FILTER=<ID=PloidyConflict,Description="Genotype call from variant caller not consistent with chromosome ploidy">
DRAGENSnpHardQUAL and DRAGENIndelHardQUAL: For all contigs other than the mitochondrial contig, the default hard filtering consists of thresholding the QUAL value only. A different default QUAL threshold value is applied to SNP and INDEL
LowDepth: This filter is applied to all variants calls with INFO/DP <= 1
PloidyConflict: This filter is applied to all variant calls on chrY of a female subject, if female is specified on the DRAGEN command line, of if female is detected by the ploidy estimator.
For the mitochondrial contig, DRAGEN processes it through a continuous AF pipeline, which is similar to the somatic variant calling pipeline. Please refer to Mitochondrial Calling for the filtering details.
Non-Default Small Variant Hard Filtering
DRAGEN supports basic filtering of variant calls as described in the VCF standard. You can apply any number of filters with the --vc-hard-filter
option, which takes a semicolon-delimited list of expressions, as follows:
where the list of criteria is itself a list of expressions, delimited by the || (OR) operator in this format:
The meaning of these expression elements is as follows:
filterID---The name of the filter, which is entered in the FILTER column of the VCF file for calls that are filtered by that expression.
snp/indel/all---The subset of variant calls to which the expression should be applied.
annotation ID---The variant call record annotation for which values should be checked for the filter. Supported annotations include FS, MQ, MQRankSum, QD, and ReadPosRankSum.
comparison operator---The numeric comparison operator to use for comparing to the specified filter value. Supported operators include <, ≤, =, ≠, ≥, and >. For example, the following expression would mark with the label "SNP filter" any SNPs with FS < 2.1 or with MQ < 100, and would mark with "indel filter" any records with FS < 2.2 or with MQ < 110:
This example is for illustration purposes only and is NOT recommended for use with DRAGEN V3 output. Illumina recommends using the default hard filters. The only supported operation for combining value comparisons is OR, and there is no support for arithmetic combinations of multiple annotations. More complex expressions may be supported in the future.
Orientation Bias Filter
The orientation bias filter is designed to reduce noise typically associated with the following:
Pre-adapter artifacts introduced during genomic library preparation (eg, a combination of heat, shearing, and metal contaminates can result in the 8-oxoguanine base pairing with either cytosine or adenine, ultimately leading to G→T transversion mutations during PCR amplification), or
FFPE (formalin-fixed paraffin-embedded) artifact. FFPE artifacts stem from formaldehyde deamination of cytosines, which results in C to T transition mutations. The orientation bias filter can only be used on somatic pipelines. To enable the filter, set the
--vc-enable-orientation-bias-filter
option to true. The default is false.
The artifact type to be filtered can be specified with the --vc-orientation-bias-filter-artifacts
option. The default is C/T,G/T, which correspond to OxoG and FFPE artifacts. Valid values include C/T, or G/T, or C/T,G/T,C/A.
An artifact (or an artifact and its reverse compliment) cannot be listed twice. For example, C/T,G/A is not valid, because C→G and T→A are reverse compliments.
The orientation bias filter adds the following information:
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description="Count of reads in F1R2 pair orientation supporting each allele">
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description="Count of reads in F2R1 pair orientation supporting each allele">
##FORMAT=<ID=OBC,Number=1,Type=String,Description="Orientation Bias Filter base context">
##FORMAT=<ID=OBPa,Number=1,Type=String,Description="Orientation Bias prior for artifact">
##FORMAT=<ID=OBParc,Number=1,Type=String,Description="Orientation Bias prior for reverse compliment artifact">
##FORMAT=<ID=OBPsnp,Number=1,Type=String,Description="Orientation Bias prior for real variant">
Please note that the OBF filter runs as a standalone process after DRAGEN is complete. The VC metrics that are computed as part of DRAGEN SNV caller will not be updated and will not reflect the additional variants that are filtered in this stage.
Last updated