DRAGEN supports force genotyping (ForceGT) for small variant calling. To use ForceGT, use the --vc-forcegt-vcf
option with a list of small variants to force genotype. The input list of small variants can be a *.vcf or *.vcf.gz file.
The current limitations of ForceGT are as follows:
ForceGT is supported for germline small variant calling in the V3 mode. The V1, V2, and V2+ modes are not supported.
ForceGT is also supported for somatic small variant calling.
ForceGT variants do not propagate through joint genotyping.
DRAGEN supports only a single ForceGT VCF input file, which must meet the following requirements:
The input has to be a valid VCF file according to version 4.2 of the VCF standard. For instance, it has to have at least eight tab-delimited columns and records need to be sorted by reference contig and position.
The header has to list the same contigs as the reference used for variant calling. All variants must refer to one of these contig names.
Variants have to be normalized (parsimonious and left-aligned, see below).
It must not contain any multinucleotide or complex variants (AT -> C). These are variants that require more than one substitution / insertion / deletion to go from REF allele to ALT allele and are ignored.
Any deletions longer than 50bp are filtered out.
Any variant will only be called once. Duplicate entries will be ignored.
The following nonnormalized variant will cause undefined behavior in DRAGEN:
Instead of…
use…
Force genotyping requires an input VCF and can be used with DRAGEN software in VCF, GVCF or VCF+GVCF mode. In all cases the output file(s) contains all regular calls and the forceGT variants, as follows:
For a ForceGT call that was not called by the variant caller (not common), the call is tagged with FGT in the INFO field.
For a germline ForceGT call that was also called by the variant caller and filter field is PASS, the call is tagged with NML;FGT in the INFO field (NML denotes normal). In somatic mode, the call is tagged with FGT;SOM.
For a normal call (and PASS) by the variant caller, with no ForceGT call (normal), no extra tags are added (no NML tag, no FGT tag).
This scheme distinguishes among calls that are present due to FGT only, common in both ForceGT input and normal calling, and normal calls.
All the variants in the input ForceGT VCF are genotyped and present in the output file. The following table lists the reported GTs for the variants.
If DRAGEN calls a variant that is different from the one specified in the input ForceGT VCF, the output contains the following multiple entries at the same position:
One entry for the default DRAGEN variant call
One entry each for every variant call present in the input ForceGT-VCF at that position
If a target BED file is provided along with the input ForceGT VCF, then the output file only contains ForceGT variants that overlap the BED file positions.
Condition | Reported GT |
---|---|
At a position with no coverage
./. or .
At a position with coverage but no reads supporting ALT allele
0/0 or 0
At a position with coverage and reads supporting ALT allele
dependent on pipeline (germline/somatic)