LogoLogo
Illumina KnowledgeIllumina SupportSign In
  • Home
  • Overview
    • Illumina® DRAGEN™ Secondary Analysis
    • DRAGEN Applications
    • Deployment Options
  • Product Guides
    • DRAGEN v4.4
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • Clinical Research Workflows
        • DRAGEN Heme WGS Tumor Only Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
        • DRAGEN Solid WGS Tumor Normal Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Quick Start
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
            • Custom Workflow
              • Custom Config Support
            • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • Illumina scRNA
        • Other scRNA prep
        • RNA Panel
        • RNA WTS
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Pedigree Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • Available pipelines
            • Germline CNV Calling (WGS/WES)
            • Germline CNV Calling ASCN (WGS)
            • Multisample Germline CNV Calling
            • Somatic CNV Calling ASCN (WGS)
            • Somatic CNV Calling WES
            • Somatic CNV Calling ASCN (WES)
          • Additional documentation
            • CNV Input
            • CNV Preprocessing
            • CNV Segmentation
            • CNV Output
            • CNV ASCN module
            • CNV with SV Support
            • Cytogenetics Modality
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
          • Structural Variant IGV Tutorial
        • VNTR Calling
        • Population Genotyping
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • JSON Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single Cell Pipeline
        • Illumina PIPseq scRNA
        • Other scRNA Prep
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN MRD Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
        • Docker Requirements
      • DRAGEN Reports
      • Tools and Utilities
    • DRAGEN v4.3
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Joint Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • CNV Output
          • CNV with SV Support
          • Multisample CNV Calling
          • Somatic CNV Calling WGS
          • Somatic CNV Calling WES
          • Allele Specific CNV for Somatic WES CNV
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
        • VNTR Calling
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
          • Effective Coverage Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single-Cell Pipeline
        • scRNA
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • RNA Panel
        • RNA WTS
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
      • DRAGEN Reports
      • Tools and Utilities
  • Reference
    • DRAGEN Server
    • DRAGEN Multi-Cloud
      • DRAGEN on AWS
      • DRAGEN on AWS Batch
      • DRAGEN on Microsoft Azure
        • Run DRAGEN VM on Azure
      • DRAGEN on Microsoft Azure Batch
        • Azure Batch Run Modes
    • DRAGEN Licensing
      • DRAGEN Server Licensing
      • DRAGEN Cloud Licensing
    • DRAGEN Application Manager
    • Support
    • Resource Files
      • Noise Baselines
    • Supplementary Information
    • Troubleshooting
    • Citing DRAGEN software
    • Release Notes
    • Revision History
Powered by GitBook
On this page
  • Input Data
  • Output Files
  • Targeted JSON File
  • Targeted VCF File
  • Merging Targeted Calls In The hard-filtered Files
  • Command-Line Examples
  • FASTQ Input Example
  • Prealigned BAM Input Example

Was this helpful?

Export as PDF
  1. Product Guides
  2. DRAGEN v4.3
  3. DRAGEN DNA Pipeline

Targeted Caller

PreviousDe Novo Repeat Expansion DetectionNextCYPDB6 Caller

Last updated 2 days ago

Was this helpful?

Repetitive regions in the human genome pose a challenge for general variant calling approaches which typically cannot make use of potentially misplaced MAPQ0 reads. Furthermore, high sequence homology of some genes with a pseudogene paralog can lead to a wide variety of common structural variants (SVs) in the population, requiring specialized targeted calling approaches. DRAGEN supports targeted calling for a number of genes/targets as described in subsequent target-specific sections.

The targeted caller can be enabled using the command line option --enable-targeted=true or a subset of targets can be enabled by providing a space-separated list of target names. The supported target names are: cyp2b6, cyp2d6, cyp21a2, gba, hba, lpa, rh, and smn. For a list of all supported targeted caller options along with their default values, see . The targeted caller produces a <output-file-prefix>.targeted.json file containing a summary of the variant caller results for each target. Additional detail of individual variant calls are reported in VCF format in the <output-file-prefix>.targeted.vcf.gz output file.

Input Data

The targeted caller requires WGS data aligned to a human reference genome with at least 30x coverage. The caller may be less reliable at lower coverage. Human reference genome builds based on hg19, hs37d5 (including GRCh37), or hg38 are supported. The targeted caller should not be enabled with low-coverage, exome or enrichment sequencing data.

Output Files

Targeted JSON File

The targeted caller generates a <output-file-prefix>.targeted.json file in the output directory. The output file is a JSON formatted file containing the fields below.

Fields in JSON
Explanation
Type and Possible Values
Present

sampleId

The sample name.

string

always

softwareVersion

The version of DRAGEN.

string

always

phenotypeDatabaseSources

Resources used for calling metabolism status (phenotype).

json array of strings

CYP2B6 or CYP2D6 is enabled

cyp2b6

The CYP2B6 caller fields.

dictionary

CYP2B6 caller is enabled

cyp2d6

The CYP2D6 caller fields.

dictionary

CYP2D6 caller is enabled

cyp21a2

The CYP21A2 caller fields.

dictionary

CYP21A2 caller is enabled

gba

The GBA caller fields.

dictionary

GBA caller is enabled

hba

The HBA caller fields.

dictionary

HBA caller is enabled

lpa

The LPA caller fields.

dictionary

LPA caller is enabled

rh

The RH caller fields.

dictionary

RH caller is enabled

smn

The SMN caller fields.

dictionary

SMN caller is enabled

Targeted VCF File

The targeted caller generates a <output-file-prefix>.targeted.vcf.gz file in the output directory. The output file is a VCFv4.2 formatted file. The targets that have VCF output are: cyp21a2, gba, hba, lpa, rh, and smn. Small variants, structural variants, and copy number variants are reported in the same VCF file. The <output-file-prefix>.targeted.vcf.gz file includes the following source header line:

##source=DRAGEN_TARGETED

For lpa, rh and smn targets, the EVENT and EVENTTYPE INFO fields are used to identify the called variants.

The EVENT and EVENTTYPE INFO fields are formally introduced in VCFv4.4 to enable the representation of complex rearrangements. This is achieved using the EVENT field to group all the related VCF records together, and the EVENTTYPE to classify the event. The corresponding header lines are the following.

##INFO=<ID=EVENT,Number=A,Type=String,Description="Event name">
##INFO=<ID=EVENTTYPE,Number=A,Type=String,Description="Type of associated event">

However, the use of EVENT is not limited to complex rearrangements and can be used to associate nonsymbolic alleles, for example in cases of variant position ambiguity in high homology regions.

Since the EVENTTYPE values are implementation-defined, custom EVENTTYPE header lines are included to describe each EVENTTYPE.

##EVENTTYPE=<ID=GENE_CONVERSION,Description="Gene conversion event">
##EVENTTYPE=<ID=VARIANT_IN_HOMOLOGY_REGION,Description="Variant in homology region">
##EVENTTYPE=<ID=VNTR,Description="Variable number tandem repeat">

For cyp21a2, gba, and hba targets, the ALLELE_ID INFO field is used to identify the called variant alleles.

##INFO=<ID=ALLELE_ID,Number=R,Type=String,Description="Identifier for each allele">

The missing value . is used when no identifier is available (e.g. a wild type allele) or applicable (e.g. allele index 0 for a structural variant record).

Nonrecombinant-like Variants In High Homology Regions

In the case of target variants in a high homology region, each variant is reported ambiguously at all corresponding homologous positions (i.e. in both the pseudogene and in the target gene). Additional analysis for these variants can be performed if absolute certainty that these variants are located in the target gene (e.g. in gba or cyp21a2) is required.

For lpa and smn the ploidy of the called genotype (FORMAT/GT field) corresponds to the combined copy number from all the homologous positions. For cyp21a2, gba and hba, this "joint" genotype from all the homologous positions is instead reported in a separate FORMAT/JGT field which is then collapsed into a diploid genotype and reported in the FORMAT/GT field. The following fields are reported for "joint" calls:

##INFO=<ID=JIDS,Number=.,Type=String,Description="IDs (from ID column) of calls associated with a joint genotype call in duplicated regions">
##FORMAT=<ID=JGT,Number=1,Type=String,Description="Joint genotype in duplicated regions">
##FORMAT=<ID=JGQ,Number=1,Type=Integer,Description="Quality of joint genotype in duplicated regions">
##FORMAT=<ID=JPL,Number=.,Type=Integer,Description="Normalized, Phred-scaled likelihoods for joint genotypes as defined in the VCF specification">
##FORMAT=<ID=JQL,Number=1,Type=Float,Description="Phred-scaled likelihood for homozygous reference joint genotype call">
##FORMAT=<ID=JVQL,Number=1,Type=Float,Description="Phred-scaled likelihood for nonvariant joint genotype call where overlapping deletion (*) ALT alleles are not considered to be variant.">
##FORMAT=<ID=JDP,Number=1,Type=Integer,Description="Total depth from all alleles in duplicated regions">
##FORMAT=<ID=JAD,Number=R,Type=Integer,Description="Total read depth for each allele in duplicated regions">
##FORMAT=<ID=JAF,Number=A,Type=Float,Description="Allele frequency for each alt allele in duplicated regions">

Note that the FORMAT/GQ and FORMAT/JGQ fields contain the unconditional genotype quality, unlike the VCF spec where FORMAT/GQ is defined as the genotype quality conditioned on the site being variant.

In the depicted example there are two genes A and B that include a high homology region. The usual process to call variants in this regions is to make a joint pileup of the reads aligning in both genes A and B and call the variants using a model with a ploidy proportional to the total copy number of the regions. This generates divergent possible genotypes that are equally likely since the variant cannot be confidently placed in either gene A or gene B. For lpa and smn the variant would be reported as follows:

chr1 100 . A T . TargetedRepeatConflict EVENT=GeneA-B:50A>T;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT 0/0/0/1
chr1 200 . A T . TargetedRepeatConflict EVENT=GeneA-B:50A>T;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT 0/0/0/1

Given the unconventional ploidy of the FORMAT/GT field used in this representation, a TargetedRepeatConflict filter is applied to these records. The header line for the filter is the following.

##FILTER=<ID=TargetedRepeatConflict,Description="Set if call is in a targeted repeat region that cannot be placed">

For cyp21a2, gba and hba, a conventional diploid FORMAT/GT is reported and so no TargetedRepeatConflict filter is applied. Due to the ambiguity in placing target variants in high homology regions, the corresponding QUAL and FORMAT/GQ fields can be much lower than conventional small variant calls (i.e. Phred 3 for a single variant allele copy across two homologous diploid positions). Therefore, instead of filtering on QUAL and FORMAT/GQ for these records, the records are filtered based on the FORMAT/JVQL and FORMAT/JGQ fields:

##FILTER=<ID=TargetedLowJGQ,Description="Set if call has JGQ < 3.">
##FILTER=<ID=TargetedLowJVQL,Description="Set if call has JVQL < 3.00.">

Since the wild type alleles at homologous positions may be different from each other or different from the reference alleles, an additional filter is applied when only wild type alleles are detected across the homologous positions. This avoids making ambiguous variant calls when no target variant of interest is detected.

##FILTER=<ID=TargetedWT,Description="Region-ambiguous targeted call with GT containing only wild type alleles, ignoring any overlapping deletions.">

Rh Gene Conversion Events

In the case of an identified gene conversion even in rh, a small variant is reported at each differentiating site in the acceptor region.

In the depicted example there are two genes A and B and gene A is the acceptor of a gene conversion from gene B (green box in the figure). Gene conversion are identified by observing variations in copy number at differentiating sites (blue and pink bars in the figure) in consecutive regions. Copy number variations between regions define the breakends of the gene conversion. An equivalent VCF representation for gene conversion would be using CNV and SV entries with breakends corresponding to the donor/acceptor regions, however, only the small variant representation is currently supported.

chr1 121 .   A T    . PASS EVENT=GC_AB;EVENTTYPE=GENE_CONVERSION; GT:PS 0|1:121
...
chr1 280 .   G A    . PASS EVENT=GC_AB;EVENTTYPE=GENE_CONVERSION; GT:PS 0|1:121

In the case of a detected gene conversion event, there may be differentiating sites with a genotype that is inconsistent with that gene conversion event. In these cases the RecombinantConflict filter is applied. The RecombinantConflict is defined by the following header line.

##FILTER=<ID=RecombinantConflict,Description="Set if call has a copy number that conflicts with a recombinant variant">

In the example, the resulting representation is as follows.

chr1 121 .   A T    . PASS EVENT=GC_AB;EVENTTYPE=GENE_CONVERSION; GT:PS 0|1:121
...
chr1 144 .   C T    . RecombinantConflict EVENT=GC_AB;EVENTTYPE=GENE_CONVERSION; GT:PS 1|1:121
chr1 153 .   A G    . RecombinantConflict EVENT=GC_AB;EVENTTYPE=GENE_CONVERSION; GT 0/0
...
chr1 280 .   G A    . PASS EVENT=GC_AB;EVENTTYPE=GENE_CONVERSION; GT:PS 0|1:121

Nonallelic Homologous Recombination

For cyp21a2 and gba, nonallelic homologous recombination can result in gene deletion or duplication in the case of reciprocal recombination or gene conversion in the case of nonreciprocal recombination. Both gene deletion and gene conversion can introduce loss-of-function variants and in both cases the targeted caller will report these variants in the target gene. In the case of gene deletion, the differentiating sites at the nontarget (i.e. pseudogene) positions will contain the overlapping deletion allele * while the differentiating sites in the target will contain any variant alleles. Although an equivalent VCF representation would be to simply report the deletion with a single structural variant VCF record, reporting small variant VCF records in the target gene allows for identification of the specific mutations that may occur in a gene transcript and matches well with annotation using HGVS nomenclature. Similarly, for gene conversions, variants are reported at differentiating sites in the target gene, rather than as pairs of structural variant breakends.

##FORMAT=<ID=VQL,Number=1,Type=Float,Description="Phred-scaled likelihood for nonvariant genotype call where overlapping deletion (*) ALT alleles are not considered to be variant.">
##FILTER=<ID=RecombinantLowVQL,Description="Region-ambiguous targeted call at recombinant site with VQL below 0.50.">
##FILTER=<ID=RecombinantREF,Description="Region-ambiguous targeted call at recombinant site with GT containing only reference alleles, ignoring any overlapping deletions.">

Overlapping Structural Variant Representation

The use of GT=0 for symbolic structural variant alleles is formally disambiguated in VCFv4.4, specifying that "GT=0 indicates the absence of any of the ALT symbolic structural variants defined in the record". With this convention we can report compound overlapping heterozygous structural variants.

In the hba genotype depicted above, two overlapping SVs can be represented as follows:

chr16	170262	.	G	<DEL>,<DUP>	.	.	END=174517;IMPRECISE;SVLEN=4255,4255;SVCLAIM=DJ,DJ;ALLELE_ID=.,-a4.2,aaa4.2	GT	0/2
chr16	173301	.	A	<DEL>,<DUP>	.	.	END=177104;IMPRECISE;SVLEN=3804,3804;SVCLAIM=DJ,DJ;ALLELE_ID=.,-a3.7,aaa3.7	GT	0/1

The relevant header lines for the VCF records above are:

##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=SVLEN,Number=A,Type=Integer,Description="Length of structural variant">
##INFO=<ID=SVCLAIM,Number=A,Type=String,Description="Claim made by the structural variant call. Valid values are D, J, DJ for abundance, adjacency and both respectively.">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">

Variable Number Tandem Repeat Representation

In the depicted example there is a Variable Number Tandem Repeat (VNTR) region composed of three repeat units in the reference. The CN INFO field is used to report the allele copy number, the CN FORMAT field to is used report the region total copy number given by the sum of the allele copy numbers, and the REPCN FORMAT field is used to report the repeat unit copy number equal to the allele copy number multiplied by the number of repeat units in the reference.

This VNTR can be represented as follows:

chr1 100 . A <DUP>,<DUP> . . END=400;EVENT=A;EVENTTYPE=VNTR;SVCLAIM=D;SVLEN=300;CN=2.6,4.3   GT:CN:REPCN 1|2:6.9:8|13

The REPCN and CN header lines are:

##FORMAT=<ID=REPCN,Number=1,Type=String,Description="Number of repeat units spanned by the allele">
##INFO=<ID=CN,Number=A,Type=Float,Description="Copy number of CNV / breakpoint">
##FORMAT=<ID=CN,Number=1,Type=Float,Description="Estimated copy number">

Additional Filters

For lpa, rh and smn, the TargetedLowQual filter is applied if the QUAL of a target variant is less than 3.00.

##FILTER=<ID=TargetedLowQual,Description="Set if call has QUAL < 3.00">

Similarly, for cyp21a2 and gba the TargetedLowVQL filter is applied if the VQL of a target variant in low-homology region is less than 3.00.

##FORMAT=<ID=VQL,Number=1,Type=Float,Description="Phred-scaled likelihood for nonvariant genotype call where overlapping deletion (*) ALT alleles are not considered to be variant.">
##FILTER=<ID=TargetedLowVQL,Description="Set if call has VQL < 3.00.">

The TargetedLowGQ filter is applied if the targeted variant has GQ smaller than 3.

##FILTER=<ID=TargetedLowGQ,Description="Set if call has GQ < 3 and JGQ is not present.">

Merging Targeted Calls In The hard-filtered Files

When the small variant caller is enabled, the targeted small variant VCF calls can be merged into the <output-file-prefix>.hard-filtered.vcf.gz and <output-file-prefix>.hard-filtered.gvcf.gz files, briefly hard-filtered files. The --targeted-merge-vc command line option can be used to control which targets will have their small variant VCF records merged into the hard-filtered files. For example, --targeted-merge-vc rh will enable merging of the calls from the rh caller into the hard-filtered files and --targeted-merge-vc rh hba will enable merging of the calls from the rh and hba targets into the hard-filtered files. The true value will merge all calls from all supported targets into the hard-filtered files, while the false value will merge no calls into the hard-filtered files.

The targeted calls merged into the hard-filtered files are marked with a TARGETED INFO flag.

When enabled, targeted small variants are merged into the hard-filtered files regardless of any regions that may be provided using the --vc-target-bed option.

Merging Strategy

The merging strategy for targeted small variant calls is to prioritize the targeted calls over small variant calls from the germline small variant caller. When a germline small variant call overlaps a targeted caller call, then the small variant call is filtered with a TargetedConflict filter if any of the following holds:

  • The targeted caller call is PASS.

  • The small variant call and targeted caller call have incompatible genotypes and the targeted caller call is not filtered with the TargetedLowGQ filter.

The strategy is summarized in the following examples.

  1. The TARGETED call is PASS.

chr1 100 . A	C	. TargetedConflict 	.			GT 0/1
chr1 100 . A	C	. PASS 				TARGETED 	GT 1/1
  1. The TARGETED call and the small variant call are not overlapping

chr1 110 . T	TCA	. PASS 				. 			GT 0/1
chr1 111 . G	A	. PASS 				TARGETED 	GT 0/1
  1. The TARGETED call is filtered with TargetedLowQual and has a discordant variant representation with the overlapping small variant call.

chr1 120 . ATTC A	. TargetedConflict	.			GT 0/1
chr1 121 . T	A	. TargetedLowQual	TARGETED 	GT 0/1
chr1 125 . TCAC T	. TargetedLowQual	TARGETED	GT 0/1
chr1 126 . C	G	. TargetedConflict	.			GT 0/1
  1. The TARGETED call is filtered with TargetedLowQual and has a discordant genotype with the overlapping small variant call.

chr1 130 . C	G	. TargetedConflict	.			GT 0/1
chr1 130 . C	G	. TargetedLowQual	TARGETED 	GT 1/1
  1. The TARGETED call is filtered with TargetedLowGQ and has a discordant genotype with the overlapping small variant call.

chr1 140 . AC	A	. PASS			.			GT:GQ 0/1:5
chr1 140 . A	T	. TargetedLowGQ	TARGETED 	GT:GQ 1/1:2

Command-Line Examples

FASTQ Input Example

The following command-line example runs the targeted caller from FASTQ input:

dragen \
	-r /staging/human/reference/hg38_alt_aware/DRAGEN/${HASH_TABLE_VERSION} \
	--fastq-file1 /staging/test/data/NA12878_R1.fastq \
	--fastq-file2 /staging/test/data/NA12878_R2.fastq \
	--output-directory /staging/test/output \
	--output-file-prefix NA12878_dragen \
	--RGID DRAGEN_RGID \
	--RGSM NA12878 \
	--enable-targeted=true

Prealigned BAM Input Example

The following command-line example runs cyp21a2 only using BAM input without realignment:

dragen \
	-r /staging/human/reference/hg38_alt_aware/DRAGEN/${HASH_TABLE_VERSION} \
	--bam-input /staging/test/data/NA12878.bam \
	--output-directory /staging/test/output \
	--output-file-prefix NA12878_dragen \
	--enable-map-align=false \
	--enable-targeted=cyp21a2

Calls at differentiating sites within the recombinant variant calling region will contain the same "joint" fields as are reported for nonrecombinant-like variants in high homology regions (see ). However, the collapsed diploid FORMAT/GT will be based on any detected recombination events. Because detected recombinant variants are placed in the target gene, these records are filtered differently than the ambiguously placed, nonrecombinant-like variants in high homology regions. The INFO/Recombinant flag is added to calls derived from recombinant variant calling to distinguish them from nonrecombinant-like variant calls in high homology regions. The FORMAT/VQL field is used to apply the RecombinantLowVQL filter for low quality recombinant variants and the RecombinantREF filter is applied when the collapsed diploid FORMAT/GT contains only reference alleles.

The targeted caller can be enabled in parallel with other components as part of a human WGS germline analysis workflow (see ).

DRAGEN Recipe - Germline WGS
Nonrecombinant-like Variants In High Homology Regions
High Homology Region Variant Example
Gene Conversion Example
Overlapping Variants Representation Example
VNTR Example
Targeted Caller Options