The GBA Caller is capable of detecting both recombinant-like and nonrecombinant-like variants in the GBA gene from whole-genome sequencing (WGS) data. Disruption of all copies of the GBA gene in an individual causes the autosomal recessive disorder Gaucher disease, and carriers are at increased risk of Parkinson's disease and Lewy body dementia. Due to high sequence similarity with its pseudogene paralog GBAP1, calling recombinant-like variants in GBA requires a specialized caller.
To enable the GBA Caller, use --enable-gba=true
as part of a germline-only WGS analysis workflow. The GBA Caller is disabled by default and requires WGS data aligned to a human reference genome with at least 30x coverage.
The GBA Caller performs the following steps:
Determine the total combined GBA and GBAP1 copy number
Detect nonrecombinant-like variants from a set of 111 known variants
Assemble phased haplotypes in the exon 9-11 region where recombinant variants occur
Detect any GBAP1 -> GBA breakpoints that are consistent with one of the 7 known recombinant-like variants
A 10 kb region of unique sequence in between GBA and GBAP1 is used to compute the copy number change due to reciprocal recombination events. Reads that align to this 10 kb region are counted and the count is normalized to a diploid baseline derived from 3000 preselected 2 kb regions across the genome. The 3000 normalization regions are randomly selected from the portion of the reference genome that has stable coverage across population samples. The total combined GBA and GBAP1 copy number is then calculated as two more than the copy number of this 10 kb region.
Of the known nonrecombinant-like variants, some are in unique (nonhomologous) regions of GBA with high mapping quality. Only reads mapping to GBA are used for calling variant in nonhomologous regions. The other variants occur in homologous regions of GBA/GBAP1 where reads mapping to either GBA or GBAP1 are used for variant calling.
For each variant, reads containing the variant allele and the nonvariant alleles are counted. A binomial model that incorporates the sequencing error rate is then used to determine the most likely variant allele copy number (0 for nonvariant).
For a list of the supported nonrecombinant-like variants, refer to the targeted/gba/target_variants_*.tsv
files located in the resources
directory of the DRAGEN install location.
A collection of 10 differentiating sites in the exon 9-11 region of GBA are used to detect the GBA and GBAP1 haplotypes present in the sample. An iterative phasing algorithm is used to build up haplotypes that are supported by the read data. The phasing algorithm starts with seed sites which are then iteratively extended to neighboring sites. At each iteration, reads that can be unambiguously assigned to one of the detected partial haplotypes are used to extend the next neighboring site for each partial haplotype. Iteration continues until all sites have been extended. Some haplotypes may have sites that are unresolved (i.e. ambiguous), but these haplotypes can still participate in GBA -> GBAP1 breakpoint detection.
If any of the 10 differentiating sites in exon 9-11 indicate that there is no wild type GBA allele copies, then the sample is called as homozygous variant and the recombinant-like variant that best matches the depth calls at the 10 sites is reported.
When the sample is not homozygous variant, the phased haplotypes are used to detect heterozygous variants. The detected haplotypes are compared against a set of 7 known recombinant-like variants: A495P, L483P, D448H, c.1263del, RecNciI, RecTL, c.1263del+RecTL). Whenever a detected haplotype has a GBA->GBAP1 or GBAP1->GBA transition that is consistent with one of these 7 known recombinant-like variants, the transition is considered as a candidate breakpoint for calling that recombinant-like variant. Reads containing phasing information for the two sites flanking each candidate breakpoint are used for variant calling. When the read data supports the hypothesis that the sample contains at least one copy of a candidate breakpoint , the associated haplotype is a recombinant haplotype candidate. Recombinant haplotype candidates are sorted by likelihood and the number of variant sites. If no wild type haplotype was detected, DRAGEN reports any detected homozygous recombinant haplotype, or up to two different recombinant haplotypes (i.e. compound het) if detected. If any wild type haplotype was found, DRAGEN reports a maximum of one recombinant haplotype. When no recombinant haplotypes are detected two wild type haplotypes are reported.
The caller can detect the following recombinant variant haplotypes: A495P, L483P, D448H, 1263del, RecNciI, RecTL, and c.1263del+RecTL. Note: RecNciI, RecTL, and c.1263del+RecTL maye be deletion-like recombinant variants. A deletion-like recombinant variant haplotype (as opposed to a gene conversion-like recombinant variant haplotype) is defined as a haplotype with one or fewer switch sites (transitions from a GBAP1 allele to a GBA allele).
The table below shows the HGVS identifiers associated with each recombinant variant haplotype.
The GBA Caller generates its output in the targeted caller output file <output-file-prefix>.targeted.json
that also contains calls from other targets (see Targeted JSON File).
Each nonrecombinant-like variant reported in the variants
array will have the fields below.
Recombinant-like and nonrecombinant-like variants are reported in VCF format. See Targeted VCF File for details about how these variants are reported in VCF.
An example of the GBA caller content in the <output-file-prefix>.targeted.json
output file is shown below.
Recombinant variant haplotype | HGVS identifiers |
---|---|
Fields in JSON | Explanation | Type and Possible Values |
---|---|---|
Fields in JSON | Explanation | Type and Possible Values |
---|---|---|
A495P
NM_000157.4:c.1483G>C
L483P
NM_000157.4:c.1448T>C
D448H
NM_000157.4:c.1342G>C
c.1263del
NM_000157.4:c.1265_1319del
RecNciI
NM_000157.4:c.1483G>C, NM_000157.4:c.1448T>C
RecTL
NM_000157.4:c.1483G>C, NM_000157.4:c.1448T>C, NM_000157.4:c.1342G>C
c.1263del+RecTL
NM_000157.4:c.1483G>C, NM_000157.4:c.1448T>C, NM_000157.4:c.1342G>C, NM_000157.4:c.1265_1319del
totalCopyNumber
Total copy number of all GBA and GBAP1 genes including hybrids
nonnegative integer
deletionBreakpointInGene
null (i.e. unknown) if totalCopyNumber > 3
true, false, null
true if CN <= 3 and a deletion-like recombinant variant haplotype is detected
false if CN <=3 and no deletion-like recombinant variant is detected
recombinantHaplotypes
List of detected haplotypes arising from nonallelic homologous recombination variant calling
Array of two strings. Each string consists of all associated allele IDs (if any) within the haplotype. Consecutive IDs in the same haplotype are separated by a '+'.
variants
List of single site, nonrecombinant-like variants (i.e. not arising from nonallelic homologous recombination). An empty list if no variants are detected.
Array of nonrecombinant-like variants.
alleleId
HGVS identifier of the variant allele
string
alleleCopyNumber
Copy number of the allele in the called genotype
nonnegative integer
genotypeQuality
Phred-scaled quality for the called genotype
nonnegative integer
filter
Filter for the called genotype
string. "PASS" when not filtered