# HBA Caller

The HBA Caller is capable of genotyping the *HBA1* and *HBA2* genes from whole-genome sequencing (WGS) and whole-exome sequencing (WES) data. Due to high sequence similarity between the genes, a specialized caller is necessary to resolve the possible genotypes of the pair of genes. We consider regions surrounding the *HBA1* and *HBA2* sites to resolve the possible *HBA1* and *HBA2* genotypes.

The HBA Caller performs the following steps:

1. Determines total copy number from read depth of the regions surrounding the *HBA1* and *HBA2* sites.
2. Determines HBA genotypes based on the copy number of the regions surrounding the *HBA1* and *HBA2* sites.
3. Calls small variants in the *HBA1* and *HBA2* regions based on the region copy number derived from the genotype along with allele counts from read information.

For a comprehensive evaluation of the HBA caller, see [HBA targeted caller blog post](https://www.illumina.com/science/genomics-research/articles/HBA-targeted-caller.html).

For information about enabling the HBA caller see [Targeted Caller](https://help.connected.illumina.com/dragen/dragen-v4.4/product-guide/dragen-v4.4/dragen-dna-pipeline/targeted-caller/broken-reference).

## Total Copy Number of the regions surrounding the *HBA1* and *HBA2* sites

The first step of HBA calling is to determine the copy number of the regions surrounding the *HBA1* and *HBA2* sites. Reads aligned to the regions are counted. The counts in each region are corrected for GC-bias, and then normalized to a diploid baseline. The GC-bias correction and normalization factors are determined from read counts in preselected regions across the genome. Finally, a Gaussian Mixture Model (GMM) is used to obtain the integer region copy number from the region normalized counts.

## Genotyping

The genotyping step attempts to identify the two likely haplotypes described in the following table, where "a" stands for a functional copy of either *HBA1* or *HBA2*, "-" stands for a nonfunctional/missing copy of either *HBA1* or *HBA2*, while "3.7", "4.2", and "20.5" describe the recombinant event that likely caused the deletion/duplication of the functional HBA copy.

| Genotype        |
| --------------- |
| aaa3.7/aa       |
| aaa4.2/aa       |
| aaa20.5/aa      |
| aaa20.5/aaa3.7  |
| aaa20.5/aaa4.2  |
| aaa20.5/aaa20.5 |
| aa/aa           |
| -a3.7/aa        |
| -a4.2/aa        |
| -a20.5/aa       |
| --/aaa3.7       |
| --/aaa4.2       |
| --/aaa20.5      |
| -a3.7/aaa20.5   |
| -a4.2/aaa20.5   |
| -a20.5/aaa3.7   |
| -a20.5/aaa4.2   |
| -a3.7/-a3.7     |
| -a4.2/-a4.2     |
| -a20.5/-a20.5   |
| -a3.7/-a4.2     |
| -a20.5/-a3.7    |
| -a20.5/-a4.2    |
| --/aa           |
| --/-a3.7        |
| --/-a4.2        |
| --/-a20.5       |
| --/--           |

The quality of fitting the region copy number calls to one of these genotypes is reported in the `FORMAT/TargetedSVModelQual` field for each SV VCF record and the `FILTER/TargetedSVModelQual` is applied to each record when the phred-scaled quality is below the threshold specified in the VCF header.

## Small Variant Calling

18 small variants are detected from the read alignments. These variants occur in homologous regions of *HBA1* and *HBA2* where reads mapping to either *HBA1* or *HBA2* are used for variant calling.

For each variant, reads containing either the variant allele or the nonvariant allele are counted and a binomial model is used to determine the likelihood for each possible variant allele copy number up to the maximum possible as determined from the *HBA1*/*HBA2* genotyping.

## HBA Output File

The HBA Caller generates its output in the targeted caller output file `<output-file-prefix>.targeted.json` that also contains calls from other targets (see [Targeted JSON File](https://help.connected.illumina.com/dragen/dragen-v4.4/product-guide/dragen-v4.4/dragen-dna-pipeline/targeted-caller/broken-reference)).

| Fields in JSON  | Explanation                                                    | Type and Possible Values                          |
| --------------- | -------------------------------------------------------------- | ------------------------------------------------- |
| totalCopyNumber | Total copy number of *HBA1* and *HBA2* genes including hybrids | nonnegative integer                               |
| genotype        | The HBA genotype.                                              | string                                            |
| genotypeFilter  | The HBA genotype filter.                                       | string, \[PASS, HBALowGQ, HBALowPValue, No\_call] |
| genotypeQual    | The HBA Phred genotype quality.                                | double                                            |
| variants        | List of detected homology region variants in *HBA1*/*HBA2*.    | Array of variants                                 |

Each variant reported in the `variants` array will have the fields below.

| Fields in JSON   | Explanation                                      | Type and Possible Values         |
| ---------------- | ------------------------------------------------ | -------------------------------- |
| alleleId         | HGVS identifier of the variant allele            | string                           |
| alleleCopyNumber | Copy number of the allele in the called genotype | nonnegative integer              |
| genotypeQuality  | Phred-scaled quality for the called genotype     | nonnegative integer              |
| filter           | Filter for the called genotype                   | string. "PASS" when not filtered |

Structural variant and homology region variants are reported in VCF format. See [Targeted VCF File](https://help.connected.illumina.com/dragen/dragen-v4.4/product-guide/dragen-v4.4/dragen-dna-pipeline/targeted-caller/broken-reference) for details about how these variants are reported in VCF.

### Output File Example

An example of the HBA caller content in the `<output-file-prefix>.targeted.json` output file is shown below.

```
  "hba": {
    "totalCopyNumber": 4,
    "genotype": "aa/aa",
    "genotypeQuality": 93,
    "genotypeFilter": "PASS",
    "variants": [
      {
        "alleleId": "NM_000517.6:c.95+1G>A",
        "alleleCopyNumber": 1,
        "genotypeQuality": 26,
        "filter": "PASS"
      }
    ]
  }
```
