# CYP21A2 Caller

The CYP21A2 Caller is capable of genotyping the *CYP21A2* gene from whole-genome sequencing (WGS) data. Due to high sequence similarity with its pseudogene paralog *CYP21A1P* and a wide variety of common structural variants (SVs), a specialized caller is necessary to resolve variants.

The CYP21A2 calling workflow is broken up into the following major stages:

1. Loading input configuration
2. Processing read data
3. Analyzing read data

Read data analysis is further split into the following steps:

1. Determine total *CYP21A2* and *CYP21A1P* copy number from read depth.
2. Call small variants in *CYP21A2* copies.
3. Phase reads to detect common variants and recombination events.
4. Identify most likely haplotypes.

The CYP21A2 Caller requires WGS data aligned to a human reference genome with at least 30x coverage.

## Total *CYP21A2* and *CYP21A1P* Copy Number

The first step of CYP21A2 calling is to determine the combined copy number of *CYP21A2* and *CYP21A1P*. Reads aligned to regions in either *CYP21A2* or *CYP21A1P* are counted. The counts in each region are corrected for GC-bias, and then normalized to a diploid baseline. The GC-bias correction and normalization factors are determined from read counts in 3000 preselected 2kb regions across the genome. These 3000 normalization regions were randomly selected from the portion of the reference genome having stable coverage across population samples. The combined *CYP21A2* and *CYP21A1P* copy number is then calculated from the average sequencing depth across the *CYP21A2* and *CYP21A1P* regions.

## Nonrecombinant-like Variant Calling

Of the known nonrecombinant-like variants, some are in unique (nonhomologous) regions of *CYP21A2* with high mapping quality. Only reads mapping to *CYP21A2* are used for calling variants in nonhomologous regions. The other variants occur in homologous regions of *CYP21A2*/*CYP21A1P* where reads mapping to either are used for variant calling.

For each variant, reads containing either the variant allele or the nonvariant allele is counted. A binomial model that incorporates the sequencing error rate is then used to determine the most likely variant copy number (0 for nonvariant).

For a list of the supported nonrecombinant-like variants, refer to the `targeted/cyp21a2/target_variants_*.tsv` files located in the `resources` directory of the DRAGEN install location.

## Nonallelic Homologous Recombination Variant Calling

To analyze the homologous region even further, DRAGEN phases reads covering differentiating sites and known variant sites. Whenever a detected haplotype has a *CYP21A2*->*CYP21A1P* or *CYP21A1P*-> *CYP21A2* transition that is consistent with one of the known recombinant-like variants, the transition is considered as a candidate breakpoint for calling those variants. Reads containing phasing information for the two sites flanking each candidate breakpoint are used for variant calling. When the read data supports the hypothesis that the sample contains at least one copy of a candidate breakpoint, the associated haplotype is a recombinant haplotype candidate. Recombinant haplotype candidates are sorted by likelihood and the number of variant sites. If no wild type haplotype was detected, DRAGEN reports any detected homozygous recombinant haplotype, or up to two different recombinant haplotypes (i.e. compound het) if detected. If any wild type haplotype was found, DRAGEN reports a maximum of one recombinant haplotype. When no recombinant haplotypes are detected two wild type haplotypes are reported.

For a list of recombinant variant sites, refer to the `targeted/cyp21a2/recombinant_variants_*.tsv` files located in the `resources` directory of the DRAGEN install location.

Note that NM\_000500.9:c.710\_719delinsACGAGGAGAA will be reported as the following three variants on the same haplotype: NM\_000500.9:c.710T>A NM\_000500.9:c.713T>A NM\_000500.9:c.719T>A

## CYP21A2 Output File

The CYP21A2 Caller generates its output in the targeted caller output file `<output-file-prefix>.targeted.json` that also contains calls from other targets (see [Targeted JSON File](https://help.connected.illumina.com/dragen/product-guides/dragen-v4.5/dragen-dna-pipeline/targeted-caller/..#targeted-json-file)).

### Output File Example

An example of the CYP21A2 caller content in the `<output-file-prefix>.targeted.json` output file is shown below.

```json
{        
    "cyp21a2": {
                "totalCopyNumber": 4,
                "deletionBreakpointInGene": null,
                "recombinantHaplotypes": [
                  "NM_000500.9:c.955C>T",
                  ""
                ],
                "recombinantHaplotypesFilter": "RecombinantSiteDepthMismatch",
                "phasedHaplotypes": {
                  "targetAlleleDepths": [ 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 2, 2, 2, 1, 2],
                  "targetAlleleDepthsQual": 2.6241045542695463,
                  "rawHaplotypes": [
                    "................TT",
                    "................NT",
                    "................NN",
                    ".........TTTTT....",
                    ".........NNNN.....",
                    ".TTTTTTT..........",
                    ".NNNNNN...........",
                    "T.................",
                    "N................."
                  ],
                  "depthMatchedHaplotypes": {
                    "targetAlleleDepths": [1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2],
                    "targetAlleleDepthsQual": 0.1321771093877906,
                    "numMatchingHaplotypeSets": 52,
                    "topHaplotypeSets": [
                      {
                        "populationPriorQual": 12.124787664997037,
                        "haplotypes": [
                          {
                            "recombinantAlleleIDs": "",
                            "haplotype": "TTTTTTTT.TTTTT..TT",
                            "copyNumber": 1
                          },
                          {
                            "recombinantAlleleIDs": "NM_000500.9:c.92C>T+NM_000500.9:c.293-13C>G+NM_000500.9:c.332_339del+NM_000500.9:c.955C>T",
                            "haplotype": "NNNNNNN..TTTTT..NT",
                            "copyNumber": 1
                          },
                          {
                            "recombinantAlleleIDs": "NM_000500.9:c.92C>T+NM_000500.9:c.293-13C>G+NM_000500.9:c.332_339del+NM_000500.9:c.710T>A+NM_000500.9:c.713T>A+NM_000500.9:c.719T>A+NM_000500.9:c.955C>T+NM_000500.9:c.1069C>T",
                            "haplotype": "NNNNNNN..NNNN...NN",
                            "copyNumber": 2
                          }
                        ]
                      },
                      {
                        "populationPriorQual": 0.14664469798819596,
                        "haplotypes": [
                          {
                            "recombinantAlleleIDs": "NM_000500.9:c.955C>T",
                            "haplotype": "TTTTTTTT.TTTTT..NT",
                            "copyNumber": 1
                          },
                          {
                            "recombinantAlleleIDs": "NM_000500.9:c.92C>T+NM_000500.9:c.293-13C>G+NM_000500.9:c.332_339del",
                            "haplotype": "NNNNNNN..TTTTT..TT",
                            "copyNumber": 1
                          },
                          {
                            "recombinantAlleleIDs": "NM_000500.9:c.92C>T+NM_000500.9:c.293-13C>G+NM_000500.9:c.332_339del+NM_000500.9:c.710T>A+NM_000500.9:c.713T>A+NM_000500.9:c.719T>A+NM_000500.9:c.955C>T+NM_000500.9:c.1069C>T",
                            "haplotype": "NNNNNNN..NNNN...NN",
                            "copyNumber": 2
                          }
                        ]
                      }
                    ]
                  }
                },
            "variants": [
                {
                    "alleleId": "NM_000500.9:c.1360C>T",
                    "alleleCopyNumber": 2,
                    "genotypeQuality": 18,
                    "filter": "PASS"
                }
            ]
    }
}
```
