# LPA Caller

The *LPA* Caller is capable of identifying the *LPA* Kringle-IV-2 (KIV-2) VNTR unit copy number from whole-genome sequencing (WGS) data. Due to high sequence similarity between the genes, a specialized caller is necessary to resolve the VNTR unit copy number.

The *LPA* Caller performs the following steps:

1. Determines total *LPA* KIV-2 VNTR unit copy number.
2. Determines the heterozygous *LPA* KIV-2 VNTR unit copy number if heterozygous KIV-2 markers are present.
3. Calls small variants in the *LPA* KIV-2 VNTR region based on the unit copy number along with allele counts from read information.

The *LPA* Caller requires WGS data aligned to a human reference genome with at least 30x coverage. Reference genome builds must be based on `hg19`, `GRCh37`, or `hg38`.

## Total *LPA* KIV-2 VNTR Unit Copy Number

The first step of *LPA* calling is to determine the unit copy number of *LPA* KIV-2. Reads aligned to the *LPA* KIV-2 are counted. The counts in each region are corrected for GC-bias, and then normalized to a diploid baseline. The GC-bias correction and normalization factors are determined from read counts in 3000 preselected 2 kb regions across the genome. These 3000 normalization regions were randomly selected from the portion of the reference genome having stable coverage across population samples.

## Heterozygous *LPA* KIV-2 VNTR Unit Copy Number

The second step of *LPA* calling is to determine the heterozygous unit copy numbert of *LPA* KIV-2. Heterozygous unit copy number is determined using two specific linked SNV sites that have been identified as a combined marker allele that is always present in every copy of the repeat unit concordantly. That is, if any copy of the repeat unit in an *LPA* haplotype contains the ALT alleles at those two SNV sites, then every copy of the repeat unit in that *LPA* haplotype contains the ALT alleles at those two sites. The relative read coverage for the ALT and REF cases at these sites can therefore be used to determine the proportions of overall copy numbers across the KIV repeat array that belong to each haplotype.

## Small Variant Calling

2 small variants are detected from the read alignments. These variants occur in the *LPA* KIV-2 VNTR region where reads mapping to either of the 6 units in the reference are used for variant calling.

For each variant, reads containing either the variant allele or the nonvariant allele are counted and a binomial model is used to determine the likelihood for each possible variant allele copy number up to the maximum possible as determined from the *LPA* KIV-2 VNTR unit copy number.

## *LPA* Output File

The *LPA* Caller generates a `<output-file-prefix>.targeted.json` file in the output directory. The output file is a JSON formatted file containing the fields below.

| Fields in JSON | Explanation                              | Type and Possible Values |
| -------------- | ---------------------------------------- | ------------------------ |
| sample         | The sample name.                         | string                   |
| dragenVersion  | The version of DRAGEN.                   | string                   |
| lpa            | The LPA targeted caller specific fields. | dictionary               |

The `lpa` fields are defined as below.

| Fields in JSON            | Explanation                                                                                               | Type and Possible Values                                                                          |
| ------------------------- | --------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| kiv2CopyNumber            | Total KIV-2 unit copy number                                                                              | float                                                                                             |
| refMarkerAlleleCopyNumber | Null if Homozygous REF/ALT markers call                                                                   | float, null                                                                                       |
|                           | Float if Heterozygous markers call and stores the KIV-2 unit copy number of the allele having REF markers |                                                                                                   |
| altMarkerAlleleCopyNumber | Null if Homozygous REF/ALT markers call                                                                   | float, null                                                                                       |
|                           | Float if Heterozygous markers call and stores the KIV-2 unit copy number of the allele having ALT markers |                                                                                                   |
| type                      | "Heterozygous markers call" if we observe both REF and ALT markers                                        | string, "Heterozygous markers call", "Homozygous REF markers call", "Homozygous ALT markers call" |
|                           | "Homozygous REF markers call" if we observe only REF markers                                              |                                                                                                   |
|                           | "Homozygous ALT markers call" if we observe only ALT markers                                              |                                                                                                   |
| variants                  | List of known variants that were detected in the KIV-2 region.                                            | list of variants                                                                                  |

For the `variants` the fields are defined as below.

| Fields in JSON       | Explanation                               | Type and Possible Values |
| -------------------- | ----------------------------------------- | ------------------------ |
| hgvs                 | HGVS identifier of the variant            | string                   |
| qual                 | Phred QUAL score of the variant           | double                   |
| altCopyNumber        | Copy number of the ALT variant            | double                   |
| altCopyNumberQuality | Phred QUAL copy number of the ALT variant | double                   |

The *LPA* Caller also generates a `<output-file-prefix>.targeted.vcf[.gz]` file in the output directory. The output file is a `VCFv4.2` formatted file possibly compressed.

### JSON Output File Examples

Examples of the *LPA* Caller content in the output json file are shown below.

```
{
  "lpa": {
    "kiv2CopyNumber": 39.236019179048256,
    "refMarkerAlleleCopyNumber": 17.84675290975861,
    "altMarkerAlleleCopyNumber": 21.389266269289646,
    "type": "Heterozygous markers call",
    "variants": [
      {
        "hgvs": "LPA:4925G>A",
        "qual": 0.0,
        "altCopyNumber": 0,
        "altCopyNumberQuality": 150.0
      },
      {
        "hgvs": "LPA:4733G>A",
        "qual": 135.16839207370617,
        "altCopyNumber": 1,
        "altCopyNumberQuality": 18.641166763477106
      }
    ]
  }
}
```

```
{
  "lpa":  {
    "kiv2CopyNumber": 28.709054093341653,
    "refMarkerAlleleCopyNumber": null,
    "altMarkerAlleleCopyNumber": null,
    "type": "Homozygous REF markers call",
        "variants": [
      {
        "hgvs": "LPA:4925G>A",
        "qual": 0.0,
        "altCopyNumber": 0,
        "altCopyNumberQuality": 150.0
      },
      {
        "hgvs": "LPA:4733G>A",
        "qual": 132.8341692747592,
        "altCopyNumber": 1,
        "altCopyNumberQuality": 15.776967125542765
      }
    ]
  }
}
```

### VCF Output File Examples

The following are example output files:

```
##fileformat=VCFv4.2
...
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	HG00096
chr6	160613491	.	A	<DUP>,<DUP>	.	PASS	SVCLAIM=D,D;END=160646772;SVLEN=33281,33281;CN=2.974459,3.564878;EVENT=LPA:KIV2,.;EVENTTYPE=VNTR,.	GT:CN:REPCN:PS	1|2:6.539337:18|21:160613491
chr6	160613786	.	T	G	.	PASS	EVENT=LPA:296T>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:PS	0|1:160613491
chr6	160614754	.	C	G	.	PASS	EVENT=LPA:1264C>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:PS	0|1:160613491
chr6 160618484 . C T 0.00 TargetedLowQual;TargetedRepeatConflict EVENT=LPA:4925G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:150
chr6	160618676	.	C	T	135.17	TargetedRepeatConflict	EVENT=LPA:4733G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:GQ	0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1:19
...
chr6	160641520	.	T	G	.	PASS	EVENT=LPA:296T>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:PS	0|1:160613491
chr6	160642488	.	C	G	.	PASS	EVENT=LPA:1264C>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:PS	0|1:160613491
chr6	160646214	.	C	T	0.00	TargetedLowQual;TargetedRepeatConflict	EVENT=LPA:4925G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:GQ	0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:150
chr6	160646406	.	C	T	135.17	TargetedRepeatConflict	EVENT=LPA:4733G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:GQ	0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1:19
```

```
##fileformat=VCFv4.2
...
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	HG00097
chr6	160613491	.	A	<DUP>,<CNV>	.	PASS	SVCLAIM=D,D;END=160646772;SVLEN=33281,33281;CN=.,.;EVENT=LPA:KIV2,.;EVENTTYPE=VNTR,.	GT:CN:REPCN:PS	1|2:4.784842:.|.:160613491
chr6	160613786	.	T	G	.	PASS	EVENT=LPA:296T>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:PS	0/0:160613491
chr6	160614754	.	C	G	.	PASS	EVENT=LPA:1264C>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:PS	0/0:160613491
chr6	160618484	.	C	T	0.00	TargetedLowQual;TargetedRepeatConflict	EVENT=LPA:4925G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:GQ	0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:150
chr6	160618676	.	C	T	132.83	TargetedRepeatConflict	EVENT=LPA:4733G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:GQ	0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1:16
...
chr6	160641520	.	T	G	.	PASS	EVENT=LPA:296T>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:PS	0/0:160613491
chr6	160642488	.	C	G	.	PASS	EVENT=LPA:1264C>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:PS	0/0:160613491
chr6	160646214	.	C	T	0.00	TargetedLowQual;TargetedRepeatConflict	EVENT=LPA:4925G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:GQ	0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:150
chr6	160646406	.	C	T	132.83	TargetedRepeatConflict	EVENT=LPA:4733G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION	GT:GQ	0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1:16
```
