The LPA Caller is capable of identifying the LPA Kringle-IV-2 (KIV-2) VNTR unit copy number from whole-genome sequencing (WGS) data. Due to high sequence similarity between the genes, a specialized caller is necessary to resolve the VNTR unit copy number.
The LPA Caller performs the following steps:
Determines total LPA KIV-2 VNTR unit copy number.
Determines the heterozygous LPA KIV-2 VNTR unit copy number if heterozygous KIV-2 markers are present.
Calls small variants in the LPA KIV-2 VNTR region based on the unit copy number along with allele counts from read information.
The LPA Caller requires WGS data aligned to a human reference genome with at least 30x coverage. Reference genome builds must be based on hg19, GRCh37, or hg38.
Total LPA KIV-2 VNTR Unit Copy Number
The first step of LPA calling is to determine the unit copy number of LPA KIV-2. Reads aligned to the LPA KIV-2 are counted. The counts in each region are corrected for GC-bias, and then normalized to a diploid baseline. The GC-bias correction and normalization factors are determined from read counts in 3000 preselected 2 kb regions across the genome. These 3000 normalization regions were randomly selected from the portion of the reference genome having stable coverage across population samples.
Heterozygous LPA KIV-2 VNTR Unit Copy Number
The second step of LPA calling is to determine the heterozygous unit copy numbert of LPA KIV-2. Heterozygous unit copy number is determined using two specific linked SNV sites that have been identified as a combined marker allele that is always present in every copy of the repeat unit concordantly. That is, if any copy of the repeat unit in an LPA haplotype contains the ALT alleles at those two SNV sites, then every copy of the repeat unit in that LPA haplotype contains the ALT alleles at those two sites. The relative read coverage for the ALT and REF cases at these sites can therefore be used to determine the proportions of overall copy numbers across the KIV repeat array that belong to each haplotype.
Small Variant Calling
2 small variants are detected from the read alignments. These variants occur in the LPA KIV-2 VNTR region where reads mapping to either of the 6 units in the reference are used for variant calling.
For each variant, reads containing either the variant allele or the nonvariant allele are counted and a binomial model is used to determine the likelihood for each possible variant allele copy number up to the maximum possible as determined from the LPA KIV-2 VNTR unit copy number.
LPA Output File
The LPA Caller generates a <output-file-prefix>.targeted.json file in the output directory. The output file is a JSON formatted file containing the fields below.
The lpa fields are defined as below.
For the variants the fields are defined as below.
The LPA Caller also generates a <output-file-prefix>.targeted.vcf[.gz] file in the output directory. The output file is a VCFv4.2 formatted file possibly compressed.
JSON Output File Examples
Examples of the LPA Caller content in the output json file are shown below.
##fileformat=VCFv4.2
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096
chr6 160613491 . A <DUP>,<DUP> . PASS SVCLAIM=D,D;END=160646772;SVLEN=33281,33281;CN=2.974459,3.564878;EVENT=LPA:KIV2,.;EVENTTYPE=VNTR,. GT:CN:REPCN:PS 1|2:6.539337:18|21:160613491
chr6 160613786 . T G . PASS EVENT=LPA:296T>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:PS 0|1:160613491
chr6 160614754 . C G . PASS EVENT=LPA:1264C>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:PS 0|1:160613491
chr6 160618484 . C T 0.00 TargetedLowQual;TargetedRepeatConflict EVENT=LPA:4925G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:150
chr6 160618676 . C T 135.17 TargetedRepeatConflict EVENT=LPA:4733G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1:19
...
chr6 160641520 . T G . PASS EVENT=LPA:296T>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:PS 0|1:160613491
chr6 160642488 . C G . PASS EVENT=LPA:1264C>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:PS 0|1:160613491
chr6 160646214 . C T 0.00 TargetedLowQual;TargetedRepeatConflict EVENT=LPA:4925G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:150
chr6 160646406 . C T 135.17 TargetedRepeatConflict EVENT=LPA:4733G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1:19
##fileformat=VCFv4.2
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00097
chr6 160613491 . A <DUP>,<CNV> . PASS SVCLAIM=D,D;END=160646772;SVLEN=33281,33281;CN=.,.;EVENT=LPA:KIV2,.;EVENTTYPE=VNTR,. GT:CN:REPCN:PS 1|2:4.784842:.|.:160613491
chr6 160613786 . T G . PASS EVENT=LPA:296T>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:PS 0/0:160613491
chr6 160614754 . C G . PASS EVENT=LPA:1264C>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:PS 0/0:160613491
chr6 160618484 . C T 0.00 TargetedLowQual;TargetedRepeatConflict EVENT=LPA:4925G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:150
chr6 160618676 . C T 132.83 TargetedRepeatConflict EVENT=LPA:4733G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1:16
...
chr6 160641520 . T G . PASS EVENT=LPA:296T>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:PS 0/0:160613491
chr6 160642488 . C G . PASS EVENT=LPA:1264C>G;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:PS 0/0:160613491
chr6 160646214 . C T 0.00 TargetedLowQual;TargetedRepeatConflict EVENT=LPA:4925G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0:150
chr6 160646406 . C T 132.83 TargetedRepeatConflict EVENT=LPA:4733G>A;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT:GQ 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1:16