How does Emedgene calculate variant effect and severity ?

Variant effect

For each variant that is mapped to the reference genome, Emedgene uses Ensembl’s Variant Effect Predictor (VEP) and the RefSeq (NCBI) library of transcripts to calculate variant effect. VEP uses a set of consequence terms defined by the Sequence Ontology (SO), including immediately recognizable terms like “missense_variant” and “frame_shift_variant” as well as some more esoteric ones like “non_coding_transcript_exon_variant”.

The full list of terms, along with detailed descriptions and severity impact categories can be found in the link below.

Importantly, each variant has a "main_effect" and "main_gene" chosen based on the most prioritized transcript for this variant. Transcript prioritization depends on many different parameters and on different Emedgene pipeline versions as described here.

Variant severity

Variant severity, also known as variant impact, is a subjective assessment of the severity of a variant consequence.

Severity is usually categorized as modifier, low, moderate or high:

  • Modifier severity is used for non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact. Inter-genic and non-coding variants are classic examples.

  • Low severity is used for variants that are assumed to be mostly harmless or unlikely to change protein function. This includes synonymous variants.

  • Moderate severity is used for non-disruptive variants that might change protein effectiveness, such as missense variants and in-frame insertions/deletions.

  • High severity is used for variants that are assumed to have a disruptive impact on abundance protein, such as by causing protein truncation, loss of open reading-frame, and/or triggering nonsense mediated decay.

Most of the time, variant effect and variant severity on Emedgene are consistent with VEP. However, genomics is a field defined by exceptions. There are key factors, outlined below, the Emedgene genetic team believes are critical to account for when assigning severity.

For small variants (SNV):

  1. Splice prediction: Small variants will be upgraded to HIGH severity if its splicing prediction is high (dbscSNV > 0.6 or max spliceAI > 0.8) or MODERATE if its splicing prediction is moderate (max spliceAI > 0.2 or dbscSNV > 0.5).

  2. Conservation: Synonymous variants and splice region variants that are highly conserved (GERP score > 0.9 or PhastCons100 > 0.2) will be upgraded to MODERATE.

  3. Non-coding RNA disease genes: The severity of a small variant will be upgraded to MODERATE if the variant is within a list of RNA genes known to be associated with disease. The current list of RNA genes is:

ATXN8OS, GNAS-AS1, H19, HELLPAR, KCNQ1OT1, LINC00237, LINC00299, MEG3, MIAT, MIR137, MIR140, MIR184, MIR19B1, MIR204, MIR2861, MIR4718, MIR605, MIR96, MIR99A, RMRP, RNU12, RNU4ATAC, SNORD116-1, SNORD118, TERC, MT-TF, MT-RNR1, MT-TV, MT-RNR2, MT-TL1, MT-TI, MT-TQ, MT-TM, MT-TW, MT-TA, MT-TN, MT-TC, MT-TY, MT-TS1, MT-TD, MT-TK, MT-TG, MT-TR, MT-TH, MT-TS2, MT-TL2, MT-TE, MT-TT, MT-TP, RNU7-1*, RNU4-2*. *Added in V35.2

For CNV/SV:

VEP annotates CNVs with overlapping genomic features and designates them with the following effects: transcript amplification (DUP), feature elongation (DUP, INS), feature truncation (DEL), and transcript ablation (DEL). However, the severity assigned by VEP for CNVs does not reflect the complexity of CNV effects on protein function and in our experience is not suitable for genome analysis and filtering.

On Emedgene, variants are annotated in regards to its overlap with three different types of regions: ‘coding regions’, ‘clinical regions’, and ‘full gene’ region (see here for a more detailed description about the BED files used in the system).

The region annotation is then used to assess severity for CNV and SV as follow:

HighModerateLowModifier

Deletion (DEL)

Coding regions

Clinical Regions and not in Coding regions

Full gene and not in Clinical Regions

No overlap with any BED

Gain (DUP)

Intragenic (coding regions but not entire gene region)

Coding Regions / Clinical Regions not intragenic

Full gene and not in Clinical Regions

No overlap with any BED

Insertion (INS)

Coding regions

Clinical Regions and not in Coding regions

Full gene and not in Clinical regions

None

Table 1: CNV/SV severity table. For each category of CNV/SV, the types of regions that overlap a given variant required to trigger the severity classification are shown.

For STR variants:

Emedgene is using an internal annotation for STR variants. More details can be provided by request to techsupport@illumina.com.

Known limitations

  • List of RNA genes known to be associated with disease is updated overtime as part of pipeline update.

  • Emedgene does not provide VEP annotation for non-coding regulatory data.

  • The “main effect” filter under the advanced “variant effect filters” in Emedgene UI represents the effects categorized by the relevant severity. It should be noted that the following effects in the advanced filter are missing. They will be added in V34.6 and V35.3.

Effect nameSeverity

splice_donor_5th_base_variant

LOW

splice_donor_region_variant

LOW

splice_polypyrimidine_tract_variant

LOW

coding_sequence_variant

MODIFIER

mature_miRNA_variant

MODIFIER

NMD_transcript_variant

MODIFIER

coding_transcript_variant

MODIFIER

regulatory_region_ablation

MODIFIER

regulatory_region_variant

MODIFIER

intergenic_variant

MODIFIER

sequence_variant

MODIFIER

Last updated