1 of 23

All FAQ

"Failed to generate report". What should I do?Emedgene annotations and update frequency How do I check the version of my environment?How do I move between organizations?What is the required format for a BED file defining a kit?How does joint calling work on Emedgene?Can I use exome data for CNV detection?How do I prepare VCF files generated by DRAGEN MANTA to be used as input for Emedgene?How do I prepare VCF files generated by Dragen STR (ExpansionHunter) to be used as input?How do I analyze mtDNA variants?Which browser should I use with Emedgene?How do I use developer tools to collect logs?Can I analyze Illumina Complete Long Reads in Emedgene?Source of gnomAD data for small variants on GRCh38 How are MNVs handled on the platform?Support for gene lists with up to 10,000 genes Performance issue troubleshooting How does Emedgene Analyze prioritize transcripts?How does Emedgene calculate variant effect and severity ?Variant Effect Filters

Which browser should I use with Emedgene?

To fully enjoy all the features available on the Emedgene platform, we recommend using the latest version of Google Chrome.

Emedgene annotations and update frequency

Every case is annotated with the attached table of resources, including proprietary Illumina prediction scores PrimateAI-3D and SpliceAI. All annotations are versioned, and versions recorded in a Versions tab, and saved per case. Key variant significance and knowledge graph databases are updated monthly, so that the most up-to-date information is available during analysis.

4MB

Illumina_Connected_Software_Emedgene_Annotation_Schema.pdf

pdf

How do I use developer tools to collect logs?

Our team prides itself on helping you troubleshoot issues when they do occur. Sometimes we need to understand the problem in the context of a customer's storage, internet service, and local security infrastructure. We want to preview the network interaction between the customer and our back end services. In these cases, if we are unable to reproduce the bug locally, you can help us help you by collecting the developer tool logs. Here are the steps to follow (example for Google Chrome):

Open the platform and go to View > Developer > Developer tools:
1. From within the browser window, right-click on the screen, and select ‘Inspect’. \
The tools will open on the right-hand side of your screen. There are 4 tabs across the top – Elements, Console, Sources, Network. Select Network.

Under Network, check the box next to Preserve log.
1. Chrome / Chromium / Microsoft Edge / Safari:\
2. Firefox: Click on the Settings button (gear icon) in the Network box, and make sure ‘Persist Logs’ is selected\
Proceed with navigating to the page of interest and reproduce the steps initially took when encountering the issue.
When you experience the issue, please save the log.
1. Chrome / Chromium / Microsoft Edge / Safari: Click on the ‘Export’ or ‘Export as HAR’ button, and save the file in the desired location.
b. Firefox: Click on the Settings button (gear icon) in the Network box, and click ‘Save all as HAR’\

We will begin troubleshooting right away.

Can I analyze Illumina Complete Long Reads in Emedgene?

Yes, Emedgene supports the recommended interpretation flow for Illumina Complete Long Reads since version 32.0.

Illumina Complete Long Reads interpretation flow in emedgene includes:

Ingesting combined output of long-read/short-read callers for both SVs and SNVs, along with the long-read BAM files. Short-read callers from VCF (CNV read-depth and STR) can be additionally used starting from DRAGEN 4.2;
ICLR calling by the BSSH application;
Ingesting VCF and BAM files through the existing BSSH integration;
Explainable AI shortlisting of ICLR variants and automated ACMG classification, when relevant.

Current limitations:

Complex SV variants like inversions and translocations are not yet supported for interpretation.
ICLR cases must be start from VCF files; Emedgene does not support case accessioning from both VCF and FASTQ sources.

How do I prepare VCF files generated by DRAGEN MANTA to be used as input for Emedgene?

All variant types (ie. SNV/indel, CNV, SV, STR) ingested into the Emedgene platform require separate vcf files. In order to ingest a Dragen Manta vcf for Dragen version < 4.2, the following line should be replaced within the header of the Dragen Manta vcf:

Replace

##source=DRAGEN <version>

##source=MANTA-DRAGEN <version>

Example:

Replace

##source=DRAGEN 05.121.645.4.0.3

##source=MANTA-DRAGEN 05.121.645.4.0.3

Note:

Variant types currently annotated and displayed in Emedgene are DEL, DUP and INS.

Source of gnomAD data for small variants on GRCh38

Implementation for Emedgene pipeline version < 34.0:

gnomAD data for GRCh38 exists for genomes only for version 3.1.

gnomAD exome data is only available for GRCh37 on previous versions.

In order to retain the wealth of exome data for our GRCh38 customers, Emedgene's GRCh38 small variant gnomAD data combines the following:

Liftover of GRCh37 exome data from gnomAD v2 to GRCh38
GRCh38 genome data from gnomAD v3

Known limitation:

This implementation contains some duplicate variants between v2 and v3.

Implementation for Emedgene pipeline version > 34.0:

gnomAD has since released gnomAD v3.1.2 (non-v2) which will includes only v3 variants that are not present in v2 and will solve the duplication issue. Emedgene will support this version starting in Emedgene pipeline 34.0.

How are MNVs handled on the platform?

During data processing, MNVs (multi-nucleotide variants) are divided into consecutive SNVs. The resulting SNVs include INFO and FORMAT fields, mirroring the original record.

In the platform's user interface, SNV variants occurring within a 2-nucleotide distance are marked with the 'Suspected MNP' flag.

Example:

Support for gene lists with up to 10,000 genes

In 32.0, we increased flexibility to support customer workflows by enabling the creation of large gene lists up to 10,000 genes. Previous limitation was 900 genes.

Gene lists can be created via API and UI; Very large gene lists (>6700 genes) may return a false error message during creation, despite being successfully created.

Genomic Regions by Case Type

Emedgene uses regional information to process genomic data in a variety of ways. Regional information influences the variants presented in gene panels, exomes, and genomes, variant quality and variant annotations.

V35.0 and up:

Case Type

Regions of Interest BED

Default Region of Interest BED

QC BED

Research Genome

Any customer-selected BED

None

Optional

Whole Genome

Any customer-selected BED

Full Genes

Optional

Exome

Any customer-selected BED

Clinical Regions

Optional

Custom Panel

Any customer-selected BED

Clinical Regions

Optional

V35.0+ Regions of Interest BEDs are applied as follows:

Research Genome: Customer can utilize any BED to restrict analysis. By default, there is no intersection and all variants are viewable (Note: data pipeline time will increase, and intergenic variants have very limited annotations).
Whole Genome: Customer can utilize any BED to restrict analysis. If no custom BED is selected, the default BED selected will be the "Full Genes" BED file (see below for more details), and only variants contained within this BED will be presented. The exception is CNV variants, which are always fully present.
Exome: Customer can utilize any BED to restrict analysis. If no custom BED is selected, variants presented in the output will be those included in the Emedgene BED file defining the "Clinical Regions" as described below.
Custom Panel: Customer can utilize any BED to restrict analysis, and also upload a separate BED file for QC. When the Panel BED file is provided in Gene Panels, variants processed from a VCF/FASTQ will be only those located within the ranges of the selected panel BED file. Typically the BED files are unique to each enrichment kit panel. If no kit BED is available, the 'Clinical Regions' BED will be used.

All default BED files are available for download at the end of this article.

Management of custom BED files is conveniently located in the Settings page, in the BED files card

Prior to V35.0:

Case Type

Regions of Interest BED

QC BED

Research Genome

None

Optional

Whole Genome

Full Genes

Optional

Exome

Clinical Regions

Optional

Custom Panel

Clinical Regions OR custom BED uploaded to kit. If a custom BED was uploaded to a kit, it will be used.

Optional (same as regions of interest BED if one was used).

<V35.0 Regions of Interest BEDs are applied as follows:

Research Genome: No intersection, all variants are viewable (Note: data pipeline time will increase, and intergenic variants have very limited annotations).
Whole Genome: Only variants contained within the "Full Genes" BED file will be displayed (see below for more details). With the exception of CNV variants, which are always fully present.
Exome: Variants presented in the output will be those included in the Emedgene BED file defining the "Clinical Regions" as described below.
Custom Panel: When the Panel BED file is provided in Gene Panels, variants processed from a VCF/FASTQ will be only those located within the ranges of the selected panel BED file. BED files are usually provided by the customer and are unique to each enrichment kit panel.

Following are the latest regional BEDs in Emedgene platform that were designed based on the indicated sources for GRCh37 and GRCh38:

Clinical Regions - available at bottom of page

This is a comprehensive bed file that includes every clinically relevant region. The following are included:

“RefSeq Curated” and “GENCODE” regions with flanking areas of 50bp from each side 5UTR and 3UTR region for protein coding genes (based on RefSeq)
OMIM disease-related RNA genes (50bp flanking)
All Clinvar Pathogenic variants regions (flanking 50bp)
Promoters region (EPDnew human version 006, flanking 50bp)
Known STR regions (Dragen 4.0 specification file)
All microRNA genes (flanking 50bp based on HGNC)
Full mtDNA region

For consistency, the GRCh38 version includes the lifted over regions of GRCh37 (liftover using CrossMap).

Full Genes - available at bottom of page

A wide range of genomic regions BED file. It contains:

"RefSeq ALL" transcripts and "GENCODE" full genes regions with 5Kbp upstream and 5Kbp downstream
Within this range, all “Clinical Regions” are included
All dosage regions (HI/TS sig level 1, 2 or 3)

Moreover, liftover versions of both reference regions were included, for the current and previous range versions.

Sources:

Liftover done using CrossMap (v0.5.2), chain hg19ToHg38.over.chain.gz
NCBI RefSeq regions are based on the release 105 (hg19) and 110 (hg38)
Gencode regions are based on the release V19 (hg19) and V41 (hg38)
All microRNA genes based on HGNC miRNA definition December 2022
ClinGen Dosage region Dec 2022
Promoters from EPDnew human version V6
mtDNA CRS
RNA disease genes based on OMIM and HGNC (Dec 2022): ATXN8OS, TERC, IL12A-AS1, FAAHP1, NUTM2B-AS1, GAS8-AS1, RNU12, MIR204, IGHG2, SLC7A2-IT1, MIR99A, RMRP, XIST, MEG3, DIRC3, MIR17HG, GNAS-AS1, LRTOMT, LINC00299, DUX4L1, MIR137, MIR140, MIR605, SNORD118, RNU4ATAC, HELLPAR, IGHG1, IGHM, MIR19B1, RNU7-1, LINC00237, MIR2861, MIR4718, IGHV3-21, IGHV4-34, IGKC, KCNQ1OT1, MIR184, MIR96, H19, HYMAI, PCDHA9, UGT1A1, AFG3L2P1, DISC2, SNORA31, TRU-TCA1-1, PCDHGA4, TRAC, ECEL1P3, MIAT
Clinvar variants (ClinVar Dec- 2022) with any pathogenic or likely pathogenic significance (and some drug responses that are affiliated with pathogenicity)
50K STR regions based on the Dragen4.0 Specification file

Table 1. Number of regions and total size per bed file

BED file name

Number of lines

Size in bp

GRC38_coding

206635

44959430

GRC38_clinical_regions

237652

121694892

GRC38_full_genes

37793

2200286025

GRC37_coding

200113

44420909

GRC37_clinical_regions

230619

119594638

GRC37_full_genes

35776

2368701647

BED files

How do I analyze mtDNA variants?

To run mtDNA variant analysis on emedgene, you need to upload:

a. FASTQ files, or

b. VCF file(s) created with DRAGEN / mity / MuTect / CNVkit using the rCRS mtDNA genome reference or a full genome reference that includes rCRS (e.g., hs37d5).

In case of an end-to-end analysis starting from FASTQ files, mtDNA SNV/indel variants will be mapped to rCRS and called by DRAGEN with improved variant quality calculation.

Note: the pipeline version required is 5.26+.

Can I use exome data for CNV detection?

CNV calling from exome FASTQ is done automatically! A prerequisite for this is defining a Panel of Normals (PON) per each enrichment kit you're using.

A PON aids to set a baseline coverage pattern and account for recurrent technical artifacts that are specific to your workflow. Depth of coverage per each sequenced region is averaged across PON samples; if a significant increase or decrease from this baseline is detected in a test sample, a CNV is called.

Recommendations for creating a PON to call CNVs from exome data:

Samples for a PON should be derived from healthy individuals.
In our experience, a PON of at least 40-50 samples yields the best results. A smaller PON is better than nothing, but keep in mind that you may encounter more false positives.
You should aim at preparing samples for a PON in a unified manner to avoid the batch effect. Please log differences in library preparation (if any).

How does joint calling work on Emedgene?

Classic joint calling consists of calling variants "simultaneously across all sample BAMs, generating a single call set for the entire cohort." (GATK.broadInstitute.org)

When running from BAM or FastQ samples on Emedgene, we do not apply a classic joint calling but a BAM look-up methodology.

This methodology consists of retrieving coverage information from BAM during the VCF merging process. Thus, if a variant does not exist in a parental sample, the algorithm will check the coverage in that position using data from the BAM file. The position will be considered as "REF" allele if it is covered (depth > 3), and "No coverage" or "N/A" (./. in the VCF FORMAT/GT field), if it is below that threshold or has no coverage.

This process involves the creation of a “genome coverage” file as a separate preliminary step. The coverage file could also be provided via a BED or a gVCF file.

BAM look-up approach is slightly different from classic joint calling used by the joint calling option in DRAGEN and other variant callers, and therefore will not produce identical results.

However, it is important to mention that Emedgene platform supports joint called VCF files, as well.

Remark: If a coverage file (ie. BED, BAM, gVCF) is not provided, then it is not possible to estimate the presence of REF allele in empty positions. As a consequence, "No_coverage" value will be assigned to those variants, which can affect the inheritance mode filters.

What is the required format for a BED file defining a kit?

As part of the management settings (30.0+), users can upload BED files to be associated with their kits.

The format of the BED file (.bed) is following the description from UCSC Genome Browser with some modifications:

the file has to be a tab-delimited text file;
the file should not contain headers;
The number of fields per line must be consistent throughout any single set of data.
Zero-based index: Start and end positions are identified using a zero-based index.
There are three required fields:

Chromosome - The name of the chromosome has to be sorted in alpha-numeric order. example: chr1, chr2, ..., chr12, ..., chr22, chrX, chrY, chrM.
Chromosome Start - The starting position of the feature in the chromosome. The start position has to be smaller than the end position. The data has to be sorted in numeric order.
Chromosome End - The ending position of the feature in the chromosome. The end position has to be greater than the start position. The data has to be sorted in numeric order.

Example:

chr1	40794900	42795220  
chr1	75796785	76799036  
chr2	21224573	21226251  
chr2	51227067	61227587  
chr3	10183418	10184013  
chr3	141327248	141327582  
chr5	33944704	33945028  
chr5	112173191	112179935  
chr6	31973380	31973922  
chr6	118880009	118880306  
chr10	43572669	43572871  
chr10	43595829	43596269  
chr11	532524	532797  
chr11	108225486	108225731  
chr20	10620075	10620661  
chr20	43058101	43058438  
chr21	35742663	35743177  
chr21	35821512	35822069  
chr22	21336576	21336967  
chr22	29133150	29133387  
chrX	9693690	9693958  
chrX	153599135	153599723  
chrY	2654931	2655693  
chrM	108	596

Newly uploaded BED files will become available for use only after a mandatory waiting period of one hour to ensure adequate time for synchronization to complete.

Which reference genomes can I use?

Both GRCh37/hg19 and GRCh38/hg38 are supported. To run cases using the GRCh38/hg38 reference genome, your pipeline version should be 5.24 or newer.

For Dragen version < 4.0, we are using the following files:

hs37d5.fa.gz for GRCh37/hg19.
Includes data from GRCh37, the rCRS mitochondrial sequence, Human herpesvirus 4 type 1 and the concatenated decoy sequences. Downloadable here.
GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz for GRCh38/hg38.
Contains the sequences of the chromosomes, the rCRS mitochondrial sequence, unlocalized scaffolds, and unplaced scaffolds. Downloadable here.

For Dragen version 4.0, we are using the following files:

Multigenome Graph hg38+alt_masked+cnv+graph+hla+rna-8-r2.0 for GRCh38/hg38.
Pre-built multigenome hash tables for hg38. The hash table builds include DNA, RNA, CNV, and HLA tables. Downloadable here.
Multigenome Graph hs37d5-cnv.graph.hla.rna for GRCh37d5.
Pre-built multigenome hash tables for GRCh37d5. The hash table builds include DNA, RNA, CNV, and HLA tables. Available on demand.

For Dragen version 4.2, we are using the following files:

Multigenome Graph hg38-alt_masked.cnv.graph.hla.rna-9-r3.0 for GRCh38/hg38.
Pre-built multigenome hash tables for hg38. The hash table builds include DNA, RNA, CNV, and HLA tables. Downloadable here.
Multigenome Graph hs37d5-cnv.graph.hla.rna-9-r3.0 for GRCh37d5.
Pre-built multigenome hash tables for GRCh37d5. The hash table builds include DNA, RNA, CNV, and HLA tables. Downloadable here.

For Dragen version 4.3, we are using the following files:

Multigenome Graph hg38-alt_masked.cnv.graph.hla.rna-10-r4.0-1 for GRCh38/hg38.
Pre-built multigenome hash tables for hg38. The hash table builds include DNA, RNA, CNV, and HLA tables. Downloadable here.
Multigenome Graph hs37d5-cnv.graph.hla.rna-10-r4.0-1 for GRCh37d5.
Pre-built multigenome hash tables for GRCh37d5. The hash table builds include DNA, RNA, CNV, and HLA tables. Downloadable here.

Note: you can run cases with both reference genomes in the same organization.

Note: curated and historical data are automatically lifted over on the fly.

Don't miss our article on the matter!

Demystifying the Versions of GRCh38/hg38 Reference Genomes, How They are Used in DRAGEN and Their Impact on Accuracy

How do I move between organizations?

Click on your initials or profile picture on the top right to access the Settings.
In the My settings tab scroll down to the Organization section on the left. Under the title Organization, you'll see the name of the organization you're currently in.
Click on the Select Organization text box and select the organization from the dropdown menu or search for it by typing its name.
Press Save. You will be transferred to the requested account.
Refresh your page and that's it!

Note:

Starting from version 34.0, a user-friendly login screen is available, allowing you to confirm the organization you wish to log into.
Before version 34.0, using different URLs would log you into the organization where you last signed in using that URL.

How do I check the version of my environment?

Add /version to the URL if you'd like to check the version of your environment. For example, for demo.emedgene.com, use demo.emedgene.com/version.

If you want to learn more about the different versions available, check the release notes.

"Failed to generate report". What should I do?

You've done your job, the case is solved, and the only thing left is to generate a report.

Then you see "Failed to generate report". If you are experiencing issues with report preview or generation in a particular case, don't panic.

The error is most likely caused by an unsupported character, most commonly the "&" sign.

Look for it:

in Variant Interpretation paragraphs for variants selected for reporting;
in Case Interpretation (Interpretation notes, Gene interpretation, Recommendations).

Having found the culprit, replace it with "and" or another suitable alternative and try generating the report again. 🤞

Can't find any ampersands (& signs) or the report generation fails even after you got rid of them?

Time for a serious investigation. Don't hesitate to reach out!

But if you're curious, go on and check if you have added any PMIDs in the fields mentioned above. If the PMID links to a very old paper, it may be absent from our database used to populate citations in the report. This also causes a rendering error.

How do I prepare VCF files generated by Dragen STR (ExpansionHunter) to be used as input?

All variant types (ie. SNV/indel, CNV, SV, STR) ingested into the Emedgene platform require separate VCFs. In order to ingest a DRAGEN VCF with STRs for DRAGEN versions <4.2, the following line should be added to the header of the Dragen STR VCF.

##source=ExpansionHunterV4.2

Additionally, ensure that the contig lines are present in the VCF headers. If they are missing, please use the following ones:

GRCh37

##contig=<ID=chr1,length=249250621>
##contig=<ID=chr2,length=243199373>
##contig=<ID=chr3,length=198022430>
##contig=<ID=chr4,length=191154276>
##contig=<ID=chr5,length=180915260>
##contig=<ID=chr6,length=171115067>
##contig=<ID=chr7,length=159138663>
##contig=<ID=chr8,length=146364022>
##contig=<ID=chr9,length=141213431>
##contig=<ID=chr10,length=135534747>
##contig=<ID=chr11,length=135006516>
##contig=<ID=chr12,length=133851895>
##contig=<ID=chr13,length=115169878>
##contig=<ID=chr14,length=107349540>
##contig=<ID=chr15,length=102531392>
##contig=<ID=chr16,length=90354753>
##contig=<ID=chr17,length=81195210>
##contig=<ID=chr18,length=78077248>
##contig=<ID=chr19,length=59128983>
##contig=<ID=chr20,length=63025520>
##contig=<ID=chr21,length=48129895>
##contig=<ID=chr22,length=51304566>
##contig=<ID=chrX,length=155270560>
##contig=<ID=chrY,length=59373566>
##contig=<ID=chrM,length=16571>

GRCh38

##contig=<ID=chr1,length=248956422>
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr3,length=198295559>
##contig=<ID=chr4,length=190214555>
##contig=<ID=chr5,length=181538259>
##contig=<ID=chr6,length=170805979>
##contig=<ID=chr7,length=159345973>
##contig=<ID=chr8,length=145138636>
##contig=<ID=chr9,length=138394717>
##contig=<ID=chr10,length=133797422>
##contig=<ID=chr11,length=135086622>
##contig=<ID=chr12,length=133275309>
##contig=<ID=chr13,length=114364328>
##contig=<ID=chr14,length=107043718>
##contig=<ID=chr15,length=101991189>
##contig=<ID=chr16,length=90338345>
##contig=<ID=chr17,length=83257441>
##contig=<ID=chr18,length=80373285>
##contig=<ID=chr19,length=58617616>
##contig=<ID=chr20,length=64444167>
##contig=<ID=chr21,length=46709983>
##contig=<ID=chr22,length=50818468>
##contig=<ID=chrX,length=156040895>
##contig=<ID=chrY,length=57227415>
##contig=<ID=chrM,length=16569>

How does Emedgene Analyze prioritize transcripts?

Scope

Emedgene uses VEP and EFF for transcript annotations and in upcoming versions will be adding Illumina Connected Annotations.

Each variant has a "main_effect" and "main_gene" chosen based on the most prioritized transcript for this variant. Transcript prioritization depends on many different parameters and on different Emedgene pipeline versions.

Transcript prioritization in V37.0

Here is a list of ordered rules for transcript prioritization:

VEP transcripts are prioritized over EFF transcripts.
If the case is a virtual panel, prioritize transcripts from genes in the case gene list (but not for Boosted Genes type panels).
Prioritize RNA genes associated with disease (See appendix 1 for prioritized list RNA genes). Importantly this does not apply to upstream and downstream RNA variants.
De-prioritize biotype readthrough transcripts.
Prioritize based on impact in the following order: HIGH > MODERATE > LOW > MODIFIER.
Prioritize introns over UTR over upstream (Appendix 2: MODIFIER effects prioritization).
Prioritize organization canonical transcripts (Defined in Curate. Always applied, no settings needed).
Prioritize canonical transcripts (Based on Appris).
Prioritize transcripts from genes in the case gene list.
Prioritizing gene without “-” in their Name.

Transcript prioritization < V37.0

Here is a list of ordered rules for transcript prioritization:

VEP transcripts are prioritized over EFF transcripts.
If the case is a virtual panel, prioritize transcripts from genes in the case gene list (but not for Boosted Genes type panels).
Prioritize RNA genes associated with disease (See appendix 1 for prioritized list RNA genes). Importantly this does not apply to upstream and downstream RNA variants.
De-prioritize biotype readthrough transcripts.
Prioritize based on impact in the following order: HIGH > MODERATE > LOW > MODIFIER.
Prioritize introns over UTR over upstream (Appendix 2: MODIFIER effects prioritization).
Prioritize organization canonical transcripts (Defined in Curate, this parameter has to be implemented upon request.).
Prioritize canonical transcripts (Based on Appris).

Appendixes

Appendix 1: List of RNA genes associated with disease

ATXN8OS, GNAS-AS1, H19, HELLPAR, KCNQ1OT1, LINC00237, LINC00299, MEG3, MIAT, MIR137, MIR140, MIR184, MIR19B1, MIR204, MIR2861, MIR4718, MIR605, MIR96, MIR99A, RMRP, RNU12, RNU4-2*, RNU4ATAC, RNU7-1*, SNORD116-1, SNORD118, TERC, MT-TF, MT-RNR1, MT-TV, MT-RNR2, MT-TL1, MT-TI, MT-TQ, MT-TM, MT-TW, MT-TA, MT-TN, MT-TC, MT-TY, MT-TS1, MT-TD, MT-TK, MT-TG, MT-TR, MT-TH, MT-TS2, MT-TL2, MT-TE, MT-TT, MT-TP.

*Added as part of pipeline V35.2

Appendix 2: MODIFIER effects prioritization

Order of modifier effects:

intron_variant
5_prime_utr_variant
3_prime_utr_variant
non_coding_transcript_exon_variant
non_coding_transcript_variant
upstream_gene_variant
downstream_gene_variant
All others effects

Keywords for search

How transcripts are being prioritized?

How transcripts are being selected/choosen?

Why is this gene selected for this variant?

Why I can't choose a transcript?

Why my transcript is missing for this variant?

My gene is missing in the transcript list.

Performance issue troubleshooting

Performance issues may be related to the limitations of your computer's hardware or software. The Emedgene platform needs 2 Gb of free allocated memory to run properly.

If you are experiencing slower loading times or disruptions in service, please consider the following steps to improve your experience:

Check for Background Tasks:

Task Manager (Windows) or Activity Monitor (macOS): Open these tools to see if any background processes are consuming excessive memory. End any unnecessary tasks.

Free Up Memory:

Close Unnecessary Apps: Close other applications that might be using memory, even if they're not actively in use.
Clear Chrome's Cache and Data: Go to Chrome's Settings > Privacy and security > Clear browsing data. Choose a time range and select "Cached images and files" and "Cookies and other site data."

Disable Extensions:

Temporarily Disable: Go to Chrome's Settings > More tools > Extensions and disable all extensions. If the error disappears, re-enable extensions one by one to identify the culprit.

Disable "Embedded IGV":

Emedgene interface: Go to any Emedgene case > Analysis tools > Varpage > Side bar > Applications > "Embedded IGV" and disable it.

Update Chrome:

Check for Updates: Ensure you're using the latest version of Chrome, as updates often contain bug fixes and memory optimizations.

Restart Chrome and Your Device:

Close Chrome: Completely exit Chrome, including any background processes.
Restart Your Device: Shut down your computer or phone and restart it. This clears temporary files and refreshes system resources.

Consider Hardware Issues:

If none of these solutions work, consider potential hardware issues, such as faulty RAM or disk problems. Consult a technician for further diagnosis.

Adapted from Google Chrome help center.

How does Emedgene calculate variant effect and severity ?

Variant effect

For each variant that is mapped to the reference genome, Emedgene uses Ensembl’s Variant Effect Predictor (VEP) and the RefSeq (NCBI) library of transcripts to calculate variant effect. VEP uses a set of consequence terms defined by the Sequence Ontology (SO), including immediately recognizable terms like “missense_variant” and “frame_shift_variant” as well as some more esoteric ones like “non_coding_transcript_exon_variant”.

The full list of terms, along with detailed descriptions and severity impact categories can be found in the link below.

Importantly, each variant has a "main_effect" and "main_gene" chosen based on the most prioritized transcript for this variant. Transcript prioritization depends on many different parameters and on different Emedgene pipeline versions as described here.

Variant severity

Variant severity, also known as variant impact, is a subjective assessment of the severity of a variant consequence.

Severity is usually categorized as modifier, low, moderate or high:

Modifier severity is used for non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact. Inter-genic and non-coding variants are classic examples.
Low severity is used for variants that are assumed to be mostly harmless or unlikely to change protein function. This includes synonymous variants.
Moderate severity is used for non-disruptive variants that might change protein effectiveness, such as missense variants and in-frame insertions/deletions.
High severity is used for variants that are assumed to have a disruptive impact on abundance protein, such as by causing protein truncation, loss of open reading-frame, and/or triggering nonsense mediated decay.

Most of the time, variant effect and variant severity on Emedgene are consistent with VEP. However, genomics is a field defined by exceptions. There are key factors, outlined below, the Emedgene genetic team believes are critical to account for when assigning severity.

For small variants (SNV):

Splice prediction: Small variants will be upgraded to HIGH severity if its splicing prediction is high (dbscSNV > 0.6 or max spliceAI > 0.8) or MODERATE if its splicing prediction is moderate (max spliceAI > 0.2 or dbscSNV > 0.5).
Conservation: Synonymous variants and splice region variants that are highly conserved (GERP score > 0.9 or PhastCons100 > 0.2) will be upgraded to MODERATE.
Non-coding RNA disease genes: The severity of a small variant will be upgraded to MODERATE if the variant is within a list of RNA genes known to be associated with disease. The current list of RNA genes is:

ATXN8OS, GNAS-AS1, H19, HELLPAR, KCNQ1OT1, LINC00237, LINC00299, MEG3, MIAT, MIR137, MIR140, MIR184, MIR19B1, MIR204, MIR2861, MIR4718, MIR605, MIR96, MIR99A, RMRP, RNU12, RNU4ATAC, SNORD116-1, SNORD118, TERC, MT-TF, MT-RNR1, MT-TV, MT-RNR2, MT-TL1, MT-TI, MT-TQ, MT-TM, MT-TW, MT-TA, MT-TN, MT-TC, MT-TY, MT-TS1, MT-TD, MT-TK, MT-TG, MT-TR, MT-TH, MT-TS2, MT-TL2, MT-TE, MT-TT, MT-TP, RNU7-1*, RNU4-2*. &#xNAN;*Added in V35.2

For CNV/SV:

VEP annotates CNVs with overlapping genomic features and designates them with the following effects: transcript amplification (DUP), feature elongation (DUP, INS), feature truncation (DEL), and transcript ablation (DEL). However, the severity assigned by VEP for CNVs does not reflect the complexity of CNV effects on protein function and in our experience is not suitable for genome analysis and filtering.

On Emedgene, variants are annotated in regards to its overlap with three different types of regions: ‘coding regions’, ‘clinical regions’, and ‘full gene’ region (see here for a more detailed description about the BED files used in the system).

The region annotation is then used to assess severity for CNV and SV as follow:

High

Moderate

Low

Modifier

Deletion (DEL)

Coding regions

Clinical Regions and not in Coding regions

Full gene and not in Clinical Regions

No overlap with any BED

Gain (DUP)

Intragenic (coding regions but not entire gene region)

Coding Regions / Clinical Regions not intragenic

Full gene and not in Clinical Regions

No overlap with any BED

Insertion (INS)

Coding regions

Clinical Regions and not in Coding regions

Full gene and not in Clinical regions

None

Table 1: CNV/SV severity table. For each category of CNV/SV, the types of regions that overlap a given variant required to trigger the severity classification are shown.

For STR variants:

Emedgene is using an internal annotation for STR variants. More details can be provided by request to techsupport@illumina.com.

Known limitations

List of RNA genes known to be associated with disease is updated overtime as part of pipeline update.
Emedgene does not provide VEP annotation for non-coding regulatory data.
The “main effect” filter under the advanced “variant effect filters” in Emedgene UI represents the effects categorized by the relevant severity. It should be noted that the following effects in the advanced filter are missing. They will be added in V34.6 and V35.3.

Effect name

Severity

splice_donor_5th_base_variant

LOW

splice_donor_region_variant

LOW

splice_polypyrimidine_tract_variant

LOW

coding_sequence_variant

MODIFIER

mature_miRNA_variant

MODIFIER

NMD_transcript_variant

MODIFIER

coding_transcript_variant

MODIFIER

regulatory_region_ablation

MODIFIER

regulatory_region_variant

MODIFIER

intergenic_variant

MODIFIER

sequence_variant

MODIFIER

How to I prepare metrics files generated by DRAGEN to be used as input for Emedgene

When creating Emedgene cases that start from VCF, you can create a browsable DRAGEN report from the DRAGEN metrics files\

Navigate to local directory containing metrics files for a specific sample
Define sample name as a variable

samplename="NA12878"

Combine the findand tar commands to package the files into a tar.gz file with the following extension *.metrics.tar.gz

Command to find files matching the required patterns

find . \( -name "*.csv" -o -name "*.tsv" -o -name "*.counts" -o -name "*.counts.gz" -o -name "*.counts.gc-corrected" -o -name "*.counts.gc-corrected.gz" -o -name "*.ploidy.vcf" -o -name "*.repeats.vcf" -o -name "*.ploidy.vcf.gz" -o -name "*.repeats.vcf.gz" \) | xargs tar -czf "${samplename}.metrics.tar.gz"

Upload the metrics.tar.gz file to the storage location used for creating cases
Add metrics.tar.gz to case creation API JSON payload using the corresponding storage id, ensure that if the extension is not contained in the filename (e.g. files from BaseSpace) that "sample_type": "dragen-metrics" is set within the JSON payload

{
    "test_data":
    {
        "consanguinity": false,
        "inheritance_modes":
        [],
        "sequence_info":
        {},
        "type": "Whole Genome",
        "notes": "",
        "samples":
        [
            {
                "bam_location": "",
                "fastq": "NA12878-PCRF450-1",
                "status": "uploaded",
                "directoryPath": "",
                "sampleFiles":
                [
                    {
                        "filename": "NA12878-PCRF450-1.metrics.tar.gz",
                        "sample_type": "dragen-metrics",
                        "path": "/analysis_output/demo_data_germline_v4_3_6_v2-DRAGEN_Germline_Whole_Genome_4-3-6-v2-75b081e8-a8aa-433e-862b-a20d2d65e492/NA12878-PCRF450-1/NA12878-PCRF450-1.metrics.tar.gz",
                        "size": 0,
                        "storage_id": 420,
                        "status": "uploaded",
                        "vcf_column_name": "NA12878-PCRF450-1",
                        "vcf_column_names":
                        [
                            "NA12878-PCRF450-1"
                        ],
                        "loadingSample": false
                    },
                    {
                        "filename": "NA12878-PCRF450-1.hard-filtered.vcf.gz",
                        "sample_type": "vcf",
                        "path": "/analysis_output/demo_data_germline_v4_3_6_v2-DRAGEN_Germline_Whole_Genome_4-3-6-v2-75b081e8-a8aa-433e-862b-a20d2d65e492/NA12878-PCRF450-1/NA12878-PCRF450-1.hard-filtered.vcf.gz",
                        "size": 0,
                        "storage_id": 420,
                        "status": "uploaded",
                        "vcf_column_name": "NA12878-PCRF450-1",
                        "vcf_column_names":
                        [
                            "NA12878-PCRF450-1"
                        ],
                        "loadingSample": false
                    }
                ],
                "storage_id": 420,
                "sampleType": "vcf"
            }
        ],
        "sample_type": "vcf",
        "patients":
        {
            "proband":
            {
                "fastq_sample": "NA12878-PCRF450-1",
                "gender": "Male",
                "healthy": false,
                "relationship": "Test Subject",
                "notes": "",
                "phenotypes":
                [
                    {
                        "id": "phenotypes/EMG_PHENOTYPE_0001324",
                        "name": "Muscle weakness"
                    }
                ],
                "detailed_ethnicity":
                {
                    "maternal":
                    [],
                    "paternal":
                    []
                },
                "zygosity": "",
                "quality": "",
                "dead": false,
                "ignore": false,
                "id": "proband"
            },
            "other":
            []
        },
        "diseases":
        [],
        "disease_penetrance": 100,
        "disease_severity": "",
        "boostGenes": false,
        "selected_preset_set": "",
        "incidental_findings": null,
        "labels":
        [],
        "gene_list":
        {
            "type": "all",
            "id": 1,
            "visible": false
        }
    },
    "should_upload": false,
    "sharing_level": 0
}

DRAGEN report link is then available once your case has been delivered