Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Implementation for Emedgene pipeline version < 34.0:
gnomAD data for GRCh38 exists for genomes only for version 3.1.
gnomAD exome data is only available for GRCh37 on previous versions.
In order to retain the wealth of exome data for our GRCh38 customers, Emedgene's GRCh38 small variant gnomAD data combines the following:
Liftover of GRCh37 exome data from gnomAD v2 to GRCh38
GRCh38 genome data from gnomAD v3
Known limitation:
This implementation contains some duplicate variants between v2 and v3.
Implementation for Emedgene pipeline version > 34.0:
gnomAD has since released gnomAD v3.1.2 (non-v2) which will includes only v3 variants that are not present in v2 and will solve the duplication issue. Emedgene will support this version starting in Emedgene pipeline 34.0.
In 32.0, we increased flexibility to support customer workflows by enabling the creation of large gene lists up to 10,000 genes. Previous limitation was 900 genes.
Gene lists can be created via API and UI; Very large gene lists (>6700 genes) may return a false error message during creation, despite being successfully created.
Yes, Emedgene supports the recommended interpretation flow for Illumina Complete Long Reads since version 32.0.
Illumina Complete Long Reads interpretation flow in emedgene includes:
Ingesting combined output of long-read/short-read callers for both SVs and SNVs, along with the long-read BAM files. Short-read callers from VCF (CNV read-depth and STR) can be additionally used starting from DRAGEN 4.2;
ICLR calling by the BSSH application;
Ingesting VCF and BAM files through the existing BSSH integration;
Explainable AI shortlisting of ICLR variants and automated ACMG classification, when relevant.
Current limitations:
Complex SV variants like inversions and translocations are not yet supported for interpretation.
ICLR cases must be start from VCF files; Emedgene does not support case accessioning from both VCF and FASTQ sources.
Every case is annotated with the attached table of resources, including proprietary Illumina prediction scores PrimateAI-3D and SpliceAI. All annotations are versioned, and versions recorded in a Versions tab, and saved per case. Key variant significance and knowledge graph databases are updated monthly, so that the most up-to-date information is available during analysis.
Emedgene uses regional information to process genomic data in a variety of ways. Regional information influences the variants presented in gene panels, exomes, and genomes, variant quality and variant annotations.
Case Type | Regions of Interest BED | Default Region of Interest BED | QC BED |
---|---|---|---|
V35.0+ Regions of Interest BEDs are applied as follows:
Research Genome: Customer can utilize any BED to restrict analysis. By default, there is no intersection and all variants are viewable (Note: data pipeline time will increase, and intergenic variants have very limited annotations).
Whole Genome: Customer can utilize any BED to restrict analysis. If no custom BED is selected, the default BED selected will be the "Full Genes" BED file (see below for more details), and only variants contained within this BED will be presented. The exception is CNV variants, which are always fully present.
Exome: Customer can utilize any BED to restrict analysis. If no custom BED is selected, variants presented in the output will be those included in the Emedgene BED file defining the "Clinical Regions" as described below.
Custom Panel: Customer can utilize any BED to restrict analysis, and also upload a separate BED file for QC. When the Panel BED file is provided in Gene Panels, variants processed from a VCF/FASTQ will be only those located within the ranges of the selected panel BED file. Typically the BED files are unique to each enrichment kit panel. If no kit BED is available, the 'Clinical Regions' BED will be used.
All default BED files are available for download at the end of this article.
Management of custom BED files is conveniently located in the Settings page, in the BED files card
<V35.0 Regions of Interest BEDs are applied as follows:
Research Genome: No intersection, all variants are viewable (Note: data pipeline time will increase, and intergenic variants have very limited annotations).
Whole Genome: Only variants contained within the "Full Genes" BED file will be displayed (see below for more details). With the exception of CNV variants, which are always fully present.
Exome: Variants presented in the output will be those included in the Emedgene BED file defining the "Clinical Regions" as described below.
Custom Panel: When the Panel BED file is provided in Gene Panels, variants processed from a VCF/FASTQ will be only those located within the ranges of the selected panel BED file. BED files are usually provided by the customer and are unique to each enrichment kit panel.
Following are the latest regional BEDs in Emedgene platform that were designed based on the indicated sources for GRCh37 and GRCh38:
Clinical Regions - available at bottom of page
This is a comprehensive bed file that includes every clinically relevant region. The following are included:
“RefSeq Curated” and “GENCODE” regions with flanking areas of 50bp from each side 5UTR and 3UTR region for protein coding genes (based on RefSeq)
OMIM disease-related RNA genes (50bp flanking)
All Clinvar Pathogenic variants regions (flanking 50bp)
Promoters region (EPDnew human version 006, flanking 50bp)
Known STR regions (Dragen 4.0 specification file)
All microRNA genes (flanking 50bp based on HGNC)
Full mtDNA region
For consistency, the GRCh38 version includes the lifted over regions of GRCh37 (liftover using CrossMap).
Full Genes - available at bottom of page
A wide range of genomic regions BED file. It contains:
"RefSeq ALL" transcripts and "GENCODE" full genes regions with 5Kbp upstream and 5Kbp downstream
Within this range, all “Clinical Regions” are included
All dosage regions (HI/TS sig level 1, 2 or 3)
Moreover, liftover versions of both reference regions were included, for the current and previous range versions.
Sources:
Liftover done using CrossMap (v0.5.2), chain hg19ToHg38.over.chain.gz
NCBI RefSeq regions are based on the release 105 (hg19) and 110 (hg38)
Gencode regions are based on the release V19 (hg19) and V41 (hg38)
All microRNA genes based on HGNC miRNA definition December 2022
ClinGen Dosage region Dec 2022
Promoters from EPDnew human version V6
mtDNA CRS
RNA disease genes based on OMIM and HGNC (Dec 2022): ATXN8OS, TERC, IL12A-AS1, FAAHP1, NUTM2B-AS1, GAS8-AS1, RNU12, MIR204, IGHG2, SLC7A2-IT1, MIR99A, RMRP, XIST, MEG3, DIRC3, MIR17HG, GNAS-AS1, LRTOMT, LINC00299, DUX4L1, MIR137, MIR140, MIR605, SNORD118, RNU4ATAC, HELLPAR, IGHG1, IGHM, MIR19B1, RNU7-1, LINC00237, MIR2861, MIR4718, IGHV3-21, IGHV4-34, IGKC, KCNQ1OT1, MIR184, MIR96, H19, HYMAI, PCDHA9, UGT1A1, AFG3L2P1, DISC2, SNORA31, TRU-TCA1-1, PCDHGA4, TRAC, ECEL1P3, MIAT
Clinvar variants (ClinVar Dec- 2022) with any pathogenic or likely pathogenic significance (and some drug responses that are affiliated with pathogenicity)
50K STR regions based on the Dragen4.0 Specification file
Table 1. Number of regions and total size per bed file
During data processing, MNVs (multi-nucleotide variants) are divided into consecutive SNVs. The resulting SNVs include INFO and FORMAT fields, mirroring the original record.
In the platform's user interface, SNV variants occurring within a 2-nucleotide distance are marked with the 'Suspected MNP' flag.
Example:
Even though no software is perfect, our team prides itself on helping you troubleshoot issues when they do occur. Sometimes we need to understand the problem in the context of a customer's storage, internet service, and local security infrastructure. In these cases, we are unable to reproduce the bug locally. You can help us to help you by collecting the developer tool logs. Here are the steps to follow (example for Google Chrome):
Open the platform and go to View > Developer > Developer tools.
The tools will open on the right-hand side of your screen. There are 4 tabs across the top – Elements, Console, Sources, Network. Select Network.
Under Network, check the box next to Preserve log.
Work on a case/cases as you normally would.
When you experience the issue, please save the log. You can do this by hovering over the text and right clicking on any line. Then select “Save all as HAR with content” and share the file with us. We will begin troubleshooting right away.
Both GRCh37/hg19 and GRCh38/hg38 are supported. To run cases using the GRCh38/hg38 reference genome, your pipeline version should be 5.24 or newer.
For Dragen version < 4.0, we are using the following files:
hs37d5.fa.gz for GRCh37/hg19.
Includes data from GRCh37, the rCRS mitochondrial sequence, Human herpesvirus 4 type 1 and the concatenated decoy sequences. Downloadable .
GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz for GRCh38/hg38.
Contains the sequences of the chromosomes, the rCRS mitochondrial sequence, unlocalized scaffolds, and unplaced scaffolds. Downloadable .
For Dragen version 4.0, we are using the following files:
Multigenome Graph hg38+alt_masked+cnv+graph+hla+rna-8-r2.0 for GRCh38/hg38.
Pre-built multigenome hash tables for hg38. The hash table builds include DNA, RNA, CNV, and HLA tables. Downloadable .
Multigenome Graph hs37d5-cnv.graph.hla.rna for GRCh37d5.
Pre-built multigenome hash tables for GRCh37d5. The hash table builds include DNA, RNA, CNV, and HLA tables. Available on demand.
For Dragen version 4.2, we are using the following files:
Multigenome Graph hg38-alt_masked.cnv.graph.hla.rna-9-r3.0 for GRCh38/hg38.
Pre-built multigenome hash tables for hg38. The hash table builds include DNA, RNA, CNV, and HLA tables. Downloadable .
Multigenome Graph hs37d5-cnv.graph.hla.rna-9-r3.0 for GRCh37d5.
Pre-built multigenome hash tables for GRCh37d5. The hash table builds include DNA, RNA, CNV, and HLA tables. Downloadable .
For Dragen version 4.3, we are using the following files:
Multigenome Graph hg38-alt_masked.cnv.graph.hla.rna-10-r4.0-1 for GRCh38/hg38.
Multigenome Graph hs37d5-cnv.graph.hla.rna-10-r4.0-1 for GRCh37d5.
Note: you can run cases with both reference genomes in the same organization.
Note: curated and historical data are automatically lifted over on the fly.
Don't miss our article on the matter!
To run mtDNA variant analysis on emedgene, you need to upload:
a. FASTQ files, or
b. VCF file(s) created with / / / using the mtDNA genome reference or a full genome reference that includes rCRS (e.g., hs37d5).
In case of an end-to-end analysis starting from FASTQ files, mtDNA SNV/indel variants will be mapped to rCRS and called by DRAGEN with improved variant quality calculation.
Note: the pipeline version required is 5.26+.
All variant types (ie. SNV/indel, CNV, SV, STR) ingested into the Emedgene platform require separate vcf files. In order to ingest a Dragen Manta vcf for Dragen version < 4.2, the following line should be replaced within the header of the Dragen Manta vcf:
Replace
by
Example:
Replace
by
Note:
Variant types currently annotated and displayed in Emedgene are DEL, DUP and INS.
Classic joint calling consists of calling variants "simultaneously across all sample BAMs, generating a single call set for the entire cohort." ()
When running from BAM or FastQ samples on Emedgene, we do not apply a classic joint calling but a BAM look-up methodology.
This methodology consists of retrieving coverage information from BAM during the VCF merging process. Thus, if a variant does not exist in a parental sample, the algorithm will check the coverage in that position using data from the BAM file. The position will be considered as "REF" allele if it is covered (depth > 3), and "No coverage" or "N/A" (./. in the VCF FORMAT/GT field), if it is below that threshold or has no coverage.
This process involves the creation of a “genome coverage” file as a separate preliminary step. The coverage file could also be provided via a BED or a gVCF file.
BAM look-up approach is slightly different from classic joint calling used by the joint calling option in DRAGEN and other variant callers, and therefore will not produce identical results.
However, it is important to mention that Emedgene platform supports joint called VCF files, as well.
Remark: If a coverage file (ie. BED, BAM, gVCF) is not provided, then it is not possible to estimate the presence of REF allele in empty positions. As a consequence, "No_coverage" value will be assigned to those variants, which can affect the .
Case Type | Regions of Interest BED | QC BED |
---|---|---|
BED file name | Number of lines | Size in bp |
---|---|---|
Pre-built multigenome hash tables for hg38. The hash table builds include DNA, RNA, CNV, and HLA tables. Downloadable .
Pre-built multigenome hash tables for GRCh37d5. The hash table builds include DNA, RNA, CNV, and HLA tables. Downloadable .
Research Genome
None
Optional
Whole Genome
Full Genes
Optional
Exome
Clinical Regions
Optional
Custom Panel
Clinical Regions OR custom BED uploaded to kit. If a custom BED was uploaded to a kit, it will be used.
Optional (same as regions of interest BED if one was used).
GRC38_coding
206635
44959430
GRC38_clinical_regions
237652
121694892
GRC38_full_genes
37793
2200286025
GRC37_coding
200113
44420909
GRC37_clinical_regions
230619
119594638
GRC37_full_genes
35776
2368701647
All variant types (ie. SNV/indel, CNV, SV, STR) ingested into the Emedgene platform require separate VCFs. In order to ingest a DRAGEN VCF with STRs for DRAGEN versions <4.2, the following line should be added to the header of the Dragen STR VCF.
CNV calling from exome FASTQ is done automatically! A prerequisite for this is defining a Panel of Normals (PON) per each enrichment kit you're using.
A PON aids to set a baseline coverage pattern and account for recurrent technical artifacts that are specific to your workflow. Depth of coverage per each sequenced region is averaged across PON samples; if a significant increase or decrease from this baseline is detected in a test sample, a CNV is called.
Samples for a PON should be derived from healthy individuals.
In our experience, a PON of at least 40-50 samples yields the best results. A smaller PON is better than nothing, but keep in mind that you may encounter more false positives.
You should aim at preparing samples for a PON in a unified manner to avoid the batch effect. Please log differences in library preparation (if any).
Performance issues may be related to the limitations of your computer's hardware or software. The Emedgene platform needs 2 Gb of free allocated memory to run properly.
If you are experiencing slower loading times or disruptions in service, please consider the following steps to improve your experience:
Check for Background Tasks:
Task Manager (Windows) or Activity Monitor (macOS): Open these tools to see if any background processes are consuming excessive memory. End any unnecessary tasks.
Free Up Memory:
Close Unnecessary Apps: Close other applications that might be using memory, even if they're not actively in use.
Clear Chrome's Cache and Data: Go to Chrome's Settings > Privacy and security > Clear browsing data. Choose a time range and select "Cached images and files" and "Cookies and other site data."
Disable Extensions:
Temporarily Disable: Go to Chrome's Settings > More tools > Extensions and disable all extensions. If the error disappears, re-enable extensions one by one to identify the culprit.
Disable "Embedded IGV":
Emedgene interface: Go to any Emedgene case > Analysis tools > Varpage > Side bar > Applications > "Embedded IGV" and disable it.
Update Chrome:
Check for Updates: Ensure you're using the latest version of Chrome, as updates often contain bug fixes and memory optimizations.
Restart Chrome and Your Device:
Close Chrome: Completely exit Chrome, including any background processes.
Restart Your Device: Shut down your computer or phone and restart it. This clears temporary files and refreshes system resources.
Consider Hardware Issues:
If none of these solutions work, consider potential hardware issues, such as faulty RAM or disk problems. Consult a technician for further diagnosis.
Research Genome
Any customer-selected BED
None
Optional
Whole Genome
Any customer-selected BED
Full Genes
Optional
Exome
Any customer-selected BED
Clinical Regions
Optional
Custom Panel
Any customer-selected BED
Clinical Regions
Optional
Click on your initials or profile picture on the top right to access the Settings.
In the My settings tab scroll down to the Organization section on the left. Under the title Organization, you'll see the name of the organization you're currently in.
Click on the Select Organization text box and select the organization from the dropdown menu or search for it by typing its name.
Press Save. You will be transferred to the requested account.
Refresh your page and that's it!
Note:
Starting from version 34.0, a user-friendly login screen is available, allowing you to confirm the organization you wish to log into.
Before version 34.0, using different URLs would log you into the organization where you last signed in using that URL.
You've done your job, the case is solved, and the only thing left is to generate a report.
Then you see "Failed to generate report". If you are experiencing issues with report preview or generation in a particular case, don't panic.
The error is most likely caused by an unsupported character, most commonly the "&" sign.
Look for it:
in Variant Interpretation paragraphs for variants selected for reporting;
in Case Interpretation (Interpretation notes, Gene interpretation, Recommendations).
Having found the culprit, replace it with "and" or another suitable alternative and try generating the report again. 🤞
Can't find any ampersands (& signs) or the report generation fails even after you got rid of them?
Time for a serious investigation. Don't hesitate to reach out!
But if you're curious, go on and check if you have added any PMIDs in the fields mentioned above. If the PMID links to a very old paper, it may be absent from our database used to populate citations in the report. This also causes a rendering error.
Add /version
to the URL if you'd like to check the version of your environment. For example, for demo.emedgene.com
, use demo.emedgene.com/version
.
If you want to learn more about the different versions available, check the .
Emedgene uses VEP and EFF for transcript annotations and in upcoming versions will be adding Illumina Connected Annotations.
Each variant has a "main_effect" and "main_gene" chosen based on the most prioritized transcript for this variant. Transcript prioritization depends on many different parameters and on different Emedgene pipeline versions.
Here is a list of ordered rules for transcript prioritization:
VEP transcripts are prioritized over EFF transcripts.
If the case is a virtual panel, prioritize transcripts from genes in the case gene list (but not for Boosted Genes type panels).
Prioritize RNA genes associated with disease (See for prioritized list RNA genes). Importantly this does not apply to upstream and downstream RNA variants.
De-prioritize transcripts.
Prioritize based on in the following order: HIGH > MODERATE > LOW > MODIFIER.
Prioritize introns over UTR over upstream (.
Prioritize organization canonical transcripts (, this parameter has to be implemented upon request.).
Prioritize canonical transcripts (Based on ).
Variant effect
For each variant that is mapped to the reference genome, Emedgene uses Ensembl’s Variant Effect Predictor (VEP) and the RefSeq (NCBI) library of transcripts to calculate variant effect. VEP uses a set of consequence terms defined by the , including immediately recognizable terms like “missense_variant” and “frame_shift_variant” as well as some more esoteric ones like “non_coding_transcript_exon_variant”.
The full list of terms, along with detailed descriptions and severity impact categories can be found in the below.
Importantly, each variant has a "main_effect" and "main_gene" chosen based on the most prioritized transcript for this variant. Transcript prioritization depends on many different parameters and on different Emedgene pipeline versions as described .
Variant severity
Variant severity, also known as variant impact, is a subjective assessment of the severity of a variant consequence.
Severity is usually categorized as modifier, low, moderate or high:
Modifier severity is used for non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact. Inter-genic and non-coding variants are classic examples.
Low severity is used for variants that are assumed to be mostly harmless or unlikely to change protein function. This includes synonymous variants.
Moderate severity is used for non-disruptive variants that might change protein effectiveness, such as missense variants and in-frame insertions/deletions.
High severity is used for variants that are assumed to have a disruptive impact on abundance protein, such as by causing protein truncation, loss of open reading-frame, and/or triggering nonsense mediated decay.
Most of the time, variant effect and variant severity on Emedgene are consistent with VEP. However, genomics is a field defined by exceptions. There are key factors, outlined below, the Emedgene genetic team believes are critical to account for when assigning severity.
For small variants (SNV):
Splice prediction: Small variants will be upgraded to HIGH severity if its splicing prediction is high (dbscSNV > 0.6 or max spliceAI > 0.8) or MODERATE if its splicing prediction is moderate (max spliceAI > 0.2 or dbscSNV > 0.5).
Conservation: Synonymous variants and splice region variants that are highly conserved (GERP score > 0.9 or PhastCons100 > 0.2) will be upgraded to MODERATE.
Non-coding RNA disease genes: The severity of a small variant will be upgraded to MODERATE if the variant is within a list of RNA genes known to be associated with disease. The current list of RNA genes is:
ATXN8OS, GNAS-AS1, H19, HELLPAR, KCNQ1OT1, LINC00237, LINC00299, MEG3, MIAT, MIR137, MIR140, MIR184, MIR19B1, MIR204, MIR2861, MIR4718, MIR605, MIR96, MIR99A, RMRP, RNU12, RNU4ATAC, SNORD116-1, SNORD118, TERC, MT-TF, MT-RNR1, MT-TV, MT-RNR2, MT-TL1, MT-TI, MT-TQ, MT-TM, MT-TW, MT-TA, MT-TN, MT-TC, MT-TY, MT-TS1, MT-TD, MT-TK, MT-TG, MT-TR, MT-TH, MT-TS2, MT-TL2, MT-TE, MT-TT, MT-TP, RNU7-1*, RNU4-2*. &#xNAN;*Added in V35.2
For CNV/SV:
VEP annotates CNVs with overlapping genomic features and designates them with the following effects: transcript amplification (DUP), feature elongation (DUP, INS), feature truncation (DEL), and transcript ablation (DEL). However, the severity assigned by VEP for CNVs does not reflect the complexity of CNV effects on protein function and in our experience is not suitable for genome analysis and filtering.
The region annotation is then used to assess severity for CNV and SV as follow:
Table 1: CNV/SV severity table. For each category of CNV/SV, the types of regions that overlap a given variant required to trigger the severity classification are shown.
For STR variants:
Emedgene is using an internal annotation for STR variants. More details can be provided by request to techsupport@illumina.com.
Known limitations
List of RNA genes known to be associated with disease is updated overtime as part of pipeline update.
Emedgene does not provide VEP annotation for non-coding regulatory data.
The “main effect” filter under the advanced “variant effect filters” in Emedgene UI represents the effects categorized by the relevant severity. It should be noted that the following effects in the advanced filter are missing. They will be added in V34.6 and V35.3.
To fully enjoy all the features available on the Emedgene platform, we recommend using the latest version of .
As part of the management settings (30.0+), users can upload BED files to be associated with their kits.
The format of the BED file (.bed) is following the description from with some modifications:
the file has to be a tab-delimited text file;
the file should not contain headers;
The number of fields per line must be consistent throughout any single set of data.
Zero-based index: Start and end positions are identified using a zero-based index.
There are three required fields:
Chromosome - The name of the chromosome has to be sorted in alpha-numeric order. example: chr1, chr2, ..., chr12, ..., chr22, chrX, chrY, chrM.
Chromosome Start - The starting position of the feature in the chromosome. The start position has to be smaller than the end position. The data has to be sorted in numeric order.
Chromosome End - The ending position of the feature in the chromosome. The end position has to be greater than the start position. The data has to be sorted in numeric order.
Example:
Adapted from Google Chrome .
On Emedgene, variants are annotated in regards to its overlap with three different types of regions: ‘coding regions’, ‘clinical regions’, and ‘full gene’ region (see for a more detailed description about the BED files used in the system).
High | Moderate | Low | Modifier |
---|
Effect name | Severity |
---|
Deletion (DEL) | Coding regions | Clinical Regions and not in Coding regions | Full gene and not in Clinical Regions | No overlap with any BED |
Gain (DUP) | Intragenic (coding regions but not entire gene region) | Coding Regions / Clinical Regions not intragenic | Full gene and not in Clinical Regions | No overlap with any BED |
Insertion (INS) | Coding regions | Clinical Regions and not in Coding regions | Full gene and not in Clinical regions | None |
splice_donor_5th_base_variant | LOW |
splice_donor_region_variant | LOW |
splice_polypyrimidine_tract_variant | LOW |
coding_sequence_variant | MODIFIER |
mature_miRNA_variant | MODIFIER |
NMD_transcript_variant | MODIFIER |
coding_transcript_variant | MODIFIER |
regulatory_region_ablation | MODIFIER |
regulatory_region_variant | MODIFIER |
intergenic_variant | MODIFIER |
sequence_variant | MODIFIER |