Genomic Regions by Case Type
Last updated
Was this helpful?
Last updated
Was this helpful?
Emedgene uses regional information to process genomic data in a variety of ways. Regional information influences the variants presented in gene panels, exomes, and genomes, variant quality and variant annotations.
Case Type | Regions of Interest BED | Default Region of Interest BED | QC BED |
---|---|---|---|
Research Genome | Any customer-selected BED | None | Optional |
Whole Genome | Any customer-selected BED | Full Genes | Optional |
Exome | Any customer-selected BED | Clinical Regions | Optional |
Custom Panel | Any customer-selected BED | Clinical Regions | Optional |
V35.0+ Regions of Interest BEDs are applied as follows:
Research Genome: Customer can utilize any BED to restrict analysis. By default, there is no intersection and all variants are viewable (Note: data pipeline time will increase, and intergenic variants have very limited annotations).
Whole Genome: Customer can utilize any BED to restrict analysis. If no custom BED is selected, the default BED selected will be the "Full Genes" BED file (see below for more details), and only variants contained within this BED will be presented. The exception is CNV variants, which are always fully present.
Exome: Customer can utilize any BED to restrict analysis. If no custom BED is selected, variants presented in the output will be those included in the Emedgene BED file defining the "Clinical Regions" as described below.
Custom Panel: Customer can utilize any BED to restrict analysis, and also upload a separate BED file for QC. When the Panel BED file is provided in Gene Panels, variants processed from a VCF/FASTQ will be only those located within the ranges of the selected panel BED file. Typically the BED files are unique to each enrichment kit panel. If no kit BED is available, the 'Clinical Regions' BED will be used.
All default BED files are available for download at the end of this article.
Management of custom BED files is conveniently located in the Settings page, in the BED files card
Case Type | Regions of Interest BED | QC BED |
---|---|---|
Research Genome | None | Optional |
Whole Genome | Full Genes | Optional |
Exome | Clinical Regions | Optional |
Custom Panel | Clinical Regions OR custom BED uploaded to kit. If a custom BED was uploaded to a kit, it will be used. | Optional (same as regions of interest BED if one was used). |
<V35.0 Regions of Interest BEDs are applied as follows:
Research Genome: No intersection, all variants are viewable (Note: data pipeline time will increase, and intergenic variants have very limited annotations).
Whole Genome: Only variants contained within the "Full Genes" BED file will be displayed (see below for more details). With the exception of CNV variants, which are always fully present.
Exome: Variants presented in the output will be those included in the Emedgene BED file defining the "Clinical Regions" as described below.
Custom Panel: When the Panel BED file is provided in Gene Panels, variants processed from a VCF/FASTQ will be only those located within the ranges of the selected panel BED file. BED files are usually provided by the customer and are unique to each enrichment kit panel.
Following are the latest regional BEDs in Emedgene platform that were designed based on the indicated sources for GRCh37 and GRCh38:
Clinical Regions - available at bottom of page
This is a comprehensive bed file that includes every clinically relevant region. The following are included:
“RefSeq Curated” and “GENCODE” regions with flanking areas of 50bp from each side 5UTR and 3UTR region for protein coding genes (based on RefSeq)
OMIM disease-related RNA genes (50bp flanking)
All Clinvar Pathogenic variants regions (flanking 50bp)
Promoters region (EPDnew human version 006, flanking 50bp)
Known STR regions (Dragen 4.0 specification file)
All microRNA genes (flanking 50bp based on HGNC)
Full mtDNA region
For consistency, the GRCh38 version includes the lifted over regions of GRCh37 (liftover using CrossMap).
Full Genes - available at bottom of page
A wide range of genomic regions BED file. It contains:
"RefSeq ALL" transcripts and "GENCODE" full genes regions with 5Kbp upstream and 5Kbp downstream
Within this range, all “Clinical Regions” are included
All dosage regions (HI/TS sig level 1, 2 or 3)
Moreover, liftover versions of both reference regions were included, for the current and previous range versions.
Sources:
Liftover done using CrossMap (v0.5.2), chain hg19ToHg38.over.chain.gz
NCBI RefSeq regions are based on the release 105 (hg19) and 110 (hg38)
Gencode regions are based on the release V19 (hg19) and V41 (hg38)
All microRNA genes based on HGNC miRNA definition December 2022
ClinGen Dosage region Dec 2022
Promoters from EPDnew human version V6
mtDNA CRS
RNA disease genes based on OMIM and HGNC (Dec 2022): ATXN8OS, TERC, IL12A-AS1, FAAHP1, NUTM2B-AS1, GAS8-AS1, RNU12, MIR204, IGHG2, SLC7A2-IT1, MIR99A, RMRP, XIST, MEG3, DIRC3, MIR17HG, GNAS-AS1, LRTOMT, LINC00299, DUX4L1, MIR137, MIR140, MIR605, SNORD118, RNU4ATAC, HELLPAR, IGHG1, IGHM, MIR19B1, RNU7-1, LINC00237, MIR2861, MIR4718, IGHV3-21, IGHV4-34, IGKC, KCNQ1OT1, MIR184, MIR96, H19, HYMAI, PCDHA9, UGT1A1, AFG3L2P1, DISC2, SNORA31, TRU-TCA1-1, PCDHGA4, TRAC, ECEL1P3, MIAT
Clinvar variants (ClinVar Dec- 2022) with any pathogenic or likely pathogenic significance (and some drug responses that are affiliated with pathogenicity)
50K STR regions based on the Dragen4.0 Specification file
Table 1. Number of regions and total size per bed file
BED file name | Number of lines | Size in bp |
---|---|---|
GRC38_coding | 206635 | 44959430 |
GRC38_clinical_regions | 237652 | 121694892 |
GRC38_full_genes | 37793 | 2200286025 |
GRC37_coding | 200113 | 44420909 |
GRC37_clinical_regions | 230619 | 119594638 |
GRC37_full_genes | 35776 | 2368701647 |