Only this pageAll pages
Powered by GitBook
Couldn't generate the PDF for 419 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

Emedgene

Get Started with Emedgene

Loading...

Loading...

Emedgene Analyze manual

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Cases table navigation panel

The Cases table navigation panel provides several tools to help you customize your table view and manage cases. It includes the following components:

  • Filters menu Use this to narrow down the list of cases.

  • Group by menu Organize your cases by case status

  • Fields menu Choose which columns are visible in the table and define their order

  • Empty trash button

    Permanently delete cases currently in the trash. Use with caution, as this action cannot be undone

Getting around the platform

Managing data storage

Get started with Emedgene

Welcome to Emedgene, where we unlock genomic insights for hereditary disease and streamline your tertiary analysis workflows.

So you've signed in and can't wait to get started? Here we will guide you through the platform architecture, case creation, and results review. You can dive a bit deeper by following the links and exploring manuals for the platform's applications:

  • Analyze—Genomic analysis workbench, where you can accession, interpret, curate and report on your cases, while also efficiently managing the lab workflow

  • Curate—A repository for all of your organizational curated knowledge

Look around

The platform is operated from the .

By clicking on the corresponding buttons, you can enter:

  • tab

  • page

  • menu

  • dropdown menu

To enter the flow, click on the namesake button on the . Here:

1

Select file type

2

Upload files

3

Create a family tree

4
1

Select a case to review on the tab. You'll be directed to the that:

  • Showcases an AI-curated shortlist of variants suggested to be checked first, namely and

  • Provides numerous customizable to help you by yourself

Dashboard tab

The Dashboard tab depicts an overview of the user activity on the Emedgene platform and provides a glance at key performance indicators for an organization.

  • The Diagnostic Yield card shows the percentage of cases classified as out of the total number of cases of the same type.

  • The Status Diagram card shows the total number of cases submitted by the organization and the count of cases for each status.

Case details

The Case details panel provides comprehensive information about a particular case.

The Case details panel is organized into three tabs:

  • Case info—displays technical, operational, and clinical information about the case

  • Family tree—shows a graphical pedigree and sample details for each family member

Family tree

The Family tree tab includes the following information:

  • Pedigree diagram. Pedigree legend can be found .

  • Sample details for each family member:

    • Phenotypes. For family members other than the test subject, phenotypes are categorized as:

How to open a case

A. Hover over the corresponding row in the Cases table and click on the Open case link next to the Case ID in the first column

B. Alternatively, double-click the row

How to search for cases

You can use the Case search tab in the top bar to search for cases by the Case ID or Proband ID.

To open a case:

  • Related—directly match one of the proband’s phenotypes

  • Unrelated—do not match any of the proband’s phenotypes

  • Medical Condition – Indicates whether the individual is considered Healthy or Affected in the case

  • Sex. Specified by the user

  • Age. Automatically calculated in years based on the provided date of birth

  • Maternal and Paternal ethnicity—ethnic background of the proband’s parents

  • BAM file location. Shown where relevant

  • here

    Storage providers

    Okta identity management

    The Emedgene platform utilizes the Okta Identity Management solution to control user access. This improves user management, enhances access and authentication security, and allows organizations to implement single sign-on for their users.

    ​

    Emedgene applications menu

    The Emedgene platform is divided into two applications:

    • Analyze—genomic analysis workbench

    • Curate—the knowledge management system

    To switch from Analyze to Curate:

    Go to the nine-dot app launcher icon located on the top navigation panel and select Curate from the dropdown menu.

    To switch from Curate to Analyze:

    Go to the nine-dot app launcher icon located on the Curate navigation panel and select Analyze from the dropdown menu.

    How can Emedgene help you solve a case?

    The AI-powered Emedgene platform utilizes machine learning throughout the analysis and interpretation workflow to deliver the fastest time from genomic data to decisions. We apply machine learning models that retrieve evidence-backed answers and provide exceptional decision support.

    • Using automated interpretation algorithms, Emedgene generates an accurate shortlist of up to 10 potential causative variants. In a joint study of 180 solved cases with Baylor Genetics, 96% of cases were successfully solved by the algorithm. See Meng et al, Genetics in Medicine, 2023 publication for more details.

    • The platform is not a black box, and overlays a layer of explainable AI (XAI), presenting supporting evidence from the literature and databases which significantly reduces the time to interpret a case.

    • The algorithms use a proprietary Emedgene knowledge graph which incorporates information extracted from literature with Natural Language Processing, as well as from public databases and is updated on a monthly basis.

    • Dozens of additional algorithms are incorporated throughout the workflow.

    Overall, the system combines AI in a highly optimized and customizable workbench, in order to automate the most time-intensive aspects of genomic analysis and research.

    2. Family tree

    3. Case info

    How to sort cases

    You can sort cases by Creation date, Due date, or Quality.

    To sort cases:

    A. Hover over the column header and click the up or down arrow to sort in ascending or descending order

    B. Alternatively, click the column name and select Sort ascending or Sort descending from the dropdown menu

    The current sort direction is indicated by a single arrow icon next to the column name.

    Only one column can be used for sorting at a time.

    Creating multiple cases

    Launching analysis

    The Stale Cases card highlights cases stalled at intermediate stages of analysis that haven't been finalized.
    • The Network Activities panel displays a timeline of user activities within the organization. This log includes activity like creating a case, verifying a filter preset, changing a Case status, generating a report, and more.

    Lefthand panel

    Resolved

    Righthand panel

    Settings dropdown menu

    Annotate each sample with clinical information

    5

    Specify analysis details

    6

    Launch the analysis!

    Documents all the case-related information like Case status, sample quality metrics, and versions of all the resources used during case analysis

    2

    Investigate the evidence on the Variant page and assign appropriate tags to the variants of interest.

    3

    When you're ready to finalize the case, indicate the end result of the analysis and variants to be reported in the Case interpretation widget.

    Create a case

    Your case status will be In progress. You'll be notified when results are ready and the case is in status Delivered.

    Examine the analysis results

    top navigation panel
    Cases
    Add new case
    Emedgene applications
    Help
    Add new case
    top navigation panel
    Cases
    Individual case page
    Most Likely Candidates
    Candidates
    filters
    explore the total list of genetic variants

    Tertiary analysis pipeline

    Activity—provides a timeline of all actions taken within the case for audit and collaboration

    Click on the row of the case you want to view. A pop-up side Case details panel will appear on the right. To close the panel, click the X icon in the top right corner.

    To expand the Case details panel, click the left-pointing arrow icon on the right edge of the screen. To collapse it, click the right-pointing arrow icon at the top left of the panel.

    How to access the Case details panel

    From the

    From an

    Top navigation panel

    The top navigation panel serves as a guide to the platform. It includes:

    1. Case search bar

    2. Dashboard tab

    3. Cases tab

    4. button

    5. dropdown menu to switch between and

    6. dropdown menu under a question mark icon

    7. dropdown menu activated by clicking the username or profile picture

    How to delete cases

    In order to prevent accidental data loss, deleting cases in Emedgene includes a staging step before permanent case deletion.

    1

    Move a case to trash

    Update the case status to Move to trash (≤v37.0) or Trash bin (v38.0+).

    Once moved to trash, the case becomes inaccessible. This can be reversed by replacing Move to trash or Trash bin with a different status.

    2

    Authorized users can permanently delete all items in the trash. To do this:

    1. Click Empty trash on the .

    2. Review the warning message showing the number of cases pending deletion.

    3. Confirm to permanently delete all cases in the trash.

    How to label a case

    You have the flexibility to manage case labels at any time: create, add, or remove them directly in the Cases table.

    Adding labels to a case provides the ability to quickly mark cases for specific use cases and an easy filtering of cases subset in the cases page.

    Cases tab

    The Cases tab provides an overview of genomic sequencing cases submitted by the organization, as well as individual case details.

    The Cases tab includes:

    1. Cases table—displays a list of cases along with key details

    2. Cases table navigation panel—enables customization of the table view, including grouping and filtering of cases

    3. —opens when a case is selected, providing additional information

    How to group cases

    To organize cases by status, navigate to the Cases table, click on Group on the navigation panel, and select Status. To remove the grouping, select None.

    Help

    Click on the circle-question icon in the top navigation panel to open the Help dropdown menu.

    From there, you can access:

    • Help Center: Find feature guides, step-by-step instructions, and tips to help you get the most out of the platform.

    • Walkthroughs: View short interactive demos of workflows (in development).

    • Feature requests: Share your ideas and feedback.

    • What's new: Stay updated with the latest release notes.

    • About: View general information such as your organization name and platform version.

    Creating a family tree

    Build a pedigree via the visual tool.

    It is ideal that a proband selected for case analysis is affected and has disease phenotype(s).

    You can add a Father, a Mother, a Sibling, or a Child to any family member, starting with the Proband. To do this, choose their icon, then click on the Add family member button in the bottom right corner of the pedigree builder to select a family member.

    More information about the pedigree symbols can be found here.

    To delete a family member, choose their icon, then click on the Delete Subject button in the top right corner of the Add patient information panel.

    Note: There is no technical limit on the size or number of generations for a family tree.

    Supported reference genome assemblies

    Both GRCh37/hg19 and GRCh38/hg38 are supported. You can run cases with both reference genomes in the same organization.

    Note: Curated and historical data are automatically lifted over on the fly.

    For the DRAGEN v4.3 analysis, Emedgene utilizes:

    • GRCh38/hg38: Multigenome Graph hg38-alt_masked.cnv.graph.hla.rna-10-r4.0-1.tar.gz.

      Pre-built multigenome hash tables for hg38. The hash table builds include DNA, RNA, CNV, and HLA tables. Download here.

    • GRCh37d5: Multigenome Graph hs37d5-cnv.graph.hla.rna-10-r4.0-1.tar.gz.

      Pre-built multigenome hash tables for GRCh37d5. The hash table builds include DNA, RNA, CNV, and HLA tables. Download .

    • GRCh38/hg38: Multigenome Graph hg38-alt_masked.cnv.graph.hla.rna-9-r3.0.tar.gz. Pre-built multigenome hash tables for hg38. The hash table builds include DNA, RNA, CNV, and HLA tables. Download .

    • GRCh37d5: Multigenome Graph hs37d5-cnv.graph.hla.rna-9-r3.0.tar.gz.

      Pre-built multigenome hash tables for GRCh37d5. The hash table builds include DNA, RNA, CNV, and HLA tables. Download .

    • GRCh38/hg38: Multigenome Graph hg38-alt_masked.cnv.graph.hla.rna-8-r2.0-1.tar.gz.

      Pre-built multigenome hash tables for hg38. The hash table builds include DNA, RNA, CNV, and HLA tables. Download .

    • GRCh37d5: Multigenome Graph hs37d5-cnv.graph.hla.rna.tar.gz.

      Pre-built multigenome hash tables for GRCh37d5. The hash table builds include DNA, RNA, CNV, and HLA tables. Available on demand.

    • GRCh38/hg38: GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz.

      Contains the sequences of the chromosomes, the rCRS mitochondrial sequence, unlocalized scaffolds, and unplaced scaffolds. Download .

    • GRCh37/hg19: hs37d5.fa.gz. Includes data from GRCh37, the rCRS mitochondrial sequence, Human herpesvirus 4 type 1 and the concatenated decoy sequences. Download .

    Activity

    The Activity tab offers a timeline of case actions and enables users to leave comments. It supports key functions that enhance case management and review:

    • Traceability—Maintains a complete, time-stamped history of case actions

    • Error recovery—Allows users to identify and trace changes, such as variant edits or disease associations, made in error

    Manage data storages

    To directly import files from your own storage, link it to an organization's storage in Emedgene.

    1

    Click on the user initials or profile picture at the rightmost corner of the top navigation panel and select Settings

    2

    Select the Management tab and proceed to Storage card that lists currently linked storages.

    3

    How to customize Cases table view

    1

    Click Fields

    2

    In Fields menu, use the toggle switch next to each field name to show or hide columns based on your preferred view

    1

    Adding a sample

    You can choose one of the following options:

    • Existing sample: Pick one of the samples already loaded on the platform

    • Upload new sample: Upload files from your PC and enter sample name

    Family tree legend

    While adding a new case, you will build a pedigree and annotate each of the samples with data required for analysis.

    After the case has been created, the family tree is available in the panel (righthand panel of the Cases page).

    1. Icon fill color in other pedigree members indicates the presence or absence of the proband's phenotypes in a present sample (regardless of the potential presence of additional unrelated phenotypes).

    Adding patient info for the non-proband samples

    Options: Male, Female, Unknown.

    Indicates the family relationship of a subject to the Proband automatically inferred from the pedigree. Options: Father, Mother, Sibling, Child, Other.

    Expected format: mm/dd/yyyy.

    Mark the checkbox if you want to exclude the sample from the AI Shortlist analysis and Inheritance filters while preserving genotype data.

    If a sample shares some phenotypes with the Proband, you can copy them by checking this box. Proband's phenotypes will appear in a newly created Related Phenotypes section. To remove any of the proband's phenotypes not observed in a current individual, click the ☒ button next to the HPO term in the Related Phenotypes section.

    Sequencing information

    A coverage BED file is used to calculate and determine quality control (QC) metrics for your case. This file defines the genomic regions that should meet coverage requirements during sequencing.

    After selecting a coverage BED file, the available reference sequences for this kit will be displayed.

    Specify details such as laboratory name, sequencing machine used, sequencing reagent kit, and expected coverage.

    Batch case upload via CLI

    • Download and install node js platform via Minimum version required: 16 Upgrade existing installation: nvm install --lts

    1. Download the batch case create script. Replace my-domain with your Emedgene domain. Illumina cloud: my-domain.emg.illumina.com Legacy Emedgene cloud: my-domain.emedgene.com

    Case type and region of interest

    Select the case type in order to define the proper analysis of your case.

    Users can utilize a custom region of interest (ROI) BED file to limit analysis results to variants within the designated regions. A ROI BED determines which genomic regions will be included in the variant analysis.

    If no custom ROI BED is selected, the system uses the .

    You can select any region of interest, regardless of the case type.

    When selecting a Custom BED as you region of interest, you must select a specific BED file that is already configured in your organization.

    Select a gene list

    You can limit analysis to a gene list in the platform while creating a case. Choose between:

    No limitation of the analysis.

    Select one of the previously added gene lists from a dropdown list.

    Generate a new virtual panel: add a List title and then add all the gene symbols one by one () or in a batch ().

    A new gene list can be comprised from a combination of configured gene lists and/or individual genes.

    A gene list can by configured to hold up to 10,000 genes.

    A new gene list can be created by combining configured gene lists and/or individual genes. Each gene list can be configured to contain up to 10,000 genes.

    Select a preset group

    A preset group is a reusable set of filter presets applied for specific case types, as defined by your laboratory SOPs.

    Select a preset group in the Case info screen during case creation or .

    • The group selected for the case determines which presets appear in the Presets tab of the Filtering panel.

    • If no group is selected, the system automatically applies the default preset group

    Formatting DRAGEN MANTA VCFs for Emedgene

    For DRAGEN versions earlier than 4.2, when ingesting a DRAGEN Manta VCF containing SVs of type INS, replace the following line in the VCF header:

    with

    Example:

    Replace

    with

    Case details panel
    Cases table
    individual case page
    After deletion is confirmed:
    • All cases marked Move to trash are permanently removed

    • An activity entry is recorded

    • Email notifications are sent to users who have opted in

    Empty trash folder (v37.0+)

    Warning: Once trash folder is emptied, this action cannot be undone. Review cases pending deletion before proceeding!

    Cases table navigation panel

    Select a coverage BED

    BED files defining custom kits can be added in Organization settings > Kit management. Furthermore, the BED file chosen here is linked to a PON (Panel of Normals) file when starting from FASTQs and conducting CNV calling.

    Specify sample preparation details

    Select case type

    Select region of interest

    BED files that define custom kits can be added in the Organization settings under Kit management.

    default ROI BED file based on the case type
    Add new case
    Emedgene applications
    Help
    Settings
    Analyze
    Curate

    Reviewing a case

    Percentage of mapped reads

    Percentage of reads mapped to the reference sequence.

    Blue bars represent each of these parameters per sample, while a vertical line represents a general metric across all the samples of the same case type in the account.

    Sequencing error rate

    Sequencing error rate refers to the frequency at which incorrect base calls are made during sequencing process.

    Blue bars represent each of these parameters per sample, while a vertical line represents a general metric across all the samples of the same case type in the account.

    Autosomal call rate

    The Autosomal call rate field displays percentage of loci on the array for which a genotype call was successfully made, that only includes autosomes.

    A high call rate indicates a high-quality sample and successful genotyping. Low call rates can signify problems with the DNA sample (poor quality or quantity) or issues during the array processing.

    Displayed to three decimal places.

    Note: Variant types currently annotated and displayed in Emedgene are DEL, DUP and INS.
    ##source=DRAGEN <version>
    ##source=MANTA-DRAGEN <version>
    ##source=DRAGEN 05.121.645.4.0.3
    ##source=MANTA-DRAGEN 05.121.645.4.0.3

    Download the CSV template file.

    1. Edit the downloaded batchCases.csv file. See CSV format requirements for more details.

    2. Execute the batch cases creator as java script using the command below. Replace my-domain with your Emedgene domain and my-email with your user email. A prompt for your Emedgene password will appear, enter the password and press Enter.

    1. In case of validation errors in the input CSV, an output CSV called batchCases_results.csv will be created in the same location with detailed error results.

    2. -l will create a log file in the same location.

    More information can be found by running

    Prerequisites

    Batch upload via CLI (Command Line Interface)

    https://nodejs.org/en/download
    curl https://my-domain.emg.illumina.com/v2/js/batchCasesCreator.js --output batchCasesCreator.js
    node batchCasesCreator.js saveTemplateFile
    node batchCasesCreator.js create -h https://my-domain.emg.illumina.com -c batchCases.csv -u my-email -l
    node batchCasesCreator.js --help
    node batchCasesCreator.js create --help
    Real-time collaboration – Enables teams to monitor each other’s updates as they happen, ensuring transparency
  • Training & quality control – Helps identify patterns in variant interpretation and supports consistent application of evidence criteria

  • Audit compliance – Supports clinical and laboratory documentation standards (e.g., CAP/CLIA) by providing a verifiable action history

  • Each activity entry includes:

    • Timestamp (date + time)

    • User name of the person who performed the action

    • Action description

    Activity logs are kept for at least six years for full traceability.

    Category
    Activities

    Case-related

    Case created Case status changed Case participants updated Case labels modified Report created Case moved to trash Case data edited, no reanalysis initiated Case data edited and reanalysis launched

    Comments

    Comments left in the Activity tab

    Variant tagging

    Variant tag updated — this log entry includes a link to the relevant variant page for immediate review

    Viewing activity logs

    In the Cases table, the Activity tab within the Case details panel displays only comments and case-related activities. To view the full list of all activities, open the Case details panel directly from the individual case page.

    The Activity tab logs the following actions:

    1. Edits are permanent. Even if a change is undone, the original action remains recorded for traceability

    2. Logs are case-specific. Activity entries do not reflect changes made in other cases or in the Curate database

    To add a new storage:

    1. Click Add Storage

    2. Choose a storage type from:

      1. Azure Data Lake

      2. Azure Blob

      3. AWS S3

      4. File Transport Protocol (FTP)

      5. Google Cloud

      6. Secure File Transport Protocol (SFTP)

      7. Illumina Basespace (BSSH)

      8. Illumina Connected Analytics (ICA)

    3. Fill in the required credentials

    4. Click Add storage

    4

    Check the connection to confirm that the storage is successfully linked.

    To do this, find the storage in the list and check the cloud icon status:

    • If it's green, the connection is set correctly

    • If it's red and strikethrough, something went wrong. Hover over the icon to see details

    Click Manage on the right to the storage details.

    Click Delete on the right to the storage details.

    Note: to manage data storage, you must have Manager and Multiple Storage user roles.

    How to link your storage to Emedgene:

    How to edit storage information:

    How to remove a link to storage:

    If data is deleted or moved from the customer's storage, it might adversely affect the case. To learn more about possible consequences, check out this table:

    In the Cases table, click the column title you want to hide

    2

    From the dropdown menu, select Hide column

    You can reorder columns in three ways.

    1

    Hover over the column title

    2

    Click the six-dot icon that appears on the left to the title

    3

    Drag and drop the column

    1

    Click Fields in Cases table navigation panel

    2

    In Fields menu, hover over the field name

    3

    Click the six-dot icon that appears on the left to the title

    4

    Drag and drop the field

    1

    Click the column header

    2

    From the dropdown menu, select Move left or Move right

    1

    Hover over the left or right border of the column header cell

    2

    When the resize cursor appears, click and drag the border to your desired width

    How to select columns to be displayed

    A. Show or hide columns via the Fields menu

    B. Hide a column directly from the Cases table

    How to change column order

    A. Drag and drop the column

    B. Reorder columns via the Fields menu

    C. Move a column using a dropdown menu

    How to adjust column width

    Choose from storage: Choose files from your cloud storage and enter sample name
  • No sample: Postpone uploading files but proceed with case creation or skip uploading files for family members other than Proband

    • The Add New Case flow does not validate that sample IDs are unique or that input files are uncorrupted. Please ensure sample IDs are unique and that input files are valid before creating the case.

    • A case won't run if Proband sample files are missing. However, sample files are not mandatory for the rest of the family members (although highly recommended).

  • Filled: The individual is affected by all of the proband's phenotypes.

  • Half-filled: The individual is affected by some of the proband's phenotypes.

  • Empty: The individual is not affected by any of the proband's phenotypes.

  • 2. Icon color intensity denotes whether sample files have been uploaded for the particular individual.

    1. Full color: The sample has files loaded in the case.

    2. Faded color: No sample files are available.

    3. Icon line type indicates whether the sample is considered or excluded during analysis (relevant to samples with uploaded files only)

    1. Solid: The sample is included in the analysis.

    2. Dashed: The sample is ignored by Inheritance filters and the AI Shortlist algorithm, but you still can explore its genotypes.

    Family tree legend:

    Case details
    Phenotypes not shared with a Proband. They can be added one by one (Selection mode) or in batch (Batch mode).

    Please follow the steps described below for each phenotype:

    • Enter an HPO term (e.g., Hypoplasia of the ulna), an HPO ID (e.g., HP:0003022), or a descriptive phenotype name (e.g., Underdeveloped ulna) in the search box;

    • Select a matching term from a dropdown menu and press Complete after you've added all the terms.

    Paste a list of comma-separated HPO terms or HPO IDs in the search box and press Complete.

    1. Fill in the boxes:

    Note: The fields marked with (*) are mandatory.

    Note: Please omit the Patient ethnicities field for non-proband samples.​

    1. Sex (*)

    2. Relationship

    3. Date of Birth

    4. Ignore Sample

    5. Add Proband's phenotypes

    Note: A popup notification will appear at the bottom of the page if any input HPO term or HPO ID is unknown.

    6. Unrelated Phenotypes

    Selection mode

    Batch mode

    2. Select Complete.

    For each gene please follow the steps described below: Enter a gene symbol in the search box in the right panel (Candidate Genes) and select a matching symbol from a dropdown menu.

    After selecting batch mode, paste a list of comma-separated gene symbols in the search box in the right panel (Candidate Genes).


    You can choose between two different modes of a gene list feature:

    Selected by default.

    AI Shortlist is limited to the selected gene panel, no variants in other genes are considered in the results. If this in silico panel is used for analysis of exome or genome data, the gene restriction may be lifted during manual analysis to "open-up" the entire exome or genome for analysis.

    Analysis is performed for variants in all the genes. Variants in the targeted genes get upgraded scores during prioritization by the AI Shortlist algorithm.

    Select a gene list

    1. All genes

    2. Existing gene list

    3. Create a new gene list

    Note: Please use the up-to-date gene symbols approved by the Hugo Gene Nomenclature Committee. When adding gene symbols in a Batch mode, those genes that do not comply with HGNC standards will be automatically excluded from the gene list. These genes will appear for 3 seconds in a black error box at the bottom of the screen.

    Selection mode

    Selection mode
    Batch mode

    Batch mode

    Gene list modes

    1. In silico panel

    2. Boosted genes

    defined in
    Lab workflow settings
    .

    To manage preset groups, go to Settings → Organization settings → Lab workflow. There, you can create and edit preset groups, hide or unhide them, download them, and set a default preset group for new cases.

    editing

    Where can I manage preset groups for my organization?

    Learn more

    For the DRAGEN v4.2 analysis, Emedgene utilizes:

    For the DRAGEN v4.0 analysis, Emedgene utilizes:

    For DRAGEN versions < 4.0, Emedgene utilizes:

    here
    here
    here
    here
    here
    here

    Manage S3 credentials

    Whenever an organization is created, we automatically allocate bucket folders in AWS S3 cloud storage to it:

    • Path for upload

    <Region_Cloud>-emg-auto-samples/<org_name>/upload/

    Folder intended to store input case files.

    Authorized user has view and upload privileges.

    • Path for download

    <Region_Cloud>-emg-downloads/<org_name>/ 

    This folder contains a partially annotated (excluding results of proprietary algorithms) VCF file per case.

    Authorized user has view and download privileges.

    • Path for DRAGEN output

    This folder contains DRAGEN output files.

    Authorized user has view and download privileges.

    To get access to your upload, download and DRAGEN output folders, you need to get a key pair consisting of an access key ID and a secret access key. , , and credentials is available for users with Manager and Manage S3 Credentials .

    You can create and use up to two dynamic access keys at the same time.

    When you require technical support, you have the option to generate a new key pair specifically for the troubleshooting process. After the issue has been resolved, you can delete the credentials to ensure security of your system.

    The newly generated credentials will only be saved in AWS Identity and Access Management (IAM) and not in our database.

    1. In Settings > Management > S3 Credentials, click on Create Access Key.

    2. You can retrieve the secret access key only when you initially create the key pair. If you lose it, you have to create a new key pair. To immediately copy the secret access key to a secure location, use the Copy to clipboard button.

    In Settings > Management > S3 Credentials, click on Deactivate in the corresponding key pair card.

    In Settings > Management > S3 Credentials, click on Activate in the corresponding key pair card.

    In Settings > Management > S3 Credentials, click on Delete in the corresponding key pair card. Only inactive key pairs can be deleted.

    How to filter cases

    Available filters

    You can filter cases using most of the Cases table fields, as well as by the case outcome category (Resolved / Not resolved), which is the only filter not displayed as a column.

    Category
    Case filters

    Identification

    Case ID

    Sample ID (Proband ID)

    Case processing stage

    1

    Go to the Filters menu in the Cases table navigation panel.

    2

    Under Field, select the field you want to filter by.

    3

    Under List

    1

    Go to the Filters menu in the Cases table navigation panel.

    2

    To remove a specific filter, click the icon next to it.

    In the Cases table navigation panel, click the icon next to the Filters menu.

    Case info

    The Case info tab includes the following information:

    • Case ID—a unique identifier assigned to each case by Emedgene, formatted as EMGXXXXXXXXX

    • Case type—the type of analysis performed:

      • Whole Genome

      • Exome

      • Custom Panel

      • Array

    • Sample type—the format of the sample files used in the case:

      • FASTQ: *.fastq.gz, *.fq.gz, *.bam, *.cram.

      • Project VCF: *.pvcf, *.vcf, *.vcf.gz, *.pvcf.gz

      • VCF: *.vcf, *.vcf.gz, *.targeted.json, *.gt_sample_summary.json

    • Gene list—defines whether gene list was used during analysis and how it was applied:

      • All genes—AI Shortlist was neither confined to nor prioritized a specific gene list

      • Virtual panel (In silico panel)—AI Shortlist was limited to only the genes in the gene list

    • Analysis type:

      • If field is not present—carrier analysis was not performed

      • Carrier—carrier analysis was performed for the selected gene list

    • Human reference—the genome reference used during case analysis

    • Ordered by—the user who created the case and the case creation date

    • Signed by—the user who finalized the case

    • Related cases—the Case IDs of other cases that share one or more samples with the selected case

    • Patient Information—basic demographic details:

      • Sex. Specified by the user

      • Age. Automatically calculated in years based on the provided date of birth

    Additional case information can be added using custom fields, either via the API or by including extra columns in your CSV during batch case creation. This allows you to extend the case details panel with project-specific data. To enable this feature or learn more, please contact [email protected].

    Secondary findings

    While creating a new case, you can choose whether to include secondary findings for the proband. This option is available on the Family Tree screen → Create family tree panel → Show Secondary Findings.

    Secondary findings are genetic variants that are not related to the primary indication for testing but may have important medical implications. These variants are automatically assigned the Incidental tag when they meet American College of Medical Genetics and Genomics (ACMG)-defined criteria for reportable secondary findings.

    In Emedgene, the terms incidental findings and secondary findings both refer to ACMG-defined secondary findings. The platform continues to use the “incidental” label in certain places for technical consistency, though the modern clinical standard is “secondary findings.”

    Tagging criteria

    A variant is automatically tagged as a secondary finding if it meets all of the following criteria:

    1. Classification: Previously classified as pathogenic or likely pathogenic in ClinVar or Curate variant databases

    2. Zygosity: Heterozygous or homozygous (only homozygous for the HFE gene)

    3. Allele frequency: Less than 5%

    4. Read depth: 10× or higher

    5. Variant quality: Any value except LOW

    6. Affected gene: Listed in the ACMG SF v3.2 or 3.3 medically actionable gene list for reporting secondary findings in clinical exome and genome sequencing (PMID: 37347242, 40568962)

    ACTA2, ACTC1, ACVRL1, APC, APOB, ATP7B, BAG3, BMPR1A, BRCA1, BRCA2, BTD, CACNA1S, CALM1, CALM2, CALM3, CASQ2, COL3A1, DES, DSC2, DSG2, DSP, ENG, FBN1, FLNC, GAA, GLA, HFE, HNF1A, KCNH2, KCNQ1, LDLR, LMNA, MAX, MEN1, MLH1, MSH2, MSH6, MUTYH, MYBPC3, MYH11, MYH7, MYL2, MYL3, NF2, OTC, PALB2, PCSK9, PKP2, PMS2, PRKAG2, PTEN, RB1, RBM20, RET, RPE65, RYR1, RYR2, SCN5A, SDHAF2, SDHB, SDHC, SDHD, SMAD3, SMAD4, STK11, TGFBR1, TGFBR2, TMEM127, TMEM43, TNNC1, TNNI3, TNNT2, TP53, TPM1, TRDN, TSC1, TSC2, TTN, TTR, VHL, WT1.

    Includes all v3.2 genes plus newly added genes:

    • PLN

    • ABCD1

    • CYP27A1

    This brings the total to 84 reportable genes.

    Supported parental ethnicities

    The ethnicities of the proband's mother and father can be specified during the process of UI or API case creation. Please refer to the following list of supported ethnicities.

    A "Afghan Jews" "Afghani" "African" "African American" "Afro-Brazilian"

    "Alaska Native" "Algerian" "Algerian Jews" "Amish" "Anatolian" "Arab" "Argentinian/Paraguayan" "Armenian" "Ashkenazi Jews" "Asian" "Asian Brazilian" "Australian Native" "Azerbaijan Jews"

    B "Bedouin" "Bengali/Northeast Indian" "British/Irish" "Bulgarian Jews"

    C "Caribbean Australian"

    "Caucasus Jews" "Central African" "Central Asian" "Chilean" "Chinese" "Chinese Dai" "Christian Arab" "Circassian" "Colombia"

    D "Druze" "Dutch"

    E "East African" "East Asian" "East European" "Egyptian" "Egyptian Jews" "Emirates" "Ethiopia" "Ethiopian / Eritrean" "Ethiopian Jews" "Ethiopian Jews - Beta Israel" "European" "European American"

    F "Fijian Australian" "Filipino" "Filipino Austronesian" "Finnish" "French" "French Canadian"

    G "Georgian Jews" "Germans" "Ghanaian / Liberian / Sierra Leonean" "Greece Jews" "Greek Americans" "Greek / Balkan" "Guam/Chamorro"

    Batch case upload from platform

    If you're comfortable with scripting and API usage, you can upload multiple cases at once using those methods. But if you're not a technical expert, don't worry. There is a user-friendly alternative available—importing a CSV file directly through the user interface.

    Please follow the steps as described below.

    Caution: Please note that refreshing or leaving the page, exiting the Add new case tab, or power failure of your computer before you've completed a batch case upload will result in loss of the case creation progress.

    1. Prepare a CSV file

    CSV (Comma-Separated Values) is a simple file format used to store data in tabular form. A row represents a sample, and a column represents a data field.

    Start by downloading a CSV template with an example line and mandatory and non-mandatory fields from the Add new case page set to Batch mode (see step 2). Fill the file with your data according to CSV format requirements.

    2. Upload a CSV file

    1. Click on the + New case button on the top navigation panel.

    2. Click on the Switch to batch button in the top right corner. You'll be directed to the Select file page of the Batch upload flow. Note: Here you can download a CSV template in the valid format.

    3. Drag and drop a CSV file into the box or upload it from the file explorer. Wait for file upload and validation to finish.

    After validation is complete, you will be directed to the Batch validation page. It features validation results details for you to review:

    • File name,

    • Number of rows in the file,

    • Number of cases to be created

    • Number of errors found,

    1. Click on Create. A progress bar will appear on the right as the cases are created (Cases creation page).

    2. If the cases have been created successfully, the Cases summary page will display the total number of cases that were created.

    3. If there were any errors during the batch case creation process, the Cases summary page will display a table indicating the number of cases that were successfully created and the number of cases that failed.

    You will have the option to download a CSV file containing two additional columns: Errors and Case ID. The Errors column will contain error messages for samples where case creation failed, while the Case ID column will contain the Case ID of a successfully created case for the lines where case creation was successful.

    Manage ICA storage

    Prerequisites for managing ICA storage

    To manage ICA storage, the user must have:

    • The Storage Provider

    How to get your ICA credentials:

    1

    Log in to your Illumina private domain via URL in the following format: yourcompanyname.login.illumina.com. This opens the Connected Platform Home

    2

    In the left navigation panel: User > API keys

    3

    Name the key

    4

    Choose one of the following options:

    A. Grant access to all workgroups across the domain If your domain includes multiple workgroups and you want the API key to apply universally, select "All current and future Workgroups and roles (Global API Key)"

    B. Grant access to specific workgroups Select one or more workgroups from the list. For each selected workgroup, assign the following application roles:

    • Emedgene Has Access

    5

    Click Generate. Once the API key is generated, copy it to your clipboard or download it as a file.

    ⚠️ Important: The API key is only accessible while the API Key Generated popup window is open. After closing the window, the key cannot be retrieved. If you didn’t copy or download it, you’ll need to generate a new key.

    1

    Log into your Emedgene domain and go to the workgroup where you want to link ICA storage

    2

    Click on the user avatar and select Settings from the dropdown

    3

    Select the Management tab

    4

    Emedgene annotations and update frequency

    Every case is annotated with the attached table of resources, including proprietary Illumina prediction scores PrimateAI-3D and SpliceAI. All annotations are versioned, and versions recorded in a Versions tab, and saved per case. Key variant significance and knowledge graph databases are updated monthly, so that the most up-to-date information is available during analysis.

    Individual case page: Top bar

    The Top bar in the Individual case page indicates the Case ID and current .

    • Change the

    • Reanalyze the case

    • and write interpretation notes

    Most Likely Candidates and Candidates

    Variants that are most promising for solving the case. This list is limited to 10 top-scored variants but may include more if more than one variant is tagged per gene (suggesting compound heterozygosity). We can change the Most Likely Candidates number limit upon request.

    Several dozen highly scored variants worth considering.

    The ranking of variants by AI Shortlist considers:

    • SNVs

    • CNVs

    Lab tab

    The Lab tab shows sample and case-level quality metrics so you can check data reliability before starting interpretation.

    • —highlights the key quality indicators, with more details provided in the subsequent sections

    • —reports sequencing run technicalities

    Summary dashboard

    Summary dashboard provides a quick overview of key quality indicators at both the case and sample levels.

    • Displays the overall case quality status

    • Reflects sample quality status

    Sequencing lab information section

    Sequencing lab information section reports sequencing run technicalities as indicated during case creation:

    • Lab

    • Instrument

    • Reagents

    Case quality section

    The Case quality section summarizes the data quality of the case and highlights the results of validation checks:

    • Chromosome validation

      Confirms that each chromosome with at least 100 SNVs in defined enrichment kit or coding regions includes at least one high-quality variant

    • gnomAD validation

      Verifies that each chromosome with at least 100 SNVs in defined enrichment kit or coding regions includes at least one variant annotated with gnomAD

    NGS sample quality

    The overall sample quality indicator provides a quick assessment of sequencing reliability for each sample.

    Sample quality is evaluated using the following metrics:

    • Average depth of coverage Mean coverage across the target regions

    • % bases covered >20x Percentage of bases in the target regions covered at a depth greater than 20×, indicating reliable coverage

    NGS sex validation

    The Sex validation column indicates whether the biological sex inferred from genomic data matches the sex information provided during case creation. This helps identify potential sample mix-ups or metadata errors before interpretation begins.

    Sex validation results:

    • Pass

      Reported sex matches the estimated sex

    • Fail A mismatch was detected between reported and estimated sex.

    Coverage

    Coverage metrics for a target region defined by a QC BED file (or RefSeq coding regions if no kit is provided) included in the Sample quality section:

    • Average coverage Average depth of coverage for a target region

    • % Bases with coverage >10x percentage of a target region that is covered at a minimum depth of 10x

    • % Bases with coverage >20x percentage of a target region that is covered at a minimum depth of 20x

    Array sample quality metrics

    Array sex validation

    The Sex validation column indicates whether the biological sex inferred from genomic data matches the sex information provided during case creation. This helps identify potential sample mix-ups or metadata errors before interpretation begins.

    Sex validation results:

    • Pass

      Reported sex matches the estimated sex

    • Fail A mismatch was detected between reported and estimated sex.

    DRAGEN QC report

    The is generated by the Illumina DRAGEN Bio-IT Platform and covers the entire analysis workflow—from raw sequencing reads to variant calls.

    • Interactive HTML summary A visual summary that includes interactive plots of key quality metrics. This report can be from the Sample quality section of the Lab tab.

    • CSV metric files A set of detailed CSV files containing sample-level quality metrics. These files are and support in-depth review and documentation.

    Review interactive DRAGEN report

    When available, a link appears below the sample name in the Sample quality section of the Lab tab. Clicking the link opens the detailed quality control metrics report in a new browser tab. This integration allows users to quickly assess sequencing quality and confidently interpret results—without leaving the Emedgene interface.

    Download DRAGEN QC metrics files

    Sample-level for all samples in a case can be downloaded by clicking the download icon next to the Sample quality section title.

    For NGS cases, the report includes coverage and mapping statistics.

    For array cases, metrics include array QC values such as call rate, autosomal call rate, and Log R dev.

    Boosted gene list—AI Shortlist analyzed variants in all genes, but variants in the gene list were given higher priority
    Due Date—the user-defined deadline for finalizing the case. To enter or edit the Due Date, click the calendar icon in the Due Date section
  • Participants—Users involved in the case, whether in submission, analysis, finalization, or those subscribed to updates. To receive email notifications, click the Subscribe icon. To unsubscribe, hover over your avatar and click the X icon

  • Clinical Information:
    • Proband phenotypes—HPO terms used to describe clinical findings in the proband

    • Suspected disease—if provided, includes the suspected condition, penetrance (%), and severity (mild, moderate, severe, or profound)

    • Maternal and Paternal ethnicity—ethnic background of the proband’s parents

    • Parental consanguinity—indicates whether the parents are related by blood

    • Report secondary findings—specifies whether secondary findings analysis was requested

  • Clinical note—free-text notes provided at the time of launching the analysis

  • When choosing an existing file path, the samples used may be cached from the original run. For a top-up flow please use a new file path.
  • When you are loading sample files from your PC or choosing them from the storage, and there is more than one file per sample, please ensure that all the necessary files are simultaneously selected in the upload pop-up. You may only select one file type per case (i.e. you may not select both a .vcf and a .bam at the same time).

  • Kit type

  • Expected coverage

  • Protocol

  • ClinVar validation

    Ensures that each chromosome with at least 100 SNV variants in defined enrichment kit or coding regions includes at least one variant annotated with ClinVar

  • AI Shortlist validation

    Checks that at least one variant is tagged by the AI Shortlist.

    • This validation is not applicable if the gene list contains fewer than 50 genes

    • If your workgroup uses a higher threshold, it is reflected in the Gene list threshold field

  • mtDNA reference validation Confirms that the rCRS reference is used for mitochondrial DNA

  • Error rate Sequencing error rate. Reflects general sequencing accuracy

  • % mapped reads Proportion of reads successfully mapped to the reference genome

  • Contamination check Detects mixed or low-quality samples that may affect interpretation

  • These metrics give an overall confidence level for whether the sequencing data can support accurate variant interpretation.

    Blue bars represent each of these parameters per sample, while a vertical line represents a general metric across all the samples of the same case type in the account.

    H "Hawaiian"

    I "Iberian" "India - Bene Israel Jews" "India - Cochin Jews" "Indian" "Indigenous Amazonian" "Indigenous peoples in Canada" "Indonesian" "Inuit" "Iranian" "Iranian Persian Jews" "Iraq" "Iraqi Jews" "Irish" "Italian" "Italian Americans" "Italian Jews"

    J "Japanese" "Japanese Brazilian" "Jordan"

    K "Kenyan" "Korean" "Kurdish" "Kurdish Jews"

    L "Latino/Hispanic Americans" "Lebanese Jews" "Levantine" "Libyan" "Libyan Jews"

    M "Maasai" "Malayali Indian" "Melanesian" "Mesoamerican and Andean" "Mexican American" "Middle Eastern" "Mongolian / Manchurian" "Mormon" "Moroccan" "Moroccan Jews" "Muslim Arab"

    N "Native American" "Nepali" "Nigerian" "North African" "North and West European" "Northern Asian" "Northern Indian"

    O "Other Pacific Islander"

    P "Pakistani" "Papuan" "Polynesian" "Portuguese in Northern Brazil" "Portuguese in Southern Brazil"

    R "Russian Jews" "Russians"

    S "Samaritan" "Samoan" "Sardinian" "Saudi" "Scandinavian" "Senegambian / Guinean" "Siberian" "Somali" "South African" "South Asian" "Southern East African / Congolese" "Southern European" "Southern Indian" "Southern Indian / Sri Lankan" "Southern South Asian" "Spaniards" "Spanish Jews" "Sub-Saharan African" "Sudanese" "Swedes" "Syrian Jews" "Syrian-Lebanese"

    T "Tajikistan Jews" "Thai / Cambodian / Vietnamese" "Tunisian" "Tunisian Jews" "Turkish" "Turkish / Anatolian" "Turkish Jews"

    U "Ukraine" "Ukraine Jews" "Uzbekistan/ Bukharan Jews"

    V "Venezuela"

    W "West African"

    Y "Yemenite" "Yemenite Jews"

    Sample quality (overall)

    Sex validation

    Autosomal call rate

    Call rate

    Log R deviation

    Time zone awareness. Timestamps follow the system’s configured time zone, which may differ from your local time—especially in international collaborations.

    Evidence notes

    Evidence notes updated — this log entry includes a link to the relevant variant page for immediate review

    Evidence pathogenicity

    Variant pathogenicity updated — this log entry includes a link to the relevant variant page for immediate review

    Evidence graph

    Evidence graph updated — this log entry includes a link to the relevant variant page for immediate review

    ACMG pathogenicity

    ACMG evidence updated (logs any changes made via the ACMG classification wizard) — this log entry includes a link to the relevant variant page for immediate review

    Transcript changes

    Reference transcript updated

    Important notes

    SNV + CNV compound heterozygotes

  • SVs

  • mtDNA variants

  • STRs

  • The AI Shortlist rates variants based on predicted variant effects, alternative allele frequency, familial segregation pattern, phenotypic match, in silico predictions, and other relevant information from scientific papers and databases.

    During the case review, you can untag variants selected by the AI Shortlist or manually tag ones not selected by the AI Shortlist.

    To streamline case review, the AI Shortlist pre-selects the list of variants likely to be causative for each case:

    Most Likely Candidates

    Candidates

    Illumina Connected Analytics - Has Access

  • Platform-home Workgroup Admin

  • In the Storage card, click Add Storage

    5

    Select Illumina Connected Analytics (not Illumina Connected Analytics V1!) from the Storage type dropdown

    6

    Fill the storage credentials:

    • "Api_key"—the API key generated before

    • "Project"—the name of the Project in ICA that contains and will contain the data you want to connect

    • "Path"—the folder within the project where the data is located. This can be used to restrict the user to only be able to access data within the specified folder. Using only “ / “ will allow all folders within your ICA project

    7

    Click Add Storage

    Upload and download permissions on the ICA project—either granted individually or via entire workgroup

    To connect ICA:

    user role
    Evaluation kit

    Specifies the QC BED kit used to evaluate coverage depth and breadth. If no kit is specified at analysis launch, NCBI RefSeqGene is used as the default reference

  • Custom gene coverage Indicates whether the coverage of genes in the selected panel meets the expected threshold, as defined by the QC BED

  • Pedigree status Displays the results of relationship validation, confirming whether the submitted pedigree aligns with genetic data

  • Included metrics:

    Case quality
    Sample quality

    DRAGEN QC report formats

    DRAGEN sample-level quality control (QC) report
    accessed
    downloadable
    , select a value from the dropdown or manually enter one.
    4

    Select Apply to activate the filter.

    5

    To add another filter, select Add new under the active filter and repeat steps 1-4.

    Status

    Resolved or Not resolved

    Cases with no selected result are not included in the Not resolved category.

    Case details

    Type

    Label

    Quality

    Participants

    Participants

    How to apply filters

    How to remove a filter

    How to clear all filters

    Status message

    • If no errors were detected, a success message will be displayed

    • If any errors were detected, an error message will be displayed.

      You will be given the option to download a file with error details to help you diagnose and correct any issues with the data. Once you've corrected the CSV file, reupload it.

    3. Review file validation results

    4. Create cases

    API/batch upload limitations

    • When using the API or batch upload, note that applying multiple gene lists can inadvertently exceed a combined limit of 10,000 genes across panels. The platform may not provide an explicit error message in such cases. Plan gene-panel combinations carefully.

    • Combining gene lists at case creation is available via the UI only and cannot be performed through API/batch upload.

    • API/batch upload cannot add phenotypes for an unaffected parent.

    • JSON files cannot be uploaded via API/batch upload.

    4MB
    Illumina_Connected_Software_Emedgene_Annotation_Schema.pdf
    PDF
    Open
    DRAGEN report
    DRAGEN QC metric files

    Lab workflow settings

    Manage presets and preset groups.

    Learn more

    Default preset group

    Set the preset group applied automatically when no group is selected.

    Learn more

    Presets tab

    Use predefined combinations of filters that reflect your laboratory’s SOPs.

    Learn more

    ACMG SF v3.2 gene list

    ACMG SF v3.3 (2025 release; requires pipeline v100.39.0+)

    Historical note

    When Emedgene was first released, the term “incidental findings” was adopted in alignment with the clinical genomics standard at the time. The 2013 ACMG recommendations defined incidental findings as “the results of a deliberate search for pathogenic or likely pathogenic alterations in genes that are not apparently relevant to a diagnostic indication for which the sequencing test was ordered” (PMID: 23788249).

    As the field evolved, the ACMG and broader clinical community began to distinguish between “incidental findings” (unexpected, not actively sought) and “secondary findings” (intentionally analyzed and reportable). This shift was reflected in the updated 2016 ACMG guidance (PMID: 27854360).

    To reflect this change, Emedgene introduced the term “secondary findings” into the platform. However, “incidental findings” remains in use throughout the platform for technical consistency.

    Tips:

    • Enable secondary findings when clinically relevant — this ensures variants in actionable genes are surfaced automatically.

    • Always review findings in the context of patient consent and your institution’s reporting policies.

    Warnings:

    • Secondary findings are limited to the ACMG-defined gene lists. Variants outside these lists will not be tagged automatically.

    • Only variants with adequate sequencing depth and quality are tagged. Low-quality calls may require manual review.

    Array sample quality

    The Quality status provides a quick assessment of array data reliability for each sample:

    • High

      Call rate ≥ 0.99 and Log R dev ≤ 0.2

    • Low If either condition is not met

    • N/A

      If the QC file not available

    Use the Quality status to quickly screen whether a sample meets minimal QC thresholds before starting detailed interpretation.

    Log R deviation

    The Log R Deviation (or Log R Ratio standard deviation) quantifies the variability of the the signal intensity for each SNP marker on an array, ie, noise level.

    Log R deviation is one of the key metrics used to determine array sample quality, alongside call rate.

    Lower values indicate more consistent signal intensities. A high Log R Deviation can indicate a poor-quality sample or potential issues with CNV calling.

    Displayed to three decimal places.

    Call rate

    The Call rate field displays the percentage of loci on the array for which a genotype call was successfully made.

    Call rate is one of the key metrics used to determine array sample quality, alongside log R deviation.

    A high call rate indicates a high-quality sample and successful genotyping. Low call rates can signify problems with the DNA sample (poor quality or quantity) or issues during the array processing.

    Displayed to three decimal places.

    N/A QC file not available; validation could not be performed.

    Sex validation is performed by comparing the observed homozygous/heterozygous genotype ratio on the X chromosome with the expected ratios:

    • <2 for females

    • >2 for males

    Prerequisites:

    • Only high-quality SNVs from targeted regions—either kit-specific or RefSeq coding regions—are used for sex validation

    • A minimum of 50 variants is required to generate a reliable result. If this threshold is not met, sex validation cannot be performed, and no result is displayed

    If the sex was marked as unknown during case creation, the system will display the predicted sex instead of a validation status.

    N/A QC file not available; validation could not be performed.

    If the sex was marked as unknown during case creation, the system will display the predicted sex instead of a validation status.

    —summarizes the data quality of the case
  • Sample quality section—highlights quality metrics for each sample

  • Pedigree section—displays the results of the relationship validation for each pair of samples in a family tree

  • Genes coverage section—highlights regions that may not have been adequately sequenced

  • The Lab tab includes:

    Summary dashboard
    Sequencing lab information section
    Case quality section
    Managing AWS S3 Lifecycle policy

    How to create a key pair

    How to deactivate a key pair

    How to activate an inactive key pair

    How to delete a key pair

    Creating
    deactivating
    activating
    deleting
    roles
    <Region_Cloud>-emg-auto-results/<org_name>/ 

    Edit the case info

  • Preview the case report

  • Options available through the Top bar:

    Case status
    Case status
    Finalize the case

    Cases table

    Cases table lists key details of all genomic sequencing cases submitted by the organization.

    You can customize the table by hiding, showing, rearranging fields, or adjusting column widths, except for Case ID, which is fixed as the first column and always visible.

    Cases table fields

    Identification

    Case ID

    A unique case identifier (EMGXXXXXXXXX).

    This field is fixed and cannot be hidden or repositioned in the table.

    Please provide this code to Tech Support when reporting any issues.

    Proband ID

    Identifier of the proband.

    For , this is the Sample Name; for , it is the BioSample Name of the test subject.

    Proband phenotypes

    Field
    Description

    Manage BaseSpace storage

    Log in to Emedgene and navigate to Settings in the upper right-hand corner of the page.

    Click on the Management tab and then on Add Storage.

    Choose Illumina BaseSpace storage type.

    Fill Client Key, Client Secret and App Token as provided from BaseSpace (a description on how to get this information is provided below) and click Add storage to complete the setup.

    Install BaseSpace CLI (Command Line Interface)

    Follow the instructions on the if needed. Be aware of the Basespace Regional Instance you are working on (us, euc1, aps2, euw2)

    On BSSH, login to the workgroup you want to connect as the storage.

    Once the BaseSpace CLI is installed, run the authentication command in the terminal.

    The command will direct you to a link which requires to login.

    After the authentication was completed successfully, find the access token in the config file.

    The result should look like -

    Populate the App_token with the accessToken value, and Server with the apiServer URL from the BSSH config file.

    Client_key will be displayed in subsequent menus, so a descriptive name such as the workgroup name can be used.

    Client_secret is unused when the App_token is available and can be set to "x".

    Go to the BaseSpace and login. Be aware of the Basespace Regional Instance you are working on (us, , , )

    Go to My Apps and click Create a new Application.

    Fill details for the application and click on create an application.

    Fill details and press save.

    You will need to fill all the fields that it requested, please add “NA” to them.

    Go to My Apps and click on your new app. Then go to the credentials tab.

    You will find the Client ID (Client Key), Client Secret and App Token to enter to Emedgene platform.

    1. Log in into the desired Emedgene organization.

    2. Go to Settings

    3. Go to Management tab

    4. Click on Add Storage

    1. Add the information from your “Credentials” of the App previously created in BSSH.

    Default region of interest kits

    A region of interest (ROI) BED file determines which genomic regions will be included in the variant analysis. It functions as a preprocessing filter, determining which variants proceed to annotation and interpretation.

    Default ROI kits by case type

    If no custom ROI BED kit is applied to a case, the system applies a default ROI BED file based on the case type. All default ROI BED files are available for download (see Default ROI kit details).

    Case type
    Default region of interest BED

    A wide range of genomic regions BED file. It contains:

    • "RefSeq ALL" transcripts and "GENCODE" full genes regions with 5Kbp upstream and 5Kbp downstream

    • Within this range, all “Clinical Regions” are included

    • All dosage regions (HI/TS sig level 1, 2 or 3)

    Moreover, liftover versions of both reference regions were included, for the current and previous range versions.

    • Liftover done using CrossMap (v0.5.2), chain hg19ToHg38.over.chain.gz

    • NCBI RefSeq regions are based on the release 105 (hg19) and 110 (hg38)

    • Gencode regions are based on the release V19 (hg19) and V41 (hg38)

    • All microRNA genes based on HGNC miRNA definition December 2022

    Download files used in v100.39.0+

    Download files used up to v38.0

    This is a BED file that includes every clinically relevant region. The following are included:

    • “RefSeq Curated” and “GENCODE” regions with flanking areas of 50bp from each side 5UTR and 3UTR region for protein coding genes (based on RefSeq)

    • OMIM disease-related RNA genes (flanking 50bp)

    • All Clinvar Pathogenic variants regions (flanking 50bp)

    For consistency, the GRCh38 version includes the lifted over regions of GRCh37 (liftover using CrossMap).

    Download files used in v100.39.0+

    Download files used up to v38.0

    Manage Google Cloud storage

    Google Cloud Storage Credentials update procedure

    How to get the client credentials?

    1. Go to the google cloud Console.

    2. Navigate to IAM & Admin - In the left sidebar, go to IAM & Admin > Service Accounts.

    1. Create a New Service Account: Click on the "Create Service Account" button at the top.

    1. Fill in the Service Account Details:

      • Service account name: Give your service account a name.

      • Service account ID: This will be automatically generated based on the name.

    1. Assign Roles to the Service Account:

      • In the Grant this service account access to project step, you’ll assign the necessary roles.

      • Grant these role:

    • Add the above 3 values into the appropriate fields:

      • Client_credentials_base64: pasting the output of 8.

      • Bucket: the bucket name.

    1. Download and install the Google Cloud SDK from the Google Cloud SDK Install page.

    2. Select Your Platform (Windows, macOS, or Linux), download and run.

    3. Initialize and Authenticate with Google Cloud: In the Cloud SDK Shell/terminal, run: gcloud init This will open a browser window to authenticate your Google account. Follow the instructions to log in and select your project.

    notice:

    • origin: if using Illumina cloud:

      https://host_name.emg.illumina.com

      else, Emedgene cloud:

      https://host_name.emedgene.com

    1. Apply CORS Configuration to Your Bucket: run the next command. gcloud storage buckets update gs://your-bucket-name --cors-file=cors.json

    2. Verify the CORS Configuration: gcloud storage buckets describe gs://your-bucket-name

    Manage Azure Blob storage

    Before you proceed to this article, make sure you understand data storage management basics.

    Update Azure Blob Storage Credentials

    In Settings > Management Tab, add or edit the required credentials: CLIENT_ID, CLIENT_SECRET, TENANT_ID, and ACCOUNT_URL.

    See the table below to learn where to look for them in your Azure account.

    Emedgene setting
    Corresponidng client (Azure) setting

    1. In Microsoft Entra ID, click on App registrations.

    1. Select New registration.

    2. Fill the name of the application & press "register."

    3. You got to the registered app page: (CLIENT_ID / TENANT_ID) From this you can retrieve: Application ID and Tenant ID. Both are marked in the screenshot.

    1. Press "Certificates & secrets"

    2. Press on "New Client secret"

    1. Fill the "Description" and change expires to 12 months. (or according to your organization policy), than press "Add"

    8. Get the CLIENT_SECRET from this page.

    1. Give this App registration roles and read access to the relevant Blob.

    1. Go to Azure Storage accounts

    1. Get into the relevant Storage account

    1. Press on "containers"

    1. Press on the relevant container

    2. Press on "Properties"

    3. Copy the ACCOUNT_URL


    Errors for bad connections can be found in CloudWatch on particular FRY log stream

    Search for: BlobApi, BlobFs, azure.

    1. Select sample type

    When creating a new case, the first step is to select the sample input type. This determines how your data will be processed and which quality metrics will be available later in the analysis.

    You can choose from the following supported formats: FASTQ, Project VCF, and VCF.

    FASTQ

    Use this option if you want the platform to perform secondary analysis and variant calling.

    Accepted file types:

    • .fastq.gz

    • .fq.gz

    • .bam

    • .cram. Make sure you understand the current limitation for using CRAM files by expanding the section below.

    Use when working with a joint VCF file containing multiple samples.

    Accepted file types:

    • .pvcf

    • .vcf

    • .pvcf.gz

    Use for cases where variants have already been called externally, or for cytogenetic array inputs.

    Accepted file types:

    • .vcf

    • .vcf.gz

    • .targeted.json

    Joint calling in Emedgene

    Classic joint calling consists of calling variants "simultaneously across all sample BAMs, generating a single call set for the entire cohort." (GATK.broadInstitute.org)

    When running from BAM or FastQ samples on Emedgene, we do not apply a classic joint calling but a BAM look-up methodology.

    This methodology consists of retrieving coverage information from BAM during the VCF merging process. Thus, if a variant does not exist in a parental sample, the algorithm will check the coverage in that position using data from the BAM file. The position will be considered as "REF" allele if it is covered (depth > 3), and "No coverage" or "N/A" (./. in the VCF FORMAT/GT field), if it is below that threshold or has no coverage.

    This process involves the creation of a “genome coverage” file as a separate preliminary step. The coverage file could also be provided via a BED or a gVCF file.

    BAM look-up approach is slightly different from classic joint calling used by the joint calling option in DRAGEN and other variant callers, and therefore will not produce identical results.

    However, it is important to mention that Emedgene platform supports joint called VCF files, as well.

    Remark: If a coverage file (ie. BED, BAM, gVCF) is not provided, then it is not possible to estimate the presence of REF allele in empty positions. As a consequence, "No_coverage" value will be assigned to those variants, which can affect the .

    Limitation: It should be noted that the current data pipeline has a limitation stemming from the way it merges variants from different samples into the same case (e.g., in a trio). Since it is based on bcftools, variants are identified by the chromosome number, start position, reference allele, and alternate allele. However, it does not take into account the size of the variant itself. As a result, this may sometimes lead to inaccurate merging of CNV-type variants that differ in size. That limitation is not present when joint calling is used.

    Integrating variant annotations from multiple sources

    The Emedgene pipeline prioritizes variant annotations based on the calling methodology rank order. The first appearance of a variant is annotated according to the following hierarchy:

    1. TARGETED

    2. STAR_ALLELE

    3. STR_REPEAT_EXPANSION

    4. MRJD

    5. FORCED_GENOTYPING

    6. SMALL_VARIANT

    7. CNV_READ_DEPTH

    8. SV_SPLIT_END

    9. UNKNOWN

    Variants are considered identical if they share the same:

    • Chromosome

    • Position

    • Reference allele (REF)

    • Alternate allele (ALT)

    When applied to Copy Number Variants (CNVs), this approach may merge variants even if they have different lengths.

    Processing multi-nucleotide variants

    Unlike single-nucleotide variants (SNVs), a multi-nucleotide variant (MNV) represents a single event involving multiple consecutive bases. In Emedgene, small variants are recognized as those comprising an MNV if they are located within a 2-nucleotide distance.

    Currently, Emedgene does not fully support MNV functionality. The following features are restricted:

    • Export to Curate: Blocked because Curate does not support MNVs.

    From v100.39.0 onward:

    Emedgene recognizes MNV as a distinct variant type and supports ingestion from VCF, annotation, and filtering.

    Each MNV is represented and annotated as:

    • An MNV itself (eg, AG>TC)

    • Individual SNVs derived from the MNV (eg, A>T and G>C), for compatibility with existing tools and workflows

    Both the MNV and its underlying SNVs display the "Suspected MNP" badge in the .

    During data processing, MNVs are split into consecutive SNVs. The resulting SNVs are annotated with INFO and FORMAT fields that mirror the original record.

    SNVs that comprise an MNV display the "Suspected MNP" badge in the .

    Annotations from organization databases

    Annotations from organization databases appear in various parts of the platform, each showing certain details.

    Historic and noise databases

    Variant table

    • Allele frequency—in "[Organization DB] AF (%)" column

    • Allele count—in "[Organization DB] AF (#)" column

    Variant page

    • Summary tab Population summary card

      • Allele count—in "Allele count" field

      • Hom/Hemi count—in "Hom/Hemi count" field

      • The last 10 samples—in "Last 10 samples" field

    • Population statistics tab Organization DBs

      • Allele frequency—in "Allele frequency" column

      • Allele count—in "Allele count" column

    • Visualization tab Population data "[Organization DB]" tracks display variants from organization databases. Left-click a variant in a track to review variant details:

      • Allele frequency

      • Allele count

    • Color-coded "[Organization DB]" badge based on pathogenicity in the Curated DB—"Known variants" column

    • Summary tab Clinical significance card Color-coded "[Organization DB]" badge based on pathogenicity in the Curated DB

    • Clinical significance tab Clinical significance card Color-coded "[Organization DB]" badge based on pathogenicity in the Curated DB

    Individual case page

    The user can enter a specific case from the Cases tab by clicking Full details in the corresponding row of the case table.

    The individual case page includes:

    1. Top bar—displays a Case ID and Case status and includes Case interpretation, Edit case info, and Report preview buttons

    2. Candidates tab—highlights a shortlist of variants, suggested to be reviewed first - Most Likely Candidates and Candidates

    3. —illustrates quality metrics for the sequenced samples

    4. —provides an interactive overview of genomic structure, ideal for analyzing CNV and ROH/LOH events

    5. —provides numerous customizable filters to help you explore the total list of genetic variants in compliance with your organization's standard case review process. You can export shortlisted variants in .xlsx format

    6. —documents versions of all the resources used during case analysis

    How to update a case status

    You can update the case status either from the individual case page or from the Cases table.

    Finalized case status can be applied only from the individual case page to prevent unintended case completion.

    On the

    1

    Open the case page.

    2

    In the top bar, select the dropdown () icon next to the current case status.

    3

    Select the new status you want to apply.


    1

    Open the Cases tab.

    2

    In the Cases table, locate the relevant case row and select the case status.

    3

    From the dropdown menu, select the new status.

    Reviewing the Candidates tab

    To select variants with a particular tag, use the Filter candidates dropdown menu in the top right corner. You can select from Most Likely, Candidate, Incidental, Carrier, Not Reviewed, or any custom tags used in your organization.

    For each variant on the Candidates tab, you can explore the suggested diagnosis, gene symbol, main variant details, and variant tag.

    When a variant is found in a gene with no known association with a disease, the possible diagnosis cannot be indicated. Such variants are displayed under the Gene of Unknown Significance title.

    All the relevant Most Likelies and Candidates fitting a сompound heterozygous mode of inheritance are presented together. This refers to both confirmed and assumed compound heterozygosity (cases with at least one parent and singleton cases, respectively).

    If you want to inspect the complete variant information, click on the variant bar to continue to the Evidence page. You can visualize evidence in text or graphical format (Click on the interactive text in the top left corner: Show evidence as text or Show evidence graph to toggle between the two).

    NGS sample quality metrics

    (overall)

    *

    :

    • Average coverage

    • % Bases with coverage >10x

    • % Bases with coverage >20x

    *Available only for whole genome FASTQ cases.

    Ploidy

    The Ploidy column displays results from the DRAGEN Ploidy Estimator, which detects aneuploidies and infers the sex karyotype in whole genome cases.

    Ploidy values are derived from the *.ploidy_estimation_metrics.csv DRAGEN output file.

    Ploidy estimation results

    Pass

    • All autosomes fall within the expected ploidy range. No large‑scale autosomal copy number deviation is detected.

    Fail

    • At least one autosome has a median ploidy score below 0.9 or above 1.1. Hover over the result to see which chromosomes are problematic.

    N/A

    • Ploidy metrics are not available.

    • Check ploidy early in case review to quickly identify potential large‑scale chromosomal abnormalities.

    • Verify sex karyotype by confirming whether the sex karyotype inferred from ploidy matches the results to rule out sample swaps.

    • A failed ploidy result does not confirm a clinical abnormality. It indicates aberrant copy number estimation and must be interpreted in the context of other QC metrics and genomic visualization.

    Ploidy evaluation in the Lab tab is available only for whole genome FASTQ cases.

    In whole genome FASTQ cases, ploidy is shown in both the DRAGEN QC report and the Lab tab. The DRAGEN pipeline automatically generates the *.ploidy_estimation_metrics.csv file, which the platform uses to display ploidy results. Under this workflow, ploidy appears consistently across the interface.

    In whole genome , ploidy is shown only in the DRAGEN QC report, never in the Lab tab. When a case is created from VCF, the DRAGEN metrics supplied in a *.metrics.tar.gz archive are used exclusively to generate the DRAGEN report and are not ingested into the Lab tab. The platform displays N/A for Ploidy in the Lab tab.

    Contamination

    The Contamination column reports whether a sample shows signs of DNA contamination, helping ensure data reliability before interpretation.

    Be mindful that when contamination is suspected in sequencing data, it could stem from various sources, including true contamination, sample mix-up, library preparation issues, or technical artifacts.

    Always confirm the issue with other quality checks.

    Contamination is detected using Peddy calculations, which estimate the proportion of reads that do not match the expected genotype. This estimate is based on the idr_baf score.

    idr_baf stands for the interdecile range of the B-allele frequency—calculated as the difference between the 90th and 10th percentiles of the distribution of alt / (ref + alt) ratios across all variant sites.

    A larger idr_baf value indicates greater variability in allele balance, which may suggest sample contamination, particularly from another human DNA sample.

    Contamination check results:

    • N/A No data is available (older cases or when idr_baf = 0.000).

    • No No contamination detected (idr_baf < 0.200).

    • Unlikely Possible contamination, but evidence is weak (0.200 ≤ idr_baf

    Prerequisites for accessing the DRAGEN QC report

    NGS case

    Option 1: FASTQ case

    1

    Run a FASTQ case in Emedgene.

    2

    Since DRAGEN analysis is integrated into Emedgene secondary analysis pipeline, QC reports are automatically generated in the system.

    Option 2: VCF case—Bring your own DRAGEN (BYOD)

    1

    Run DRAGEN analysis externally.

    2

    a TAR archive containing DRAGEN QC metrics files.

    3

    Upload the TAR archive and the sample VCF file to Emedgene.

    Array cases start from VCF input files. DRAGEN QC for array cases is supported on Emedgene v100.39.0 and later.

    1

    Run DRAGEN analysis externally using .

    2

    Upload the .annotated_cyto.json DRAGEN QC metrics file, the sample VCF file, and the .gt_sample_summary.json file to Emedgene.

    Bring Your Own Bucket

    If you have an Enterprise account and you would like Emedgene-managed DRAGEN solution to save the DRAGEN output files in your own bucket, reach out to .

    Emedgene directly from your AWS S3 bucket. In order to do it, you should enable for the Emedgene application URLs.

    Case Type
    File Type
    Expected effect

    Adding patient info for the proband

    Options: Male, Female, Unknown.

    The default fixed value for Proband is Test Subject.

    Expected format: mm/dd/yyyy.

    Options: Affected, Healthy.

    The default value for Proband is Affected, but you may change it to Healthy.

    To add all relevant phenotypes for the Proband, use one of the following methods:

    Creating a single case

    This guide provides a step-by-step process for creating a new case via the user interface. Detailed instructions for each step are available in the corresponding pages of the .

    1. Click on the Add New Case button on the top navigation panel.

    2. At the page, choose the file type for your case analysis (FASTQ, gVCF, VCF, or Array).

    Candidates tab

    The Candidates tab displays all tagged variants, whether tagged by the AI Shortlist or manually by a user.

    Variants are automatically tagged as:

    • Most Likely Candidates and Candidates

      Variants prioritized by the AI Shortlist

    • Secondary findings

    Sample quality section

    The Sample quality section in the Lab tab gives you a quick view of the reliability of sequencing or array data used in your case.

    The metrics displayed in the Sample quality section and their underlying calculation vary depending on the case type (see below).

    NGS case

    Identical Variant Criteria

    Limitation

    Hom/Hemi count—in "# of Homozygotes" column
  • Allele number—in "Total" column

  • Allele number
  • Het count

  • Hom/Hemi count

  • The last 10 samples

  • arrow-right-long
    arrow-right-long
    arrow-right-long
    arrow-right-long
    arrow-right-long

    Curated databases

    Variant table

    Variant page

    < 0.241).
  • Likely Contamination suspected (0.241 ≤ idr_baf < 0.300).

  • Yes

    Contamination confirmed (idr_baf ≥ 0.300).

  • Hover over the value to display a tooltip showing the HET ratio (proportion of sites that are heterozygous) and the HET count (number of heterozygote calls in sampled sites).

    Tips:

    • Always review contamination results before starting interpretation to rule out technical issues that could explain unexpected variant calls.

    • Cross-check contamination results with other QC metrics (e.g., depth, ploidy, sex validation) for a more complete picture of sample quality.

    • For family cases, check that no contamination is flagged before relying on inheritance-based filters.

    Warnings:

    • Panels may be less reliable: For targeted panels, contamination estimates may be inaccurate due to the limited number of variants available for calculation. Use caution and cross-check with other QC metrics when interpreting these results.

    • Do not use in isolation: A "Likely" or "Yes" result should not immediately be considered diagnostic — review case setup, sequencing quality, and sample handling first.

    inheritance mode filters
    AI Shortlist: MNVs are not included in the AI shortlist.
  • ACMG Classification: Disabled for MNVs.

  • Up to v38.0:

    Limitations

    Clinical significance tab
    Clinical significance tab

    Best practices

    Ploidy availability by workflow

    sex validation
    Bring your own DRAGEN (BYOD) VCF cases

    This workflow is only supported for batch case upload via UI and CLI and API-based case creation.

    Array case

    VCF case—Bring your own DRAGEN (BYOD)

    This workflow is only supported for batch case upload via UI and CLI and API-based case creation.

    Prepare
    DRAGEN Array v1.3.0

    Error rate

    Sample quality
    Sex validation
    Ploidy
    Contamination
    Coverage metrics
    % Mapped reads
  • ClinGen Dosage region Dec 2022

  • Promoters from EPDnew human version V6

  • mtDNA CRS

  • RNA disease genes based on OMIM and HGNC (Dec 2022): ATXN8OS, TERC, IL12A-AS1, FAAHP1, NUTM2B-AS1, GAS8-AS1, RNU12, MIR204, IGHG2, SLC7A2-IT1, MIR99A, RMRP, XIST, MEG3, DIRC3, MIR17HG, GNAS-AS1, LRTOMT, LINC00299, DUX4L1, MIR137, MIR140, MIR605, SNORD118, RNU4ATAC, HELLPAR, IGHG1, IGHM, MIR19B1, RNU7-1, LINC00237, MIR2861, MIR4718, IGHV3-21, IGHV4-34, IGKC, KCNQ1OT1, MIR184, MIR96, H19, HYMAI, PCDHA9, UGT1A1, AFG3L2P1, DISC2, SNORA31, TRU-TCA1-1, PCDHGA4, TRAC, ECEL1P3, MIAT

  • ClinVar variants (ClinVar Dec 2022) with any pathogenic or likely pathogenic significance (and some drug responses that are affiliated with pathogenicity)

  • 50K STR regions based on the DRAGEN 4.0 Specification file

  • Promoters region (EPDnew human version 006, flanking 50bp)
  • Known STR regions (DRAGEN 4.0 specification file)

  • All microRNA genes (flanking 50bp based on HGNC)

  • Full mtDNA region

  • Research Genome

    None

    Whole Genome

    Full Genes

    Exome

    Clinical Regions

    Custom Panel

    Clinical Regions

    Default ROI kit details

    Full Genes

    Sources:

    CNV variants are not confined to regions of interest.

    Files

    Clinical Regions

    CNV variants are not confined to regions of interest.

    Files

    876KB
    GRC38_full_genes.bed
    Open
    GRCh38 Full Genes v100.39.0+
    831KB
    GRC37_full_genes.bed
    Open
    GRCh37 Full Genes v100.39.0+
    887KB
    GRC38_full_genes (1).bed
    Open
    GRCh38 Full Genes ≤v38.0
    839KB
    GRC37_full_genes (1).bed
    Open
    GRCh37 Full Genes ≤v38.0
    6MB
    GRC38_clinical_regions.bed
    Open
    GRCh38 Clinical Regions v100.39.0+
    5MB
    GRC37_clinical_regions.bed
    Open
    GRCh37 Clinical Regions v100.39.0+
    5MB
    GRC38_clinical_regions (1).bed
    Open
    GRCh38 Clinical Regions ≤v38.0
    5MB
    GRC37_clinical_regions (1).bed
    Open
    GRCh37 Clinical Regions ≤v38.0

    In the Cases table

    individual case page

    ACCOUNT_URL

    The account_url of the Azure account.

    Format: https://account_name.blob.core.windows.net/container_name

    CLIENT_ID

    application_id.

    Format: ########-####-####-####-############

    (letters/numbers)

    CLIENT_SECRET

    Value of the client_secret tuple (Value, Secret ID).

    Format: #####-#######-######-######

    (letters/digits/special chars)

    TENANT_ID

    ID of the tenant.

    Format: ########-####-####-####-############

    (letters/numbers)

    ACCOUNT_NAME

    An arbitrary name that the customer must supply to define the ACCOUNT_URL.

    Format: string

    CONTAINER_NAME

    An arbitrary name that the customer must supply to define the ACCOUNT_URL.

    Format: string

    Blob Integration Setup

    Create an App registration

    Azure Blob configuration

    For Internal support:

    Lab tab
    Genome view tab
    Versions tab
    Analysis tools tab
    Description: Optionally, provide a description for the service account.

    Click "Create and Continue".

    example:

    "storage object viewer" (read-only access)
  • Create the Service Account:

    • After assigning the roles, click "Done".

  • Generate and Download a Key:

    • Find your newly created service account, click the three dots on the right, and select "Manage Keys".

    • Click Add Key > Create New Key and choose the JSON format.

    • Download the key and store it securely, as it is used for authentication in your code or applications.

  • Encode the key in base 64:

    • use python function: put this function and your json (here named json_file.json) in the same directory and run.\

    • save the output printed.

  • Path: for default, fill with / else, put your path in the bucket. Seperate directories with /

    Set CORS Configuration via gcloud: Create a JSON file (cors.json) on your machine with the CORS rules. Example\ it should look like:

    Add the storage provider to Emedgene platform:

    CORS - Visualisation

    LINK

    Select BaseSpace:

    Via Command Line

    Prerequisite

    Authenticate

    Via BaseSpace Developer Portal

    Adding BSSH account to your Emedgene account

    Via Command Line
    Via BaseSpace Developer Portal
    BaseSpace CLI Installation Page
    developer portal
    euc1
    aps2
    euw2
    Example - connect integration1 workgroup as storage.
    import json
    import base64
    
    
    def encode_json_to_base64(json_file):
        # Read JSON data from file
        with open(json_file, 'r') as file:
            json_data = json.load(file)
    
        # Convert the JSON data to a string
        json_str = json.dumps(json_data)
    
        # Encode the string to bytes, then to Base64
        json_bytes = json_str.encode('utf-8')
        base64_bytes = base64.b64encode(json_bytes)
    
        # Convert Base64 bytes back to a string
        base64_str = base64_bytes.decode('utf-8')
        # Print the Base64-encoded string
        print(base64_str)
    
    
    encode_json_to_base64('json_file.json')
    [
        {
          "origin": ["https://<host_name>.emg.illumina.com"],
          "method": ["GET"],
          "responseHeader": ["emgauthorization"],
          "maxAgeSeconds": 3600
        }
    ]
    # Linux
    $ wget "https://launch.basespace.illumina.com/CLI/latest/amd64-linux/bs" -O $HOME/bin/bs
    # Mac
    $ wget "https://launch.basespace.illumina.com/CLI/latest/amd64-osx/bs" -O $HOME/bin/bs
    # or
    $ brew tap basespace/basespace && brew install bs-cli
    # Windows
    $ wget "https://launch.basespace.illumina.com/CLI/latest/amd64-windows/bs.exe" -O bs.exe
    $ bs auth
    $ cat .basespace/default.cfg
    apiServer   = https://api.basespace.illumina.com
    accessToken = 

    .vcf.gz

    .gt_sample_summary.json (v37.0+, DRAGEN Array v1.2+)

  • .annotated_cyto.json (v100.39.0+, DRAGEN Array v1.3+)

  • Current limitation: CRAM input and reference compatibility

    Context

    Emedgene uses a specific (for example, hg38-alt_masked.cnv.graph.hla.rna-10-r4.0-1.tar.gz) for each DRAGEN version + genome reference (GRCh38 or GRCh37) combination. Both DRAGEN version and genome reference are configured per organization in Workbench & Pipeline settings.

    Key requirement

    When using CRAM files as input (instead of BAM), the same genome reference assembly file must be used during:

    Project VCF

    Make sure the proband sample is listed first to ensure correct downstream calculations.

    VCF

    Array results can be visualized in Genome View and Visualization tabs, and sample-level quality metrics are available in the Lab tab.

    Tips:

    • Choose the input type carefully — it cannot be changed after the case is created.

    • Keep file paths simple (avoid spaces, parentheses, or very long names >255 characters). This helps prevent errors during upload.

    Warning:

    • If files are incomplete or corrupted, the case may still be created but will fail during processing. Double-check your files before uploading.

    • For large files (BAM/CRAM/FASTQ), browser upload is not recommended. Use , , or cloud-to-cloud transfer instead to avoid incomplete or truncated uploads.

    FASTQ

    CRAM (Output)

    Reanalysis will fail

    FASTQ

    VCFs

    Reanalysis will fail

    FASTQ

    CSV, etc

    Reanalysis will fail

    VCF

    BAM/CRAM (visualizations)

    Visualization will fail

    VCF

    VCF (input)

    Reanalysis will fail

    VCF

    CSV, etc

    Reanalysis will fail (will be fixed)

    This feature is only related to saving Dragen output files in your own bucket when using Dragen through Emedgene (without ICA).

    If you are looking to:

    • Import data from AWS S3 to Emedgene go to Manage data storages

    • Integrating any data storage to Emedgene go to Manage data storages

    • Download any data from Emedgene go to Manage S3 credentials



    Bring Your Own Bucket, also known as BYOK, enables you to control your DRAGEN file outputs.

    Emedgene-managed DRAGEN solution saves the DRAGEN output files in a detected AWS S3 bucket that you have access to using your S3 credentials.

    However, if you have an Enterprise account and you would like Emedgene-managed DRAGEN solution to save the DRAGEN output files in your own bucket, reach out to [email protected] and follow this steps:

    Emedgene requires access to the root folder, which means a dedicated bucket might be appropriated.

    Bucket policy should allow Emedgene user access to the bucket.

    Example bucket policy:

    Emedgene visualizes data in IGV directly from your AWS S3 bucket. In order to do it, you should enable CORS for the Emedgene application URLs.

    Example CORS policy:

    We will require to run a case and validate the managed DRAGEN pipeline finish successfully and all features are available in the platform.


    If a customer enables an AWS S3 Lifecycle policy in order to archive or change the S3 tiers for different files, they might create an adverse effect on the platform.

    Case Type
    File Type
    Expected effect

    FASTQ

    FASTQ/BAM/CRAM (input)

    Reanalysis will fail (will be fixed)

    FASTQ

    CRAM (Output)

    Reanalysis will fail

    FASTQ

    FASTQ/BAM/CRAM (input)

    [email protected]
    visualizes data in IGV
    CORS

    Reanalysis will fail (will be fixed)

    {
        Coming Soon
    }
    {
        Coming Soon
    }

    Bring your own bucket is only available for Enterprise level support accounts and require Illumina support for setup.

    Bring Your Own Bucket

    1. Create an AWS bucket

    2. Edit Bucket policy

    3. Allow illumina.com and emedgene.com for CORS

    4. Test and validate the configuration with Illumina support

    The BYOB solution means you managed your own data, meaning if you accidentally deleted or moved the data the integration with Emedgene might break. You are responsible for your DRP and data backup solutions.

    Managing AWS S3 Lifecycle policy

    One by one (Selection mode),
  • In a batch (Batch mode), or

  • Automatically infer disease-associated phenotypes (see Proband Suspected Disease Condition below).

  • Please follow the steps described below for each phenotype:

    1. Enter an HPO term (e.g., Hypoplasia of the ulna), an HPO ID (e.g., HP:0003022), or a descriptive phenotype name (e.g., Underdeveloped ulna) in the search box.

    2. Select a matching term from a dropdown menu and press Complete after you've added all the terms and additional patient information below.

    Paste a list of comma-separated HPO terms or HPO IDs in the search box and press Complete.

    Enter the disease name in the search box, select a matching term from a dropdown menu and press Complete. All the associated phenotypes will be automatically added to the Proband Phenotypes.

    Selecting a disease only fetches its associated phenotypes for convenience—it does not affect downstream analysis. You can edit this list to match the proband’s clinical presentation. Only the phenotypes you keep or add influence analysis, not the disease selection itself.

    To remove any phenotype described for the disease but not observed in your patient, click the button next to the HPO term in the Proband Phenotypes list.

    Enter the suspected disease penetrance as a percentage.

    Select the appropriate category to indicate the severity of the disease symptoms observed in the patient: Mild, Moderate, Severe, Profound.

    Mark the checkbox if applicable.

    Paternal and Maternal. Enter the ethnicity name in the search box and select a matching term from a dropdown menu.

    1. Fill in the boxes:

    Note: The fields marked with (*) are mandatory.

    1. Sex (*)

    Handling a proband sample with unknown sex

    When a sample is user-assigned "Unknown" sex, the system assumes "Female". This affects CNV interpretation on sex chromosomes in case the genetic sex is actually male:

    • Chromosome X: CN = 2 is considered reference (REF) for a female genome, so CNVs with two copies are hidden by default. This may cause chromosome X duplications to be missed.

    • Chromosome Y: CN = 0 is considered reference (REF) for a female genome, so CNVs with zero copies are hidden by default. This may cause chromosome Y deletions to be missed.

    To include these variants in the analysis, enable the in Workbench & Pipeline Settings.

    2. Relationship

    3. Date of Birth

    4. Medical Condition (*)

    5. Proband Phenotypes (*)

    Notes:

    • The maximum permissible number of proband phenotypes is 100.

    • Some diseases may not suggest phenotypes automatically if the source database does not provide them. You can add phenotypes manually in these cases.

    Warning: Select valid HPO phenotypic abnormality terms

    When adding patient phenotypes, ensure that all selected HPO terms originate from the “Phenotypic abnormality (HP:0000118)” branch of the HPO ontology. Terms outside this branch are not supported for case analysis, as they do not represent clinical phenotypes and may lead to incomplete or inaccurate downstream results.

    Selection mode

    Batch mode

    Notes:

    • A popup notification will appear at the bottom of the page if any input HPO term or HPO ID is unknown.

    • Only phenotypes from the 'Phenotypic abnormality' HPO branch are currently supported.

    6. Proband Suspected Disease Condition.

    Note:

    Searching for a disease name may return several entries with the same title.

    This happens because the disease appears in multiple gene–disease sources, each with its own identifiers and evidence associations. These entries are not merged automatically, so choosing different items may return different sets of phenotypes.

    7. Suspected Disease Penetrance

    8. Suspected Disease Severity

    9. Consanguinity

    Note: If consanguinity is identified in the Proband's parents, but this box is not selected in case creation, this will result in a discrepancy alert in the .

    10. Patient Ethnicities

    2. ​Select Complete.

    Click Next to proceed.

    The page is divided into two panels: Create family tree (left) and Add patient information (right).

    1. Use the visual tool to build the pedigree.

    2. Add Clinical Notes (optional) in free text.

    3. Select suspected Inheritance mode(s) (for record only; not used in the analysis).

    4. Decide whether to include Secondary findings in the proband for the AI Shortlist (checkbox).

    For each family member:

    1. Add a sample (use a unique file path unless reusing samples).

    1. Fill in a sample name (for VCF input, this must match the header in the file).

    2. Complete the required patient details: for a proband and for non-proband samples.

    Click Next to proceed to the Case info screen.

    Here you define how the analysis will run:

    1. Case type: Choose Array, Custom Panel, Exome, Whole Genome, or Other.

      • For Exome cases, variants outside exons ±50 bp are automatically filtered.

    2. Carrier Analysis: Optional checkbox. Requires a targeted gene list.

    3. :

      1. Select an enrichment kit (if applicable) or "No kit".

      2. If provided, kit details (Lab, Machine, Reagents, Expected coverage) will be used to compare coverage depth and breadth.

      3. If no kit is provided, RefSeq coding regions will be used as reference.

    4. options:

      1. All genes

      2. Phenotype-based genes

      3. Existing gene list

    5. : Select the Preset group appropriate for this case type.

      • If none is selected, the default Preset group is applied automatically (marked as default).

    6. Consent: Confirm subject consent for extended sharing.

    Additional case info (optional):

    • Indication for testing (free text).

    • Labels (choose from predefined organization labels; these cannot be changed later).

    At the Summary stage, confirm case type, gene list, and other selections.

    After the case is created:

    • The Case ID is displayed.

    • You may add participants so colleagues receive notifications on status changes or updates.

    Caution: Please note that refreshing or leaving the page, exiting the Add new case tab, or power failure of your computer before you've completed adding a new case will result in loss of the case creation progress.

    Step 1: Start a new case

    section
    Select sample type

    Step 2: Build the family tree and add patient information

    (left)

    Add patient information (right)

    • The Add New Case flow does not validate that sample IDs are unique or that input files are uncorrupted. Please ensure sample IDs are unique and that input files are valid before creating the case.

    • If a QC metrics file (metrics.tar.gz) is uploaded from BSSH, it will not be processed.

    Some diseases may not suggest phenotypes automatically if the source database does not provide them. You can add phenotypes manually in these cases.

    Step 3: Case info screen

    Note: Combining/merging gene lists from the Add New Case UI is supported only via the UI — this is not available from the API or batch upload.

    Caution: Clicking Next here will finalize case creation. After delivery, only the proband’s phenotypes can be edited without reanalysis.

    Step 4: Done screen

    Note: In Illumina Cloud environments, users may still appear as available participants even after being removed from an IAM workgroup. These users do not have access to Emedgene, and accidental adding them as participants to a case does not pose any security or access risk.

    Variants that meet ACMG-defined criteria for secondary findings and automatically tagged with an Incidental tag (if enabled)
  • Carrier variants

    Variants identified by the carrier analysis pipeline (if enabled)

  • During review in the Candidates tab, additional tags can be applied to a variant alongside the original automatic tag.

    A set of the most promising variants based on scores calculated by the AI Shortlist. These variants are initially tagged by the system.

    Variant types assessed:

    • SNVs and indels

    • CNVs

    • SVs

    • mtDNA variants

    • STRs

    Secondary findings are variants that are automatically assigned the Incidental tag when they meet the criteria for secondary findings as defined by the American College of Medical Genetics and Genomics (ACMG).

    Tagging is applied only when the Secondary findings checkbox is selected during case creation.

    A variant is automatically tagged as an incidental (secondary) finding if it meets all of the following criteria:

    1. Classification: Previously classified as pathogenic or likely pathogenic in ClinVar or Curate variant databases

    2. Zygosity: Heterozygous or homozygous (only homozygous for the HFE gene)

    3. Allele frequency: Less than 5%

    4. Read depth: 10× or higher

    5. Variant quality: Any value but LOW

    6. Affected gene: Listed in the ACMG SF v3.2 medically actionable gene list for reporting secondary findings in clinical exome and genome sequencing (PMID: 37347242)

    ACTA2, ACTC1, ACVRL1, APC, APOB, ATP7B, BAG3, BMPR1A, BRCA1, BRCA2, BTD, CACNA1S, CALM1, CALM2, CALM3, CASQ2, COL3A1, DES, DSC2, DSG2, DSP, ENG, FBN1, FLNC, GAA, GLA, HFE, HNF1A, KCNH2, KCNQ1, LDLR, LMNA, MAX, MEN1, MLH1, MSH2, MSH6, MUTYH, MYBPC3, MYH11, MYH7, MYL2, MYL3, NF2, OTC, PALB2, PCSK9, PKP2, PMS2, PRKAG2, PTEN, RB1, RBM20, RET, RPE65, RYR1, RYR2, SCN5A, SDHAF2, SDHB, SDHC, SDHD, SMAD3, SMAD4, STK11, TGFBR1, TGFBR2, TMEM127, TMEM43, TNNC1, TNNI3, TNNT2, TP53, TPM1, TRDN, TSC1, TSC2, TTN, TTR, VHL, WT1.

    Variants identified by the Carrier analysis pipeline. Carrier variants are automatically tagged only if you've selected the Carrier Analysis checkbox while creating a case. Analysis requirements and a list of targeted regions are specified by the organization's manager. This Carrier analysis flow is implemented by request.

    Variants that were manually selected to be reported.

    Variant tagging by the AI Shortlist

    Assigning variant tags during review

    The Candidates tab presents:

    (Secondary)*

    Tagging criteria

    ACMG SF v3.2 gene list

    *In Emedgene, the terms "incidental findings" and "secondary findings" both refer to secondary findings as defined by the ACMG, due to historical usage.

    When Emedgene was first released, the term “incidental findings” was adopted in alignment with the clinical genomics standard at the time. The 2013 ACMG recommendations defined incidental findings as “the results of a deliberate search for pathogenic or likely pathogenic alterations in genes that are not apparently relevant to a diagnostic indication for which the sequencing test was ordered” (PMID: 23788249).

    As the field evolved, the ACMG and broader clinical community began to distinguish between “incidental findings” (unexpected, not actively sought) and “secondary findings” (intentionally analyzed and reportable). This shift was reflected in the updated 2016 ACMG guidance (PMID: 27854360).

    To reflect this change, Emedgene introduced the term “secondary findings” into the platform. However, “incidental findings” remains in use throughout the platform for technical consistency.

    Carrier

    In Report and other custom

    *

    :

    • Average coverage

    • % Bases with coverage >10x

    • % Bases with coverage >20x

    *Available only for whole genome FASTQ cases.

    Array case

    (overall)

    For eligible cases, users can also review the results of DRAGEN QC: interactive DRAGEN QC report and DRAGEN QC metric files.

    Sample quality (overall)

    Sample quality metrics

    DRAGEN QC

    DRAGEN QC for array samples is available from version 100.39.0 onwards.

    Phenotypes

    Phenotypes submitted for the proband.

    Status

    Current case status.

    arrow-pointer You can update the status directly from the table.

    Type

    Case type: whole genome, exome, custom panel, or array.

    Label

    Custom case labels. arrow-pointer Click the pencil icon to add a new label, select an existing one, or remove a label from the case.

    Quality

    Overall case quality: Passed, Failed, or dash Not available.

    arrow-pointer Hover over the icon for a brief summary, or view detailed results in the Lab tab. sort Sortable (Passed > Failed > Not available).

    Creation date

    Date the analysis was initiated. sort Sortable.

    Due date

    Customizable due date.

    arrow-pointer Click the calendar icon to set a date. To change it, click the existing date and select a new one. Remove the date by clicking the cross icon. sort Sortable.

    Participants

    Users subscribed to case updates.

    arrow-pointer To receive email alerts for case updates, click the Subscribe icon. To unsubscribe, hover over your avatar and click the button.

    Lab directors and other authorized roles can assign cases directly to analysts, making workload management easier.

    User groups

    User groups defined in Settings; each group appears as its own column.

    Case processing stage

    Case details

    Dates

    Participants

    single case creation
    batch case creation

    Case status

    Case status reflects the current stage of case processing, either by the Emedgene platform or your team. Statuses enable case progress tracking and support a consistent, collaborative case review workflow.

    You can view and update the current case status in the Cases table and in the top bar of the individual case page.

    Cases table
    Case page

    Out-of-the-box and custom statuses

    Emedgene provides out-of-the-box statuses, as well as the option to create custom statuses to match your case review workflow. To create, remove, or reorder case statuses for your organization, go to Settings > Management > Case statuses.

    Case status history

    Each time a case status is , the change is logged and recorded in the case activity history.

    To review case status updates:

    1

    In the relevant case, open the .

    2

    Select the Activity tab.

    3

    Filter logs by selecting Case‑related activities from the dropdown list.

    Result: The status history is shown, along with other case-level logs.

    Bring Your Own Key

    Bring Your Own Key (BYOK) is a security feature that allows organizations to use their own encryption keys to protect their data. This ensures that they maintain control over their encryption keys and, consequently, their data.

    Illumina integrates with leading Key Management Services (KMS), including Azure Key Vault and AWS KMS, so organizations can maintain full control over their encryption keys. These integrations combine Illumina’s Bring Your Own Key (BYOK) feature with your preferred KMS provider to deliver robust key management and enhanced data security.

    is a cloud service that provides a secure way to store and manage sensitive information like API keys, passwords, and certificates. It offers robust features for key management, including key generation, storage, and lifecycle management.

    (KMS) allows you to create and control encryption keys used to encrypt your data across a wide range of AWS services and applications. It provides centralized management of encryption keys and integrates seamlessly with other AWS services.


    The API server encrypts the organization's information before storing it in the database and decrypts it when needed (e.g., during pipeline execution). The key vault is managed by the organization.

    Variant effect and severity calculation

    Variant effect

    For each variant that is mapped to the reference genome, Emedgene uses Ensembl’s Variant Effect Predictor (VEP) and the RefSeq (NCBI) library of transcripts to calculate variant effect. VEP uses a set of consequence terms defined by the , including immediately recognizable terms like “missense_variant” and “frame_shift_variant” as well as some more esoteric ones like “non_coding_transcript_exon_variant”.

    The full list of terms, along with detailed descriptions and severity impact categories can be found in the below.

    Importantly, each variant has a "main_effect" and "main_gene" chosen based on the most prioritized transcript for this variant. Transcript prioritization depends on many different parameters and on different Emedgene pipeline versions as described .

    Variant severity

    Variant severity, also known as variant impact, is a subjective assessment of the severity of a variant consequence.

    Prepare DRAGEN QC metrics files to be included in a NGS VCF case

    When creating NGS cases that start from VCF, you can create a browsable from the DRAGEN metrics files. Due to security restrictions, CSV files are not directly ingested, but they can be included when packaged in a TAR file.

    1. Navigate to local directory containing metrics files for a specific sample.

    2. Define sample name as a variable samplename="NA12878".

    Demystifying the versions of GRCh38/hg38 reference genomes, how they are used in DRAGEN and their impact on accuracywww.illumina.com

    FASTQ

    VCFs

    Reanalysis will fail

    FASTQ

    CSV, etc

    Reanalysis will fail

    VCF

    BAM/CRAM (visualizations)

    Visualization will fail

    VCF

    VCF (input)

    Reanalysis will fail

    VCF

    CSV, etc

    Reanalysis will fail

    (will be fixed)

    Sex validation
    Ploidy
    Contamination
    Coverage metrics
    % Mapped reads
    Error rate
    Sample quality
    Sex validation
    Autosomal call rate
    Call rate
    Log R deviation
    Most Likely Candidates and Candidates
    Incidental
    variant tags
    • The mapping/alignment stage (which produces the CRAM file)

    • The variant calling stage (Emedgene secondary pipeline)

    A mismatch in reference genome assembly files prevents the system from decompressing the CRAM file, leading to case analysis failure.

    Best practices

    • Confirm reference compatibility with your organization settings before launching a run

    • If you receive CRAM files from an external lab, verify the specific reference genome file used to generate them

    • If the reference is unknown or incompatible, convert CRAM → BAM and upload the BAM file instead

    Batch Upload
    CLI
    genome reference assembly file
    Include Reference Homozygosity and No Coverage Calls toggle
    Lab tab

    Create a new gene list

    • You may combine multiple gene lists into one, or add specific genes to an existing list during case creation. The merged list behaves like any other list in the platform.

    Keep file names under 255 characters and avoid spaces or parentheses in file paths.

  • Always ensure sample IDs are unique to prevent case failure.

  • If using joint gVCF input, place the proband first for accurate insufficient region calculation.

  • The UI does not allow reusing the same gVCF file for multiple samples.

  • Sequencing Information
    Gene list
    Preset group
    Create family tree
    Logo

    To configure encryption in Emedgene, you need the following information from Azure Key Vault:

    Application tokens:

    • Client Id

    • Tenant Id

    • Client Secret

    The key information:

    • Key URL

    1

    Navigate to App registrations

    2

    Click Register to create a new application and and fill in the required details

    3

    After registration, copy and save the Application (Client) ID and Directory (Tenant) ID

    1

    In the left menu, select Certificates & Secrets

    2

    Click New client secret. Copy and save the Value (Client Secret) immediately, as it is shown only once.

    Please note the expiration date. If the secret expires, encryption will fail.

    1

    Click New Key (Create key vault)

    2

    Specify the key vault name, region (for example, East US), and pricing tier

    3

    Click Next to go to Access Policies

    4

    Select Add access policy, and set Key permissions:

    • Key Management Operations

    • Cryptographic Operations: Decrypt, Encrypt, Unwrap Key, Wrap Key

    5

    Set Secret permissions:

    • Secret Permission: Get

    • Select Principal: select the application you created earlier

    6

    Finish with Review + create

    1

    Navigate to the newly created Key vault

    2

    In the left menu, select Keys, and then select the key

    3

    Select the current version

    4

    Copy the Key Identifier (Key URL):

    Description is coming soon.


    The API server will encrypt the client's information before storing it in a database and decrypt that information when needed (e.g., running the pipeline). The key vault is managed by the client, and Emedgene will only be provided with access to encrypt/decrypt functions in that key vault. This guarantees that clients control access to the information.

    Illustration of data flow when creating a case in Emedgene platform:

    Illustration of data flow when reading a case data from emedgene platform:

    A preliminary step to this solution is having a key vault owned by the client, and a key that Emedgene is given access to.

    The client will create an access policy in the key vault of type “Application” and provide the matching key and secret to Emedgene. The access policy must contain permissions to perform encrypt and decrypt actions.

    In order for Emedgene to integrate with the key, depending on the key vault provider, the client needs to provide the following information:

    • Client Id

    • Client Secret

    • Tenant Id

    • Key vault name

    • Key name

    Since some of our platform search capabilities run directly on the DB, we can’t directly search any data that is encrypted. To overcome this, we will implement a hashing search functionality as follows.

    • The case data will still be fully encrypted in the DB as it is today

    • Specific fields we want to make “searchable” - as defined by the customer, we will save their hash value alongside the encrypted data.

    • Hashing will be done using SHA-256, and will include a secure random generated salt of 32 characters, which will be added to the value.

    • The salt is unique and will not be used anywhere else in the platform.

    • When the user enters a string to search, we will hash that value using all the salt values, and search those hash values.

    Illustration of data flow when searching in Emedgene platform:

    Illustration of data flow when creating a case with searchable field in Emedgene platform:

    Scope

    BYOK is only available for Enterprise-level support accounts.

    BYOK setup

    For versions earlier than v100.39.0, BYOK setup requires Illumina Support.

    For versions v100.39.0 and later, you can complete the setup from Organization settings.

    Supported Key Management Services

    Azure Key Vault

    AWS KMS

    Risk of losing a key

    Losing the encryption key means that all data encrypted with that key will be inaccessible. This can lead to permanent loss of access to crucial information.

    It is crucial to securely store and manage your keys to prevent such risks.

    Setup

    Azure Key Vault Setup

    Azure Key Vault
    AWS Key Management Service

    Create a new application

    Add a client secret

    Create a new key

    Find key details

    AWS Key Management Service (KMS) Setup

    Please reach out to [email protected] to get help with this setup.

    Architecture

    Searching Encrypted Fields

    Appendix

    Appendix: Control flows text

    Write:

    Read

    Write Searchable

    Read Searchable

    Creating a case in emedgene platform
    Reading a case data from emedgene platform
    Severity is usually categorized as modifier, low, moderate or high:
    • Modifier severity is used for non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact. Inter-genic and non-coding variants are classic examples.

    • Low severity is used for variants that are assumed to be mostly harmless or unlikely to change protein function. This includes synonymous variants.

    • Moderate severity is used for non-disruptive variants that might change protein effectiveness, such as missense variants and in-frame insertions/deletions.

    • High severity is used for variants that are assumed to have a disruptive impact on abundance protein, such as by causing protein truncation, loss of open reading-frame, and/or triggering nonsense mediated decay.

    Most of the time, variant effect and variant severity on Emedgene are consistent with VEP. However, genomics is a field defined by exceptions. There are key factors, outlined below, the Emedgene genetic team believes are critical to account for when assigning severity.

    For small variants (SNV):

    1. Splice prediction: Small variants will be upgraded to HIGH severity if its splicing prediction is high or MODERATE if its splicing prediction is moderate.

    2. Conservation: Synonymous variants and splice region variants that are highly conserved will be upgraded to MODERATE.

    3. Non-coding RNA disease genes: The severity of a small variant will be upgraded to MODERATE if the variant is within a list of RNA genes known to be associated with disease.

    For CNV/SV:

    VEP annotates CNVs with overlapping genomic features and designates them with the following effects: transcript amplification (DUP), feature elongation (DUP, INS), feature truncation (DEL), and transcript ablation (DEL). However, the severity assigned by VEP for CNVs does not reflect the complexity of CNV effects on protein function and in our experience is not suitable for genome analysis and filtering.

    On Emedgene, variants are annotated in regards to its overlap with three different types of regions: ‘coding regions’, ‘clinical regions’, and ‘full gene’ region (see here for a more detailed description about the BED files used in the system).

    The region annotation is then used to assess severity for CNV and SV as follow:

    High
    Moderate
    Low
    Modifier

    Deletion (DEL)

    Coding regions

    Clinical Regions and not in Coding regions

    Full gene and not in Clinical Regions

    No overlap with any BED

    Table 1: CNV/SV severity table. For each category of CNV/SV, the types of regions that overlap a given variant required to trigger the severity classification are shown.

    For STR variants:

    Emedgene is using an internal annotation for STR variants. More details can be provided by request to [email protected].

    Known limitations

    • List of RNA genes known to be associated with disease is updated overtime as part of pipeline update.

    • Emedgene does not provide VEP annotation for non-coding regulatory data.

    Sequence Ontology (SO)
    link
    here

    Combine the find and tar commands to package the files into a tar.gz file with the following extension *.metrics.tar.gz. Command to find files matching the required patterns:

    1. Upload the metrics.tar.gz file to the storage location used for creating cases.

    2. Add metrics.tar.gz to case creation API JSON payload using the corresponding storage ID. Ensure that if the extension is not contained in the filename (e.g. files from BaseSpace) that "sample_type": "dragen-metrics" is set within the JSON payload.

    1. DRAGEN report link is then available once your case has been delivered.

    DRAGEN QC report
    find . \( -name "*.csv" -o -name "*.tsv" -o -name "*.counts" -o -name "*.counts.gz" -o -name "*.counts.gc-corrected" -o -name "*.counts.gc-corrected.gz" -o -name "*.ploidy.vcf" -o -name "*.correlation.txt.gz" -o -name "*.correlation.txt" -o -name "*.repeats.vcf" -o -name "*.ploidy.vcf.gz" -o -name "*.repeats.vcf.gz" -o -name "*.annotated_cyto.json" \) | xargs tar -czf "${samplename}.metrics.tar.gz"
    Case creation API JSON

    Learn more

    Case statuses in a case lifecycle

    A visual overview of case status transitions and a reference description of each status.

    How to update a case status

    Update case status.

    Case status management

    Create custom case statuses and reorder them according to your workflow.

    updated
    Case details panel

    Transcript prioritization logic

    Each variant has a main_effect and main_gene chosen based on the most prioritized transcript for this variant. This selection influences how variants are displayed, interpreted and classified across the platform.

    From v39.0+, Emedgene introduces improvements to Curate transcript prioritization and updates the RNA gene prioritization logic.

    Transcript prioritization (v39.0+)

    Emedgene uses VEP and EFF for transcript annotations and supports organization-defined canonical and preferred transcripts from Curate.

    1

    VEP transcripts are prioritized over EFF transcripts.

    2

    If the case is a Virtual Panel, prioritize transcripts from genes in the case gene list (not applied for Boosted Genes panel types).

    3

    Prioritize transcripts defined in Curate variants

    • Curate variant-level preferred transcripts now receive high priority.

    • Requires the new organization setting, enabled by Illumina Bioinformatics support.

    4

    Prioritize RNA genes associated with disease

    (See Appendix 1: Updated RNA gene list)

    • This rule does not apply to upstream or downstream RNA variants.

    • RNA gene prioritization has been refined in v39.0.

    5

    De-prioritize readthrough biotype transcripts.

    6

    Prioritize intronic based on variant impact:

    HIGH → MODERATE → LOW → MODIFIER

    7

    Prioritize intronic > UTR > upstream effects

    (See Appendix 2 for MODIFIER effect prioritization)

    8

    Prioritize organization canonical transcripts

    • Defined in Curate

    • Always applied; no additional settings required

    9

    Prioritize canonical transcripts based on APPRIS.

    10

    Prioritize transcripts from genes in the case gene list.

    11

    Prioritize genes without a " — " in their symbol.

    From version 39.0, Emedgene has changed how RNA genes are prioritized relative to protein-coding genes.

    Prior to v39.0, if a variant overlapped an RNA gene from the prioritized list, the RNA transcript was often chosen as the main_gene, even when a protein-coding gene had a more impactful variant.

    Starting with v39.0"

    • Protein-coding genes with stronger effects now take priority over RNA genes.

    • RNA genes are still considered, but no longer override coding transcripts with higher significance.

    This results in more more clinically meaningful main_gene selection.

    Here is a list of ordered rules for transcript prioritization:

    1

    VEP transcripts are prioritized over EFF transcripts.

    2

    If the case is a virtual panel, prioritize transcripts from genes in the case gene list (but not for Boosted Genes type panels).

    3

    Prioritize RNA genes associated with disease (See appendix 1 for prioritized list RNA genes). Importantly this does not apply to upstream and downstream RNA variants.

    Here is a list of ordered rules for transcript prioritization:

    1

    VEP transcripts are prioritized over EFF transcripts.

    2

    If the case is a virtual panel, prioritize transcripts from genes in the case gene list (but not for Boosted Genes type panels).

    3

    Prioritize RNA genes associated with disease (See appendix 1 for prioritized list RNA genes). Importantly this does not apply to upstream and downstream RNA variants.

    Case statuses in a case lifecycle

    In Emedgene, case status indicates the current stage of a case—from data upload through analysis, review, and results finalization. Statuses are assigned either automatically by the system or by authorized users, depending on the workflow stage and user permissions.

    Different statuses require different IAM scopes/Emedgene roles for assignment and reassignment.

    Case statuses are grouped into three categories based on control type:

    • System-controlled: Assigned automatically by the system; cannot be reassigned by users.

    • User-controlled: Assigned and reassigned by authorized users.

    • System-assigned, user-reassignable: Assigned automatically by the system but can be reassigned by authorized users.

    Case statuses by origin:

    • Out-of-the-box: Default options provided by the platform.

    • Custom: to align with specific workflows.

    Each status represents a distinct stage in the case lifecycle. Figure 1 shows the possible transitions between statuses and the control type for each assignment, indicated by solid and dashed arrows. Table 1 provides an overview of case statuses.

    Table 1. Case status reference

    Case status
    Workflow stage
    Control type
    Origin

    Gain (DUP)

    Intragenic (coding regions but not entire gene region)

    Coding Regions / Clinical Regions not intragenic

    Full gene and not in Clinical Regions

    No overlap with any BED

    Insertion (INS)

    Coding regions

    Clinical Regions and not in Coding regions

    Full gene and not in Clinical regions

    None

    4

    De-prioritize biotype readthrough transcripts.

    5

    Prioritize based on impact in the following order: HIGH > MODERATE > LOW > MODIFIER.

    6

    Prioritize introns over UTR over upstream (Appendix 2: MODIFIER effects prioritization).

    7

    Prioritize organization canonical transcripts (Defined in Curate. Always applied, no settings needed).

    8

    Prioritize canonical transcripts (Based on Appris).

    9

    Prioritize transcripts from genes in the case gene list.

    10

    Prioritize gene without “-” in their Name.

    4

    De-prioritize biotype readthrough transcripts.

    5

    Prioritize based on impact in the following order: HIGH > MODERATE > LOW > MODIFIER.

    6

    Prioritize introns over UTR over upstream (Appendix 2: MODIFIER effects prioritization).

    7

    Prioritize organization canonical transcripts (Defined in Curate, this parameter has to be implemented upon request).

    8

    Prioritize canonical transcripts (Based on Appris).

    Note: Curate transcript prioritization is available only for organizations that request the setting through Illumina Bioinformatics Support in v39.0.

    RNA gene prioritization (v39.0+)

    Refined selection logic

    Transcript prioritization v37.0, 38.0

    Transcript prioritization before v37.0

    Appendixes

    Appendix 1: List of RNA genes associated with disease

    ATXN8OS, GNAS-AS1, H19, HELLPAR, KCNQ1OT1, LINC00237, LINC00299, MEG3, MIAT, MIR137, MIR140, MIR184, MIR19B1, MIR204, MIR2861, MIR4718, MIR605, MIR96, MIR99A, RMRP, RNU12, RNU4-2*, RNU4ATAC, RNU7-1*, SNORD116-1, SNORD118, TERC, MT-TF, MT-RNR1, MT-TV, MT-RNR2, MT-TL1, MT-TI, MT-TQ, MT-TM, MT-TW, MT-TA, MT-TN, MT-TC, MT-TY, MT-TS1, MT-TD, MT-TK, MT-TG, MT-TR, MT-TH, MT-TS2, MT-TL2, MT-TE, MT-TT, MT-TP.

    *Added as part of pipeline v35.2

    The prioritized RNA gene list has been updated in v39.0 to include newly supported genes and remove those with limited pathogenic evidence.

    Added RNA genes (v39.0): CHASERR, MIR17HG, RNU2-2, RNU5A-1, RNU5B-1, TRU-TCA1-1, SNORA31

    Removed RNA genes (v39.0): HELLPAR, LINC00237, GNAS-AS1, MEG3, LINC00299

    Appendix 2: MODIFIER effects prioritization

    Order of modifier effects:

    • intron_variant

    • 5_prime_utr_variant

    • 3_prime_utr_variant

    • non_coding_transcript_exon_variant

    • non_coding_transcript_variant

    • upstream_gene_variant

    • downstream_gene_variant

    • All others effects

    Warning: If Curate preferred transcripts are enabled, transcript selection may differ from previous versions. This may change the displayed main gene/effect for some variants.

    Tip: To ensure consistent results across teams, confirm whether Curate transcript prioritization is enabled for your organization.

    up-right-from-square
    up-right-from-square
    up-right-from-square
    Learn more
    Learn more
    Learn more
    {
        "test_data":
        {
            "consanguinity": false,
            "inheritance_modes":
            [],
            "sequence_info":
            {},
            "type": "Whole Genome",
            "notes": "",
            "samples":
            [
                {
                    "bam_location": "",
                    "fastq": "NA12878-PCRF450-1",
                    "status": "uploaded",
                    "directoryPath": "",
                    "sampleFiles":
                    [
                        {
                            "filename": "NA12878-PCRF450-1.metrics.tar.gz",
                            "sample_type": "dragen-metrics",
                            "path": "/analysis_output/demo_data_germline_v4_3_6_v2-DRAGEN_Germline_Whole_Genome_4-3-6-v2-75b081e8-a8aa-433e-862b-a20d2d65e492/NA12878-PCRF450-1/NA12878-PCRF450-1.metrics.tar.gz",
                            "size": 0,
                            "storage_id": 420,
                            "status": "uploaded",
                            "vcf_column_name": "NA12878-PCRF450-1",
                            "vcf_column_names":
                            [
                                "NA12878-PCRF450-1"
                            ],
                            "loadingSample": false
                        },
                        {
                            "filename": "NA12878-PCRF450-1.hard-filtered.vcf.gz",
                            "sample_type": "vcf",
                            "path": "/analysis_output/demo_data_germline_v4_3_6_v2-DRAGEN_Germline_Whole_Genome_4-3-6-v2-75b081e8-a8aa-433e-862b-a20d2d65e492/NA12878-PCRF450-1/NA12878-PCRF450-1.hard-filtered.vcf.gz",
                            "size": 0,
                            "storage_id": 420,
                            "status": "uploaded",
                            "vcf_column_name": "NA12878-PCRF450-1",
                            "vcf_column_names":
                            [
                                "NA12878-PCRF450-1"
                            ],
                            "loadingSample": false
                        }
                    ],
                    "storage_id": 420,
                    "sampleType": "vcf"
                }
            ],
            "sample_type": "vcf",
            "patients":
            {
                "proband":
                {
                    "fastq_sample": "NA12878-PCRF450-1",
                    "gender": "Male",
                    "healthy": false,
                    "relationship": "Test Subject",
                    "notes": "",
                    "phenotypes":
                    [
                        {
                            "id": "phenotypes/EMG_PHENOTYPE_0001324",
                            "name": "Muscle weakness"
                        }
                    ],
                    "detailed_ethnicity":
                    {
                        "maternal":
                        [],
                        "paternal":
                        []
                    },
                    "zygosity": "",
                    "quality": "",
                    "dead": false,
                    "ignore": false,
                    "id": "proband"
                },
                "other":
                []
            },
            "diseases":
            [],
            "disease_penetrance": 100,
            "disease_severity": "",
            "boostGenes": false,
            "selected_preset_set": "",
            "incidental_findings": null,
            "labels":
            [],
            "gene_list":
            {
                "type": "all",
                "id": 1,
                "visible": false
            }
        },
        "should_upload": false,
        "sharing_level": 0
    }
    Drawing
    Drawing
    Drawing
    Drawing
    https://<key-vault-name>.vault.azure.net/keys/<key-name>/<key-version>
    Client->Emedgene API: Add New Test Request 
    note right of Emedgene API: Process Request 
    Emedgene API->Key Vault: PHI 
    note right of Key Vault: Encrypt 
    Key Vault->Emedgene API: Encrypted PHI 
    Emedgene API->Emedgene DB: Store Encrypted PHI
    Client->Emedgene API: Get Test Request 
    emedgene DB->Emedgene API: Encrypted PHI 
    Emedgene API->Key Vault: Encrypted PHI 
    note right of Key Vault: Decrypt 
    Key Vault->Emedgene API: Decrypted PHI 
    Emedgene API->Client: Decrypted PHI
    Client->Emedgene API: Add New Test Request 
    note right of Emedgene API: Process Request 
    Emedgene API->Key Vault: PHI 
    note right of Key Vault: Encrypt 
    Key Vault->Emedgene API: Encrypted PHI 
    Emedgene API-> Emedgene DB: Get Salt 
    Emedgene API-> Emedgene API: Hash Value using Salt 
    Emedgene API->Emedgene DB: Store Encrypted PHI + Hashed value
    Client->Emedgene API: Search string 
    Emedgene API->AWS Secrets: Get Salt 
    Emedgene API-> Emedgene API: Hash string using Salt 
    Emedgene API->Emedgene DB: Search hashed string 
    Emedgene DB->Emedgene API: Search results 
    Emedgene API->Client: Search results

    Out-of-the-box

    "Delivered"

    Analysis completed; case ready for review.

    System-assigned, user-reassignable

    Out-of-the-box

    Custom status

    Indicates a custom case processing stage between "Delivered" and "Finalized".

    User-controlled

    Custom

    "Finalized"

    The analysis and review of the case by the analyst group have been .

    Typically assignment and reassignment of the "Finalized" status is configured to be restricted to organization managers and/or lab directors.

    User-controlled

    Out-of-the-box

    "Trash bin" (v38.0+) or

    "Move to trash" (≤v37.0)

    Case marked for ; access restricted.

    Typically assignment and reassignment of the "Trash bin"/"Move to trash" status is configured to be restricted to organization managers and/or lab directors.

    User-controlled

    Out-of-the-box

    "Pending sequencing"

    Case created; awaiting sequencing data.

    System-controlled.

    Exception: user-reassignable to "Trash bin"

    Out-of-the-box

    "Issue reported"

    The case failed to run.

    Please check the integrity of the uploaded files and ensure that the variant caller used is on Emedgene list of accepted .

    System-controlled.

    Exception: user-reassignable to "Trash bin"

    Out-of-the-box

    "Reanalysis"

    The system is re-running the AI Shortlist algorithm.

    System-controlled

    Out-of-the-box

    "Uploading"

    Data upload in progress.

    System-controlled

    Out-of-the-box

    "In progress"

    Analysis currently running.

    Case lifecycle

    User-configured

    System-controlled

    completed
    deletion
    variant callers

    CSV format requirements for case upload

    General CSV format requirements

    The following are the general format requirements for a CSV file used to create multiple cases:

    1. The file must have a .csv extension.

    2. The file must contain a [Data] header.

    3. The row after [Data] header must include the field names identifying the data in each column. The column names are case-sensitive.

    4. The row after the column name header and each subsequent row represents a sample.

    5. Each column represents a data field.

    6. It is essential that there are no empty rows between the [Data] header and the last sample row.

    7. Number of cases per file can’t be greater than 50.


    Must be present in the sample table at all times.

    1. Case Type;

    2. Family Id;

    3. Phenotypes OR Phenotypes Id.

    If these fields are left empty, it will result in the creation of an empty sample.

    1. BioSample Name;

    2. Files Names;

    3. Storage Provider Id;

    This field is mandatory if Files Names is empty:

    1. Sample Type.

    This field is required if the "auto" option is used for Files Names (only relevant for BSSH):

    1. Default Project.

    The sample table may include these supported optional columns.

    1. Boost Genes

    2. Clinical Notes

    3. Date Of Birth

    The sample table may contain custom columns to suit your specific needs and include any relevant information that is important for your workflow.

    Each custom field must be assigned a unique name without spaces. Data from custom columns is saved per case under the Additional information section of .

    Field (column) name
    Expected input
    Field details
    Example

    (highlighted in red), (highlighted in orange), and fields should be filled in according to the following rules.

    Field (column) name
    Expected input
    Field details
    Example

    For BSSH, it is necessary to use the actual names (numbers):

    instead of aliases

    In version 37, we introduced an enhancement to the batch upload process that allows you to provide a human-readable path in their batch CSV for BSSH files.

    When a batch CSV includes a human-readable path, the system performs the following validations for paths in BSSH storage:

    1. Single File in the Path:

      • If the provided path contains exactly one file or dataset, the batch upload proceeds successfully.

    2. Two Files in the Path:

    • Multiple QCPassed Datasets: If two datasets in the same path are marked as QCPassed, the batch upload will fail with a descriptive error indicating the conflict.

    • Excessive Files in the Path: If more than two files are found for the provided path, the batch upload will fail, instructing the user to provide a more specific or valid path.

    • Enables customers to use intuitive, human-readable paths in their workflows.

    • Automatically handles dataset selection based on quality control status.

    Due Date

  • Execute now

  • Gender. See an important note

  • Gene List Id

  • Kit Id

  • Intersect Bed Id (38.0+)

  • Label Id

  • Opt In

  • Relation

  • Selected Preset. See an important note

  • Visualization Files

  • 24-02-2022

    Sample_Type

    Free text

    Custom

    Amniotic Fluid

    TRUE

    Case Type

    1. "Whole Genome" 2. "Exome" 3. "Custom Panel" 4. Array

    5. Custom case type

    Mandatory. Only considered for proband.

    Whole Genome

    Clinical Notes

    Free text

    Optional

    A 14-year-old boy with a visual acuity of 20/200 in both eyes in whom hearing loss was first noted at 5 years of age on routine screening; audiometry revealed sensorineural hearing loss.

    Date Of Birth

    Date "YYYY-MM-DD"

    Optional

    2013-01-22

    Default Project

    Free text

    Conditionally mandatory. Must be filled in if the "auto" option is used for Files Names (only relevant for BSSH).

    GIAB

    Due Date

    Date "YYYY-MM-DD"

    Optional

    2023-05-03

    Execute now

    1. "TRUE" 2. "FALSE"

    Optional. Default value is "TRUE". Use "FALSE" if you don’t want to run the case upon uploading the file. Only considered for proband.

    FALSE

    Family Id

    Free text

    Mandatory

    RM8392

    Files Names

    1. Semicolon-separated list of paths to .fastq, .fastq.gz, .vcf, .vcf.gz, .bam, .cram, .gt_sample_summary.json, .annotated_cyto.json files without spaces 2. "existing" 3. "auto" (BSSH)

    Conditionally mandatory. An empty sample will be created if the field is left blank. The "existing" option automatically locates FASTQ files based on the BioSample Name. Note: If data files for an existing case were sourced from the customer’s external bucket and later removed, attempting to create a case from those files will result in an error.

    Learn about the . With the "auto" option, BSSH users can automatically locate FASTQ files based on the BioSample Name and Default Project provided. When using BSSH without the "auto" option, ensure that your file path is .

    /GIAB_cases/1/NA24385.dragen.hard-filtered.gvcf.gz;/QA_cases/Other/NA24385.dragen.cnv.vcf.gz;/QA_cases/Other/NA24385.dragen.repeats.vcf;

    Gender

    1. "F" 2. "M" 3. "U"

    Optional. Default value is "U". See an .

    M

    Gene List Id

    integer

    Optional. Must be the id of a previously defined Gene List. Only considered for proband.

    12345

    Kit Id

    integer

    Optional.

    <38.0: ID of a Region of interest BED.

    38.0+: ID of a Coverage BED. Must be the id of a previously defined kit. Only considered for proband.

    23456

    Intersect Bed Id (38.0+)

    integer

    Optional. ID of a Region of interest BED. Must be the id of a previously defined kit. Only considered for proband.

    78957

    Label Id

    integer

    Optional. Must be the id of a previously defined Case Label. Only considered for proband.

    34567

    Opt In

    1. "TRUE" 2. "FALSE"

    Optional. Indicates whether the case subject consented to the with your network(s). Default value is "TRUE".

    FALSE

    Phenotypes

    1. Semicolon-separated list of HPO phenotype terms

    2. "Unaffected" is used for non-affected family members.

    Mandatory for proband sample if Phenotypes Id is empty. List must be under 100. It is possible to include non-HPO terms if Phenotypes Id is empty.

    Abnormal pupillary function;Orthotopic os odontoideum;

    Phenotypes Id

    Semicolon-separated list of HPO phenotype IDs

    Mandatory for proband sample if Phenotypes is empty.

    List must be under 100.

    HP:0007686;HP:0025375;

    Relation

    1. "proband" 2. "mother" 3. "father" 4. "sibling"

    Optional. Default value is "proband". Values "proband", "father", "mother" can be only used once per Family ID. One sample with Relation "proband" is required per Family ID.

    Mother

    Sample Type

    1. "FASTQ" 2. "VCF"

    Conditionally mandatory. Required if Files Names is empty. Only considered for proband.

    FASTQ

    Selected Preset

    1. Free text 2. "Default"

    Optional. Must be the name of a previously defined preset group. The specified preset group appears in the Presets tab for the case.

    If set to default, the default preset group is used.

    If left empty, no preset is applied.

    See an .

    Exome trio

    Storage Provider Id

    Integer

    Conditionally mandatory. Required if Files Names is not empty. Must be from the configured storage provider ID list.

    208

    Visualization Files

    Semicolon-separated list of paths to sequence alignment data files of extension .bam, .cram; .tn.bw, .baf.bw, .roh.bed, .lrr.bedgraph, .baf.bedgraph

    Optional

    /giab_project/NA24385.bam

    If the path contains two files with the same name (for example, two pairs of fastqs in a dataset) , the system will:
    • Select the dataset marked as QCPassed.

    • Fail the batch upload if both datasets are marked as QCPassed, as this indicates conflicting data.

  • More Than Two Files in the Path:

    • If the path contains more than two files or datasets, the system fails the batch upload, as the path is considered ambiguous or invalid.

  • Institution

    Free text

    Custom

    GenoMed Solutions

    Sample_Received_Date

    Free text

    BioSample Name

    Free text

    Conditionally mandatory. An empty sample will be created if the field is left blank.

    NA24385

    Boost Genes

    1. "TRUE" 2. "FALSE"

    /projects/3824821/appresults/2319318/files/119675608
    /projects/ABC_DEF_2022-12-22_DEv395/appresults/ABC-GM58342-def/files/ABC-GM58342-def.hard-filtered.vcf.gz

    CSV schema

    1. Mandatory fields

    2. Conditionally mandatory fields

    3. Optional fields

    4. Custom fields

    Note: In cases with more than one sample, custom fields are only recognized and added to case information if their values appear within the same table row where the Relation field is equal to "proband".

    Custom field examples:

    Batch case .csv file validation rules

    Handling a proband sample with unknown sex

    When a sample is user-assigned "Unknown" sex, the system assumes "Female". This affects CNV interpretation on sex chromosomes in case the genetic sex is actually male:

    • Chromosome X: CN = 2 is considered reference (REF) for a female genome, so CNVs with two copies are hidden by default. This may cause chromosome X duplications to be missed.

    • Chromosome Y: CN = 0 is considered reference (REF) for a female genome, so CNVs with zero copies are hidden by default. This may cause chromosome Y deletions to be missed.

    To include these variants in the analysis, enable the Include Reference Homozygosity and No Coverage Calls toggle in Workbench & Pipeline Settings.

    Terminology clarification: “Selected Preset” field

    Despite its name, the "Selected Preset" field specifies a preset group used in the case, not an individual preset.

    Required BSSH file path format:

    Human-readable path for BSSH files in batch CSV

    Validations

    Error Scenarios

    Benefits

    Case Info
    Mandatory
    Conditionally mandatory
    Optional

    Custom

    Optional. Indicates whether the will be used. "TRUE" means that variants in the targeted genes will receive upgraded scores during prioritization by the AI Shortlist algorithm. Default value is "FALSE". Only considered for proband.

    Supported variant callers

    Emedgene provides the tightest integration with DRAGEN for germline variation analysis, providing accuracy, comprehensiveness, and efficiency, spanning variant calling through interpretation and report generation.

    Compatibility with DRAGEN and DRAGEN Array Variant Callers

    DRAGEN version
    Emedgene version
    Available callers

    4.5

    100.40.0+

    SNV, CNV, STR, SV (del/dup/ins), Targeted, MRJD, Ploidy TruPath: SNV, SV, MRJD


    DRAGEN Array version
    Emedgene version
    Available callers

    The Emedgene platform supports a variety of variant callers and applies specific quality parameters for each. The quality assessment is an essential step in the Emedgene pipeline because variants with low quality will not be considered by the AI components.

    If the variant caller is not supported or not recognized, a default quality function will be applied. The default parameters are built on GT (genotype), depth (DP) and allele bias (AB). These fields are mandatory, and their absence will induce “Low quality” for all variants.

    The following variant callers are currently supported on the Emedgene pipeline, providing a header with the variant caller command line should be present within the VCF headers.

    Additional callers can be supported on demand under license.

    Variant caller / VCF
    Supported versions
    Notes
    Calling methodology
    Boost genes mode
    current limitation for CRAM file input
    formatted correctly
    important note
    extended sharing of data
    important note

    37.0+

    Cyto

    SmallVariant

    N/A

    SmallVariant

    1.38

    CNVReadDepth

    Clair3

    v37.0+

    SmallVariant

    N/A

    SmallVariant

    ClinSV

    N/A

    SVSplitEnd

    N/A

    CNVReadDepth

    CNVReporter

    0.01

    CNVReadDepth

    1.0

    CNVReadDepth

    N/A

    CNVReadDepth

    cuteSV

    2.02

    v37.0+

    SVSplitEnd

    Multi-Sample Viewer:1.0.0.71

    Unknown

    1.0.0

    SmallVariant

    N/A

    SVSplitEnd

    0.1

    CNVReadDepth

    ExomeDepthAM

    0.1

    Private fork of ExomeDepth

    CNVReadDepth

    N/A

    SmallVariant

    3, 3.4, 3.5, 2014, 4, 4.1

    SmallVariant

    GATK

    N/A

    SmallVariant

    Scramble

    Running: scramble2vcf.pl

    SmallVariant

    1.4

    SmallVariant

    4.x, 5.x and not: 5.12, 5.20

    SmallVariant

    CNV

    5.16

    CNVReadDepth

    2.2.0

    SVSplitEnd

    N/A

    SmallVariant

    2.X

    SmallVariant

    2.1.1

    SVSplitEnd

    2.2.4

    SmallVariant

    2.2.4

    SVSplitEnd

    2.X

    SVSplitEnd

    5.2.9

    SmallVariant

    201808, 201911, 202010

    SmallVariant

    201808.03

    SmallVariant

    2.0.6, 2.0.7, 2.5

    SVSplitEnd

    0.0.2

    SmallVariant

    2.0.1

    CNVReadDepth

    Spectre

    v37.0+

    CNVReadDepth

    2.4.5

    SmallVariant

    N/A

    SmallVariant

    N/A

    SVSplitEnd

    4.4 See full compatibility table

    100.39.0+

    SNV, CNV, STR, SV (del/dup/ins), Targeted, MRJD, Ploidy

    4.3 See full compatibility table

    36.0+

    SNV, CNV, STR, SV (del/dup/ins), Targeted, MRJD, JSON PGx*

    4.2 See full compatibility table

    All

    SNV, CNV, STR, SV (del/dup/ins), SMN, JSON PGx*

    4.2

    All

    SNV, CNV, STR, SV (del/dup/ins), SMN

    4.0

    All

    SNV, CNV, STR, SV (del/dup/ins)

    3.10

    All

    SNV, CNV, STR, SV (del/dup/ins)

    3.6-3.9

    All

    SNV

    1.4

    100.40.0+

    Cyto

    1.3

    100.39.0+

    Cyto

    AED CNV

    N/A

    Affymetrix Extensible Data. converted to VCF

    CNVReadDepth

    ION AMPLISEQ

    Extensive Compatibility with Additional Variant Callers

    Internally, this list is referred to as the Emedgenizers list. An Emedgenizer is a tool that normalizes VCF files to the system’s expected format for each variant caller.

    1.2

    5.12, 5.20

    Atlas-SNP2
    CanvasCNV
    QIAGEN CLC Genomics Workbench
    CNVKit
    CnvXhmm
    CNVnator
    CytoScanHDArray
    DeepVariant
    eKLIPse
    ExomeDepth
    Freebayes
    GATK
    Mutect
    GATKScramble
    GLNEXUSSNV
    IONTorrent
    IONTorrent
    MELT
    Mity
    NextGene
    cuteSV for ONT
    PAV
    PAVSV
    PBSV
    Pisces
    Sentieon
    SentieonDNAScope
    Sniffles
    Sophia
    SophiaCnv
    Starling
    Strelka
    Witty
    See full compatibility table