arrow-left

All pages
gitbookPowered by GitBook
1 of 40

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Workbench & Pipeline

AI ACMG module (v100.39.0+)

Users with a 'Manage AI ACMG' role have the ability to exclude PP5 and BP6 tags from ACMG classification. These tags rely on external assertions without independent evidence and may introduce bias (Biesecker et al. 2018).

This option is enabled by default.

  • When enabled: PP5 and BP6 tags will not be used in ACMG classification. If either tag is positive (by AI or manual input), a warning message will be displayed in the ACMG section of the variant page.

  • When disabled: The tags will be included, and no warning sign will be shown.

Organization settings

The Organization Settings page is accessible via the dropdown menu under Settings. Note: The Organization Settings page is accessible only for users having a Manager role.

NGS quality

  • Set the Gene list threshold for your organization. Case validations will not be applied for cases with gene list containing fewer than the gene list threshold.

  • Default value for the Gene list threshold is 50.

  • Values above 0 and whole numbers are accepted.

AI Shortlist module

The AI shortlist prioritizes variants, helping you focus on the most relevant candidates.

In the AI Shortlist module card, you can select an AI shortlist mode for rare disease analysis and one for carrier analysis to match your workflow. Once a mode is changed, it will apply to all new case analyses, including reanalysis.

circle-info

To change the AI Shortlist modes for rare disease analysis and carrier analysis, users must have the Manage Auto Analysis Tier IAM scope.

In legacy cloud environments, this permission is granted through the EMG role.

hashtag
Rare disease analysis

Emedgene offers two analysis modes for rare disease interpretation.

  • Focused Mode: The list includes only variants found in genes known to be associated with disease. Variants that are highly ranked but located in genes of unknown significance (GUSs) are labeled as Most likely GUS and Candidate GUS.

  • Discovery Mode: The list is not restricted to variants in known disease-associated genes. Variants in both known and unknown genes are evaluated and ranked together in the same list.

circle-info

The focused mode of the AI shortlist for rare diseases is referred to as "Tier v0," and the discovery mode is referred to as "Tier v2" in the .

hashtag
Carrier analysis

The AI shortlist will prioritize variants reported as pathogenic or likely pathogenic in any known variant databases or variants with high severity.

  • Known Pathogenic: The AI shortlist will prioritize variants reported as pathogenic or likely pathogenic in any known variant databases.

  • High Severity: The AI shortlist will prioritize variants with high severity.

  • Both: The AI shortlist will prioritize variants reported as pathogenic or likely pathogenic in any known variant databases or variants with high severity.

Analysis tools columns order

You can customize the default order of Variant table columns. Just rearrange the columns by dragging and dropping them to your preferred sequence.

Quality parameters

Quality parameters section allows users to configure quality thresholds for the organization. Users with 'Manage Quality Parameters' role can edit the thresholds.

NGS quality & Array sample quality section in Quality Parameters

Configuring an organization database

1

Upon request, Illumina Support the internal database VCF file from the customer’s cases based on supplied requirements.

Alternatively, the customer an external database VCF file and provides it to Illumina Support for upload.

2

Illumina Support stores the database in the customer’s secure bucket.
3

The user with appropriate permissions registers the database by selecting it from their bucket in Settings.

4

Once registered, the database appears in the Organization Databases list. From here, the customer can tune settings:

  • Enable or disable its use in variant annotation.

  • Enable or disable allele frequency usage by AI Shortlist for variant prioritization.

provisions
creates
Most Likely Candidates and Candidates
Most Likely Candidates and Candidates
Versions tab

Integration (v100.39.0+)

Kit Management

Case settings

Removing PON files from storage

The PON file (combined counts file) is not stored within Emedgene but within the customer storage location, which means that the file is retrieved each time a new case using the associated test kit is processed.

NOTE: If the combined counts file is removed from the storage location, then new cases created with the associated test kit, or reanalysis that includes secondary analysis, will have an error and go to status of "Issue Reported".

If PON file is deleted from storage, but customer is now using a different DRAGEN version or Ref Genome build version that has another PON, then that new file will be used without case failure.

Array sample quality

  • Set the Array quality thresholds for your organization. If a sample’s quality values meet the criteria below, it will be classified as ‘High’; otherwise, it will be classified as ‘Low’.

  • Default thresholds are Call Rate >= 0.99 and LogR Dev <= 0.2

  • Values between 0 to 1 are accepted.

Lab workflow

Environment

SV annotation thresholds

The Emedgene SV annotation pipeline integrates multiple structural variant databases, including allele frequency sources (e.g., gnomAD SV) and variant/region pathogenicity resources (e.g., ClinVar, ClinGen). Annotation is performed based on defined overlap thresholds tailored to the clinical significance of each database category

Present under Workbench & Pipeline in Organization Settings, this module enables configuration of annotation overlapping thresholds for structural variants with external and internal databases.

Organization settings > Workbench & Pipeline > SV annotation threshold
circle-info

Value Constraint: Comprised between 0 to 1, with 2 decimals allowed.

hashtag
One Side Pathogenic

The One Side Pathogenic setting allows to set a threshold for identifying structural variants classified as pathogenic and likely pathogenic, such as those in ClinGen Pathogenic, ClinVar Pathogenic, DDDSyndromes, and Curate Pathogenic databases. It starts at a default of 0.7, and accepts value between 0 and 1.

hashtag
One Side Uncertain

The One Side Uncertain setting allows to set a threshold for identifying structural variants classified as uncertain, such as those in ClinGen VUS, ClinVar VUS, and Curate VUS databases. It starts at a default of 0.7, and accepts value between 0 and 1.

hashtag
Two Sided

The Two Sided setting allows to set a threshold for identifying structural variants classified as benign and likely benign, such as those in ClinGen Benign, ClinVar Benign, DECIPHER, DGV, gnomAD SV, 1000 genomes, and Curate Benign databases. It starts at a default of 0.7, and accepts value between 0 and 1.

Required format for a BED file defining a kit

As part of the management settings, users can upload BED files to be associated with their kits.

The format of the BED file (.bed) is following the description from UCSC Genome Browserarrow-up-right with some modifications:

  • The file has to be a tab-delimited text file;

  • The file should not contain headers;

  • The number of fields per line must be consistent throughout any single set of data.

  • Zero-based index: Start and end positions are identified using a zero-based index.

There are three required fields:

  1. Chromosome - The name of the chromosome has to be sorted in alpha-numeric order. example: chr1, chr2, ..., chr12, ..., chr22, chrX, chrY, chrM.

  2. Chromosome Start - The starting position of the feature in the chromosome. The start position has to be smaller than the end position. The data has to be sorted in numeric order.

  3. Chromosome End - The ending position of the feature in the chromosome. The end position has to be greater than the start position. The data has to be sorted in numeric order.

To add a region name to a BED file, include it as an additional column. This name will be displayed in place of the actual region location for insufficient regions, making it easier to review these areas on the Lab page.

Example:

circle-exclamation

Newly uploaded BED files will become available for use only after a mandatory waiting period of one hour to ensure adequate time for synchronization to complete.

Creating a database VCF file

hashtag
Creating an internal database VCF file

The internal database VCF file is produced by our support team.

Please refer to the linked resource for recommendations on sample selection as well as the technical details and limitations involved in constructing internal historic and noise databases.

hashtag
Creating an external database VCF file

Use the linked to prepare a VCF file formatted for use as a historic, noise, or curated database.

Panel of normals (PON)

Panel of Normals (PON) is a a set of matched normal samples to determine a baseline coverage pattern and account for recurrent technical artifacts that are specific to your workflow. Depth of coverage per each sequenced region is averaged across PON samples; if a significant increase or decrease from this baseline is detected in a test sample, a CNV is called.

A PON linked to a coverage BED kit is used by DRAGEN to increase accuracy of copy number variation (CNV) calling in targeted panel and exome cases.

A PON file is typically a text file listing absolute paths to 'target counts' files of individual matched normal samples. However, a PON can also be in the form of a combined counts file, which is a column-wise concatenation of individual target counts files (either GC-corrected or not).

The Illumina BaseSpace Baseline Builder App can generate this combined counts file (see "").

Modifying database settings

Users with appropriate can tune database settings within their organization.

1

Click Edit next to the database you want to update.

2

Modify database settings as needed:

How do I check my platform version?

Add /version to the URL if you'd like to check the current version of your environment. For example, for demo.emedgene.com, use demo.emedgene.com/version.

If you want to learn more about the different versions available, check .

Default page

The default page upon entering a case can be customized. You can choose between:

  1. - default option;

Default preset group

You can set a Preset group as default. The case is assigned to the default Preset group if no Preset group is selected or the default value is selected in the Preset group selection step.

  1. Click on the dropdown arrow;

  2. Select the Preset group;

Platform version

Select a platform version for your organization from the options available in your region. Changes may take 1–15 minutes to become active.

circle-exclamation

Note: This selection updates only the software version. It does not affect secondary or tertiary pipeline versions. To change pipeline versions, go to Organization settings > Workbench & Pipeline settings > .

Organization DB management

Use the Organization DB Management card to manage —custom variant datasets that . Here you can:

  • Review database details in a table:

    • Database name and type, file, genome reference, variant type, date last edited

Case identifier

By default, Emedgene displays the Case ID in the "EMGXXXXXXXXX" format. However, you have the choice to use the Proband ID instead.

guide
hashtag
Recommendations for creating a PON to call CNVs from exome data:
  1. Samples for a PON should be derived from healthy individuals.

  2. In our experience, a PON of at least 40-50 samples yields the best results. A smaller PON is better than nothing, but keep in mind that you may encounter more false positives.

  3. You should aim at preparing samples for a PON in a unified manner to avoid the batch effect. Please log differences in library preparation (if any).

Creating a PON for Emedgene using BaseSpace Baseline Builder

Active: Turn on to apply annotations from this database to variants in new and reanalyzed cases.

  • AI Shortlist: Turn on to include allele frequencies from this database when prioritizing variants. When enabled, variants with an allele frequency greater than 10% are excluded from the shortlist.

  • 3

    Click Save to apply the changes.

    Editing an organization database is automatically recorded in the organization’s audit log for transparency and traceability.

    permissions
    release notesarrow-up-right
    Pipeline versions
    chr1	40794900	42795220  
    chr1	75796785	76799036  
    chr2	21224573	21226251  
    chr2	51227067	61227587  
    chr3	10183418	10184013  
    chr3	141327248	141327582  
    chr5	33944704	33945028  
    chr5	112173191	112179935  
    chr6	31973380	31973922  
    chr6	118880009	118880306  
    chr10	43572669	43572871  
    chr10	43595829	43596269  
    chr11	532524	532797  
    chr11	108225486	108225731  
    chr20	10620075	10620661  
    chr20	43058101	43058438  
    chr21	35742663	35743177  
    chr21	35821512	35822069  
    chr22	21336576	21336967  
    chr22	29133150	29133387  
    chrX	9693690	9693958  
    chrX	153599135	153599723  
    chrY	2654931	2655693  
    chrM	108	596
  • Lab

  • Candidates
    Analysis/filters
    Analysis/presets
    Click Save.

    Note: A default Preset group cannot be hidden and cannot be reverted.

    Add New Case
  • Active (On/Off): Indicates whether annotation from the database is applied to variants in new and reanalyzed cases

  • AI Shortlist (On/Off): Indicates whether the AI Shortlist factors in allele frequencies from the database. When enabled, variants with an allele frequency above 10% are excluded from the list

  • For CNV databases: Candidate (C) and annotation (A) overlap percentages used for comparing CNV variants in cases vs. the database

  • Search: Use the search bar above the table to look up a database by its name

  • Download: Click the arrow-down-from-line Download icon on the right to export a database as a VCF file

  • Configure databases (requires special permissions)

  • Modify database settings (requires special permissions)

  • Figure 1. Organization DB management card
    organization databases
    support variant interpretation

    Organization URL (ILMN clouds)

    hashtag
    Associating URLs with your workgroup on ILMN Cloud

    On ILMN Cloud, customers can easily associate URLs with their current workgroup by selecting from a list of predefined URL patterns. These URLs play a crucial role in organizing and accessing workgroup-specific resources.

    Key Points to Remember:

    Variant tags

    The Variant tags card allows organizations to add and delete custom .

    circle-info

    Only users with the Manager edit variant tag role can add and delete variant tags.

    circle-info

    Every time a tag is added or deleted, the system records it in the

    Pipeline versions

    The Pipeline version card, located in the Workbench & Pipeline section of Organization Settings, allows you to set the applied versions for the sample pipeline, case pipeline, DRAGEN, and human reference. You can also configure whether to include reference homozygous genotype calls in cases.

    hashtag
    Sample

    Set the organization sample pipeline configurations. These configurations are only applicable for cases starting from FASTQ files.

    BED files

    Manage BED files in the BED files card.

    These files define your kits and determine which genomic regions are analyzed during case processing. BED files should follow the format as described .

    hashtag
    Kit BED file applications

    General

    General settings and organization information

    hashtag
    1. Information

    Displays organization details such as Cloud region, Organization name, Organization URL, Organization ID and the Platform main version.

    circle-info

    URL Selection: You can choose from a set of predefined URL patterns.
  • Multiple Associations: A single URL can be associated with one or several workgroups.

  • Activation Time: After selecting the URL for your workgroup, please allow 1 to 15 minutes for the changes to take effect.

  • Need a Custom URL?

    If you would like to create a new custom URL pattern within your domain, please reach out to our technical support team at [email protected]envelope for assistance.

    .

    hashtag
    Add a variant tag

    1

    In the Variant tags card, click Add new.

    2

    Enter a name for the new tag.

    3

    Click Save to confirm.

    hashtag
    Delete a variant tag

    1

    Click the Delete icon next to the tag you want to remove.

    2

    Confirm by clicking Delete in the dialog box.

    circle-exclamation

    Tags that are already in use and default tags provided by the system cannot be deleted.

    variant tags
    Activity log

    Human Ref- Set the used human reference genome build to apply in the pipeline, used for alignment and calling. possible options: GRCh37 and GRCh38.

  • Secondary Analysis Pipeline- Set the used secondary analysis pipeline for your organization. This pipeline shall be applicable for all FASTQ samples that are part of the case.

  • DRAGEN Version- Set the used DRAGEN version utilized in the Secondary analysis pipeline.

  • hashtag
    Variant caller mapping

    Select the variant callers enabled for your secondary analysis pipeline.

    Each caller is annotated with its sequencing compatibility (WGS, WES), methodology (e.g., CNV read-depth, small variant, SV split-end), and compatibility requirements across sample, DRAGEN, and case pipeline versions. By default, SNV and CNV callers are always enabled. Additional callers, such as SV, SMN, and STR can now be selectively activated to match evolving analysis needs. Changes made through this interface will apply to newly processed cases after changing selected callers.

    Enabling variant caller that is not compatible with the selected pipelines is not supported.

    Some variant callers may require an assigned PON file to apply properly for WES sequencing type.

    circle-info

    Note on Analysis Flow Selection and Expected Outputs The analysis flow used for a case is automatically determined by the total number of sequencing reads:

    • If total reads exceed 600 million, the system runs the whole-genome (WGS) analysis flow.

    • If total reads are below 600 million, the system runs the exome analysis flow.

    This decision affects downstream outputs, particularly CNV detection:

    • WGS flow: CNVs are reported by default.

    • Exome flow: CNVs are only reported if a Panel of Normals (PON) file is available for the selected pipeline version.

    hashtag
    Sample pipeline arguments

    View your Sample pipeline arguments for mapping and calling. In order to modify these arguments, you will need to contact technical support.

    If the arguments are not set, they will be displayed as ‘N/A’.

    hashtag
    Case

    Set the organization case pipeline configuration. This configuration is applicable for all cases in your organization.

    Changing the case pipeline will affect the used annotation sources and version for your cases and may impact AI shortlist outcome for your case.

    hashtag
    Include Reference Homozygosity (v36.0+)

    The Include Reference Homozygous & No Coverage calls option allows you to control whether reference homozygous and no coverage genotype calls are included for the proband in newly processed cases. When enabled, cases will include these calls; when disabled (default), they will not.

    circle-info

    Use case: A proband sample with unknown sex

    When a sample is set to "Unknown" sex, the system assumes "Female" for chromosome analysis. If the actual genetic sex is male, chrX duplications and chrY deletions will be hidden.

    Enabling Include Reference Homozygosity and No Coverage Calls before the case is run keeps these variants visible.

    Pipeline versions card
    • Region of interest BED determines which genomic regions will be included in the variant analysis. It acts as a filter before annotation and interpretation, defining which variants are processed and thus—the overall cost of the case analysis.

    • Coverage BED

      • Defines the target regions used to compute coverage It ensures that the required depth of coverage is achieved for quality control purposes. Unlike the Region of interest use case, using a BED file to assess coverage does not affect which variants are analyzed.

      • Enables CNV calling for exome and panel FASTQ cases CNV calling for exome or panel analysis from FASTQ files requires a (PON). The PON is created based on a Coverage BED kit. Therefore, to trigger CNV calling, you must the Coverage BED during case creation.

    The table on this page lists all available BED files, including their kit name, genome build (GRCh37 or GRCh38), and ID. You can view, edit, or delete files as needed, and add new ones using the Add New button.

    circle-info

    Note: After uploading a kit BED file, please wait ~2 hours before using it to create case.

    here
    Note: These fields are for reference only and cannot be edited.

    hashtag
    2. Person of contact

    Allows users to manage the primary contact email for the organization in an editable contact email field, ensuring that the correct person receives important information and notifications.

    hashtag
    3. Report timezone

    In this card, users with the "Manage report time zone" role can update the default time zone used across reports. This setting applies to timestamps for report creation, signing, and due dates.

    hashtag
    How to Select a Time zone

    • A categorized dropdown menu allows users to choose a time zone by continent.

    • Major cities are listed alongside their GMT offset for easy reference.

    • A search bar is available to quickly find specific time zones.

    This ensures consistency in report timestamps across your organization.

    Creating an internal database VCF file

    Upon request, Illumina Support builds internal historic or noise database VCF files from the customer’s cases based on supplied requirements.

    hashtag
    What data is tracked for each variant?

    For each variant, internal historic database records:

    • Allele frequency (AF): The ratio of the number of variant alleles to the total number of alleles in the dataset

    • Allele count (AC): The number of observed variant alleles

    • Allele number (AN): The total number of alleles assessed, derived from the number of individuals

    • Genotype counts: heterozygous (HET), homozygous (HOM), and hemizygous (HEMI)

    • Sample list (TEN): A short list of samples where the variant was found

    hashtag
    How are variants merged across cases?

    hashtag
    Historic database

    1

    Variant extraction

    Variants are extracted from the original input files without any filtering, ensuring that all calls are preserved.

    2

    Variant grouping and merging

    Variants are grouped by calling methodology and merged across cases based on matching information:

    hashtag
    Noise database

    1

    Variant extraction

    Variants are extracted from the original input files without any filtering, ensuring that all calls are preserved.

    2

    Variant grouping and merging

    Variants are grouped by calling methodology and merged across cases based on matching information:

    hashtag
    Important notes and limitations

    hashtag
    Variant quality

    No quality filters are applied; all variants are retained in the database regardless of confidence or quality.

    hashtag
    Duplication events

    CNV callers often report copy number (CN) without specifying zygosity. In Emedgene, the following assumptions are applied:

    • CN = 3 → Interpreted as heterozygous duplication

    • CN > 3 → Interpreted as homozygous duplication

    When samples contain different CN values for the same region, they are merged into a single DUP entry. The counts for this entry are then calculated based on the inferred zygosity rather than the exact copy number.

    hashtag
    Sex chromosomes

    hashtag
    Chromosome X

    Allele counts depend on the sample’s recorded sex:

    • Females: Diploid — contribute one allele to the allele count (AC) if heterozygous, or two alleles if homozygous

    • Males: Haploid — contribute one allele to AC

    • Mosaic chrX variants in males: Treated as heterozygous, contributing one allele to AC

    hashtag
    Chromosome Y

    Haploid; only samples recorded as male contribute to allele counting.

    hashtag
    Pseudo‑autosomal regions (PAR)

    No special handling is implemented yet.

    hashtag
    Mitochondrial DNA variants

    Heteroplasmy levels are not taken into account; mitochondrial DNA variants are treated as homoplasmic and therefore counted as haploid homozygous in the dataset.

    hashtag
    No data regions

    hashtag
    The issue

    The merging algorithm does not account for sequencing coverage and incorrectly interprets positions with no data as homozygous reference calls. As a result, the dataset becomes artificially enriched in reference alleles, leading to an underestimation of allele frequency (AF) for any variant located in regions with absent coverage.

    This applies to any region with absent coverage in any dataset (exomes only, genomes only, or mixed). However, it becomes particularly problematic in mixed exome–genome datasets, where exome samples systematically lack data outside the capture regions (see below).

    circle-exclamation

    hashtag
    Recommendation

    Be cautious when evaluating variant rarity in regions that are not uniformly covered across samples in a dataset.

    hashtag
    Special case: Non-coding variant allele frequency in mixed exome–genome datasets

    When exome and genome samples are combined in a single dataset, AF values for non‑coding variants become systematically lower than their true population frequency. This occurs because exome samples have no coverage in non‑coding regions, and the merging algorithm incorrectly interprets these missing data points as homozygous reference. As a result, variants located outside the exome capture regions—such as common intronic or intergenic variants—are underrepresented in the aggregated dataset.

    The downward AF bias becomes especially pronounced when exome samples greatly outnumber genome samples:

    • Exomes outnumber genomes by ~20× Common intronic variants may appear at an artificially low frequency (<0.05), despite being frequent in the population.

    • Exomes outnumber genomes by ~100× or more The bias becomes so severe that common non‑coding variants may appear at <0.01 AF, leading to potential misclassification as rare variants.

    circle-exclamation

    hashtag
    Recommendation

    • To prevent misleading AF estimates for non‑coding regions, maintain separate databases

    Organization databases

    Organization databases (DBs) are custom variant datasets that enhance interpretation by adding population-specific frequencies, detecting technical artifacts, and referencing curated variants.

    hashtag
    Organization database types

    hashtag
    By purpose

    • Historic DBs: Filter out variants common in the population of interest

    • Noise DBs: Detect technical artifacts

    • Curated DBs: Reference previously curated variants

    hashtag
    Historic database

    • Serves as a private population frequency database, helping users evaluate variant frequencies within their population.

    • An internal historic database is created from cases processed in your organization’s Emedgene account.

    hashtag
    By origin

    • Internal DBs: from cases processed within your organization’s Emedgene account. Note: Historic and noise databases only.

    • External DBs: by the organization from other sources, such as:

      • Cases analyzed with different software

    hashtag
    By included variant types

    • SNV DBs: Store single nucleotide variants (SNVs)

    • CNV DBs: Store copy number variants (CNVs), insertions >50bp (INS), short tandem repeats (STRs), and regions of homozygosity (ROH)

    Creating a PON for Emedgene using BaseSpace Baseline Builder

    Emedgene users can supply their own panel of normal (PON) to enable CNV calling on gene panels and WES samples. This guide details the process using BaseSpace.

    The DRAGEN Baseline Builder application in BaseSpace can be used to build PONs that are compatible with Emedgene:

    Users without BaseSpace can receive a free trial BaseSpace account along with compute (250 iCredits) and storage (1 TB) by registering here (https://basespace.illumina.com/). The compute is allocated for 30 days upon trial commencement and will be sufficient to generate a PON.

    Alternatively, Illumina Connected Analytics (ICA), DRAGEN servers and DRAGEN in cloud are also capable of generating Emedgene-compatible PONs.

    hashtag
    Requirements

    • ~ 50 normal samples 50/50 male:female split, originating from the same library prep protocol, ideally from the same sequencer.

    • The sample fastq files need to be in one or more Projects in your Basespace account.

    • iCredits

    hashtag
    Uploading a BED file:

    The BED file used must match that uploaded to Emedgene, any discrepancy may cause a failure during case processing.

    Uploading a BED file to a project in BaseSpace can be done within the project:

    Note that hg19/grch37 BED files are not directly compatible with hs37d5 (the reference used within Emedgene) and will require their region contigs being renamed from "chr1", "chr2", "chr3"... etc. to "1", "2", "3" etc.

    hashtag
    Running the app

    1) Select the application "DRAGEN Baseline Builder" with the latest version that matches your Emedgene secondary analysis pipeline (4.3 in example) and click launch:

    2) Select output project and Baseline Mode "CNV":

    3) Select input FASTQ Biosamples to use.

    50 samples is a rough guideline as the degree of correlation between normals and case sample is more important than quantity.

    4) Select correct reference genome, matching that used in Emedgene.

    Chose the multigenome version of the genome build.

    For GRCh38/hg38:

    For GRCh37/hg19 choose the hs37d5 build (remember to rename contigs in BED file if using hs37d5):

    5) Select BED file you uploaded to your Basespace project in Step 2.1 above:

    6) Configure Emedgene-specific settings:

    Emedgene CNV calling has DRAGEN settings that must also be used during PON creation. In the app, this can be done in the advanced settings section at the bottom of the page:

    Tick "Ignore Duplicate Reads in CNV Baseline Files" and change "Generated Combined Counts file for CNV" to "GC Corrected" 7) Tick the BaseSpace Labs App Acknowledgement box.

    8) Launch the app.

    hashtag
    Downloading Results

    Once the analysis is complete, open the output files and look for the *.combined.counts.txt.gz file. It may be within a folder called "pon".

    It can be downloaded from analysis results manually, by clicking on the file and selecting "Download", via the Basespace CLI (https://developer.basespace.illumina.com/docs/content/documentation/cli/cli-overview), or if the output project is connected to Emedgene - loaded directly into the platform.

    Encryption settings

    Emedgene supports data encryption with customer-managed keys through Bring Your Own Key (BYOK)arrow-up-right. This gives organizations full control over their encryption and helps meet compliance requirements for data protection regulations such as HIPAA and GDPR.

    Encryption is managed through a Key Management Service (KMS)—a secure system that creates and controls cryptographic keys. Currently, Azure Key Vaultarrow-up-right is supported, and AWS Key Management Service (KMS)arrow-up-right will be available soon.

    Starting in v100.39.0, users with appropriate permissionsarrow-up-right can configure encryption for their workgroup directly in the platform using a key from Azure Key Vault KMS.

    hashtag
    Manage encryption using your own key

    Use this card to set up data encryption and review its details.

    circle-exclamation

    hashtag
    Important notes before you start

    • Encryption can be configured by users with appropriate

    hashtag
    Set up encryption

    1

    Click Add.

    2

    Select the KMS type (Azure Key Vault is the default).

    3

    Enter the required details:

    circle-check

    Encryption is active immediately after configuration.

    Once encryption is set up, you’ll see the status marked Enabled, plus the date added and the key URL (Azure Key Vault only).

    circle-info

    Client secret expiration is monitored. If expiration is less than 30 days:

    • A warning appears in Organization settings.

    • Weekly reminders are sent to your organization's point of contact until updated.

    hashtag
    Update the client secret for an existing Azure Key Vault configuration

    You can update the client secret for an active encryption with Azure Key Vault key. Client ID, tenant ID, and key URL can't be updated.

    1

    Click the Edit icon on the right.

    2

    Enter the new client secret.

    3

    Click Test and Save

    Attaching a PON to a Coverage BED kit

    Users must link a PON to an existing Coverage BED by specifying the kit ID. Once the PON is created, to trigger CNV calling, you must provide the corresponding Coverage BED during case creation.

    A PON Management section is available in the Organization Settings page under Kit management.

    This section includes a table displaying:

    • Kit Name

    • Kit ID

    • GC Corrected Status

    • Human Reference

    • DRAGEN Version

    • Maximum Interval Size

    Additionally, there is an ‘Add PON’ button to initiate the PON addition process.

    hashtag
    Prerequisites

    Before adding a PON, ensure the following requirements are met:

    • The PON file must be a combined counts file.

    • The file must be stored in a supported cloud storage (AWS S3, ICA, or BaseSpace Storage). Direct uploads are not allowed.

    • Users must have the “Manage PON” role.

    hashtag
    Compatibility

    To use the PON for CNV calling, the sample pipeline version must be set to 37.0 or higher. Otherwise, CNV calling will not use the newly added PON.

    hashtag
    Existing PONs

    Previously existing PONs will not appear in the PON table. However, a notification will indicate the presence of existing PONs within the workgroup.

    hashtag
    PON Migration

    Previously existing PONs will continue to function, and CNV calling will remain unaffected. If migration to the new PON table is required, please contact [email protected] or your bioinformatics support team.

    hashtag
    Adding a new PON

    1. Click ‘Add PON’ to open a pop-up window.

    2. Select values for the required fields:

      1. Coverage kit: Lists all unique kits within the organization (excluding common kits). To add a PON for a common Kit BED, create a separate Kit with the same Kit BED (refer to Kit Management for details).

    1. On next window select a combined counts file from a supported cloud storage service (AWS S3, ICA, or BaseSpace Storage). Ensure the file is available in storage before proceeding.

    2. Only one file with the extension “.combined.counts.txt.gz” can be selected.

    3. Click ‘Next’ to validate the file.

    hashtag
    Automated Validations

    Before adding a PON, the system performs the following checks:

    1. Maximum Interval Size Validation

    • The system inspects the first 1000 rows of the combined counts file to ensure that no target exceeds the threshold interval size.

    • Using a combined counts file with the expected maximum interval size is strongly recommended.

    2. GC Correction Validation

    • The system determines GC correction status by checking the cnv-enable-gcbias-correction field in the file headers:

      • 0: Non-GC corrected

      • 1: GC corrected

    hashtag
    Viewing Added PONs

    Once a PON is successfully added, it appears in the PON table with details based on the selected inputs. Value for Maximum Interval Size and GC Corrected Status are inferred from validation results.

    hashtag
    Deleting PON from table

    If user wants to delete a PON for a combination listed in table for any reasons, then user with role “Manage Pon” shall be able to delete it. This deletion will be a soft deletion i.e. linkage between combined counts file and Kit BED will be removed and there will be no impact on combined counts file itself.

    Registering a database

    After a database VCF file is created by Illumina Support or provided by the customer, it is stored in a dedicated customer bucket. To activate the database, users with appropriate permissions need to register the database.

    1

    In the Organization DB Management card, click Add in the upper right corner.

    2

    Enter the database details:

    • DB name: Enter database name

    circle-exclamation

    To prevent processing issues and maintain consistent naming across the platform, do not use underscores ("_") in a database name.

    • DB type: Select database type (Noise, Historic)

    • Variant type: Select variant type (SNV, CNV)

    • Human reference: Select genome reference (GRCh37, GRCh38)

    3

    Click Next.

    4

    Select the database file from your bucket using the file browser.

    5

    Click Save.

    Registering an organization database automatically logs the details in the organization's audit record, ensuring transparency and traceability.

    Preset groups

    Here you can review and manage Preset groups.

    Preset groups can be created by combining different Presets. Alternatively, you can upload a JSON file that defines the Presets in a Preset group. The file name and schema will be validated upon upload.

    The section has two tabs:

    • V2 (new)

    Here you can create (from Presets/from a JSON file, view, edit, hide/unhide and download your organization's Preset groups.

    Presets

    Here you can review and manage filter for your organization. The presets table includes:

    • Preset name

    • Type

    • Last update (v100.39.0+): Displays the date of the most recent change

    Panel of Normals
    provide
    The Coverage kit and its human reference BED must already exist. The BED file must match the one used to generate the PON.
  • Only one PON per unique combination of Coverage kit, human reference BED, and DRAGEN main version is allowed.

  • PONs cannot be added for DRAGEN sub-versions.

  • DRAGEN Version: Only versions 3.6 and later are supported.
  • Human Reference: Choose GRCh37 or GRCh38.

  • Based on the selected DRAGEN version, the system will display the expected maximum interval size:

    1. DRAGEN 4.2 and below: 250bp

    2. DRAGEN 4.3 and above: 500bp (default for DRAGEN 4.3)

  • Click Next go to file selection window:

  • If this field is missing, the system analyzes target value types:
    • Integer values: Non-GC corrected

    • Float values: GC corrected

  • Users must ensure the combined counts file has the intended GC correction status.

  • Active: Toggle to apply annotations from the database to variants in new and reanalyzed cases.
  • AI Shortlist: Toggle to make AI Shortlist factor in allele frequencies from the database. When enabled, variants with an allele frequency above 10% are excluded from the list

  • Fields: Auto-populated

  • Compatible BED file
    only once per workgroup.
  • Once encryption is set up, you can update the client secret, but you cannot disable encryption or change the KMS type.

  • Client ID

  • Tenant ID

  • Client secret

  • Key URL

  • 4

    Click Test and Save to validate the credentials.

    Emedgene checks KMS accessibility with the given credentials and ensures that it has encrypt, decrypt, wrapKey, and unwrapKey permissions for cryptographic operations.

    5

    Once validated, click Confirm to apply the update.

    to validate the credentials.
    4

    Once validated, click Confirm to apply the update.

    pen
    permissionsarrow-up-right

    Single nucleotide variants (SNVs) Merged across cases when they share the same chromosome, position, reference and alternate alleles

  • Copy number variants (CNVs), insertions >50bp (INS), short tandem repeats (STRs), and regions of homozygosity (ROH) Merged across cases when they share the same chromosome, start position, end position, and reference and alternate alleles

  • Single nucleotide variants (SNVs) Merged across cases when they share the same chromosome, position, reference and alternate alleles

  • Copy number variants (CNVs), insertions >50bp (INS), short tandem repeats (STRs), and regions of homozygosity (ROH) Merged across cases when they share the same chromosome, start position, end position, and reference and alternate alleles

  • 3

    CNV and INS variants: Variant clustering

    1. The algorithm selects the most frequent variant at each locus and uses it as a pivot (reference) variant.

    2. The pivot is merged with other variants of the same type that reciprocally overlap it by at least 70%. Let P be the pivot (length = ppp) and V be another variant (length = ), with an overlapping region of length . They are merged if both conditions are satisfied:

      In other words, at least 70% of pivot overlaps variant, and at least 70% of variant overlaps pivot.

    4

    CNV and INS variants: Recalculation of allele counts

    After clustering, the allele count (AC) and the genotype counts are recomputed based on the new consensus variant.

    for exome and genome samples.
  • If this is not feasible, assess non‑coding variant rarity in mixed datasets with caution.

  • Typically includes all unique cases at the time of creation but can be tailored to include only specific cases.

  • Common variants are less likely to be pathogenic and can be filtered out.

  • hashtag
    Noise database

    • Serves as a quality control database to identify recurring artifacts introduced by the sequencing technique, sequencing platform, and analysis pipeline.

    • Typically, a noise database is composed of samples from unaffected individuals, such as healthy parents.

    • If only patient data is available, the database remains useful for filtering out high-frequency artifacts. However, caution is necessary when filtering rare variants to avoid excluding true pathogenic ones.

    • Sample size recommendations:

      • ≥ 100 samples to filter out variants with > 5% allele frequency in the database

      • ≥ 500 samples to filter out variants with > 1% allele frequency in the database

    • Multiple noise database instances can be maintained to account for different assays and calling methodologies.

    • Common variants can be as likely artifacts.

    hashtag
    Curated database

    • Serves as a reference of previously curated variants.

    • A static curated variant database implemented upon request. Note: This is not the same as variants found in the dynamic database.

    • from your curated databases aids in pinpointing significant variants, consistency, and faster interpretation.

    Research cohorts or legacy data

  • Publicly available datasets

  • Built automatically
    Created

    V1 (legacy)

    Here you can view and download legacy Preset groups.

    Migrating V1 Preset groups to the improved V2 methodology

    Legacy Preset groups can be migrated to the new methodology via two simple steps: downloading the Preset group JSON file in the V1 (legacy) tab, then uploading it on the V2 (new) tab.

    hashtag
    Create Preset group from existing Presets:

    1. Click on Add new;

    2. From the dropdown, select New;

    3. Enter a name for the Preset group. Note: The Preset group can't be renamed later!

    4. Select Presets to include in the Preset group. Click Add after selecting each Preset;

    5. Drag and drop Presets to change the order;

    6. Click on Save.

    hashtag
    Create Preset group from JSON file:

    This is the second step in the migration of the V1 (legacy) Preset group, after you have downloaded the Preset group JSON file.

    1. Click on Add new;

    2. From the dropdown, select From file;

    3. From the file browser, select a JSON file that defines a Preset group;

    4. The system will thoroughly validate the file;

    5. If validation is successful, a Preset group will be created. Any underlying Presets that are missing from your organization will be added as well.

    hashtag
    Review contents of the Preset group (V2 and V1)

    Click on an downward arrow icon left to the Preset group's name.

    hashtag
    Edit Preset group

    1. Click on the ✏️Edit icon; 2a. Add Presets to the group as needed. Select Presets to include in the Preset group from the dropdown. Click Add after selecting each Preset; 2b. Remove Presets from the group as needed. Click on the Remove icon to the right of the Preset name. 2c. Drag and drop Presets to change the order;

    2. Click Save.

    hashtag
    Hide/unhide Preset group

    When you hide a Preset group, it will no longer appear in the Preset groups list offered at case creation (Select preset group step).

    Click on the 👁️Hide/Unhide icon.

    Note: A default Preset group cannot be hidden.

    hashtag
    Download Preset group file (V2 and V1)

    This is the first step in the migration of the V1 (legacy) Preset group to V2 methodology. A Preset group file contains preset names and the filters used to define each Preset in JSON format.

    Click on the ⬇️Download icon.

    hashtag
    Revert Preset group (V1)

    If a V1 Preset group that has undergone migration is set to revert, the corresponding V2 Preset group will be deleted.

    Click on the ↩️Revert icon to undo the migration.

    Note: If a migrated Preset group has been assigned as default, it cannot be reverted.

    The table is sortable by any column (available from version 100.39.0 onwards).

    hashtag
    Create Preset

    Upon clicking on Add new, you will be redirected to Analysis tools page where you can create a Preset from active filters.

    hashtag
    Review logic behind the Preset

    Click on an downward arrow icon left to the Preset's name.

    hashtag
    Duplicate Preset (v100.39.0+)

    1. Locate the preset you would like to duplicate and click on the new Duplicate icon;

    2. When duplicating a preset, a new preset name must be entered. The naming and creation process follows the same limitations as creating a new preset from active filters.

    Note: the duplicated preset will inherit the original preset's configuration. Note: this function is available to users with the existing role permissions.

    hashtag
    Edit Preset

    1. Click on the ✏️Edit icon;

    2. Make the necessary changes. Editing the Preset requires basic understanding of JSON data format;

    3. Click Save. The software will perform schema validation on the edited Preset. Note: you can modify the Preset's content but not its name. Note: the Preset can't be edited if it's locked.

    hashtag
    Lock/unlock Preset

    Locking the Preset prevents any user from changing it.

    1. Click on the Lock/Unlock icon;

    2. Click on Lock/Unlock in the popup window.

    hashtag
    Delete Preset

    1. Click on the 🗑️Delete icon;

    2. Confirm your decision by clicking Delete in the popup window.

    Note: Only unused Presets can be deleted.

    Presets
    vvv
    ooo
    o/p≥0.7o/p≥ 0.7o/p≥0.7
    o/v≥0.7o/v≥ 0.7o/v≥0.7
    filtered out
    Curate
    Filtering by known variants

    Creating an external database VCF file

    hashtag
    Prerequisites

    Format: Database file must follow VCF 4.2 specificationsarrow-up-right.

    Tools:

    • Required:

      • awk

      • bgzip

    • Optional:

      • vcftools Useful for population frequency calculations.

    hashtag
    How to create a noise or historic database file

    1

    Calculate population statistics

    1. General: Allele Number (AN): Calculate the total number of alleles in your population by multiplying the number of individuals () by 2: .

    2. For each variant within the dataset:

    hashtag
    Example historic DB VCF header and variant line

    hashtag
    How to create a curated database file

    1

    Create a VCF file with your variants

    1. In the INFO field, include the significance sub-field and assign its value to each variant based on Table 1. Only one significance value is allowed per variant. If a variant has multiple interpretations, list the variant in separate rows, each with a different significance value.

    hashtag
    Example curated DB VCF variant lines

    hashtag
    Small variant

    hashtag
    Copy number variant

    hashtag
    Next steps

    1

    Reach out to Illumina support

    Provide VCF and TBI files to Illumina support to upload to your organization's dedicated storage bucket, along with information:

    • Database name. Underscores ("_") in a database name are not allowed.

    tabix

    Allele Count (AC): Determine the number of times alternate allele of a variant appears across all individuals. This is inferred from genotype counts (nnn). In case of a biallelic variant where allele A is reference and allele B is alternate, the allele count for alternate allele is calculated as follows:

    AC(B)=2×n(BB)+n(AB)AC(B) = 2×n(BB) + n(AB)AC(B)=2×n(BB)+n(AB)

  • Allele Frequency (AF): Calculate as the ratio of Allele Count to Allele Number: AF(B)=AC(B)/ANAF(B) = AC(B)/ ANAF(B)=AC(B)/AN.

  • 2

    Create a VCF file with your variants

    1. In the INFO field, include the AF sub-field.

    2. Optionally, include AC, AN, and TEN (a list of up to 10 samples carrying the variant). You may add other fields, provided the field names do not contain underscores or hyphens.

    3. Specify the exact format of each INFO sub-field in ##INFO meta-information lines.

    See format example below.

    3

    Sort variants in the VCF based on chromosome and position with awk

    4

    Compress the VCF with bgzip

    5

    Create a TBI index file with tabix

    Table 1. Mapping of significance values to pathogenicity classes.
    Significance value in the VCF
    Pathogenicity class in the UI

    0

    Unknown

    1

    Benign

    2

    Likely Benign

    3

    VUS

    4

    Likely Pathogenic

    5

    1. Optionally, include comment , category, or other fields to capture text or numerical values that are relevant to classification. You may add other fields, provided the field names do not contain underscores or hyphens.

    2. Specify the exact format of each INFO sub-field in ##INFO meta-information lines.

    See format example below.

    2

    Sort variants in the VCF based on chromosome and position with awk

    3

    Compress the VCF with bgzip

    4

    Create a TBI index file with tabix

    Database type (Noise, Historic, Curated)
  • Variant type (SNV, CNV)

  • Genome reference (GRCh37, GRCh38)

  • 2

    Register the database

    Once the database is uploaded, the user with appropriate permissions can register the database by selecting it from their bucket in Settings.

    arrow-up-right-from-square
    arrow-up-right-from-square
    arrow-up-right-from-square
    NNN
    AN=2×NAN = 2×NAN=2×N
    Documentationarrow-up-right
    Documentationarrow-up-right
    Documentationarrow-up-right
    arrow-up-right-from-square
    Documentationarrow-up-right
    awk '$1 ~ /^#/ {print $0;next} {print $0 | "sort -k1,1 -k2,2n"}'
    bgzip <your_db>.vcf
    tabix -p vcf <your_db>.vcf.gz
    awk '$1 ~ /^#/ {print $0;next} {print $0 | "sort -k1,1 -k2,2n"}'
    bgzip <your_db>.vcf
    tabix -p vcf <your_db>.vcf.gz
    ##fileformat=VCFv4.2
    ##fileDate=20250906
    ##reference=ftp://ftp.ensembl.org/pub/hg19
    ##contig=<ID=chr1,length=249250621,assembly=hg19>
    ##contig=<ID=chr2,length=243199373,assembly=hg19>
    ##contig=<ID=chr3,length=198022430,assembly=hg19>
    ##contig=<ID=chr4,length=191154276,assembly=hg19>
    ##contig=<ID=chr5,length=180915260,assembly=hg19>
    ##contig=<ID=chr6,length=171115067,assembly=hg19>
    ##contig=<ID=chr7,length=159138663,assembly=hg19>
    ##contig=<ID=chr8,length=146364022,assembly=hg19>
    ##contig=<ID=chr9,length=141213431,assembly=hg19>
    ##contig=<ID=chr10,length=135534747,assembly=hg19>
    ##contig=<ID=chr11,length=135006516,assembly=hg19>
    ##contig=<ID=chr12,length=133851895,assembly=hg19>
    ##contig=<ID=chr13,length=115169878,assembly=hg19>
    ##contig=<ID=chr14,length=107349540,assembly=hg19>
    ##contig=<ID=chr15,length=102531392,assembly=hg19>
    ##contig=<ID=chr16,length=90354753,assembly=hg19>
    ##contig=<ID=chr17,length=81195000,assembly=hg19>
    ##contig=<ID=chr18,length=78077000,assembly=hg19>
    ##contig=<ID=chr19,length=59128000,assembly=hg19>
    ##contig=<ID=chr20,length=63025000,assembly=hg19>
    ##contig=<ID=chr21,length=48129000,assembly=hg19>
    ##contig=<ID=chr22,length=51304000,assembly=hg19>
    ##contig=<ID=chrX,length=155270000,assembly=hg19>
    ##contig=<ID=chrY,length=59373000,assembly=hg19>
    ##contig=<ID=chrM,length=16500,assembly=hg19>
    ##INFO=<ID=COUNT,Number=1,Type=Integer,Description="Number of occurrences of the variant in the dataset">
    ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency of the variant in the dataset">
    ##INFO=<ID=AN,Number=A,Type=Float,Description="Allele Number in the dataset">
    ##INFO=<ID=AC,Number=A,Type=Float,Description="Allele Count of the variant in the dataset">
    ##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant">
    ##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of the structural variant">
    ##INFO=<ID=TEN,Number=.,Type=String,Description="Ten samples containing the variant">
    ##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
    #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
    chr1	849466	.	N	<DEL>	.	.	COUNT=1;AF=0.00017;AC=2;AN=12118;END=1073402;SVTYPE=DEL;SVLEN=223936;TEN=SAMPLE1
    #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
    16	89531962	.	G	T	.	PASS	significance=3;category=interpretations=1.COMMENT='in silico: 3/3 damaging (PP3)'.OBSERVATIONS=1.UPDATE='2020-12-12'
    #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
    chr1	145413388	.	N	<DUP>	.	.	significance=1;SVLEN=333881;END=145747269;SVTYPE=DUP;category=type=duplicate.COMMENT=proximal duplicate BP2-BP3 1q21.1. 1q21 microdeletion syndrome.REMARK=Only AR OMIM genes.
    

    Pathogenic