Users with a 'Manage AI ACMG' role have the ability to exclude PP5 and BP6 tags from ACMG classification. These tags rely on external assertions without independent evidence and may introduce bias (Biesecker et al. 2018).
This option is enabled by default.
When enabled: PP5 and BP6 tags will not be used in ACMG classification. If either tag is positive (by AI or manual input), a warning message will be displayed in the ACMG section of the variant page.
When disabled: The tags will be included, and no warning sign will be shown.
Organization settings
The Organization Settings page is accessible via the dropdown menu under Settings. Note: The Organization Settings page is accessible only for users having a Manager role.
NGS quality
Set the Gene list threshold for your organization. Case validations will not be applied for cases with gene list containing fewer than the gene list threshold.
Default value for the Gene list threshold is 50.
Values above 0 and whole numbers are accepted.
AI Shortlist module
The AI shortlist prioritizes variants, helping you focus on the most relevant candidates.
In the AI Shortlist module card, you can select an AI shortlist mode for rare disease analysis and one for carrier analysis to match your workflow. Once a mode is changed, it will apply to all new case analyses, including reanalysis.
To change the AI Shortlist modes for rare disease analysis and carrier analysis, users must have the Manage Auto Analysis Tier IAM scope.
In legacy cloud environments, this permission is granted through the EMG role.
Rare disease analysis
Emedgene offers two analysis modes for rare disease interpretation.
Focused Mode: The list includes only variants found in genes known to be associated with disease.
Variants that are highly ranked but located in genes of unknown significance (GUSs) are labeled as Most likely GUS and Candidate GUS.
Discovery Mode: The list is not restricted to variants in known disease-associated genes. Variants in both known and unknown genes are evaluated and ranked together in the same list.
The focused mode of the AI shortlist for rare diseases is referred to as "Tier v0," and the discovery mode is referred to as "Tier v2" in the .
Carrier analysis
The AI shortlist will prioritize variants reported as pathogenic or likely pathogenic in any known variant databases or variants with high severity.
Known Pathogenic: The AI shortlist will prioritize variants reported as pathogenic or likely pathogenic in any known variant databases.
High Severity: The AI shortlist will prioritize variants with high severity.
Both: The AI shortlist will prioritize variants reported as pathogenic or likely pathogenic in any known variant databases or variants with high severity.
Analysis tools columns order
You can customize the default order of Variant tablecolumns. Just rearrange the columns by dragging and dropping them to your preferred sequence.
Quality parameters
Quality parameters section allows users to configure quality thresholds for the organization. Users with 'Manage Quality Parameters' role can edit the thresholds.
NGS quality & Array sample quality section in Quality Parameters
Configuring an organization database
1
Upon request, Illumina Supportthe internal database VCF file from the customer’s cases based on supplied requirements.
Alternatively, the customeran external database VCF file and provides it to Illumina Support for upload.
2
Illumina Support stores the database in the customer’s secure bucket.
3
The user with appropriate permissionsregisters the database by selecting it from their bucket in Settings.
4
Once registered, the database appears in the Organization Databases list. From here, the customer can tune settings:
The PON file (combined counts file) is not stored within Emedgene but within the customer storage location, which means that the file is retrieved each time a new case using the associated test kit is processed.
NOTE: If the combined counts file is removed from the storage location, then new cases created with the associated test kit, or reanalysis that includes secondary analysis, will have an error and go to status of "Issue Reported".
If PON file is deleted from storage, but customer is now using a different DRAGEN version or Ref Genome build version that has another PON, then that new file will be used without case failure.
Array sample quality
Set the Array quality thresholds for your organization. If a sample’s quality values meet the criteria below, it will be classified as ‘High’; otherwise, it will be classified as ‘Low’.
Default thresholds are Call Rate >= 0.99 and LogR Dev <= 0.2
Values between 0 to 1 are accepted.
Lab workflow
Environment
SV annotation thresholds
The Emedgene SV annotation pipeline integrates multiple structural variant databases, including allele frequency sources (e.g., gnomAD SV) and variant/region pathogenicity resources (e.g., ClinVar, ClinGen). Annotation is performed based on defined overlap thresholds tailored to the clinical significance of each database category
Present under Workbench & Pipeline in Organization Settings, this module enables configuration of annotation overlapping thresholds for structural variants with external and internal databases.
Value Constraint: Comprised between 0 to 1, with 2 decimals allowed.
One Side Pathogenic
The One Side Pathogenic setting allows to set a threshold for identifying structural variants classified as pathogenic and likely pathogenic, such as those in ClinGen Pathogenic, ClinVar Pathogenic, DDDSyndromes, and Curate Pathogenic databases. It starts at a default of 0.7, and accepts value between 0 and 1.
One Side Uncertain
The One Side Uncertain setting allows to set a threshold for identifying structural variants classified as uncertain, such as those in ClinGen VUS, ClinVar VUS, and Curate VUS databases. It starts at a default of 0.7, and accepts value between 0 and 1.
Two Sided
The Two Sided setting allows to set a threshold for identifying structural variants classified as benign and likely benign, such as those in ClinGen Benign, ClinVar Benign, DECIPHER, DGV, gnomAD SV, 1000 genomes, and Curate Benign databases. It starts at a default of 0.7, and accepts value between 0 and 1.
Required format for a BED file defining a kit
As part of the management settings, users can upload BED files to be associated with their kits.
The format of the BED file (.bed) is following the description from UCSC Genome Browser with some modifications:
The file has to be a tab-delimited text file;
The file should not contain headers;
The number of fields per line must be consistent throughout any single set of data.
Zero-based index: Start and end positions are identified using a zero-based index.
There are three required fields:
Chromosome - The name of the chromosome has to be sorted in alpha-numeric order. example: chr1, chr2, ..., chr12, ..., chr22, chrX, chrY, chrM.
Chromosome Start - The starting position of the feature in the chromosome. The start position has to be smaller than the end position. The data has to be sorted in numeric order.
Chromosome End - The ending position of the feature in the chromosome. The end position has to be greater than the start position. The data has to be sorted in numeric order.
To add a region name to a BED file, include it as an additional column. This name will be displayed in place of the actual region location for insufficient regions, making it easier to review these areas on the Lab page.
Example:
Newly uploaded BED files will become available for use only after a mandatory waiting period of one hour to ensure adequate time for synchronization to complete.
Creating a database VCF file
Creating an internal database VCF file
The internal database VCF file is produced by our support team.
Please refer to the linked resource for recommendations on sample selection as well as the technical details and limitations involved in constructing internal historic and noise databases.
Creating an external database VCF file
Use the linked to prepare a VCF file formatted for use as a historic, noise, or curated database.
Panel of normals (PON)
Panel of Normals (PON) is a a set of matched normal samples to determine a baseline coverage pattern and account for recurrent technical artifacts that are specific to your workflow. Depth of coverage per each sequenced region is averaged across PON samples; if a significant increase or decrease from this baseline is detected in a test sample, a CNV is called.
A PON linked to a coverage BED kit is used by DRAGEN to increase accuracy of copy number variation (CNV) calling in targeted panel and exome cases.
A PON file is typically a text file listing absolute paths to 'target counts' files of individual matched normal samples. However, a PON can also be in the form of a combined counts file, which is a column-wise concatenation of individual target counts files (either GC-corrected or not).
The Illumina BaseSpace Baseline Builder App can generate this combined counts file (see "").
Modifying database settings
Users with appropriate can tune database settings within their organization.
1
Click Edit next to the database you want to update.
2
Modify database settings as needed:
How do I check my platform version?
Add /version to the URL if you'd like to check the current version of your environment. For example, for demo.emedgene.com, use demo.emedgene.com/version.
If you want to learn more about the different versions available, check .
Default page
The default page upon entering a case can be customized. You can choose between:
- default option;
Default preset group
You can set a Preset group as default. The case is assigned to the default Preset group if no Preset group is selected or the default value is selected in the Preset group selection step.
Click on the dropdown arrow;
Select the Preset group;
Platform version
Select a platform version for your organization from the options available in your region. Changes may take 1–15 minutes to become active.
Note: This selection updates only the software version. It does not affect secondary or tertiary pipeline versions.
To change pipeline versions, go to Organization settings > Workbench & Pipeline settings > .
Organization DB management
Use the Organization DB Management card to manage —custom variant datasets that . Here you can:
Review database details in a table:
Database name and type, file, genome reference, variant type, date last edited
Case identifier
By default, Emedgene displays the Case ID in the "EMGXXXXXXXXX" format. However, you have the choice to use the Proband ID instead.
Recommendations for creating a PON to call CNVs from exome data:
Samples for a PON should be derived from healthy individuals.
In our experience, a PON of at least 40-50 samples yields the best results. A smaller PON is better than nothing, but keep in mind that you may encounter more false positives.
You should aim at preparing samples for a PON in a unified manner to avoid the batch effect. Please log differences in library preparation (if any).
Active: Turn on to apply annotations from this database to variants in new and reanalyzed cases.
AI Shortlist: Turn on to include allele frequencies from this database when prioritizing variants. When enabled, variants with an allele frequency greater than 10% are excluded from the shortlist.
3
Click Save to apply the changes.
Editing an organization database is automatically recorded in the organization’s audit log for transparency and traceability.
Active (On/Off): Indicates whether annotation from the database is applied to variants in new and reanalyzed cases
AI Shortlist (On/Off): Indicates whether the AI Shortlist factors in allele frequencies from the database. When enabled, variants with an allele frequency above 10% are excluded from the list
For CNV databases: Candidate (C) and annotation (A)overlap percentages used for comparing CNV variants in cases vs. the database
Search: Use the search bar above the table to look up a database by its name
Download: Click the Download icon on the right to export a database as a VCF file
Associating URLs with your workgroup on ILMN Cloud
On ILMN Cloud, customers can easily associate URLs with their current workgroup by selecting from a list of predefined URL patterns. These URLs play a crucial role in organizing and accessing workgroup-specific resources.
Key Points to Remember:
Variant tags
The Variant tags card allows organizations to add and delete custom .
Only users with the Manager edit variant tag role can add and delete variant tags.
Every time a tag is added or deleted, the system records it in the
Pipeline versions
The Pipeline version card, located in the Workbench & Pipeline section of Organization Settings, allows you to set the applied versions for the sample pipeline, case pipeline, DRAGEN, and human reference. You can also configure whether to include reference homozygous genotype calls in cases.
Sample
Set the organization sample pipeline configurations. These configurations are only applicable for cases starting from FASTQ files.
BED files
Manage BED files in the BED files card.
These files define your kits and determine which genomic regions are analyzed during case processing. BED files should follow the format as described .
Kit BED file applications
General
General settings and organization information
1. Information
Displays organization details such as Cloud region, Organization name, Organization URL, Organization ID and the Platform main version.
URL Selection: You can choose from a set of predefined URL patterns.
Multiple Associations: A single URL can be associated with one or several workgroups.
Activation Time: After selecting the URL for your workgroup, please allow 1 to 15 minutes for the changes to take effect.
Need a Custom URL?
If you would like to create a new custom URL pattern within your domain, please reach out to our technical support team at [email protected] for assistance.
.
Add a variant tag
1
In the Variant tags card, click Add new.
2
Enter a name for the new tag.
3
Click Save to confirm.
Delete a variant tag
1
Click the Delete icon next to the tag you want to remove.
2
Confirm by clicking Delete in the dialog box.
Tags that are already in use and default tags provided by the system cannot be deleted.
Human Ref- Set the used human reference genome build to apply in the pipeline, used for alignment and calling. possible options: GRCh37 and GRCh38.
Secondary Analysis Pipeline- Set the used secondary analysis pipeline for your organization. This pipeline shall be applicable for all FASTQ samples that are part of the case.
DRAGEN Version- Set the used DRAGEN version utilized in the Secondary analysis pipeline.
Variant caller mapping
Select the variant callers enabled for your secondary analysis pipeline.
Each caller is annotated with its sequencing compatibility (WGS, WES), methodology (e.g., CNV read-depth, small variant, SV split-end), and compatibility requirements across sample, DRAGEN, and case pipeline versions. By default, SNV and CNV callers are always enabled. Additional callers, such as SV, SMN, and STR can now be selectively activated to match evolving analysis needs. Changes made through this interface will apply to newly processed cases after changing selected callers.
Enabling variant caller that is not compatible with the selected pipelines is not supported.
Some variant callers may require an assigned PON file to apply properly for WES sequencing type.
Note on Analysis Flow Selection and Expected Outputs
The analysis flow used for a case is automatically determined by the total number of sequencing reads:
If total reads exceed 600 million, the system runs the whole-genome (WGS) analysis flow.
If total reads are below 600 million, the system runs the exome analysis flow.
This decision affects downstream outputs, particularly CNV detection:
WGS flow: CNVs are reported by default.
Exome flow: CNVs are only reported if a Panel of Normals (PON) file is available for the selected pipeline version.
Sample pipeline arguments
View your Sample pipeline arguments for mapping and calling. In order to modify these arguments, you will need to contact technical support.
If the arguments are not set, they will be displayed as ‘N/A’.
Case
Set the organization case pipeline configuration. This configuration is applicable for all cases in your organization.
Changing the case pipeline will affect the used annotation sources and version for your cases and may impact AI shortlist outcome for your case.
Include Reference Homozygosity (v36.0+)
The Include Reference Homozygous & No Coverage calls option allows you to control whether reference homozygous and no coverage genotype calls are included for the proband in newly processed cases. When enabled, cases will include these calls; when disabled (default), they will not.
Use case: A proband sample with unknown sex
When a sample is set to "Unknown" sex, the system assumes "Female" for chromosome analysis. If the actual genetic sex is male, chrX duplications and chrY deletions will be hidden.
Enabling Include Reference Homozygosity and No Coverage Calls before the case is run keeps these variants visible.
Pipeline versions card
Region of interest BED determines which genomic regions will be included in the variant analysis.
It acts as a filter before annotation and interpretation, defining which variants are processed and thus—the overall cost of the case analysis.
Coverage BED
Defines the target regions used to compute coverage
It ensures that the required depth of coverage is achieved for quality control purposes. Unlike the Region of interest use case, using a BED file to assess coverage does not affect which variants are analyzed.
Enables CNV calling for exome and panel FASTQ cases
CNV calling for exome or panel analysis from FASTQ files requires a (PON). The PON is created based on a Coverage BED kit. Therefore, to trigger CNV calling, you must the Coverage BED during case creation.
The table on this page lists all available BED files, including their kit name, genome build (GRCh37 or GRCh38), and ID. You can view, edit, or delete files as needed, and add new ones using the Add New button.
Note: After uploading a kit BED file, please wait ~2 hours before using it to create case.
Note: These fields are for reference only and cannot be edited.
2. Person of contact
Allows users to manage the primary contact email for the organization in an editable contact email field, ensuring that the correct person receives important information and notifications.
3. Report timezone
In this card, users with the "Manage report time zone" role can update the default time zone used across reports. This setting applies to timestamps for report creation, signing, and due dates.
How to Select a Time zone
A categorized dropdown menu allows users to choose a time zone by continent.
Major cities are listed alongside their GMT offset for easy reference.
A search bar is available to quickly find specific time zones.
This ensures consistency in report timestamps across your organization.
Creating an internal database VCF file
Upon request, Illumina Support builds internal historic or noise database VCF files from the customer’s cases based on supplied requirements.
What data is tracked for each variant?
For each variant, internal historic database records:
Allele frequency (AF): The ratio of the number of variant alleles to the total number of alleles in the dataset
Allele count (AC): The number of observed variant alleles
Allele number (AN): The total number of alleles assessed, derived from the number of individuals
Genotype counts: heterozygous (HET), homozygous (HOM), and hemizygous (HEMI)
Sample list (TEN): A short list of samples where the variant was found
How are variants merged across cases?
Historic database
1
Variant extraction
Variants are extracted from the original input files without any filtering, ensuring that all calls are preserved.
2
Variant grouping and merging
Variants are grouped by calling methodology and merged across cases based on matching information:
Noise database
1
Variant extraction
Variants are extracted from the original input files without any filtering, ensuring that all calls are preserved.
2
Variant grouping and merging
Variants are grouped by calling methodology and merged across cases based on matching information:
Important notes and limitations
Variant quality
No quality filters are applied; all variants are retained in the database regardless of confidence or quality.
Duplication events
CNV callers often report copy number (CN) without specifying zygosity. In Emedgene, the following assumptions are applied:
CN = 3 → Interpreted as heterozygous duplication
CN > 3 → Interpreted as homozygous duplication
When samples contain different CN values for the same region, they are merged into a single DUP entry. The counts for this entry are then calculated based on the inferred zygosity rather than the exact copy number.
Sex chromosomes
Chromosome X
Allele counts depend on the sample’s recorded sex:
Females: Diploid — contribute one allele to the allele count (AC) if heterozygous, or two alleles if homozygous
Males: Haploid — contribute one allele to AC
Mosaic chrX variants in males: Treated as heterozygous, contributing one allele to AC
Chromosome Y
Haploid; only samples recorded as male contribute to allele counting.
Pseudo‑autosomal regions (PAR)
No special handling is implemented yet.
Mitochondrial DNA variants
Heteroplasmy levels are not taken into account; mitochondrial DNA variants are treated as homoplasmic and therefore counted as haploid homozygous in the dataset.
No data regions
The issue
The merging algorithm does not account for sequencing coverage and incorrectly interprets positions with no data as homozygous reference calls. As a result, the dataset becomes artificially enriched in reference alleles, leading to an underestimation of allele frequency (AF) for any variant located in regions with absent coverage.
This applies to any region with absent coverage in any dataset (exomes only, genomes only, or mixed). However, it becomes particularly problematic in mixed exome–genome datasets, where exome samples systematically lack data outside the capture regions (see below).
Recommendation
Be cautious when evaluating variant rarity in regions that are not uniformly covered across samples in a dataset.
Special case: Non-coding variant allele frequency in mixed exome–genome datasets
When exome and genome samples are combined in a single dataset, AF values for non‑coding variants become systematically lower than their true population frequency. This occurs because exome samples have no coverage in non‑coding regions, and the merging algorithm incorrectly interprets these missing data points as homozygous reference. As a result, variants located outside the exome capture regions—such as common intronic or intergenic variants—are underrepresented in the aggregated dataset.
The downward AF bias becomes especially pronounced when exome samples greatly outnumber genome samples:
Exomes outnumber genomes by ~20×
Common intronic variants may appear at an artificially low frequency (<0.05), despite being frequent in the population.
Exomes outnumber genomes by ~100× or more
The bias becomes so severe that common non‑coding variants may appear at <0.01 AF, leading to potential misclassification as rare variants.
Recommendation
To prevent misleading AF estimates for non‑coding regions, maintain separate databases
Organization databases
Organization databases (DBs) are custom variant datasets that enhance interpretation by adding population-specific frequencies, detecting technical artifacts, and referencing curated variants.
Organization database types
By purpose
Historic DBs: Filter out variants common in the population of interest
Serves as a private population frequency database, helping users evaluate variant frequencies within their population.
An internal historic database is created from cases processed in your organization’s Emedgene account.
By origin
Internal DBs: from cases processed within your organization’s Emedgene account.
Note: Historic and noise databases only.
External DBs: by the organization from other sources, such as:
Cases analyzed with different software
By included variant types
SNV DBs: Store single nucleotide variants (SNVs)
CNV DBs: Store copy number variants (CNVs), insertions >50bp (INS), short tandem repeats (STRs), and regions of homozygosity (ROH)
Creating a PON for Emedgene using BaseSpace Baseline Builder
Emedgene users can supply their own panel of normal (PON) to enable CNV calling on gene panels and WES samples. This guide details the process using BaseSpace.
The DRAGEN Baseline Builder application in BaseSpace can be used to build PONs that are compatible with Emedgene:
Users without BaseSpace can receive a free trial BaseSpace account along with compute (250 iCredits) and storage (1 TB) by registering here (https://basespace.illumina.com/). The compute is allocated for 30 days upon trial commencement and will be sufficient to generate a PON.
Alternatively, Illumina Connected Analytics (ICA), DRAGEN servers and DRAGEN in cloud are also capable of generating Emedgene-compatible PONs.
Requirements
~ 50 normal samples 50/50 male:female split, originating from the same library prep protocol, ideally from the same sequencer.
The sample fastq files need to be in one or more Projects in your Basespace account.
iCredits
Uploading a BED file:
The BED file used must match that uploaded to Emedgene, any discrepancy may cause a failure during case processing.
Uploading a BED file to a project in BaseSpace can be done within the project:
Note that hg19/grch37 BED files are not directly compatible with hs37d5 (the reference used within Emedgene) and will require their region contigs being renamed from "chr1", "chr2", "chr3"... etc. to "1", "2", "3" etc.
Running the app
1) Select the application "DRAGEN Baseline Builder" with the latest version that matches your Emedgene secondary analysis pipeline (4.3 in example) and click launch:
2) Select output project and Baseline Mode "CNV":
3) Select input FASTQ Biosamples to use.
50 samples is a rough guideline as the degree of correlation between normals and case sample is more important than quantity.
4) Select correct reference genome, matching that used in Emedgene.
Chose the multigenome version of the genome build.
For GRCh38/hg38:
For GRCh37/hg19 choose the hs37d5 build (remember to rename contigs in BED file if using hs37d5):
5) Select BED file you uploaded to your Basespace project in Step 2.1 above:
6) Configure Emedgene-specific settings:
Emedgene CNV calling has DRAGEN settings that must also be used during PON creation. In the app, this can be done in the advanced settings section at the bottom of the page:
Tick "Ignore Duplicate Reads in CNV Baseline Files" and change "Generated Combined Counts file for CNV" to "GC Corrected"
7) Tick the BaseSpace Labs App Acknowledgement box.
8) Launch the app.
Downloading Results
Once the analysis is complete, open the output files and look for the *.combined.counts.txt.gz file. It may be within a folder called "pon".
It can be downloaded from analysis results manually, by clicking on the file and selecting "Download", via the Basespace CLI (https://developer.basespace.illumina.com/docs/content/documentation/cli/cli-overview), or if the output project is connected to Emedgene - loaded directly into the platform.
Encryption settings
Emedgene supports data encryption with customer-managed keys through Bring Your Own Key (BYOK). This gives organizations full control over their encryption and helps meet compliance requirements for data protection regulations such as HIPAA and GDPR.
Encryption is managed through a Key Management Service (KMS)—a secure system that creates and controls cryptographic keys. Currently, Azure Key Vault is supported, and AWS Key Management Service (KMS) will be available soon.
Starting in v100.39.0, users with appropriate permissions can configure encryption for their workgroup directly in the platform using a key from Azure Key Vault KMS.
Manage encryption using your own key
Use this card to set up data encryption and review its details.
Important notes before you start
Encryption can be configured by users with appropriate
Set up encryption
1
Click Add.
2
Select the KMS type (Azure Key Vault is the default).
3
Enter the required details:
Encryption is active immediately after configuration.
Once encryption is set up, you’ll see the status marked Enabled, plus the date added and the key URL (Azure Key Vault only).
Client secret expiration is monitored. If expiration is less than 30 days:
A warning appears in Organization settings.
Weekly reminders are sent to your organization's point of contact until updated.
Update the client secret for an existing Azure Key Vault configuration
You can update the client secret for an active encryption with Azure Key Vault key. Client ID, tenant ID, and key URL can't be updated.
1
Click the Edit icon on the right.
2
Enter the new client secret.
3
Click Test and Save
Attaching a PON to a Coverage BED kit
Users must link a PON to an existing Coverage BED by specifying the kit ID. Once the PON is created, to trigger CNV calling, you must provide the corresponding Coverage BED during case creation.
A PON Management section is available in the Organization Settings page under Kit management.
This section includes a table displaying:
Kit Name
Kit ID
GC Corrected Status
Human Reference
DRAGEN Version
Maximum Interval Size
Additionally, there is an ‘Add PON’ button to initiate the PON addition process.
Prerequisites
Before adding a PON, ensure the following requirements are met:
The PON file must be a combined counts file.
The file must be stored in a supported cloud storage (AWS S3, ICA, or BaseSpace Storage). Direct uploads are not allowed.
Users must have the “Manage PON” role.
Compatibility
To use the PON for CNV calling, the sample pipeline version must be set to 37.0 or higher. Otherwise, CNV calling will not use the newly added PON.
Existing PONs
Previously existing PONs will not appear in the PON table. However, a notification will indicate the presence of existing PONs within the workgroup.
PON Migration
Previously existing PONs will continue to function, and CNV calling will remain unaffected. If migration to the new PON table is required, please contact [email protected] or your bioinformatics support team.
Adding a new PON
Click ‘Add PON’ to open a pop-up window.
Select values for the required fields:
Coverage kit: Lists all unique kits within the organization (excluding common kits). To add a PON for a common Kit BED, create a separate Kit with the same Kit BED (refer to Kit Management for details).
On next window select a combined counts file from a supported cloud storage service (AWS S3, ICA, or BaseSpace Storage). Ensure the file is available in storage before proceeding.
Only one file with the extension “.combined.counts.txt.gz” can be selected.
Click ‘Next’ to validate the file.
Automated Validations
Before adding a PON, the system performs the following checks:
1. Maximum Interval Size Validation
The system inspects the first 1000 rows of the combined counts file to ensure that no target exceeds the threshold interval size.
Using a combined counts file with the expected maximum interval size is strongly recommended.
2. GC Correction Validation
The system determines GC correction status by checking the cnv-enable-gcbias-correction field in the file headers:
0: Non-GC corrected
1: GC corrected
Viewing Added PONs
Once a PON is successfully added, it appears in the PON table with details based on the selected inputs. Value for Maximum Interval Size and GC Corrected Status are inferred from validation results.
Deleting PON from table
If user wants to delete a PON for a combination listed in table for any reasons, then user with role “Manage Pon” shall be able to delete it. This deletion will be a soft deletion i.e. linkage between combined counts file and Kit BED will be removed and there will be no impact on combined counts file itself.
Registering a database
After a database VCF file is created by Illumina Support or provided by the customer, it is stored in a dedicated customer bucket. To activate the database, users with appropriate permissions need to register the database.
1
In the Organization DB Management card, click Add in the upper right corner.
2
Enter the database details:
DB name: Enter database name
To prevent processing issues and maintain consistent naming across the platform, do not use underscores ("_") in a database name.
DB type: Select database type (Noise, Historic)
Variant type: Select variant type (SNV, CNV)
Human reference: Select genome reference (GRCh37, GRCh38)
3
Click Next.
4
Select the database file from your bucket using the file browser.
5
Click Save.
Registering an organization database automatically logs the details in the organization's audit record, ensuring transparency and traceability.
Preset groups
Here you can review and manage Preset groups.
Preset groups can be created by combining different Presets. Alternatively, you can upload a JSON file that defines the Presets in a Preset group. The file name and schema will be validated upon upload.
The section has two tabs:
V2 (new)
Here you can create (from Presets/from a JSON file, view, edit, hide/unhide and download your organization's Preset groups.
Presets
Here you can review and manage filter for your organization. The presets table includes:
Preset name
Type
Last update (v100.39.0+): Displays the date of the most recent change
The Coverage kit and its human reference BED must already exist. The BED file must match the one used to generate the PON.
Only one PON per unique combination of Coverage kit, human reference BED, and DRAGEN main version is allowed.
PONs cannot be added for DRAGEN sub-versions.
DRAGEN Version: Only versions 3.6 and later are supported.
Human Reference: Choose GRCh37 or GRCh38.
Based on the selected DRAGEN version, the system will display the expected maximum interval size:
DRAGEN 4.2 and below: 250bp
DRAGEN 4.3 and above: 500bp (default for DRAGEN 4.3)
Click Next go to file selection window:
If this field is missing, the system analyzes target value types:
Integer values: Non-GC corrected
Float values: GC corrected
Users must ensure the combined counts file has the intended GC correction status.
Active: Toggle to apply annotations from the database to variants in new and reanalyzed cases.
AI Shortlist: Toggle to make AI Shortlist factor in allele frequencies from the database. When enabled, variants with an allele frequency above 10% are excluded from the list
Fields: Auto-populated
Compatible BED file
only once per workgroup.
Once encryption is set up, you can update the client secret, but you cannot disable encryption or change the KMS type.
Client ID
Tenant ID
Client secret
Key URL
4
Click Test and Save to validate the credentials.
Emedgene checks KMS accessibility with the given credentials and ensures that it has encrypt, decrypt, wrapKey, and unwrapKey permissions for cryptographic operations.
5
Once validated, click Confirm to apply the update.
to validate the credentials.
4
Once validated, click Confirm to apply the update.
Single nucleotide variants (SNVs)
Merged across cases when they share the samechromosome, position, reference and alternate alleles
Copy number variants (CNVs), insertions >50bp (INS), short tandem repeats (STRs), and regions of homozygosity (ROH)
Merged across cases when they share the samechromosome, start position, end position, and reference and alternate alleles
Single nucleotide variants (SNVs)
Merged across cases when they share the samechromosome, position, reference and alternate alleles
Copy number variants (CNVs), insertions >50bp (INS), short tandem repeats (STRs), and regions of homozygosity (ROH)
Merged across cases when they share the samechromosome, start position, end position, and reference and alternate alleles
3
CNV and INS variants: Variant clustering
The algorithm selects the most frequent variant at each locus and uses it as a pivot (reference) variant.
The pivot is merged with other variants of the same type that reciprocally overlap it by at least 70%.
Let P be the pivot (length = p) and V be another variant (length = ), with an overlapping region of length . They are merged if both conditions are satisfied:
In other words, at least 70% of pivot overlaps variant, and at least 70% of variant overlaps pivot.
4
CNV and INS variants: Recalculation of allele counts
After clustering, the allele count (AC) and the genotype counts are recomputed based on the new consensus variant.
for exome and genome samples.
If this is not feasible, assess non‑coding variant rarity in mixed datasets with caution.
Typically includes all unique cases at the time of creation but can be tailored to include only specific cases.
Common variants are less likely to be pathogenic and can be filtered out.
Noise database
Serves as a quality control database to identify recurring artifacts introduced by the sequencing technique, sequencing platform, and analysis pipeline.
Typically, a noise database is composed of samples from unaffected individuals, such as healthy parents.
If only patient data is available, the database remains useful for filtering out high-frequency artifacts. However, caution is necessary when filtering rare variants to avoid excluding true pathogenic ones.
Sample size recommendations:
≥ 100 samples to filter out variants with > 5% allele frequency in the database
≥ 500 samples to filter out variants with > 1% allele frequency in the database
Multiple noise database instances can be maintained to account for different assays and calling methodologies.
Common variants can be as likely artifacts.
Curated database
Serves as a reference of previously curated variants.
A static curated variant database implemented upon request.
Note: This is not the same as variants found in the dynamic database.
from your curated databases aids in pinpointing significant variants, consistency, and faster interpretation.
Here you can view and download legacy Preset groups.
Migrating V1 Preset groups to the improved V2 methodology
Legacy Preset groups can be migrated to the new methodology via two simple steps: downloading the Preset group JSON file in the V1 (legacy) tab, then uploading it on the V2 (new) tab.
Create Preset group from existing Presets:
Click on Add new;
From the dropdown, select New;
Enter a name for the Preset group. Note: The Preset group can't be renamed later!
Select Presets to include in the Preset group. Click Add after selecting each Preset;
Drag and drop Presets to change the order;
Click on Save.
Create Preset group from JSON file:
This is the second step in the migration of the V1 (legacy) Preset group, after you have downloaded the Preset group JSON file.
Click on Add new;
From the dropdown, select From file;
From the file browser, select a JSON file that defines a Preset group;
The system will thoroughly validate the file;
If validation is successful, a Preset group will be created. Any underlying Presets that are missing from your organization will be added as well.
Review contents of the Preset group (V2 and V1)
Click on an downward arrow icon left to the Preset group's name.
Edit Preset group
Click on the ✏️Edit icon; 2a. Add Presets to the group as needed. Select Presets to include in the Preset group from the dropdown. Click Add after selecting each Preset; 2b. Remove Presets from the group as needed. Click on the Remove icon to the right of the Preset name. 2c. Drag and drop Presets to change the order;
Click Save.
Hide/unhide Preset group
When you hide a Preset group, it will no longer appear in the Preset groups list offered at case creation (Select preset group step).
Click on the 👁️Hide/Unhide icon.
Note: A default Preset group cannot be hidden.
Download Preset group file (V2 and V1)
This is the first step in the migration of the V1 (legacy) Preset group to V2 methodology. A Preset group file contains preset names and the filters used to define each Preset in JSON format.
Click on the ⬇️Download icon.
Revert Preset group (V1)
If a V1 Preset group that has undergone migration is set to revert, the corresponding V2 Preset group will be deleted.
Click on the ↩️Revert icon to undo the migration.
Note: If a migrated Preset group has been assigned as default, it cannot be reverted.
The table is sortable by any column (available from version 100.39.0 onwards).
Click on an downward arrow icon left to the Preset's name.
Duplicate Preset (v100.39.0+)
Locate the preset you would like to duplicate and click on the new Duplicate icon;
When duplicating a preset, a new preset name must be entered. The naming and creation process follows the same limitations as creating a new preset from active filters.
Note: the duplicated preset will inherit the original preset's configuration.
Note: this function is available to users with the existing role permissions.
Edit Preset
Click on the ✏️Edit icon;
Make the necessary changes. Editing the Preset requires basic understanding of JSON data format;
Click Save. The software will perform schema validation on the edited Preset.
Note: you can modify the Preset's content but not its name. Note: the Preset can't be edited if it's locked.
Lock/unlock Preset
Locking the Preset prevents any user from changing it.
Click on the Lock/Unlock icon;
Click on Lock/Unlock in the popup window.
Delete Preset
Click on the 🗑️Delete icon;
Confirm your decision by clicking Delete in the popup window.
vcftools Useful for population frequency calculations.
How to create a noise or historic database file
1
Calculate population statistics
General: Allele Number (AN): Calculate the total number of alleles in your population by multiplying the number of individuals () by 2:
.
For each variant within the dataset:
Example historic DB VCF header and variant line
How to create a curated database file
1
Create a VCF file with your variants
In the INFO field, include the significance sub-field and assign its value to each variant based on Table 1.
Only one significance value is allowed per variant. If a variant has multiple interpretations, list the variant in separate rows, each with a different significance value.
Example curated DB VCF variant lines
Small variant
Copy number variant
Next steps
1
Reach out to Illumina support
Provide VCF and TBI files to Illumina support to upload to your organization's dedicated storage bucket, along with information:
Database name. Underscores ("_") in a database name are not allowed.
tabix
Allele Count (AC): Determine the number of times alternate allele of a variant appears across all individuals. This is inferred from genotype counts (n). In case of a biallelic variant where allele A is reference and allele B is alternate, the allele count for alternate allele is calculated as follows:
AC(B)=2×n(BB)+n(AB)
Allele Frequency (AF): Calculate as the ratio of Allele Count to Allele Number:
AF(B)=AC(B)/AN.
2
Create a VCF file with your variants
In the INFO field, include the AF sub-field.
Optionally, include AC, AN, and TEN (a list of up to 10 samples carrying the variant). You may add other fields, provided the field names do not contain underscores or hyphens.
Specify the exact format of each INFO sub-field in ##INFO meta-information lines.
See format example below.
3
Sort variants in the VCF based on chromosome and position with awk
4
Compress the VCF with bgzip
5
Create a TBI index file with tabix
Table 1. Mapping of significance values to pathogenicity classes.
Significance value in the VCF
Pathogenicity class in the UI
0
Unknown
1
Benign
2
Likely Benign
3
VUS
4
Likely Pathogenic
5
Optionally, include comment , category, or other fields to capture text or numerical values that are relevant to classification. You may add other fields, provided the field names do not contain underscores or hyphens.
Specify the exact format of each INFO sub-field in ##INFO meta-information lines.
See format example below.
2
Sort variants in the VCF based on chromosome and position with awk
3
Compress the VCF with bgzip
4
Create a TBI index file with tabix
Database type (Noise, Historic, Curated)
Variant type (SNV, CNV)
Genome reference (GRCh37, GRCh38)
2
Register the database
Once the database is uploaded, the user with appropriate permissions can register the database by selecting it from their bucket in Settings.
##fileformat=VCFv4.2
##fileDate=20250906
##reference=ftp://ftp.ensembl.org/pub/hg19
##contig=<ID=chr1,length=249250621,assembly=hg19>
##contig=<ID=chr2,length=243199373,assembly=hg19>
##contig=<ID=chr3,length=198022430,assembly=hg19>
##contig=<ID=chr4,length=191154276,assembly=hg19>
##contig=<ID=chr5,length=180915260,assembly=hg19>
##contig=<ID=chr6,length=171115067,assembly=hg19>
##contig=<ID=chr7,length=159138663,assembly=hg19>
##contig=<ID=chr8,length=146364022,assembly=hg19>
##contig=<ID=chr9,length=141213431,assembly=hg19>
##contig=<ID=chr10,length=135534747,assembly=hg19>
##contig=<ID=chr11,length=135006516,assembly=hg19>
##contig=<ID=chr12,length=133851895,assembly=hg19>
##contig=<ID=chr13,length=115169878,assembly=hg19>
##contig=<ID=chr14,length=107349540,assembly=hg19>
##contig=<ID=chr15,length=102531392,assembly=hg19>
##contig=<ID=chr16,length=90354753,assembly=hg19>
##contig=<ID=chr17,length=81195000,assembly=hg19>
##contig=<ID=chr18,length=78077000,assembly=hg19>
##contig=<ID=chr19,length=59128000,assembly=hg19>
##contig=<ID=chr20,length=63025000,assembly=hg19>
##contig=<ID=chr21,length=48129000,assembly=hg19>
##contig=<ID=chr22,length=51304000,assembly=hg19>
##contig=<ID=chrX,length=155270000,assembly=hg19>
##contig=<ID=chrY,length=59373000,assembly=hg19>
##contig=<ID=chrM,length=16500,assembly=hg19>
##INFO=<ID=COUNT,Number=1,Type=Integer,Description="Number of occurrences of the variant in the dataset">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency of the variant in the dataset">
##INFO=<ID=AN,Number=A,Type=Float,Description="Allele Number in the dataset">
##INFO=<ID=AC,Number=A,Type=Float,Description="Allele Count of the variant in the dataset">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of the structural variant">
##INFO=<ID=TEN,Number=.,Type=String,Description="Ten samples containing the variant">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 849466 . N <DEL> . . COUNT=1;AF=0.00017;AC=2;AN=12118;END=1073402;SVTYPE=DEL;SVLEN=223936;TEN=SAMPLE1
#CHROM POS ID REF ALT QUAL FILTER INFO
16 89531962 . G T . PASS significance=3;category=interpretations=1.COMMENT='in silico: 3/3 damaging (PP3)'.OBSERVATIONS=1.UPDATE='2020-12-12'
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 145413388 . N <DUP> . . significance=1;SVLEN=333881;END=145747269;SVTYPE=DUP;category=type=duplicate.COMMENT=proximal duplicate BP2-BP3 1q21.1. 1q21 microdeletion syndrome.REMARK=Only AR OMIM genes.