V36.1
October 16, 2024
Support for DRAGEN 4.3 for customers starting from VCF and FASTQ including expanded multi-genome mapping, improved high-sensitivity and mosaic calling algorithms, support for the new MRJD caller and higher targeted caller coverage.
Significant run time decrease for a genome trio from 12 hours to 5 hours.
New focused mode AI shortlist increases precision and recall, ranking solving variants higher and presenting a shorter ‘most likely’ list.
New self-serve organization settings for more granular control over organization set-up.
Emedgene customers can select their preferred version out of any of the past 5 releases. Customers on v31.0 and below should select an upgrade path at this time.
Patches | Date |
---|---|
DRAGEN 4.3 offers significant improvements in accuracy, comprehensiveness and efficiency documented comprehensively in DRAGEN release notes.
In these release notes we will focus on updates to the Emedgene workbench and pipeline that will result in a clear path to interpreting DRAGEN 4.3 outputs.
Emedgene enables customers to run DRAGEN in Emedgene, or in Bring Your Own DRAGEN (BYOD) workflows on DRAGEN server, BSSH or ICA pipelines.
The following table summarizes the supported DRAGEN 4.3 callers:
New mosaic calling ML model results in 4x fewer FPs than DRAGEN 4.2 high sensitivity mode, and is both more accurate and faster than other mosaic callers.
In Emedgene, mosaic variants are now displayed with a ‘Potential Mosaic’ tag, and users can create preset filters with this tag.
When running DRAGEN through Emedgene, mosaic detection is activated by default with an AF filter threshold set to 0.2.
High sensitivity mode is turned on by default in DRAGEN 4.3. In Emedgene, these variants are displayed with a ‘Homology Region’ tag, and users can create preset filters with this tag.
Segmental duplication regions represent 5% of the genome and have poor mappability. The MRJD (Multi Region Joint Detection) caller implements a haplotype-based de novo small variant calling from collected reads potentially mapped to paralogous regions, enabling de novo germline small variant calling in paralogous regions. Learn more about the MRJD caller.
The MRJD caller provides coverage for the following genes:
Variants will be displayed with ‘Ambiguous calling’ tag, and users will be able to filter for these variants using the Calling Methodology filter.
Since some of the variants output by the MRJD caller are ambiguously placed, and connected by a unique identifier called ‘JIDS’, Emedgene designed a new Connected Variants component to display the connected variants in one view.
When running DRAGEN through Emedgene, MRJD high sensitivity mode will be enabled by default.
A new “Connected Variants” section has been added to the Variant Page providing geneticists with crucial insights into variant relationships within the same case. This feature aids in the accurate assessment of variant pathogenicity by displaying related variants and their connections.
The Connected Variants section displays 3 types of variant connections:
MNVs – multi-nucleotide variants occurring within 2-nucleotides
Ambiguously called variants by DRAGEN connected via the JIDS or Joined IDs.
Compound Heterozygote variants (Only available for trio cases where both parental samples are provided).
Connected Variants will appear both in the summary tab and in a new component located between the Population Statistics and Related Cases vertical tabs. It is available for SNV/indel, SV, CNV, MtDNA, and STR variant types. Only the first 50 connected variants will be displayed.
The Connected Variants tab is available in both simple and advanced mode, where simple mode displays all connected variants, and advanced mode allows you to filter on either of the 3 available connected variant types.
New Emedgene coverage for GBA, HBA & CYP21A2 was added with DRAGEN 4.3. Similar to the MRJD caller output, some targeted caller outputs are also ambiguously placed and connected via a JIDS identifier. The connected variants component supports the display of these variants. Variants will be displayed with ‘Ambiguous calling’ tag, and users will be able to filter for these variants using the Calling Methodology filter.
In DRAGEN 4.3, Emedgene overall quality calculations are aligned with DRAGEN recommendations. See details in the table below.
In addition, new metrics are displayed in the variant page.
For CNVs called from exomes/panels (but not genomes), DRAGEN 4.3 has added a likelihood ratio score. The likelihood ratio is a Log10 likelihood ratio of ALT to REF.
For STRs, the ref and alt confidence interval is now displayed, and additional metrics will be added in V37.
It is highly recommended that users review the DRAGEN default quality criteria for each variant caller as Emedgene displayed quality now aligns with DRAGEN.
Limitations
Preset filters for mosaic and high sensitivity variants can only be created with assistance of Illumina bioinformatics or technical support teams.
The Connected Variants Compound-Het variants will only be shown for trio cases with both parental samples.
Known Issues
When providing joint called DRAGEN 4.3 SV files, proband VCF filter field is copied to parents.
For customers on DRAGEN 4.3 and on GRCh37, there is no DRAGEN ref validation on CYP21A2 variants from the targeted.vcf, may cause case to fail.
Some Connected Variants links search by Chromosome:Position and may display a different variant than original table.
This version greatly improved the case pipeline speed for a genome trio, and also reduced the high variability seen in genome trio processing times. Note that this pipeline is executed after DRAGEN secondary analysis.
The Emedgene focused AI model for shortlisting variants likely to solve the case has been retrained to increase both precision and recall.
As a reminder, the AI model is available in two modes:
Focused mode, which is trained to find a likely solving variant in known genes and will split the shortlist into known and unknown genes. This version of the focused mode will suggest very few genes of unknown significance.
Discovery mode, which is trained to suggest a shortlist of likely solving variants in both known genes and genes of unknown significance.
AI mode is defined in the organization settings.
In this version, only the focused mode AI was updated.
The model was evaluated with a proprietary Illumina dataset of 1375 cases.
V36 focused mode: 97% of cases were solved by the top 3 variants
V35 and under: 97% of cases were solved by the top 9 variants
45% of cases in the validation set had only 3 variants.
An additional 15% had 4 variants.
The median length of the short list is 4 variants.
These updates provide customers with the ability to view and edit information related to their organization directly, thus improving accessibility and reducing dependence on Illumina support. These new settings are available in Settings -> Organization Settings and require new roles per card. Any changes on this page are tracked with activities.
The Information Card displays the cloud region, organization name, ID and URL, and environment version. There is no edit option in this card.
The POC, or Person of Contact card, displays the email of the primary point of contact regarding this Emedgene organization, and can be edited.
The Pipeline card enables users to select and manage pipelines for samples and cases, including the selection Human Reference and DRAGEN version settings.
A new toggle "Include Reference Homozygous & No Coverage calls" has been added. Enable this setting to include reference homozygous genotype and no coverage calls in cases.
This card allows user to select a preferred AI shortlist version:
Focused mode, which is trained to find a likely solving variant in known genes and will split the shortlist into known and unknown genes. This version of the focused mode will suggest very few genes of unknown significance.
Discovery mode, which is trained to suggest a shortlist of likely solving variants in both known genes and genes of unknown significance.
Login | Emedgene does not support accents in User Names, despite support for these in IAM console. Users will not be able to login to the software.
Add New Case | Input file path cannot contain spaces or parenthesis.
Add New Case | API | Applying multiple panels has a limit of 10,000 genes in total but no error message.
Add New Case | No validation that:
Sample IDs are unique
Input files are uncorrupted
Add New Case | BSSH integration does not discard QC failed samples.
Add New Case | For customers starting from Joint gVCF, please make sure the proband is first in order to see the correct insufficient region calculation.
Add New Case | Selecting a disease should automatically suggest phenotypes, however, some diseases available for selection are from sources without phenotypes, and in that case, no phenotypes will be suggested.
Add New Case | API/Batch/UI discrepancies:
Cannot add phenotypes for unaffected parent in batch upload.
Cannot use the same gVCF file for multiple samples from the UI.
Cannot upload JSON files in Batch upload.
Pipeline | The DRAGEN pedigree pipeline for customers starting from FASTQ is only available for customers who have their FASTQ files stored on ICA, and only run WGS in their organization.
Pipeline | Transcript selection prioritization limitations: sometimes prioritizing fusion genes over independent genes with the same severity.
Edit case/Reanalysis | Prohibited changes are not blocked from the user interface. Please refer to documentation on allowed changes. All other workflows require a new case.
Edit Case | Reanalysis | When applying a new gene list, the original gene list is not displayed in activity.
Cases Page | When sample pipeline (DRAGEN) fails, there is no clear error message.
Lab Tab | Insufficient coverage export will not work via UI or API if an included gene does not have a start or end position in NCBI.
Lab Tab | When sample gender is unknown, results of sex validation can be confusing as default gender in Emedgene is Female, so predicted Male will show up as failed.
Analysis Tools | Search | When searching for positions range, search does not consider END for CNVs.
Analysis Tools | Multi-Select | Does not have an aggregated activity report.
Analysis Tools | Manually Added Variants | Cannot be sorted on non-applicable column types e.g. AI rank.
Analysis Tools | Manually Added Variants | STRs | Format is not aligned with format of STRs on the software, e.g. missing variant length.
Candidates, Variant Page | After editing the evidence graph, phenotypic match strength indications are missing from the sidecar and variant page.
Variant Page | Evidence | ACMG automation will not be calculated for CNVs over 20MB.
Curate | CNVs over 10MB are not supported. Fix planned in Q4 2024.
Curate | When exporting a variant from Analyze to Curate, users may sometimes see an erroneous error message ‘None is not allowed for pathogenicity’. The actual error is an unsupported variant type.
Curate | Transcript versions will not be identical in variants created in Curate vs variants created in Analyze. Future enhancement will align transcript selection logic.
Curate | Discrepancy between Analyze and Curate HGVS parser may be experienced.
Network | GRCh37<-->GRCh38 Liftover not available for older components of Network infrastructure, as a result, [Variant Page | Clinical Significance | Networks Classified] may remain erroneously empty while [Variant page | Related cases section] shows relevant information. Same gap for manually classified variants.
Export | Export to excel is limited to 32KB per cell, which may prevent exports with very large CNVs.
Reporting | PMIDs will only work if there is an author on link, no support for books.
Webhooks | Cannot be trigger on internal software statuses such as ‘In Progress’ ‘Reanalysis’.
Organization Settings | BED upload | Validation on the UI component does not check the following.
No validation at all for API uploads.
All lines in the BED must contain the same number of columns.
No duplicate lines.
No trailing whitespaces.
Add New Case | Blocked the ability to run a case from ICA where upload status is incomplete.
Add New Case | Group FASTQ pairs by lane in Emedgene DRAGEN.
Edit Case | Multiple fixes
Improved robustness of reanalysis for old cases.
Fixed issue where reanalysis of carrier analysis cases unselected this analysis type.
Fixed issue where changing case status from finalized also automatically changed it from resolved to unresolved.
Variant Page | Updated default annotation to gnomAD 4.1-non UKB data.
Variant Page | Removed rounding of max AF% in order to accommodate very large gnomAD 4.1 data set.
Report/Export | Fixed an issue with populating ACMG tags to the report.
Management | Roles | Improved the addition of new user roles for new versions, so they are automatically available to assign.
Pipeline robustness | Improved error messages for very large cases.
Add New Case | API | When sending due date please use UTC time, customer time zone is not taken into account with API, only through the UI.
Add New Case | Storage | Issue loading BSSH VCFs from the UI, will be hotfixed with v35.4/v36.1. Please use batch upload to load files from BSSH.
Edit Case | Reanalysis | If HPO terms were updated between analyses, the reanalysis will not automatically map previous HPO terms to new ones.
DRAGEN 4.3 | When providing joint called DRAGEN 4.3 SV files, proband VCF filter field is copied to parents.
DRAGEN 4.3 | For customers on DRAGEN 4.3 and on GRCh37, there is no DRAGEN ref validation on CYP21A2 variants from the targeted.vcf, may cause case to fail.
Cases Page | Illumina Clouds | Users that have been removed from workgroups in IAM can still be added as participants to a case. They will not have access to the software, and there is no security/access risk.
Cases Page | Reupload fails for JSON files.
Candidates Page | Compound het SNV-CNV variants will not display the automated CNV classification. Workaround – view variants from analysis table.
Lab Tab | When a gene is removed from the knowledge graph, no coverage will be shown, until gene is removed from gene list.
Analysis Tools | Filters | Not all variant effects are available in advanced mode, full documentation here.
Analysis Tools | Intermittent issues using variant type filters in combination with variant effects, will be resolved with a refresh.
Analysis Tools | ‘Last’ button on pagination does not work.
Variant Page | Summary | mtDNA variants displayed max AF from gnomAD does not utilize all sub populations, resulting in inaccuracy.
Variant Page | Summary | When ACMG tags are updated by users, the summary component is erroneously not updated.
Variant Page | Clinical Significance | DANN score showing as 0 on GRCh38 variants. DANN data is only available on GRCh37, and should not be visible for GRCh38 variants.
Variant Page | Clinical Significance | For reanalyzed cases, network classified variants may appear as N/A for cases on GRCh38.
Variant Page | Visualization | IGV component performs a mean calculation when zooming out, typically around 3Mbp. This will affect BAF and BigWig files visualization.
Variant Page | Visualization | Intermittent appearance of "cannot read properties of undefined” error due to IGV component limitation.
Variant Page | Visualization | Simple/Advanced selectors will not work for locally uploaded BAM files.
Variant Page | Related Cases | For organizations on Illumina clouds, workgroup ID is shown instead of workgroup alias.
Variant Page | Evidence | ACMG notes won’t save if tag was changed. As a workaround, please hit save after changing tag and before writing notes.
Variant Page | Evidence | For finalized cases, ACMG notes are not visible.
Variant Page | For chrY haploid males, showing HOM zygosity instead of HEMI.
Versions Tab | After a reanalysis, some input files may not appear in versions tab.
Report/Export | Cannot export regions of insufficient coverage to report.
Curate | Searching for a variant causes the related cases to disappear even when search is removed. Work around is to refresh.
Curate | Orphanet links no longer work due to an Orphanet link structure update.
Activity | Editing interpretation paragraph yields an erroneous activity labeled reanalysis.
API | Marking over 5000 variants as viewed will not result in an error message, although this is the database limitation for saving viewed variants.
Dashboard | Diagnostic Yield includes Uncertain as Resolved.
HistoricalDB for mtDNA variants | Mislabels Hom/Hemi counts, contact support to fix.
Organization Settings | API Gene Lists | Does not support NCBI only export/import. This is supported from the UI.
Organization Settings | Set mandatory fields - does not work from the UI. Please contact support if you’d like to configure these fields for your account.
Type | DRAGEN Output | FASTQ | VCF (BYOD) | Notes |
---|---|---|---|---|
Case Type | Avg Number of Variants | V36 Avg Run Time (hrs) |
---|---|---|
October 16, 2024
BAM/
CRAM
In-.BAM/.CRAM
Out - .CRAM
✓
✓
Requires .bai in same folder
Small variants
In-vcf/gvcf Out-hard-filtered.gvcf.gz
✓
✓
Targeted caller variants are removed and ingested via the targeted vcf.
SV del/dup/ins
sv.vcf.gz
✓
✓
VNTR caller outputs are removed from the SV output and not supported on Emedgene yet.
CNV
cnv.vcf.gz
-
✓*
*The new CNV-SV merged file is also supported. Do not use both the CNV and CNV-SV file.
CNV-SV
cnv_sv.vcf.gz
✓
✓*
*The new CNV-SV merged file is also supported. Do not use both the CNV and CNV-SV file.
STR
repeats.vcf.gz
✓
✓
Do not use the ExpansionHunter SMN caller, this will fail the case.
MRJD
mrjd.hard-filtered.vcf.gz
✓
✓
Targeted Callers
Targeted.vcf
✓
✓
GBA, HBA, CYP21A2 w/o CNV, supported. SMN is supported from the targeted JSON.
SMN 1/2
Targeted.json
✓
✓
Supported from targeted.json starting in 4.3.
Ploidy
ploidy_estimation_metrics.csv
✓
✕*
Security requirements prevent the ingestion of csv files at this time, can be pushed in tar.
Star Allele
Targeted.json
✓
✓
Star allele caller, CYP2D6 & CYP2B6 are supported. SMN is also supported.
QC metrics
mapping_metrics.csv
✓
✕*
Security requirements prevent the ingestion of csv files at this time, can be pushed in tar.
QC metrics
bed_coverage_metrics.csv
✓
✕*
Metrics file containing FASTQC information.
QC metrics/
TAR
*.metrics.tar.gz
✕
✓*
DRAGEN report for customers starting from VCF. Only available via API. Tar file must contain one of the following.
METRICS_PATTERNS = [
r'.csv$',
r'.tsv$',
r'.counts(.gc-corrected)?(.gz)?$',
r'.(ploidy|repeats).vcf(.gz)?$',
]
ROH Viz
roh.bed
✓
✓
BAF BigWig
hard-filtered.baf.bw
✓
✓
B-Allele frequency (BAF) output.
TNS BigWig
tn.bw
✓
✓
Bigwig representation of the tangent normalized signal.
Target Counts BigWig
target.counts.bw
✓
✓
BigWig representation of the target counts bins.
ICLR small variants
combined_iclr_sbs.phased.vcf
✕
✓
Phased small variant VCF from combined short and long reads.
ICLR SV
combined_iclr_sbs.sv.vcf
✕
✓
Structural variant VCF from combined short and long reads.
ICLR BAM
LongRead.haplotyped.BAM
✕
✓
Aligned Illumina Complete Long Read in BAM format.
Gene Name
Condition
Application
Emedgene
PMS2
Lynch Syndrome
Pharmacogenomics
✓
SMN1 (small variants)
Spinal Muscular Atrophy
Carrier screening
✓
STRC
Nonsyndromic hearing loss
Carrier screening
✓
NEB
Nemaline myopathy
Carrier screening
✓
TTN
Cardiomyopathy
ACMG 2ndary, NBS
✓
IKBKG
Incontinentia pigmenti, Hypohidrotic ectodermal dysplasia
Newborn screening
✓
Gene Name
Condition
Application
Emedgene
CYP2D6
NA
Pharmacogenomics
✓
SMN1
Spinal Muscular Atrophy
Carrier screening
✓
GBA
Gaucher’s Disease
Carrier screening
✓
CYP2B6
NA
Pharmacogenomics
✓
HBA1/2
Alpha-thalassemia
Carrier Screening
✓
CYP21A2
Congenital Adrenal Hyperplasia
Carrier Screening
✓
RHD/RHCE
RH blood type
Blood Typing
✕
LPA
Cardiac disease risk
Cardiovascular Disease
✕
Variant Type/Caller
Emedgene Quality Calculations
SNV and small InDels
A variant is designated as Low quality if:
· The VCF/FILTER is not "PASS" OR the VCF/QUAL is less than 10
A variant is designated as High quality if:
· The VCF/FILTER is "PASS" AND the VCF/QUAL is greater than 30
All other variants are categorized as Moderate quality.
The VCF Filter value will be presented in the Variant Page | Quality tab.
CNVs
(called by the CNV_SV caller for genomes)
A variant will be designated as Low quality if:
· The VCF/FILTER is not "PASS" AND INFO field SVCLAIM = D
A variant will be designated as High quality if:
· The VCF/FILTER is "PASS" AND (INFO field SVCLAIM = D OR INFO field SVCLAIM = DJ) AND QUAL > 100.
All other variants are categorized as Moderate quality.
CNVs
(called by read-depth caller)
A variant will be designated as Low quality if:
· The VCF/FILTER is not "PASS"
A variant will be designated as High quality if:
· The VCF/FILTER is "PASS" AND VCF/QUAL is greater than 30
All other variants are categorized as Moderate quality.
SVs
A variant will be designated as Low quality if:
· The SVLEN is greater than 50 kb OR the VCF/FILTER is not "PASS"
A variant will be designated as High quality if:
· The VCF/FILTER is "PASS" AND VCF/QUAL is greater than 500
All other variants are categorized as Moderate quality.
STR variants
A variant will be designated as Low quality if:
· The VCF/FILTER is not "PASS"
A variant will be designated as High quality if:
· The VCF/FILTER is "PASS"
Additional STR loci will always have low quality: ARX, HOXA13
MRJD caller variants
A variant will be designated as Low quality if:
· INFO field contains 'JIDS'
A variant will be designated as High quality if:
· FILTER=PASS and INFO field does not contain 'REGION_AMBIGUOUS'
All other variants are categorized as Moderate quality.
Targeted caller variants
For variants with no JIDS ID:
A variant will be designated as Low quality if:
· Filter is not PASS
A variant will be designated as High quality if:
· FILTER=PASS
For variants with a JIDS ID or a Recombinant flag: Always low.
Genome Trio
5 million
5
Genome Singleton
3 million
4
Research Genome Trio
7 million
7
Metric
V35 & under
V36 NEW
Mean rank of solving variant
2.45
1.31
% solving variants tagged by model
97.16%
98.76%
Median length of short list
8
4