Multi-Region Joint Detection
DRAGEN Multi-region Joint Detection (MRJD) is a de novo germline small variant caller for paralogous regions. In DRAGEN v4.3, MRJD covers regions that include six clinically relevant genes: NEB, TTN, SMN1/2, PMS2, STRC, and IKBKG. MRJD is compatible with hg38, hg19 and GRCh37 reference genome. The table below includes hg38 region coordinates covered by MRJD.
Chromosome | Start | End | Description |
---|---|---|---|
chr2 | 151578759 | 151588523 | NEB exon 98-105 |
chr2 | 151589318 | 151599076 | NEB exon 90-97 |
chr2 | 151599871 | 151609628 | NEB exon 82-89 |
chr2 | 178653238 | 178654995 | TTN exon 172-180 |
chr2 | 178657498 | 178659255 | TTN exon 181-189 |
chr2 | 178661759 | 178663516 | TTN exon 190-198 |
chr5 | 70049522 | 70077596 | SMN2 |
chr5 | 70924940 | 70953013 | SMN1 |
chr7 | 5970924 | 5980896 | PMS2 exon 13-15 |
chr7 | 5980968 | 5987689 | PMS2 exon 11-12 |
chr7 | 6737007 | 6743712 | PMS2CL exon 2-3 |
chr7 | 6743880 | 6753867 | PMS2CL exon 4-6 |
chr15 | 43599563 | 43602630 | STRC exon 24-29 |
chr15 | 43602982 | 43611000 | STRC exon 14-23 |
chr15 | 43611040 | 43618800 | STRC exon 1-13 |
chr15 | 43699379 | 43702452 | STRCP1 exon 23-28 |
chr15 | 43702488 | 43710472 | STRCP1 exon 13-22 |
chr15 | 43710502 | 43718262 | STRCP1 exon 1-12 |
chrX | 154555884 | 154565047 | IKBKG exon 3-10 |
chrX | 154639390 | 154648553 | IKBKGP1 |
MRJD method
MRJD is a variant calling method that is designed to detect de novo germline small variants in paralogous regions of the genome. A conventional variant caller relies on the read aligner to determine which reads likely originated from a given location. This method works well when the region of interest does not resemble any other region of the genome over the span of a single read (or a pair of reads for paired-end sequencing). However, a significant fraction of the human genome does not meet this criterion. At least 5% of the human genome consists of segmental duplications. Many regions of the genome have near-identical copies elsewhere, and as a result, the true source location of a read might be subject to considerable uncertainty. If a group of reads is mapped with low confidence, a conventional variant caller might ignore the reads, even though they contain useful information. If a read is mismapped (i.e., the primary alignment is not the true source of the read), it can result in variant detection errors.
MRJD is designed in attempt to tackle the complexities raised by segmental duplication regions. Basically, instead of considering each region in isolation, MRJD considers all locations from which a group of reads may have originated and attempts to detect the underlying sequences jointly across all paralogous regions in the sample of interest.
Below is a diagram showing the general workflow of MRJD in PMS2 and PMS2CL regions. MRJD takes primary alignments in all paralogous regions, regardless of mapping quality, builds and places haplotypes based on reads and prior knowledge, and computes joint genotypes to call small variants.
Figure 1. MRJD Caller workflow.
Two modes of the MRJD Caller
As shown in the diagram above, there are two modes of the DRAGEN MRJD Caller, default mode and high sensitivity mode. Here are details on the differences between the two modes.
Default mode
With --enable-mrjd=true
, the MRJD Caller will report the following two types of variants:
Uniquely placed variants, which means the variant is found and placed in one of the paralogous regions without ambiguity. See variants labeled with “type 1” in Figure 2.
Region-ambiguous variants. In this case, the aggregated genotype contains a variant allele with high confidence, but MRJD Caller is unable to place the variant allele in one of the paralogous regions with high confidence. The MRJD Caller will report the variant allele in all paralogous regions. See variants labeled with “type 2” in Figure 2.
High Sensitivity mode
With both --enable-mrjd=true
and --mrjd-enable-high-sensitivity-mode=true
, the MRJD Caller reports the same variants as from the default mode, plus two other types of variants.
Positions where the reference alleles in all paralogous regions are not the same. It is well established that gene conversion, including reciprocal crossover, is a common event between paralogous regions (such as PMS2 and PMS2CL). When reciprocal crossover event occurs, the prior model, without nearby information on phasing, might end up placing the converted haplotype in the source region instead of the destination region, resulting in no variant. The high sensitivity mode compensates for this event by reporting the variant in corresponding positions in all paralogous regions. See variants labeled with “type 3” in Figure 2.
Variants that have been placed uniquely in one of the paralogous regions and no variant in the corresponding position in the other region. The high sensitivity mode reports the variant in the rest of the paralogous regions. This is to compensate the fact that sometimes the prior knowledge that is used to help place the variant is not sufficient or is estimated incorrectly. In those cases, the variant allele still exists but is placed in the wrong paralog region. Therefore, reporting the variant in the other paralogous regions can help maximize sensitivity even with the lack of prior. See variants labeled with “type 4” in Figure 2.
Figure 2. Different variant types reported by MRJD Caller default mode and high sensitivity mode.
Running DRAGEN MRJD
The MRJD Caller is disabled by default and requires WGS data aligned to a human reference genome build 38, 19, or GRCh37.
Here is the list of options related to MRJD.
--enable-mrjd
If set to true, MRJD is enabled for the DRAGEN pipeline. Note that MRJD cannot run together with SNV caller in the current version of DRAGEN (default = ‘false’).--mrjd-enable-high-sensitivity-mode
If set to true, MRJD high sensitivity mode is enabled for the DRAGEN pipeline. See previous section on what variant types are reported in MRJD default mode and high sensitivity mode (default = ‘false’).
The following command-line example uses FASTQ input and runs MRJD Caller with high sensitivity mode:
The following command-line example uses BAM input that has already been aligned and runs MRJD Caller with high sensitivity mode:
Recommended WGS workflow that includes MRJD
It is important to note that MRJD cannot run together with the DRAGEN Small Variant Caller in this DRAGEN version. We recommend users to run DNA Mapping and Small Variant Calling workflow first, and then run MRJD using the aligned BAM file generated from DNA Mapping workflow as input. Using this workflow, two VCF files will be created (.hard-filtered.vcf.gz by DRAGEN Small Variant Caller and .mrjd.hard-filtered.vcf.gz by DRAGEN MRJD). To help user get a single VCF file for downstream anlaysis, we prepared a utility tool that replaces the DRAGEN Small Variant Caller output in the homology region of the six medically relevant and challenging genes with MRJD caller output. The tool also annotates the calls made by MRJD (with "MRJD" tag in the INFO column). Please refer to the DRAGEN Software Support Site page to download the utility tool.
Here are the example command lines to first run DNA Mapping and Small Variant Calling workflow using FASTQ files as input, and then run MRJD using BAM file generated by the DNA Mapping workflow as input.
Output format
The MRJD Caller generates a .mrjd.hard-filtered.vcf.gz file in the output directory. The output file is a compressed VCFv4.2 formatted file that contains the VCF representation of the small variants from the identified genotype.
Uniquely placed call
The following are example output format for uniquely placed variant. The DRAGENHardQual filter is applied to the records if the variant has a QUAL < 3.00.
Figure 3. VCF output format example for uniquely placed call.
Non-uniquely-placed call
For variant that are not uniquely placed (type 2-4 variant in Figure 2), the MRJD Caller will also report variants under diploid genotype format, which can be interpreted the same way as uniquely placed variant (the genotype is region-specific instead of an aggregate across all regions). Under this format, The QUAL presents phred-scaled quality score for the assertion made in ALT (i.e. −10log10 prob(GT==0/0)). Note that the QUAL score will be equal to or less than 3 (if the QUAL > 3, then the call should be uniquely placed).
The QUAL, GT, GQ and PL will be reported similar to the DRAGEN germline VC. To avoid losing information about the aggregated genotype across paralogous regions, the MRJD Caller reports genotype, phred-scaled quality score, and the phred-scaled genotype likelihoods for aggregated genotype using JGT, JQL, and JPL in the FORMAT column.
Figure 4. VCF output format example for non-uniquely-placed call.
Last updated