DUX4 Rearrangement Caller
Overview
The DUX4 Rearrangement Caller identifies the events of potential structural rearrangements between DUX4 and other genes (including IGH). The primary support for the DUX4 Rearrangement Caller is for human reference hg38.
Functionality
The DUX4 Rearrangement Caller has the following features:
call DUX4 Rearrangement events from various format of genomic data like FASTQ, BAM, CRAM.
scan the whole genome and identify potential DUX4 rearrangement events.
run in parallel with the host DRAGEN software with minimal overhead.
Prerequisites
Sequencing dataset to be tumor-only, paired-end and whole-genome sequencing
Sequencing dataset with mean coverage range between 25X to 120X
Sequencing dataset with mean fragment length between 300 to 500bp
Sequencing dataset with mean read length between 100 to 151bp
A reference genome that is compatible with DRAGEN software. You can download prebuilt reference genomes from our website or build your own customized version with:
dragen --build-hash-table true --output-directory <HASHTABLE_DIR> --ht-reference <REF_FASTA> [options]
The DRAGEN DUX4 caller has been validated with a cohort of samples that fall within the above defined parameters. If you have datasets that don't comply with the above parameters, you can bypass the requirements check by specifying --dux4-skip-santiy-check true
to obtain experimental results.
Basic usage
The basic syntax of the DRAGEN command line is:
dragen [global options] [pipeline options] [output options]
The global options are common to all pipelines and control the general behavior of DRAGEN, such as the input and output files/directories, the reference genome, and the license file.
The pipeline options are specific to each pipeline and control the parameters and features of the analysis, such as the variant callers, the filters and the annotations.
The output options control the format and content of the output files, such as the VCF, BAM, and the metrics files.
Input files and command line options
For DUX4 caller, a simple and quick example would be:
where DRAGEN analysis will take in sequencing data from fastq format (BAM, CRAM, ORA also acceptable) and map/align the reads to the reference genome, the mapped and sorted reads will be consumed by DUX4 caller.
Alternatively, DRAGEN DUX4 caller can start from bam format input by skipping the map/align step (assuming bam file is sorted and with duplicates being marked):
What's more, DUX4 caller can run in parallel with other variant callers:
Finally, you will find DUX4 VCF results in the directory of --output-dir with prefix being specified by --output-file-prefix.
Output format
The DUX4 VCF will contain positive calls that represent translocation events across gene pairs. Each event will consist of a set of 4 VCF Breakend records to describe the potential translocation event. Each record will contain PR:SR:SRPB tags to describe the number of fragment that support the events, where PR stands for number of spanning paired reads, SR stands for number of spanning split reads and SRPB stands for number of support read pairs per billion reads being processed. We predefined two sets of genomics target regions, "CoreDUX4" regions and "ExtendedDUX4" regions, to optimize the events detection process, where "CoreDUX4" regions is a subset of "ExtendedDUX4" regions.
An output VCF example will look like this:
Last updated