LogoLogo
Illumina KnowledgeIllumina SupportSign In
  • Home
  • Overview
    • Illumina® DRAGEN™ Secondary Analysis
    • DRAGEN Applications
    • Deployment Options
  • Product Guides
    • DRAGEN v4.4
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • Clinical Research Workflows
        • DRAGEN Heme WGS Tumor Only Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
        • DRAGEN Solid WGS Tumor Normal Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Quick Start
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
            • Custom Workflow
              • Custom Config Support
            • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • Illumina scRNA
        • Other scRNA prep
        • RNA Panel
        • RNA WTS
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Pedigree Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • Available pipelines
            • Germline CNV Calling (WGS/WES)
            • Germline CNV Calling ASCN (WGS)
            • Multisample Germline CNV Calling
            • Somatic CNV Calling ASCN (WGS)
            • Somatic CNV Calling WES
            • Somatic CNV Calling ASCN (WES)
          • Additional documentation
            • CNV Input
            • CNV Preprocessing
            • CNV Segmentation
            • CNV Output
            • CNV ASCN module
            • CNV with SV Support
            • Cytogenetics Modality
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
          • Structural Variant IGV Tutorial
        • VNTR Calling
        • Population Genotyping
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • JSON Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single Cell Pipeline
        • Illumina PIPseq scRNA
        • Other scRNA Prep
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN MRD Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
        • Docker Requirements
      • DRAGEN Reports
      • Tools and Utilities
    • DRAGEN v4.3
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Joint Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • CNV Output
          • CNV with SV Support
          • Multisample CNV Calling
          • Somatic CNV Calling WGS
          • Somatic CNV Calling WES
          • Allele Specific CNV for Somatic WES CNV
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
        • VNTR Calling
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
          • Effective Coverage Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single-Cell Pipeline
        • scRNA
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • RNA Panel
        • RNA WTS
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
      • DRAGEN Reports
      • Tools and Utilities
  • Reference
    • DRAGEN Server
    • DRAGEN Multi-Cloud
      • DRAGEN on AWS
      • DRAGEN on AWS Batch
      • DRAGEN on Microsoft Azure
        • Run DRAGEN VM on Azure
      • DRAGEN on Microsoft Azure Batch
        • Azure Batch Run Modes
    • DRAGEN Licensing
      • DRAGEN Server Licensing
      • DRAGEN Cloud Licensing
    • DRAGEN Application Manager
    • Support
    • Resource Files
      • Noise Baselines
    • Supplementary Information
    • Troubleshooting
    • Citing DRAGEN software
    • Release Notes
    • Revision History
Powered by GitBook
On this page
  • Repeat Expansion Detection Options
  • Repeat Expansion Specification Files
  • Covered Repeat Regions
  • Repeat Expansion Detection Output Files
  • VCF Output File
  • Additional Output Files

Was this helpful?

Export as PDF
  1. Product Guides
  2. DRAGEN v4.3
  3. DRAGEN DNA Pipeline

Repeat Expansion Detection

PreviousAllele Specific CNV for Somatic WES CNVNextDe Novo Repeat Expansion Detection

Last updated 2 days ago

Was this helpful?

Short tandem repeats (STRs) are regions of the genome consisting of repetitions of short DNA segments called repeat units. STRs can expand to lengths beyond the normal range and cause mutations called repeat expansions. Repeat expansions are responsible for many diseases, including Fragile X syndrome, amyotrophic lateral sclerosis, and Huntington's disease.

DRAGEN includes a repeat expansion detection tool for STRs, DRAGEN-STR. DRAGEN-STR Performs sequence-graph based realignment of reads that originate inside and around each target repeat. DRAGEN-STR then genotypes the length of the repeat in each allele based on these graph alignments.

DRAGEN-STR is designed for PCR-free whole genome samples. Repeats are only genotyped if the coverage at the locus is at least 10x, but a minimum of 30x is recommended. Sequencing reads must be paired-end with a minimum read length of 100 (2x100bp). DRAGEN-STR cannot be run on multiple FASTQ files that are assigned to different library IDs in the fastq_list.csv file.

DRAGEN-STR does not support somatic analysis.

NOTE:

DRAGEN STR is based on the ExpansionHunter tool. For more information about implementation details and performance assessment refer to these .

Repeat Expansion Detection Options

To enable DRAGEN repeat expansion detection, the following command-line options are required.

  • --repeat-genotype-enable=true

  • --repeat-genotype-specs=<path to specification file>

You can use the --sample-sex option to specify the sex of the sample. The following options are optional.

  • --repeat-genotype-region-extension-length=<length of region around repeat to examine> (default 1000 bp)

  • --repeat-genotype-min-baseq=<Minimum base quality for high confidence bases> (default 20)

For more information on the specification file specified by --repeat-genotype-specs option, see .

The main output of repeat expansion detection is a VCF file that contains the variants found via this analysis.

Repeat Expansion Specification Files

The repeat-specification (also called variant catalog) JSON file defines the repeat regions for DRAGEN-STR to analyze. Default repeat-specification for some pathogenic and polymorphic repeats are in the <INSTALL_PATH>/resources/repeat-specs/ directory, based on the reference genome used with DRAGEN.

--repeat-genotype-specs is required for DRAGEN-STR. If the option is not provided, DRAGEN attempts to autodetect the applicable catalog file from <INSTALL_PATH>/resources/repeat-specs/ based on the reference provided.

Covered Repeat Regions

Repeat Expansion Detection Output Files

VCF Output File

The results of repeat genotyping are output as a separate VCF file, which provides the length of each allele at each callable repeat defined in the repeat-specification catalog file. The name is <outputPrefix>.repeats.vcf (*.gz). The VCF output file lists with the following fields first.

Table 2 Core VCF Fields

Field
Description

CHROM

Chromosome identifier

POS

Position of the first base before the repeat region in the reference

ID

Always .

REF

The reference base at position POS

ALT

List of repeat alleles in format <STRn> . N is the number of repeat units. If REF, then ..

QUAL

Always .

FILTER

LowDepth filter is applied when the overall locus depth is below 10x or number of reads that span one or both breakends is below 5.

Table 3 Additional INFO Fields

Field
Description

END

Position of the last base of the repeat region in the reference

REF

Number of repeat units spanned by the repeat in the reference

RL

Reference length in bp

VARID

Variant ID from the variant catalog

RU

Repeat unit in the reference orientation

REPID

Variant ID from the variant catalog

Table 4 GENOTYPE (Per Sample) Fields

Field
Description

GT

Genotype

SO

Type of reads that support the allele. Values can be SPANNING, FLANKING, or INREPEAT. These values indicate if the reads span, flank, or are fully contained in the repeat.

REPCN

Number of repeat units spanned by the allele

REPCI

Confidence interval for REPCN

ADSP

Number of spanning reads consistent with the allele

ADFL

Number of flanking reads consistent with the allele

ADIR

Number of in-repeat reads consistent with the allele

LC

Locus Coverage

For example, the following VCF entry describes the ATXN1 repeat in a sample NA13537.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NA13537
chr6	16327864	.	G	<STR33>,<STR58>	.	PASS	END=16327954;REF=30;RL=90;RU=TGC;VARID=ATXN1;REPID=ATXN1	GT:SO:REPCN:REPCI:ADSP:ADFL:ADIR:LC	1/2:SPANNING/INREPEAT:33/58:33-33/52-71:4/0:69/83:0/4:37.459459

In this example, the first allele spans 33 repeat units while the second allele spans 58 repeat units. The repeat unit is TGC (RU INFO field), so the sequence of the first allele is TGC x 33 and the sequence of the second allele is TGC x 58. The repeat spans 30 repeat units in the reference (REF INFO field).

The length of the short allele was estimated from spanning reads (SPANNING) while the length of the expanded allele was estimated from in-repeat reads (INREPEAT). The confidence interval for the size of the expanded allele is (52,71). There are 4 spanning and 69 flanking reads consistent with the repeat allele of size 33 that is 4 reads fully contain the repeat of size 33 and 69 flanking reads overlap at most 33 repeat units. There are 83 flanking and 4 in-repeat reads consistent with the repeat allele of size 58. The average coverage of this locus is 37.46x.

Additional Output Files

The sequence-graph alignments of reads in the targeted repeat regions are output in a BAM file. You can use a specialized GraphAlignmentViewer tool available on GitHub to visualize the alignments. Programs like Integrative Genomics Viewer (IGV) are not designed for displaying graph-aligned reads and cannot visualize these BAMs.

The BAMs store graph alignments in custom XG tags using the format<LocusName>,<StartPosition>,<GraphCIGAR>.

  • LocusName---A locus identifier that matches the corresponding entry in the repeat expansion specification file.

  • StartPosition---The starting alignment position of a read on the first graph node.

  • GraphCIGAR---The alignment of a read against the graph starting from that position. GraphCIGAR consists of a sequence of graph node identifiers and linear CIGARS describing the alignment of the read to each node.

Quality scores in the BAM file are binary. High-scoring bases are assigned a score of 40, and low-scoring bases are assigned a score of 0.

You can create specification files for new repeat regions by using one of the provided specification files as a template. See the for details on the format.

Users can choose between any of the three default repeat-specification files packaged with DRAGEN using the command line option: --repeat-genotype-use-catalog=<default|default_plus_smn|expanded>. The default option includes ~60 repeats. The default_plus_smn option includes the SMN repeat in addition to all the repeats in the default catalog. The expanded catalog includes ~174K repeats, see . If --repeat-genotype-use-catalog is not specified on the command line, then the default catalog is used.

The repeat genotyping results will be incorrect if the selected reference genome is not compatible with the repeat specification file. When this occurs, many repeats may be marked as "LowDepth" in the VCF output file or estimated to have zero length. This can be further confirmed by visualizing read alignments with the .

The default variant catalog contains specifications on disease-causing repeats located in AFF2, AR, ARX_1, ARX_2, ATN1, ATXN1, ATXN10, ATXN2, ATXN3, ATXN7, ATXN8OS, BEAN1, C9ORF72, CACNA1A, CBL, CNBP, COMP, CSTB, DAB1, DIP2B, DMD, DMPK, EIF4A3, FMR1, FOXL2, FXN, GIPC1, GLS, HOXA13_1, HOXA13_2, HOXA13_3, HOXD13, HTT, JPH3, LRP12, MARCHF6, NIPA1, NOP56, NOTCH2NLC, NUTM2B-AS1, PABPN1, PHOX2B, PPP2R2B, PRDM12, PRNP, RAPGEF2, RFC1, RUNX2, SAMD12, SOX3, STARD7, TBP, TBX1, TCF4, TNRC6A, VWA1, XYLT1, YEATS2, ZIC2 and ZIC3 genes. More information about disease-causing repeats can also be found .

For the expanded variant catalog, apart from the aforementioned disease-causing repeats, there are ~174K additional polymorphic repeats. They are initially detected using STR-Finder from the 1000 Genomes Project. After that, the candidate repeats are filtered out based on a customized quality control pipeline, see details .

DRAGEN-STR can detect pathogenic expansions of FXN, ATXN3, ATN1, AR, DMPK, HTT, FMR1, ATXN1, C9ORF72 repeats with high accuracy (see ). The pathogenicity status of some repeats might depend on the presence of sequence interruptions or motif changes that DRAGEN-STR does not call. If you would like to visually inspect the relevant read alignments, you can use a Repeat Expansion Viewer third-party tool.

catalog documentation
REViewer visualization tool
here
here
Covered Repeat Regions
Repeat Expansion Specification Files
publications
publications