DRAGEN Amplicon Pipeline

Amplicon sequencing is a highly targeted approach that enables you to analyze genetic variation in specific genomic regions. The ultradeep sequencing of PCR products (amplicons) allows you to efficiently identify and characterize variants. This method uses oligonucleotide probes designed to target and capture regions of interest, followed by next-generation sequencing (NGS).

The Amplicon Pipeline supports both DNA and RNA data. The Amplicon Pipeline turns off duplicate marking because there are only a few unique start and end positions for fragments from an amplicon target due to the assay.

The DNA Amplicon Pipeline uses the DRAGEN DNA Pipeline by including an additional step after mapping and aligning to soft-clip primers and rewrite alignments. If the target amplicon is found, DRAGEN tags each alignment with the target amplicon and performs soft-clipping on the primer sequences. DRAGEN performs tagging by adding an XN:Z:<amplicon name> tag to the output BAM/CRAM record. Soft-clipping makes sure that the primer sequences do not contribute to the variant calls.

In the primer clipping step, poorly aligned reads are also unaligned with MAPQ set to 0:

Alignments that don't consume any reference bases after soft-clipping.
Off-target alignments overlapping target regions.
Alignments with a substitution fraction more than a threshold. Substitution fraction is the ratio of match count to match and mismatch count and the probe regions are excluded from the calculation. The threshold is specified by --amplicon-max-substitution-fraction with a default of 0.04.
Alignments with read base count less than the short-read threshold after soft-clipping and with a substitution fraction more than a threshold including the probes. The short-read threshold is specified by --amplicon-shortread-length-threshold with a default of 30. The probe regions are included in the calculation and soft-clipped bases are treated as mismatches. The substitution threshold is set by --amplicon-max-shortread-substitution-fraction with a default of 0.1.
Alignments with a soft-clipping fraction more than a threshold. The probe regions are excluded from the calculation and the treshold is set by --amplicon-max-softclip-fraction with a default of 0.1.
Off-target alignments with a soft-clipping fraction more than a threshold. The probe regions are included in the calculation and the threshold is set by --amplicon-max-offtarget-softclip-fraction with a default of 0.2.

The RNA Amplicon Pipeline uses the DRAGEN RNA Pipeline. Amplicon-specific parameters are set for fusion calling, including a fusion scoring model trained on RNA amplicon data. Small variant calling is not supported in RNA amplicon mode.

Amplicon BED File

The DRAGEN Amplicon Pipeline requires an amplicon BED file and all input files required by the DRAGEN DNA or RNA pipeline. Each row in an amplicon BED file describes an amplicon target. The fields are as follows.

Field

Description

chrom

The name of the chromosome.

chromStart

The 0-based inclusive start position of the target, excluding the primer.

chromEnd

The 0-based exclusive end position of the target, excluding the primer.

name

The name of the amplicon target.

gene

[Optional] The gene ID.

targetType

[Optional] The target type.

In copy number variant calling of DNA amplicon mode, the default segmentation mode is bed and could be modified via --cnv-segmentation-mode. The CNV segmentation bed is gene-level and auto-generated based on the gene ID column in the amplicon BED file. In small gene panels, where regions for establishing a copy-neutral baseline are limited, the targetType (CTRL vs. CNVtarget) is used to identify control regions for CNV calling. In RNA amplicon mode, targetType is used to identify fusion targets, whose targetType is Fusion. The gene IDs for fusion targets are collected and written to an output file. The default value of --rna-gf-enriched-genes is then set to this file containing fusion gene IDs. A candidate fusion is required to have both partner genes in the gene list. Base-level and read-level coverage is calculated for each region in the amplicon BED file. It is recommended that the fusion targets are commented to avoid competition with gene expression targets.

DRAGEN DNA Amplicon Settings

To use the DNA amplicon pipeline, set --enable-dna-amplicon to true. Use --amplicon-target-bed to specify the path to your amplicon BED file.

To enable small variant calling, set --enable-variant-calling to true. To enable copy number variant calling, set set --enable-cnv to true. GC bias correction when generating target counts is enabled by default. The generation of the target counts for the normal samples should also have identical command line options with the case sample under analysis. To enable structural variant calling, set --enable-sv to true. Note that amplicon assays may have limited ability to detect large structural variants (SV) due to their design characteristics and restricted target region length. The target small variant calling BED input is set to amplicon BED file by default and could be modified via --vc-target-bed. The CNV segmentation bed is auto generated based on the gene ID column in the amplicon BED file and could be modified via cnv-segmentation-bed. See CNV Targeted Segmentation (Segment BED) for more information. The amplicon pipeline can be run in either germline or somatic mode. For the somatic mode, specify a tumor-only or tumor-normal input. For more details about somatic mode, see Somatic Mode and Somatic Mode Options. In amplicon tumor-only somatic variant calling, potential germline variants can be annotated in the INFO field with the 'GermlineStatus' tag using population databases. Refer to Germline Tagging in the Tumor-Only Pipeline for details. For more information on the multicaller (germline & somatic) workflows, see Multicaller Workflows. If calling somatic small variants, we also recommend to set --vc-use-somatic-hotspots to false.

By default the maximum amplicon primer length is set to 50. You can specify a different value using --amplicon-primer-length. The parameter affects whether an alignment is assigned to an amplicon target. If an alignment starts inside the primer region of the amplicon target, the alignment is assigned to the amplicon. For a properly paired alignment, both the alignment and the mate must come from the same amplicon target. However, in order to detect deletion events that are close to the target boundaries, we now require only one of the reads to start in the primer region (--amplicon-allow-partial-target=true by default). For candidate deletions, we rewrite the CIGAR to make them candidates for columnwise detection (--amplicon-enable-deletion-realigner=true by default).

  |-- primer --|-- amplicon target --|-- primer --|
     ---------- read ----------------->
              <---------- read -----------------

The following is an example command line to run the DRAGEN DNA Amplicon Pipeline with copy number, structural variant and germline small variant calling.

dragen --enable-dna-amplicon true --enable-map-align=true --enable-sort=true --enable-map-align-output=true -r reference_genomes/Hsapiens/hg19_alt_aware/DRAGEN/8 --amplicon-target-bed=CancerHotSpot-v2.dna_manifest.20180509.bed --enable-variant-caller=true --enable-cnv=true --enable-sv=true --fastq-file1=read1.fastq.gz --fastq-file2=read2.fastq.gz --RGSM NA12878 --RGID 1 --output-directory=/staging/out --output-file-prefix=NA12878

DRAGEN RNA Amplicon Settings

To use the RNA amplicon pipeline, set --enable-rna-amplicon to true. Use --amplicon-target-bed to specify the path to your amplicon BED file.

We do not recommend enabling RNA quantification to produce the .sf quantification output files as a panel-specific GTF file is usually not used. The .target_bed_read_cov_report.bed read-level coverage output file should be used instead. This file is automatically produced when map/align is output enabled.

To enable RNA gene fusion calling, set --enable-rna-gene-fusion to true. Fusion calling parameters are automatically set in RNA amplicon mode but can be overridden in the command line. If fusion targets are not listed in the amplicon BED file, users can explicitly set --rna-gf-enriched-genes to a file containing fusion gene IDs or symbols.

The following is an example command line to run the DRAGEN RNA Amplicon Pipeline with gene fusion calling.

dragen --enable-rna-amplicon true --enable-map-align=true --enable-sort=true --enable-map-align-output=true -r reference_genomes/Hsapiens/hg19_alt_aware/DRAGEN/8 --amplicon-target-bed=Myeloid.rna_manifest.20201014.bed --enable-rna-gene-fusion=true --ann-sj-file=gencode.v19.annotation.gtf --output-format=BAM --fastq-file1=read1.fastq.gz --fastq-file2=read2.fastq.gz --RGSM Seraseq --RGID 1 --output-directory=/staging/out --output-file-prefix=Seraseq

Imbalance Ratio

The RNA amplicon pipeline includes an option for an additional metric, the imbalance ratio. This is a metric for measuring fusion events using amplicons that target the 3' and 5' ends of a gene and calculating the deviation in their coverages. The imbalance ratio for the gene is the difference between 3' and 5' coverage divided by the total counts from all genes.

To enable imbalance ratio calculation, use --amplicon-enable-imbalance-ratio=true in addition to any other amplicon pipeline arguments. If imbalance ratio is enabled, the gene and target type columns in the amplicon target BED file become mandatory. Target types can be "3prime", "5prime" or "control" (case sensitive). There can be multiple 3' or 5' targets for each gene. Any other target type (e.g., "none") will not be included in the imbalance ratio calculation, but will ba included in other amplicon metrics. Imbalance ratio results are reported in a comma-delimited file with the suffix .imbalance_ratio.csv.

DRAGEN Amplicon Panel Specific Settings

To support the varied designs of amplicon panels and the specific requirements of different analysis types (e.g., SNV, CNV, SV, MSI, RNA fusion, RNA splice variants, and RNA 3'/5' imbalance ratio), panel-specific parameter settings have been integrated into the command-line options. Each supported panel has a dedicated option, and the details for these are listed in the table below:

Panel Name

Short Name

Panel Code

Sample Type

Default variant caller enabled

Command Line Options

oncoReveal BRCA1 & BRCA2 plus CNV

BRCA CNV

BR283

DNA

SNV, CNV

--amplicon-enable-dna-brca

oncoReveal Lymphoid

Lymphoid

P-LYM-01

DNA

SNV, SV

--amplicon-enable-dna-lymphoid

oncoReveal Essential MPN

MPN

MY7

DNA

SNV

--amplicon-enable-dna-mpn

oncoReveal Fusion LBx

Fusion LBx

P-LBX-03

cfRNA

RNA fusion, RNA splice-variant

--amplicon-enable-cfrna-lbxfusion

oncoReveal Multi-Cancer RNA Fusion v2

Multi-Cancer with Fusion

SF-V2

RNA

RNA fusion, RNA splice-variant, RNA 3'/5' imbalance-ratio

--amplicon-enable-rna-multicancer

oncoReveal Multi-Cancer v4 with CNV

Multi-Cancer with CNV

HS341

DNA

SNV, CNV

--amplicon-enable-dna-multicancer

oncoReveal Myeloid

Myeloid

MY766

DNA

SNV, SV

--amplicon-enable-dna-myeloid

oncoReveal Nexus 21 Gene

Nexus

P-CMC-01

DNA

SNV, SV

--amplicon-enable-dna-nexus

oncoReveal Solid Tumor v2

Solid Tumor v2

P-ST-02

DNA

SNV

--amplicon-enable-dna-solidtumor

When a panel-specific switch is enabled, the corresponding default variant callers are automatically activated. These defaults can be overridden via the command line if needed:

Variant Callers

Command Line Options

SNV

--enable-variant-caller

CNV

--enable-cnv

--enable-sv

MSI

--amplicon-enable-msi

RNA fusion

--enable-rna-gene-fusion

RNA splice-variant

--enable-rna-splice-variant

RNA 3'/5' imbalance-ratio

--amplicon-enable-imbalance-ratio

All necessary resource files for each panel—such as the amplicon target BED file, SNV and SV systematic noise files, CNV Panel of Normals (PON), and MSI PON (if applicable)—are pre-packaged within DRAGEN. These resources are automatically detected when the corresponding panel-specific option is enabled. Users who prefer to supply custom resource files can do so through command-line options:

Resources Files

Command Line Options

SNV systematic noise file

--vc-systematic-noise

SV systematic noise file

--sv-systematic-noise

CNV Panel of Normals (PON)

--cnv-combined-counts

MSI PON

--msi-ref-normal-input

By default, CNV analysis does not use the pre-packaged Panel of Normals (PON). We recommend including in-run normal samples—matched in sample type and library preparation—in the same sequencing run to serve as the PON. If generating a custom PON is not feasible, the pre-packaged panel-specific PON can be used as a fallback. To enable this, set the amplicon-cnv-use-default-pon to true. The CNV component also utilizes the sixth column of the amplicon target BED file to identify regions annotated as "CTRL" (used for establishing baseline coverage) and "CNVtarget" (used for calling copy number variants).

PreviousDRAGEN MRD Pipeline NextDRAGEN 16S Pipeline

Last updated 2 months ago

Was this helpful?