🧬Custom reference

In addition to the built-in options, DRAGEN Microbial Amplicon supports the use of custom reference genomes and primer definitions. These files must be uploaded to a BaseSpace Project before they can be used. See https://help.basespace.illumina.com/manage-data/import-data for more information about importing files into BaseSpace.

In the app input form, select the 'Custom' option for 'Amplicon Primer Set'. Then expand the 'Custom Reference' settings to provide the following:

  • Custom Reference FASTA for Consensus Generation (required)

  • Custom Reference BED (optional)

  • Custom PCR Primer Definitions (optional)

In general, keep the file names short (<25 characters) and space-less ( use underscores (_) or hyphens (-) instead of space). Please see this page on general guidelines to upload data to BaseSpace for more details.

Custom reference FASTA

If the 'Custom' option is selected for 'Amplicon Primer Set', the user must provide a custom FASTA file containing one or more reference sequences as the target for read alignment (and as the basis for generating consensus sequences). See the following page on the format of this file:

📄Reference FASTA

Custom reference BED

Optionally, a reference BED file may be provided to add information about each reference sequence in the FASTA file, such as human-readable names to be used in the reports. For multi-segment genomes such as Influenza, this file assigns the segment name to each sequence, which allows the software to group individual segment sequences by genome. See the following page on the format of this file:

📄Reference BED

Custom PCR primer definitions

Optionally, a TSV file may be provided to define the primer sequences or binding locations, which are used for two purposes:

  1. Primer sequences are trimmed from reads, which eliminates sequences that may come from the primer sequences themselves (which we do not want) from sequences contributed by the biological sample (which we do want). This reduces reference bias that can incorrectly lower the observed allele frequency of true sequence variants in primer binding sites.

  2. Primer locations are used to define the amplicons expected from PCR reactions. The read coverage within the unique (non-overlapping) amplicon regions is used to determine whether each amplicon is reliably detected. The percentage of detected amplicons is used to determine whether sufficient material exists to accurately call variants and generate consensus sequences from the sample.

See the following pages for further information:

Special considerations for amplicon detection📄PCR primer definition

Nextclade datasets

Optionally, one or more Nextclade datasets can be selected to use for phylogenetic analysis of the consensus sequences generated from the samples. Every selected dataset will be applied to every consensus sequence generated in every sample. See here for more information on Nextclade.

Last updated

Was this helpful?