🧬Custom genomes and primer sets
In addition to the built-in options, DRAGEN Targeted Microbial supports the use of custom reference genomes and primer definitions. These files must be uploaded to a BaseSpace Project before they can be used. See https://help.basespace.illumina.com/manage-data/import-data for more information about importing files into BaseSpace. These files can be used for both Enrichment and Amplicon libraries, when choosing the 'Custom' option for 'Enrichment Panel' or 'Amplicon Primer Set', respectively. Expand the 'Custom Reference' settings block to access the options for custom files. The following controls are applicable to the specified experiment type:
Custom Enrichment Panel
Custom Reference FASTA for Consensus Generation (required)
Custom Reference BED (optional)
Custom Amplicon Primer Set
Custom Reference FASTA for Consensus Generation (required)
Custom Reference BED (optional)
Custom PCR Primer Definitions (optional)
Custom genome references
The user may provide one or more reference genomes as the target for read alignment (and as the basis for generating consensus sequences). At a minimum, the user must provide a FASTA file containing the sequences of the reference genomes. The software will generate the required DRAGEN hash tables and other auxiliary files automatically, so there is no need to process the FASTA file with a separate app. Use the 'Custom Reference FASTA for Consensus Generation' control to select the previously-uploaded FASTA file containing the reference sequences.
Optionally, a genome definition BED file may also be provided, which tells the software more information about each sequence, such as a human-readable common name to be used in the reports. For multi-segment genomes such as Influenza, the genome definition file provides the segment name of each sequence and indicates that all the segments of a single genome belong together. Use the 'Custom Reference BED' control to select the previously-uploaded BED file containing the genome definition. See the following page for a description of the format of the genome definition file:
📄Genome definition file formatsCustom primer sets
For amplicon experiments, the user may optionally provide a file that defines the primer sequences or locations. The primers defined in this file are used for two purposes:
The primer binding locations are used to trim reads, which eliminates sequence data that may be contributed by the primer sequences themselves (which we do not want) from sequence data contributed by the sample (which we do want). This is important to avoid reference bias that can depress the observed allele frequency of sequence variants in primer binding sites.
The primers are matched to define the boundaries of the expected amplicons resulting from the PCR reaction. The read coverage within the unique (non-overlapping) regions of these amplicons is used to determine whether or not each amplicon is reliably observed. The fraction of observed amplicons is a function of the concentration of the sample, and is used to determine whether or not sufficient material exists within the sample to reliably and accurately call variants and generate a consensus sequence. See this page for a more in-depth discussion:
Use the 'Custom PCR Primer Definitions' control to select the previously-uploaded primer definition file. The allowed formats for this file are described here:
📄Primer definition file formatsRequired custom input based on reference type for amplicon experiments
Reference | Example | Required input | Note |
---|---|---|---|
Single non-segmented genome | Zika | Primer set | |
Single segmented genome | All 8 segments from one Influenza A genome | Primer set | |
Multiple non-segmented genomes | Multiple genomes of Zika | Reference BED, Primer set | Reference BED must be provided to make it clear that the reference sequences are not segments in the same genome. Otherwise, the pipeline will assume this is a single segmented genome (above). If multiple genomes remain after reference selection, the genome with the best per-amplicon coverage will be considered for sample filtering. |
Multiple segmented genomes | A collection of Influenza A and B genomes | Reference BED, Primer set | Reference BED must be provided to specify which sequences belong to the same genome. Otherwise, the pipeline will assume this is a single segmented genome. If multiple genomes remain after reference selection, the genome with the best per-amplicon coverage will be considered for sample filtering. |
Last updated