📄PCR primer definition
Usage
If provided, primer definition file are used to:
Trim primer sequences from input reads to minimize artifacts introduced by primer sequences (e.g. false positive variant called from a mismatch between primer and reference sequences)
Compute per-amplicon coverage to infer if there is sufficient sample titer for variant calling and consensus sequence generation
Format
If primer coordinates are known (recommended)
A BED-like tab-separated value (TSV) file with no header row, with each row corresponding to a primer, and with 7 columns:
accession
: reference sequence accessionstart
: primer start positionend
: primer end positionprimerName
: unique primer name that includes amplicon name and direction tag (e.g._LEFT
)pool
: primer poolstrand
: primer direction (+
or-
)sequence
: primer sequence
While it is highly recommended to provide all 7 columns, the last three columns (pool
, strand
, sequence
) are optional if both of the following conditions are met:
All amplicons have exactly one left primer and one right primer (i.e. no alternative primers). This is inferred from the
primerName
column.At least one entry in
accession
column appears in custom reference FASTA.
If at least one of the conditions are not met, the pipeline attempts to align the primer sequences against all reference sequences and define primer coordinates on the fly, ignoring the coordinates provided in the BED file. It is therefore highly recommended to provide all 7 columns with all the above conditions met.
Column order must be maintained. If you wish to provide the strand
column, for instance, all 5 columns before it must be provided.
Example 1: 5-column format with columns accession
, start
, end
, primerName
, pool
seqX 0 15 primer1_LEFT 1
seqX 1745 1760 primer1_RIGHT 1
seqY 0 15 primer2_LEFT 2
seqY 1015 1030 primer2_RIGHT 2
Example 2: 7-column format with columns accession
, start
, end
, primerName
, pool
, strand
, sequence
seqX 0 15 primer1_LEFT 1 + GGGCAAACCTAAAGG
seqX 1745 1760 primer1_RIGHT 1 - GTTATGTAAAGGTGC
seqY 0 15 primer2_LEFT 2 + GGGCGAAACTAAAGG
seqY 1015 1030 primer2_RIGHT 2 - GTTATGTAAAGGTGC
If primer coordinates are unknown
Options below should be used if primer coordinates are not known but primer sequences are expected to map well to reference sequences with minimal mismatches. Using either option triggers the pipeline to align the primer sequences against all reference sequences in custom reference FASTA and define primer coordinates on the fly.
Option 1. One line per amplicon with 3 columns: ampliconName
, forwardSequence
,reverseSequence
.
amplicon1 GGGCAAACCTAAAGG GTTATGTAAAGGTGC
amplicon2 GGGCGAAACTAAAGG GTTATGTAAAGGTGC
Option 2. One line per primer with 3 columns: primerName
, sequence
, pool
.
primer1_LEFT GGGCAAACCTAAAGG 1
primer1_LEFT_alt GGGCGAAACTAAAGG 1
primer1_RIGHT GTTATGTAAAGGTGC 1
Guidelines
General
All text is case sensitive.
Any line starting with '#' is ignored. This can be used to add a header line with column names.
Every line must have the same number of columns and format (except those starting with '#').
Any number of spaces can separate the columns. A value within a single column should not have any space.
BED format
Per standard BED conventions, sequence coordinates are given as 0-based, half-open intervals, such that the
start
field (2nd column) contains the first nucleotide in the primer binding site and the last nucleotide in the primer binding site is the value in theend
field (3rd column) minus 1.accession
field must contain a sequence identifier that matches the header of the FASTA file containing the sequence that the coordinates are relative to.Multiple sequence identifiers (
accession
) are permitted within one file.
Primer name
primerName
must be unique and encode the name of the amplicon for which the primer is designed, the direction tag indicating which side of the amplicon, left or right, the primer belongs to, and an optional indicator that the primer is an alternative primer for that amplicon.In addition to
_LEFT
and_RIGHT
, we permit_L
and_R
as direction tags inprimerName
. Any text after the direction tag should be separated by an underscore.Text in
primerName
before the direction tag is considered to be an amplicon identifier. Ensure that the text of the amplicon identifier is unique for that amplicon and that the direction tag occurs only once inprimerName
.Each amplicon must have at least one left and right primer (including alternative primers) associated with it.
Alternative primers are used to bind to locations that avoid sequence variation in the default primer binding site that may disrupt hybridization. An amplicon may have an arbitrary number of alternative primers (as long as the primer name is unique), but most amplicons will have none. Alternative primers are indicated by the presence of the
_alt
after the direction tag inprimerName
, followed by optional text to distinguish between different alternative primers, such as a number.Examples of valid primer names:
MY_SEQUENCE_434_A_LEFT
virus1_L
amplicon_4934m_RIGHT_alt
amplicon_4934m_RIGHT_alt1
amplicon_4934m_R_altprimerB
Examples of invalid primer names:
LEFT_MY_SEQUENCE_434_A
virus1_l
amplicon_4934m_RIGHT_L
Last updated
Was this helpful?