A: For many sample types, especially clinical, wastewater, or environmental samples, viral RNA or DNA makes up a tiny proportion of the total nucleic acids, with the remainder dominated by host or bacteria/archaea. Therefore, even with a dramatic increase of abundance over what you would obtain without targeted sequencing, the percentage of targeted reads can still be low.
A: The 10x threshold is applied per-nucleotide. Any positions below 10x coverage will be hard-masked with "N".
A: This message is to warn users that the sequence accession in the consensus genome does not necessarily reflect the true phylogeny of the organisms in the sample and should not be taken as such.
Because the app uses a limited set of reference sequences, the accession in the consensus sequence FASTA file headers (and coverage plots, etc) merely reflects the best match from that limited set. There may be sequences in RefSeq or elsewhere that are a closer match.
A: The denominator in the "Detected Amplicons" columns is based on the reference sequences selected based on de novo assembled contigs. Depending on the quality of the sample and/or reads, the assembler may not have enough data to generate a contig for some segments. Shorter segments are more likely to be missed. If only 7 segments are selected as reference for short read alignment, then we expect 7 amplicons in total. If you believe that the sample should contain all 8 segments, you can download the contig FASTA file from our report page and submit it to NCBI BLAST to see if all 8 segments are present in the contig sequences.
One known issue is that chimeric reads can be generated during library preparation, which can lead to chimeric contigs, where the contig sequence contains sequences from more than one segment. This can result in missing an entire segment in the reference selection stage. A workaround may be to filter out chimeric reads from your FASTQ files before running the app.
Alternatively, you can force the app to use all 8 segments of a particular Influenza genome by providing a custom reference FASTA file with all 8 segment sequences and a custom reference BED file with the genome
column to set to the same value (e.g. Influenza A). This way, the app skips assembly and uses all 8 segments as the reference sequences for short read alignment.
A: The "Detected Amplicons" column shows the number of detected amplicons over the total number of expected amplicons. The percentage of detected amplicons is used to infer if the sample is of sufficient quality for variant calling. The "% callable bases" column shows the percentage of the selected reference genome whose bases are at or above the minimum read coverage depth for consensus sequence generation, which is computed independent of amplicons.
Both metrics are useful to assess the quality of the sample, but the percentage of detected amplicons is used by the app after short read alignment to filter out low-titer samples and the percentage of callable bases is not.
A: Not necessarily. Your virus of interest may be present in the sample, but the app may not have generated a consensus sequence for it for various reasons.
One reason could be that there are too few reads coming from that virus. Tools like DRAGEN Metagenomics can be used to characterize what is in the sample more broadly.
Another reason could be that the virus in your sample is too divergent from the reference sequences used in the app. In such cases, we recommend downloading the contig FASTA file (if available) from our report page and submitting it to NCBI BLAST. If you do see a genome that matches your virus of interest, you can provide that to the app as a custom reference genome.
A: De novo assembly is performed only if there are multiple candidate reference genomes, which is typically when there are multiple serotypes, strains, subtypes, or clades. This currently applies to the following Amplicon Primer Set options:
Dengue Virus All Serotypes, 400-bp DengueSeq primers
Influenza A, Universal primers
Influenza B, Universal primers
Influenza A and B, Universal primers
Mpox All Clades, 2500-bp ARTIC-INRB v1 primers
Respiratory Syncytial Virus (RSV), CDC primers
Respiratory Syncytial Virus (RSV), WCCRRI primers
If a custom reference FASTA file is provided, assembly is performed if there are multiple sequences in the file. If a custom reference BED fils is also provided, assembly is performed if based on the BED file there are multiple genome-segment pairs (or multiple non-segmented full genomes). Otherwise, all sequences in the custom reference FASTA file are used as reference for short read alignment.
A: In most cases, the consensus sequence FASTA file. Contig sequences are useful if the reference sequences used for consensus sequence generation were not the best match. They should be used with caution however because there is no filtering of base calls based on coverage or quality as done in consensus sequence generation.
A: It is most likely that the custom database was not formatted correctly. Below are requirements for the Custom Reference FASTA For Consensus Generation:
Do not use Spaces in the file name, instead use an underscore "_"
Do not exceed 25 characters in the file name
File extension must be .fasta or .fa
Do not exceed the file size limitation: 16GB for a single file or 25GB for multiple files
Do not have duplicate entries
If providing a Custom Reference BED and/or Custom PCR Primer Definitions in BED format, the names in the first column of the BED file (accession
) must match the names that appear in the FASTA (text after >
and before the first whitespace character).
Please see this page on general guidelines to upload data to BaseSpace for more details. If you continue having issues, reach out to techsupport@illumina.com.