Describes the report that can be viewed from the Summary link on the Reports tab of a completed analysis.
At the top of the report, after the app version display, is the Metrics by Sample table which provides a top-line summary of each of the analyzed samples.
The first element is a button that will trigger downloading of a FASTA-formatted file containing all consensus sequences generated across all samples.
The "Download CSV" button allows for downloading the contents of the table as a text comma-separated value (CSV) file. Note that for fields with multiple entries, these entries will be combined as a semicolon-separated list in the corresponding fields in the CSV file.
Next is the table itself, which contains one row per sample. The various genomes generated for each sample are nested as sub-rows within this row.
The table contains one row per sample and the following columns:
Sample: The name of the BaseSpace sample analyzed. The sample name is a clickable link that will take you directly to the Result Report for that sample.
Detected Amplicons (only if 'Experiment Type' is set to 'Amplicon'): The number of amplicons detected over the total number of amplicons expected for that sequence. The percentage of amplicons detected is used to to determine if the sample is sufficient quality for variant calling. See Special considerations for amplicon sequencing with IMAP protocols for more details.
Num genomes: The number of genomes chosen during the reference selection stage of the pipeline
Genomes generated: The names of each genome chosen during the reference selection stage. If the percentage of callable bases (callable bases are defined as genomic positions with read coverage above the minimum read coverage depth for consensus sequence generation, 10x by default) for a genome is below the minimum percentage of consensus sequence generated to label as confident (5% by default), the cell is highlighted in yellow to indicate that there is only marginal evidence that the indicated genome is present in the sample and should be treated with caution. For amplicon experiments, if the sample is considered to have insufficient titer for VC because the percentage of detected amplicons is below the minimum percentage required for reliable variant calling (80% by default), cells are highlighted in orange. For genomes for which a consensus sequence was generated, clicking on the name of that genome initiates a download of a FASTA file containing the consensus sequences of that genome only.
% callable bases: The percentage of the selected reference genome whose bases are considered "callable." Callable bases are those for which reliable variant calling can be performed and therefore for which the software can output a base call. Callable bases are defined as genomic positions with read coverage above the minimum read coverage depth for consensus sequence generation (10x by default). Note that genomic positions below the confidence threshold are hard-masked with "N" characters to avoid reference bias (inclusion of a reference base when the actual base cannot be accurately determined). Note that this percentage is calculated over the lengths of the reference genome(s), not the reported consensus sequence(s).
Status: The overall outcome of the analysis for this virus
Full analysis (consensus, VC) means that the sample analysis completed normally, that a sufficient number of amplicons were detected to ensure reliable variant calling (amplicon experiments only), and that the percentage of callable bases was above the minimum percentage of consensus sequence generated to label as confident (5% by default)
Low confidence means that there is at lease one callable base but the overall percentage of callable bases was below the minimum percentage of consensus sequence generated to label as confident (5% by default)
No callable bases indicates that zero positions in the indicated reference genome were callable and no consensus sequence is therefore provided.
Insufficient titer for VC will only be present for an amplicon experiment and indicates that the number of detected amplicons was below the minimum percentage (default 80%) required for reliable variant calling. See Special considerations for amplicon sequencing with IMAP protocols for more details.
Consensus FASTA: This column contains links to download a FASTA-formatted text file containing all of the consensus genomes generated for a sample. If no consensus genomes were generated for a sample, this column contains "N/A."
Input read count: The number of reads (or read pairs / clusters for paired-end samples) in the sample.
Mapped read count: The number of reads that could be mapped to any reference genome.
Unmapped reads: Displays buttons that initiate downloads of gzipped FASTQ files containing reads that could not be mapped to any reference genomes.
Raw Contigs: Displays a button that initiates a download of a FASTA file containing all contigs generated during the de novo assembly step of the pipeline. If a contig could be mapped to a reference genome the contig name contains information about the reference genome they aligned to.
This table contains the results of the Pangolin analysis performed on the generated consensus sequences across all samples. Pangolin is run if the "Enable Pangolin" box is checked on the input form and one of the following is true:
'Enrichment Panel' is set to a non-custom panel (e.g. VSP) and a consensus sequence was generated using SARS-CoV-2 as reference
'Amplicon Primer Set' is set to a non-custom set with SARS-CoV-2 as reference (e.g. SARS-CoV-2, ARTIC v5.3.2 primers) and a valid consensus sequence was generated
Either 'Enrichment Panel' or 'Amplicon Primer Set' is set to 'Custom'. In this case, Pangolin is applied to every consensus sequence generated for the sample since the software assumes all of them to be potentially SARS-CoV-2 sequences.
The Pangolin report contains a "Download CSV" button which allows the user to download the contents of the report as a text CSV file.
The table in the Pangolin report is derived from the output of the Pangolin software. Please see the Pangolin documentation for more details: https://cov-lineages.org/resources/pangolin/output.html. Sequences with a bad Pangolin QC status are highlighted in yellow.
This table contains the results of the NextClade analysis performed on the generated consensus sequences across all samples. NextClade is run if the "Enable NextClade" box is checked on the input form and one of the following is true:
'Enrichment Panel' is set to a non-custom panel (e.g. VSP) and a consensus sequence was generated using a reference with NextClade dataset available.
'Amplicon Primer Set' is set to a non-custom set with a reference with NextClade dataset available (e.g. SARS-CoV-2, ARTIC v5.3.2 primers) and a valid consensus sequence was generated.
Either 'Enrichment Panel' or 'Amplicon Primer Set' is set to 'Custom' and one or more NextClade datasets are selected under 'Custom Reference'. In this case, each of the selected NextClade datasets is applied to each consensus sequence generated for the sample. This may result in multiple NextClade results for each consensus sequence, some of which may not be meaningful (e.g. "flu_h1n1pdm_ha" dataset applied to a NA segment of an Influenza genome).
The NextClade Report contains a button labeled "Group table by" with a drop-down menu allowing the user to group the results by various fields, including "None". The default is "Dataset" which means that all of the results for each NextClade dataset will be grouped together. For example, if a user is only interested in phylogenetic analysis performed on the HA segment of Influenza A H1N1, these results for each sample can be viewed together in the "flu_h1n1pdm_ha" collapsible group.
The NextClade report also contains a "Download CSV" button which allows the user to download the contents of the report as a text CSV file.
The table in the NextClade report is derived from the output of the NextClade software. Please see the NextClade documentation for more details: https://docs.nextstrain.org/projects/nextclade/en/stable/user/output-files/04-results-tsv.html. Sequences with a bad NextClade QC status are highlighted in yellow.