1 of 61

Illumina Infectious Disease Software

Illumina Infectious Disease and Microbiology Software

This page provides an overview of the software available on Illumina's cloud platforms

Infectious Disease and Microbiology software include powerful bioinformatics tools to analyze NGS data ranging from single microbial genomes to complex microbial communities of thousands of viruses, bacteria, parasites, and fungi. This comprehensive secondary analysis suite of tools supports target specific workflows such as amplicon and hybrid capture enrichment sequencing, to generalized microbiology methods like small WGS, shotgun sequencing, or 16S Amplicon. All tools are available on BaseSpace, with some available on On-board select Illumina Sequencers.

Click the links below to learn more about our currently-available infectious disease software products:

DRAGEN Microbial Amplicon

DRAGEN Microbial Amplicon App Documentation

Overview

DRAGEN Microbial Amplicon is a software application designed to analyze sequencing data from amplicon library preps (both DNA and RNA) on microbiological samples, with an emphasis on viruses. Illumina sequencing reads are processed to generate consensus sequences that represent a best estimate of the population of viral sequences in each sample. Where appropriate, these consensus sequences are further analyzed by the phylogenetic analysis tools Nextclade and/or Pangolin to provide an identification of the clade or lineage of each sequence.

How to start

Launch the DRAGEN Microbial Amplicon BaseSpace application.
Choose the analysis name and destination project to save results to.
Choose either Biosample or Project as input method. Selecting Project will result in all biosamples in the selected project being analyzed.
Choose the appropriate Amplicon Primer Set that matches the primer design used to prepare your samples or choose Custom to provide your own. See this to learn more about the custom option.
If needed, uncheck the appropriate boxes to disable Pangolin and Nextclade analyses. These tools perform phylogenetic analysis and lineage assignment for SARS-CoV-2 (Pangolin) and other viruses (Nextclade). Depending on the chosen Amplicon Primer Set, these tools may not be applicable.
If needed, expand the Advanced Workflow Settings box to change default settings. Click on the "i" circle next to each setting for more information.
If needed, expand the Additional DRAGEN Command Line Arguments to provide additional arguments to default DRAGEN commands.
Click “Launch Application"

Custom reference

In addition to the built-in options, DRAGEN Microbial Amplicon supports the use of custom reference genomes and primer definitions. These files must be uploaded to a BaseSpace Project before they can be used. See for more information about importing files into BaseSpace.

In the app input form, select the 'Custom' option for 'Amplicon Primer Set'. Then expand the 'Custom Reference' settings to provide the following:

Custom Reference FASTA for Consensus Generation (required)
Custom Reference BED (optional)

Reference FASTA

This is a required input file if the 'Custom' option is selected for 'Amplicon Primer Set'. This file contains one or more reference sequences as the target for short read alignment and as the basis for generating consensus sequences, but not all provided sequences may be used.

The file can contain sequences from multiple sources. For example, it can contain sequences of multiple segments from different Influenza genomes. In such cases, providing a corresponding Custom Reference BED file is highly recommended to inform the app which sequences should be belong together.

The app generates the required DRAGEN hash tables and other auxiliary files automatically, so there is no need to process the FASTA file with a separate app.

Guidelines

File name
- Keep the file name short (<25 characters).
- File extension must be .fasta or .fa (e.g. reference.fasta).
- Avoid using spaces and use underscores (_) or hyphens (-) instead (e.g. reference_01.fasta)
Header
- Header must start with > and contain a unique sequence name. If there is any space in the FASTA header, the part before the first space is assumed to be the sequence name.
- Use the following in sequence names: alphabets, numbers, underscore (_), hyphen (-), parentheses ((,)
Sequence
- Sequences must consist of the following characters: A, C, T, G, N, R, Y, S, W

Reference BED

A BED-like tab-separated value (TSV) file with no header row and with 4 or 5 columns:

accession: each sequence accession as it appears in Custom Reference FASTA header
start: start position (always set to 0)
end

PCR primer definition

Usage

If provided, primer definition file are used to:

Trim primer sequences from input reads to minimize artifacts introduced by primer sequences (e.g. false positive variant called from a mismatch between primer and reference sequences)

Understanding the BaseSpace reports

Once the analysis completes, the "REPORTS" tab on BaseSpace enables users to view the Summary of the entire analysis, which summarizes results from all input samples, as well as individual Sample Report for each sample.

📄Summary 📄Sample Report

Sample Report

The Sample Report contains at most four tabs: Sample QC, Virus Metrics, Nextclade Report, and Pangolin Report.

Sample QC

This tab contains tables and plots summarizing the sample.

Sample Summary Metrics

This table reports summary metrics for the sample, such as Status and Detected Amplicons. See for their definitions.

Pre-processing Metrics

This plot displays counts of reads that fall into different categories. See for their definitions.

Sequence Alignment

This plot displays the number of reads that mapped to each reference sequence. If there is a single reference sequence (e.g. SARS-CoV-2), one bar is shown.

Sequence Alignment Metrics

This table provides the number of reads that mapped to each reference sequence along with the genome and segment names of the reference sequence. The "Download CSV" button enables downloading the contents of the table as a text comma-separated value (CSV) file.

Virus Metrics

Metrics by Virus

This table summarizes results for each viral genome generated in the sample with each row corresponding to a single viral genome. For segmented viruses like Influenza, a row will summarize information across multiple sequences generated for a single viral genome.

At the top is the "Download CSV" button, which enables downloading the contents of the table as a text comma-separated value (CSV) file.

The table itself contains rows for every viral genome with at least one sequence generated in the sample with the following columns:

Virus: Name of the viral genome
- For custom references, this will be the part of the FASTA header before the first whitespace character for the corresponding reference sequence if no custom genome definition file is provided. If a custom genome definition file is provided, this will be the value of the genomeName column
% Callable: Percentage of bases in the reference sequence with coverage above the minimum read coverage depth for consensus sequence generation (10x by default). This is computed across all sequences belonging to the viral genome. See for more information.

Metrics by Sequence

This table summarizes the results for each sequence generated in the sample. For segmented viruses like Influenza, there are typically multiple rows with the same virus name. Otherwise, this table contains similar information as the Metrics By Virus table.

At the top is the "Download CSV" button, which enables downloading the contents of the table as a text comma-separated value (CSV) file.

Virus: Name of the virus genome
Segment: Name of the segment to which the reference sequence corresponds. For non-segmented viruses, this is typically set to "Full".
Accession: Unique identifier of the reference sequence (text before first space in FASTA header if custom reference FASTA was provided)
% Callable: Percentage of bases in the reference sequence with coverage above the minimum read coverage depth for consensus sequence generation (10x by default). See

Consensus Coverage

Displays a trace of read coverage over each reference genome. On the top right is a drop-down menu that allows users to switch between genomes. The blue line represents the read coverage, with the coverage depth in log 10 of number of reads on the y-axis and the genomic position in the reference genome on the x-axis.

For segmented viruses like Influenza, coverage values for each segment is displayed in a horizontally stacked fashion. Grey blocks at the top show their boundaries.

Nextclade Report (optional)

This tab contains tables reporting the results of the Nextclade analysis performed on the generated consensus sequences in the sample. See for more details.

Pangolin Report (optional)

This tab contains tables reporting the results of the Pangolin analysis performed on the generated consensus sequences in this sample. See for more details.

Special considerations for amplicon detection

Amplicon sequencing, especially of RNA viruses, requires additional bioinformatics processing to ensure maximum quality of the resulting data.

In RT-PCR, a reverse transcriptase enzyme first generates cDNA molecules using the RNA molecules in the sample as templates, before amplifying the cDNA sequences using a DNA polymerase enzyme during PCR. These amplified cDNA sequences are then further processed to generate the sequencing libraries. Both of these enzymes can potentially introduce an incorrect base into a sequence, generating a position where the resulting sequence does not match the sequence in the sample -- that is, an error.

Reverse transcriptases exhibit that are multiple orders of magnitude higher than .

When large numbers of nucleic acid molecules are present in a reaction, these individual misincorporation errors are largely uncorrelated and appear at very low frequencies, so that they are typically ignored by variant callers.

However, when there is a small number of incoming nucleic acid molecules, such as for a low-titer sample, an error that occurs during the RT step or early in the PCR reaction can, as a result of sampling noise, be amplified to high frequencies in the resulting sequencing libraries. The variant caller may treat this error as a sequence variant, since it is a true sequence variant in the context of the library provided to the instrument. As a result, these artifactual sequence variants often have high allele frequency and quality scores, which makes them very difficult to detect, and appear in the final consensus sequence. While less common, it is also possible for a true sequence variant to have its allele frequency depressed by this same process (if the error results in a reversion to the reference sequence).

DRAGEN Targeted Microbial

How to set up and run an analysis

Launch the DRAGEN Targeted Microbial BaseSpace application.
After choosing a name and destination project for the Analysis, choose either “Biosample” or “Project” as input type. Selecting “Project” will result in all biosamples in the selected project being analyzed.
Next, choose between Enrichment and Amplicon for Experiment Type. Libraries prepared with IMAP should be run as “Amplicon” experiments. Choose the appropriate Amplicon Primer Set that matches the primer design used to prepare your samples or choose “Custom” to provide your own genome references and primer designs. Note that all provided files must first be uploaded to a BaseSpace project before they can be selected in the software.
To use a custom reference and primer design, click the “Custom Reference” block to expand it.
At a minimum, the user must provide a custom genome reference containing one or more target sequences (to be used for alignment, variant calling and consensus generation) in the form of a FASTA file.
1. Optionally, the user may provide a BED file that assigns human-readable names and segment numbers (if applicable) to each sequence in the provided FASTA file. Note that the accessions in the genome definition file must match the first part (before whitespace) of the FASTA headers. See the pages for “Genome Definition File Format Specification” in the “Supporting Information” section of this user guide for information on the required format of this file.
2. Optionally, the user may provide a file containing the locations or sequences of the primers used to prepare this sample. These primer definitions are important to guide the trimming of primer sequence from reads that overlap the binding sites, as well as to define the boundaries of the amplicons whose coverage is used to determine if the sample has sufficient viral material to reliably call variants and generate consensus sequence.
Check the appropriate boxes to enable or disable Pangolin and/or NextClade analysis if desired. These tools perform phylogenetic analysis and lineage assignment for SARS-CoV-2 (Pangolin) and other viruses (NextClade). Depending on the chosen Amplicon primer set, not all of these options may be applicable.
Click “Launch Application” to begin the Analysis.

Genome definition file formats

A BED-like tab-separated value (TSV) file with no header row, consisting of the following columns:

chrom: each sequence name as it appears in Custom Reference FASTA
chromStart: start position (always set to 0)
chromEnd

Understanding the BaseSpace Reports

Brief description of Summary and Result reports and an explanation of their contents

The app produces a summary report as well as result reports for each of the samples analyzed. See the links below for a description of each.

📄Summary Report 📄Result Reports

Special considerations for amplicon sequencing with IMAP protocols

Amplicon sequencing, especially of RNA viruses, requires additional bioinformatics processing to ensure maximum quality of the resulting data.

Reverse transcriptases exhibit that are multiple orders of magnitude higher than .

However, when the number of incoming nucleic acid molecules is small, such as for a low-titer virus sample, an error that occurs during the RT step or early in the PCR reaction can, as a result of sampling noise, be amplified to high frequencies in the resulting sequencing libraries. When the variant caller encounters such a position, it will be treated as a sequence variant, since it is a true sequence variant in the context of the library provided to the instrument. As a result, these artifactual sequence variants often have high allele frequency and very good quality scores, which makes them very difficult to detect and remove. This can result in a false positive variant call that, at a sufficiently high allele frequency, will also be incorporated into the consensus sequence. It is also possible for a true sequence variant to have its allele frequency depressed by this same process (if the error results in a reversion to the reference sequence), but this is much less common.

Known issues

For version 1.1 and later, consensus FASTA files generated for each sample, virus, and reference sequence incorrectly contain soft-masked sequences instead of hard-masked sequences. To get hard-masked sequences, use {sample_name}.consensus_hard_masked_sequence.fa or convert lowercase nucleotides to "N".

DRAGEN Microbial Enrichment Plus

Custom reference FASTA and BED files

Custom reference FASTA file:

A custom reference FASTA file containing one or more reference sequences is required to run the custom reference sequence analysis. In the FASTA file, sequence names must be unique and should not contain any spaces. If there is any space in the FASTA header, the part before the first space is assumed to be the sequence name. It is recommended to use only the following in sequence names: alphabets, numbers, underscore (_), hyphen (-), parentheses ((,)), and period (.). Otherwise, the sequence names may appear different in the output. An example custom reference FASTA file is provided in the link below.

To upload a custom reference FASTA file, go to the "Projects" tab and click on the folded paper icon (representing File) to reveal a dropdown menu. Click on "Upload" and select "Files". Within the upload page, select "Other" format for FASTA files, and upload the file as a Biosample. Within the DRAGEN Microbial Enrichment Plus app, under "Custom panel specification" use the "Custom reference FASTA for consensus generation" control to select the uploaded FASTA file.

Custom reference BED file (optional):

Optionally, a custom reference BED file may also be provided. Sequence names must match between the FASTA file and BED file, and the same set of sequences must appear in both files. If there are multiple viruses, their names should be unique. For example, if there are multiple Influenza genomes, they should not be labeled with the same virus name in the 4th column.

The BED file controls how sequences are grouped and labeled in the output. If the custom reference FASTA file includes sequences from multiple segments of a viral genome, it is recommended to provide a BED file so that the segments are included under the results of that microorganism.

The BED file must be tab-delimited with at least 4 columns:

chrom: the sequence name as it appears in the FASTA
chromStart: start position (always set to 0)
chromEnd: end position (sequence length)
genomeName: name of the genome, target, or microorganism the sequence belongs to (e.g. Monkeypox virus clade II)

Example custom reference BED file:

To upload a custom reference BED file, go to the "Projects" tab and click on the folded paper icon (representing File) to reveal a dropdown menu. Click on "Upload" and select "Files". Within the upload page, select "Other" format for BED files, and upload the file as a Biosample. Within the DRAGEN Microbial Enrichment Plus App, under "Custom panel specification" use the "Custom reference BED (optional)" dropdown to select the uploaded BED file.

Pangolin custom analysis behavior:

For Custom Panel analyses, Pangolin is enabled and will run on custom reference sequences with at least 3% coverage that meet these naming conventions:

If only a FASTA file is provided, Pangolin will run on sequences that have a header containing either SARS-CoV-2 or NC_045512
If both a FASTA and BED file are provided, Pangolin will run on sequences where the first column (chrom) contains NC_045512 or the fourth column (genomeName) contains SARS-CoV-2

Nextclade custom analysis behavior:

For Custom Panel analyses, Nextclade is disabled and will not be run. Do not enable Nextclade.

Output files

Note: Some files may not be generated depending on the selected analysis options and analysis results

Sample-level output files

Filename

Type

Test information

Enrichment panel

RVOP/RVEK

Abbreviation

Definition

VSP

Abbreviation

Definition

Custom Panel

Abbreviation

Definition

Scientific evidence

Illumina Respiratory Pathogen ID/AMR Enrichment Panel Kit (RPIP)

Application note:

Technical note:

DRAGEN 16S Plus

DRAGEN 16S Plus App Documentation

Overview

Summary:

The DRAGEN 16S Plus application is a rapid, kmer-based informatics solution designed for microbial classification and community profiling from mixed flora and metagenomic sample types. The app delivers easy-to-use, powerful secondary analysis of Illumina 16S sequencing data, with workflows for read QC (optional), taxonomic classification, result filtering (optional), and reporting. It also supports custom database analysis.

Input files:

FASTQ files
(if applicable)

Demo Data:

The includes 21 samples prepared using the . An example custom reference sequence FASTA file is also included.

Analysis Pipeline:

Read QC (optional)
Taxonomic classification
Result filters (optional read count threshold)
Reporting

Output files:

Analysis-level outputs: XLSX, CSV, HTML, ZIP
Sample-level outputs: JSON, CSV, HTML, TXT.GZ

Important Notes:

DRAGEN 16S Plus is a secondary analysis tool for research use only. Further interpretation, statistical analysis, and downstream analysis of results may be necessary.

For questions on this application please contact Illumina Technical Support at [email protected].

For Research Use Only. Not for use in diagnostic procedures.

How to set up and run an analysis

Launch the DRAGEN 16S Plus BaseSpace app, which can be found in the "Dragen" and "Infectious Disease + Microbiology" app collections.
Enter a name for the Analysis.
Choose either "Project" or “Biosample list” as input type. When a Project is selected, the app will attempt to find all FASTQ files in that Project and run analyses on them.

Custom database FASTA file format

Custom database FASTA files:

A custom database FASTA file containing up to 500 million basepairs of reference sequence may be specified using the exact FASTA header format defined below. In the FASTA file, the SequenceID should not contain any spaces. All sequences must have seven canonical taxonomic rank prefixes specified: k__;p__;c__;o__;f__;g__;s__. However these can all be left blank except for (k)ingdom and (s)pecies designations, which are required.

To upload a custom database FASTA file, go to the "Projects" tab and click on the folded paper icon (representing File) to reveal a dropdown menu. Click on "Upload" and select "Files". Within the upload page, select "Other" format for FASTA files, and upload the file as a Biosample. Within the DRAGEN 16S Plus app, under "Custom database specification" use the "Custom reference for taxonomic classification" control to select the uploaded FASTA file.

Understanding the BaseSpace HTML reports

Summary results

Sample Information

The Sample Information table summarizes sample QC metrics for all samples in the analysis. Further details on each metric can be found by hovering over each column header.

Pipeline logic

Pipeline steps

Step

Description

Notes

Read QC

Low-quality bases are trimmed from the ends of each read. After trimming, the read is discarded if fewer than 50% of its bases have a quality score greater or equal to q20, the read is shorter than 32 bp, or the read has 5 or more ambiguous bases. For paired-end data, both read1 and read2 must pass QC filtering for the read pair to be used for analysis. It is assumed that appropriate adapter trimming has already been performed.

Optional

Test information

For Research Use Only. Not for use in diagnostic procedures.

Release notes

DRAGEN 16S Plus app version 1.0.0

Initial release.

Component versions

Reference database: (full-length NCBI RefSeq 16S rRNA database supplemented by RDP)

Frequently Asked Questions (FAQ)

Q: After processing my sample with an enrichment panel, the majority of my reads are removed in preprocessing and/or I only have a small amount of viral reads. Is the enrichment panel working?

A: The enrichment protocols can create a several thousand fold increase in the abundance of the targeted viral species. However, it is important to keep in mind that in many sample types, especially clinical, wastewater, or environmental samples, viral RNA or DNA makes up a tiny proportion of the total nucleic acids present, with the remainder dominated by host (human) or bacteria/archaea. So even with a dramatic enrichment of abundance over what you would obtain without enrichment, the percentage of viral reads can still be low. E.g. you may have a sample with only 2% viral reads, but without enrichment you might have only obtained 0.1% viral reads. If it is low abundance after enrichment, it is likely extremely low abundance prior to enrichment.

Q: The contig default min coverage is 10x, but is that across the entire contig, median, average.. or?

A: The 10x threshold is applied per-nucleotide. Any positions below 10x coverage will be hard-masked with "N".

Q: In the demo data, I downloaded the consensus sequence FASTA file and each sequence line would say “Panel reference sequences are not necessarily comprehensive and should not be used for strain typing.” Does that mean even though VSP provide full-genome resolution of all 66 viruses, the app can only strain-type the strains listed because of the reference sequences the app uses?

A: Correct, we only align to a limited number of reference sequences for each virus type, so the sequence accession in the consensus genomes (and coverage plots, etc) merely reflects the best match chosen from that subset. There could be sequences in RefSeq that are a closer match. Furthermore, strain typing is not necessarily as simple as choosing the closest matching genome; there are further complexities that can go into it, and we have not systematically developed or tested any strain typing capability to date. The noted message is to warn users that the sequence accession in the consensus genome does not necessarily reflect the true phylogeny of the organisms in the sample and should not be taken as such.

Q: Why am I seeing some segments from one Influenza A subtype and some from another subtype? Does my sample contain both subtypes?

A: For each de novo assembled contig, we aim to find the best matching reference sequence rather than an entire reference genome. If the best match for one contig is a reference sequence from one subtype and the best match for another contig is a reference sequence from another subtype, then we will report them as such. This is not necessarily indicative of a mixed infection, reassortment, or error. It is usually reflective of how similar certain segments can be across different subtypes.

Influenza A viruses are classified into different subtypes based on the hemagglutinin (HA) and neuraminidase (NA) proteins, which are encoded by segments 4 and 6, respectively. Therefore, we recommend focusing on those segments to infer the subtype. If there is a sequence generated from segment 8 of an H3N2 genome but all the rest of the consensus sequences are generated from reference sequences from an H1N1 genome (indicating H1 and N1 subtypes), then the sample likely contains H1N1, not H3N2. One possible explanation is that segment 8 from H1N1 and segment 8 from H3N2 were both good matches for a particular contig but the one from H3N2 was a slightly better match and therefore chosen as final reference. Similarly, if there is a sequence generated from segment 4 of an H1N1 genome (indicating H1 subtype) and a sequence generated from segment 6 of an H5N1 genome (indicating N1 subtype), then the sample likely contains H1N1, not H5N1.

Q: Why does the "Detected Amplicons" column show 7 in the denominator for an Influenza genome when there should be 8 amplicons in total?

A: The denominator in the "Detected Amplicons" columns is based on the reference sequences selected based on de novo assembled contigs. Depending on the quality of the sample and/or reads, the assembler may not have enough data to generate a contig for some segments. Shorter segments are more likely to be missed. If only 7 segments are selected as final reference for short read alignment, then we expect 7 amplicons in total. If you believe that the sample should contain all 8 segments, you can download the contig FASTA file from our report page and submit it to to see if all 8 segments are present in the contig sequences.

One known issue is that chimeric reads can be generated during library preparation, which can lead to chimeric contigs, where the contig sequence contains sequences from more than one segment. This can result in missing an entire segment in the reference selection stage. A workaround may be to filter out chimeric reads from your FASTQ files before running the app.

Alternatively, you can force the app to use all 8 segments of a particular Influenza genome by providing a custom reference FASTA file with all 8 segment sequences and a custom reference BED file with the genomeName column to set to the same value (e.g. Influenza A). This way, the app will not perform assembly and use all 8 segments as the reference sequences for short read alignemnt.

Q: What is the difference between the "Detected Amplicons" and "% callable bases" columns?

A: The "Detected Amplicons" column shows the number of amplicons detected over the total number of amplicons expected for that genome. The percentage of amplicons detected is used to infer if the sample is of sufficient quality for variant calling. The "% callable bases" column shows the percentage of the selected reference genome whose bases are above the minimum read coverage depth for consensus sequence generation, which is computed independent of amplicon coordinates. Both metrics are useful to assess the quality of the sample, but the percentage of detected amplicons is used by the app after short read alignment to filter out low-titer samples and the percentage of callable bases is not.

Q: I don't see the virus I'm interested in listed in the "Genomes generated" column. Does that mean the virus is not present in my sample?

A: Not necessarily. Your virus of interest may be present in the sample, but the app may not have generated a consensus sequence for it for various reasons. One reason could be that there are too few reads coming from that virus. Tools like DRAGEN Metagenomics can be used to characterize what is in the sample more broadly. Another reason could be that the virus in your sample is too divergent from the reference sequences used in the app. In such cases, we recommend downloading the contig FASTA file from our report page and submitting it to . If you do see a sequence that matches your virus of interest, you can provide that sequence to the app as a custom reference genome.

Q: I am using the custom analysis workflow and my analysis aborted or shows an error, why?

A: While there may be quite a few causes for the analysis to fail, some of the most common cases are that the custom database was not formatted correctly. Below are requirements for the Custom Reference FASTA For Consensus Generation:

Do not use Spaces in the file name, instead use an underscore "_"
Do not exceed 25 characters in the file name
File extension must be .fasta or .fa
Do not exceed the file size limitation: 16GB for a single file or 25GB for multiple files

Output files

Note: Some files may not be generated depending on user inputs and pipeline outcome

For each sample, the pipeline generates a directory named after the sample. This directory contains the following subdirectories:

consensus/
- {sample_name}_sample_consensus.fasta : Contains all hard-masked consensus sequences for the sample. Regions with coverage below minimum coverage depth for consensus sequence generation (10x by default) are considered not callable and are therefore “hard-masked” with letter N. Variant calling is not applied to these regions. If the user selected specific VSP or RVOP organisms to be reported, this file excludes consensus sequences that are generated but do not belong to the selected organisms.
- {sample_name}.consensus_hard_masked_sequence.fa : Identical to {sample_name}_sample_consensus.fasta, except for sequence headers. Moreover, even if the user selected specific VSP or RVOP organisms to be reported, this file contains all consensus sequences, including those that do not belong to the selected organisms
- {sample_name}.consensus_soft_masked_sequence.fa : Identical to {sample_name}.consensus_hard_masked_sequence.fa, except low-coverage regions are “soft-masked” with lower-case letters that match the reference. Variant calling is not performed in these regions.
- {sample_name}{virus_name}_virus_consensus.fasta : Contains hard-masked consensus sequences generated for a particular virus. If the virus is not segmented (i.e. one reference sequence for the virus), this file contains a single sequence and is identical to {sample_name}{virus_name}{segment_name}{sequence_accession} consensus.fasta.
- {sample_name}_{virus_name}_{segment_name}_{sequence_accession}_consensus.fasta : Contains a hard-masked consensus sequence generated with a particular reference sequence.
- {sample_name}.consensus_from_vcf.log : Log file generated during consensus sequence generation.
contig/
- {sample_name}_sample_contig.fasta : Contains all de novo assembled contigs generated for the sample.
- {sample_name}_unmapped_contig.fasta : Contains de novo assembled contigs that could not be mapped to any reference sequence. Because de novo assembly is reference-free, these contigs may correspond to sequences that are too diverged from those in the reference database or sequences not included in the database.
map_align/
- {sample_name}_unfiltered_tumor.bam or {sample_name}_unfiltered_tumor_primertrim_with_unmapped_reads.bam : Short reads mapped to all selected reference sequences. If a primer set is available and properly mapped to the reference sequences, {sample_name}_unfiltered_tumor_primertrim_with_unmapped_reads.bam is provided as output. Its reads have primer sequences trimmed based on primer binding site coordinates.
variant_calling/
- {sample_name}.consensus_filtered_variants.vcf.gz : Contains variant calls that passed consensus filter and were therefore applied to consensus sequences. They are a subset of variants listed in {sample_name}.hard-filtered.vcf.gz.
- {sample_name}.consensus_filtered_variants_vcf_stats.txt : Summarizes all variant calls in {sample_name}.consensus_filtered_variants.vcf.gz. Outputted by bcftools stats.
metrics/
- {sample_name}_num_reads.tsv : Reports number of input reads, reads filtered out at each pre-processing step, reads mapped to each selected reference sequence, etc.
- {sample_name}_metadata.json : Reports parameters, read counts, amplicon counts, analysis results, and other metadata.
amplicon/
- {sample_name}_processed_non_overlapping_amplicon.bed : Lists all non-overlapping amplicon regions (i.e. covered with exactly one amplicon). If a custom primer set is provided, this file also lists selected reference sequences lacking amplicons. While amplicons are defined based on primer binding sites, for viruses like Influenza, reference sequences often lack primer binding sites, which are located at sequence ends. This results in defining fewer or sometimes no amplicons for an entire viral genome. To avoid this, each reference sequence without any amplicons defined is considered an amplicon and is listed in this file. All regions in this file are used for amplicon detection to infer sample concentration and determine if it is sufficient to apply variant calling and consensus sequence generation.
tertiary/
- nextclade_{sample_name}_{sequence_accession}_{dataset_name}.csv : CSV output file generated by NextClade given a consensus sequence (generated with the specified sequence as reference) and a NextClade dataset.
- nextclade_{sample_name}_{sequence_accession}_{dataset_name}.tsv : Same as nextclade_{sample_name} _{sequence_accession}_{dataset_name}.csv except in TSV format.
reference/
- reference.bed : Describes all reference sequences. If a custom reference was provided, sequence names may appear different in this BED file.
- reference.json : Same as reference.bed but with more detail. If any of the sequences were renamed during the pipeline, this file provides the mapping between the original and renamed versions.

Result Reports

Describes the reports that can be viewed from the individual sample links on the left side of the reports tab or by clicking on sample names in the Metrics by sample table.

Version information

At the top of the report is version information for the App and any third-party components.

FASTA downloads

Two buttons provide the ability to download relevant FASTA-formatted text files for this sample. The "Consensus" button initiates a download of a FASTA file containing all consensus sequences generated for this sample. The "Contig" button initiates a download of a FASTA file containing all assembled contigs for this sample.

Metrics by virus

The metrics by virus table contains information about each viral genome generated. Each row summarizes all sequences assigned to that virus. In the case of multi-segment viruses like Influenza, a row will summarize information across all segment sequences generated for a single viral genome. It contains buttons to download the contents of the table as a CSV, JSON or PDF file. The table itself contains rows for every virus with at least one generated genome in the sample. It contains the following columns:

Virus: The name of the virus. For custom references, this will be the part of the FASTA header before the first whitespace character for the corresponding reference sequence if no custom genome definition file is provided. If a custom genome definition file is provided, this will be the value of the genomeName column that corresponds to the selected reference (matched by the value in the chrom column of the genome definition file and the part of the FASTA header before the first whitespace character).
Detected Amplicons (only if 'Experiment Type' is set to 'Amplicon'): The number of amplicons detected over the total number of amplicons expected for that sequence. The percentage of amplicons detected is used to to determine if the sample is sufficient quality for variant calling. See for more details.

Metrics by sequence

This table summarizes the results for each sequence generated for the sample. For multi-segment viruses such as Influenza, there will may be multiple sequences detected for a given virus. For single-segment viruses there will typically be only one sequence per virus. It contains buttons to download the contents of the table as a CSV, JSON or PDF file. The table itself contains rows for every sequence. It contains the following columns:

Virus: The name of the virus. For custom references, this will be the part of the FASTA header before the first whitespace character for the corresponding reference sequence if no custom genome definition file is provided. If a custom genome definition file is provided, this will be the value of the genomeName column that corresponds to the selected reference (matched by the value in the accession column of the genome definition file and the part of the FASTA header before the first whitespace character).
Segment: The name of the genome segment to which the sequence belongs. For viruses with a single segment, the name of the segment will typically be "Full".
Accession: The accession number or other short unique identifier for the sequence. If using a custom genome definition BED, this value is taken from the first column (chrom

Pre-processing metrics

This stacked bar plot contains information about the outcome of the pre-processing steps (read QC, trimming, de-hosting) as well as the alignment step. It contains counts of reads that fall into the following categories:

Removed in QC: Reads that failed to meet the minimum quality thresholds and were excluded from further processing.
Removed in trimming: Reads that were removed in the initial sequence-based primer trimming step and were excluded from further processing.
Removed in de-hosting: Reads that were removed in the de-hosting step and excluded from further processing. De-hosting is the process of removing reads that may originate from the host organism. Currently only human hosts are supported. De-hosting reads improves the quality of downstream analysis and helps ensure that human sequences are not included in the output BAM files.

Alignment metrics

A column plot displaying the numbers and percentages of all reads that aligned to each reference sequence with at least one mapped read. The columns are labeled by both virus and segment name (if available) on the x-axis, and the y-axis is the read count for each sequence.

Coverage

Displays a trace of the read coverage over each reference sequence. The drop-down menu in the upper left allows the user to switch between viruses. If multiple segment sequences are generated for a single virus, their corresponding coverage plots will be displayed in a vertically stacked fashion. The black trace represents the read coverage, with the coverage depth in number of reads on the left y-axis and the position in the reference sequence on the x-axis.

The minimum read coverage depth for consensus sequence generation (default 10x) is plotted as a dashed orange line across the plot, to easily visualize locations where coverage drops below the threshold (which will be masked in the consensus sequence) and where the coverage is above the threshold (which will be reported in the consensus sequence).

The median coverage is plotted as a dashed teal line across the plot.

By default, sequence variants representing differences between the consensus sequence and the reference sequence are also plotted, with allele frequency on the right y-axis. The colors and symbols represent different sequence variant types. See the figure legend for details.

The "Show log-scale" toggle switch allows the user to switch between logarithmic and linear scales for the coverage (left) y-axis.

The "Show Median" toggle switch allows the user to turn the median coverage line on and off.

The "Show Sequence Variants" toggle switch allows the user to turn the plotting of sequence variants on and off.

Pangolin Report (optional)

This table contains the results of the Pangolin analysis performed on the generated consensus sequences. Pangolin is run if the "Enable Pangolin" box is checked on the input form and one of the following is true:

'Enrichment Panel' is set to a non-custom panel (e.g. VSP) and a consensus sequence was generated using SARS-CoV-2 as reference
'Amplicon Primer Set' is set to a non-custom set with SARS-CoV-2 as reference (e.g. SARS-CoV-2, ARTIC v5.3.2 primers) and a valid consensus sequence was generated
Either 'Enrichment Panel' or 'Amplicon Primer Set' is set to 'Custom'. In this case, Pangolin is applied to every consensus sequence generated for the sample since the software assumes all of them to be potentially SARS-CoV-2 sequences.

The Pangolin report contains a "Download CSV" button which allows the user to download the contents of the report as a text CSV file.

The table in the Pangolin report is derived from the output of the Pangolin software. Please see the Pangolin documentation for more details: . Sequences with a bad Pangolin QC status are highlighted in yellow.

NextClade Report (optional)

This table contains the results of the NextClade analysis performed on the generated consensus sequences. NextClade is run if the "Enable NextClade" box is checked on the input form and one of the following is true:

'Enrichment Panel' is set to a non-custom panel (e.g. VSP) and a consensus sequence was generated using a reference with NextClade dataset available.
'Amplicon Primer Set' is set to a non-custom set with a reference with NextClade dataset available (e.g. SARS-CoV-2, ARTIC v5.3.2 primers) and a valid consensus sequence was generated.
Either 'Enrichment Panel' or 'Amplicon Primer Set' is set to 'Custom' and one or more NextClade datasets are selected under 'Custom Reference'. In this case, each of the selected NextClade datasets is applied to each consensus sequence generated for the sample. This may result in multiple NextClade results for each consensus sequence, some of which may not be meaningful (e.g. "flu_h1n1pdm_ha" dataset applied to a NA segment of an Influenza genome).

The NextClade Report contains a button labeled "Group table by" with a drop-down menu allowing the user to group the results by various fields, including "None". The default is "Dataset" which means that all of the results for each NextClade dataset will be grouped together. For example, if a user is only interested in phylogenetic analysis performed on the HA segment of Influenza A H1N1, these results for each sample can be viewed together in the "flu_h1n1pdm_ha" collapsible group.

The NextClade report also contains a "Download CSV" button which allows the user to download the contents of the report as a text CSV file.

The table in the NextClade report is derived from the output of the NextClade software. Please see the NextClade documentation for more details: . Sequences with a bad NextClade QC status are highlighted in yellow.

Frequently Asked Questions (FAQ)

General

Q: Which Illumina Infectious Disease and Microbiology target-capture enrichment panel kits are compatible with the DRAGEN Microbial Enrichment Plus app?

A: RPIP, UPIP, RVOP/RVEK, VSP, VSP V2, and Custom infectious disease and microbiology enrichment panels. To analyze the Pan-Coronavirus (Pan-CoV) panel, a custom coronavirus reference sequence database may be specified. The DME+ app is not intended for use with non-infectious disease enrichment panels (such as human exome).

Q: Can I analyze the Pan-Coronavirus (Pan-CoV) panel here?

A: The only infectious disease and microbiology enrichment panel without a pre-set DME+ database is the Pan-CoV panel. To analyze Pan-CoV enriched data with the DME+ app, select "Custom Panel" under the "Enrichment Panel" drop-down list and specify a custom coronavirus reference sequence database. Alternatively, we recommend using the DRAGEN Targeted Microbial app.

Q: What does it cost to analyze samples using the DRAGEN Microbial Enrichment Plus app?

A: A Basic Basespace Sequence Hub (BSSH) user account is required to access the DME+ app. However, there is no subscription cost for a Basic BSSH account and no compute cost to run the DME+ app. A Basic BSSH account provides 1 TB of free storage. Additional storage may require iCredits.

Q: Where do I upload my custom reference FASTA and/or BED file?

A: Upload these files to a BSSH project before launching the DME+ app. It will then be possible to select these files in the "Select Dataset File(s)" browser in the app. Please see and reach out to [email protected] with any unresolved upload issues.

Panel Content & Design

Q: Is my viral subtype of interest captured by the VSP V2 panel?

A: See the "Virus Types Captured" column of the "Microorganisms" table in the .

Q: Was VSP V2 designed using contemporary viral genomes or against traditional reference strains only?

A: The VSP V2 viral genome sourcing approach aimed at being as inclusive and comprehensive as possible for the 200 targeted human viruses. All viral genomes passing quality filters available as of June 2023 were included in the design, including recombinant and vaccine strains.

Q: How much of the genome is targeted by the RPIP, UPIP, RVOP/RVEK, VSP, and VSP V2 panels?

A: The full viral genome is targeted for all RVOP/RVEK, VSP, and VSP V2 viruses. For RPIP viruses, see the "Percent Genome Targeted" column of the "Microorganisms" table in the . No more than ~1% of bacterial, fungal, and parasitic genomes are targeted by RPIP or UPIP.

Analysis Options & Settings

Q: I am using the "Custom panel specification" option and my custom analysis aborted or shows an error, why?

A: While there are many possible reasons, one of the most common causes is that the custom database was not formatted correctly. Below are requirements for the custom reference FASTA and (optional) BED file:

Do not exceed the file size limitation: 10 million bases
Do not include duplicate entries
Do not use spaces in the file name; instead use an underscore "_"
File extension must be .fasta or .fa for custom reference FASTA file and .bed for custom reference BED file

See for further details.

Q: I am using the "User-defined specification" option. I am not seeing the microorganisms I expect to be there AND/OR I am seeing microorganisms that I do not want to see.

A: Ensure that the correct microorganism reporting file was uploaded and used. We recommend saving the updated microorganism reporting file with a new name. Rows with microorganism names that are not of interest can be deleted, but do not add any new columns or delete any columns from the provided template. Similarly, do not change or remove any text from the header row. Also, please note that the "kmer_read_count" metric is only valid with the UPIP panel. See for further details.

Q: What read QC (Quality Control) is performed by the DRAGEN Microbial Enrichment Plus app?

A: If enabled, low-quality bases are trimmed from the ends of each read. After trimming, the read is discarded if fewer than 50% of its bases have a quality score greater or equal to q20, the read is shorter than 32 bp, or the read has 5 or more ambiguous bases. It is assumed that appropriate adapter trimming has already been performed.

Q: What does "Read classification sensitivity" mean in the settings for RVOP/RVEK, VSP, and VSP V2?

A: This setting is used as a pre-alignment filtering step for all viral whole-genome sequencing (WGS) panels. The default setting of 5 means that if less than 5 reads classify to the set of reference sequences belonging to a given virus, that virus will not be reported. On the other hand, if 5 or more reads classify to the set of reference sequences belonging to a given virus, read alignment will proceed and alignment-based thresholds will be used to determine whether that virus is reported. The read classification sensitivity can be set as low as 1 or as high as 1000. Lowering the read classification sensitivity threshold below 5 may significantly increase computational run time and is not recommended for most use cases.

Q: When is a Pangolin analysis run?

A: Pangolin is currently enabled for all enrichment panels except UPIP. For Custom Panel analyses, Pangolin is enabled and will run on custom reference sequences with at least 3% coverage that meet these naming conventions:

If only a FASTA file is provided, Pangolin will run on sequences that have a header containing either SARS-CoV-2 or NC_045512
If both a FASTA and BED file are provided, Pangolin will run on sequences where the first column (chrom) contains NC_045512 or the fourth column (genomeName) contains SARS-CoV-2

Q: When is a Nextclade analysis run?

A: When enabled, a Nextclade analysis using the specified dataset(s) is run for the following microorganisms, as applicable:

Microorganism

Nextclade Dataset

Type of Nextclade Dataset

Q. What Internal Control (IC) options are supported and what additional information does using an IC provide?

A: The RPIP, UPIP, and VSP V2 enrichment panels contain probes targeting commercially available Internal Controls. See the table below for Internal Control options compatible with RPIP, UPIP, and VSP V2. It is recommended to spike each sample prior to extraction with Enterobacteria phage T7 at 1.21 x 10^7 copies/mL of sample.

Internal Control

RPIP

UPIP

VSP V2

Process control

Enrichment factor calculation

Microorganism absolute quantification*

Notes

*Quantitative Internal Control concentration must be provided

Q. What are the DRAGEN Microbial Enrichment Plus app settings related to consensus sequence generation and variant calling?

A: See the table below. Consensus sequence bases without aligned read support are indicated by "N" bases.

Setting

Value

Reporting

Q: I don't see the microbe I'm interested in listed in the reported microorganism summary. Does that mean my microbe of interest is not present?

A: Not necessarily. The microbe of interest may be present in the sample, but the DME+ app may not have reported it because the detection metrics fell below the default reporting thresholds. If it is suspected that this may be the case, select the "Report microorganisms and/or AMR markers that are below threshold" option. A user-defined microorganism reporting file can also be specified on a microorganism-by-microorganism basis using multiple parameters should more sensitive reporting be required for a given use case. See for further details.

Q: What is the default reporting threshold for a microorganism to be "predicted present" and make it into reports?

A: Multiple parameters are used to determine whether the sequencing data for a given microorganism is sufficient for a positive call. These may include the horizontal coverage, median read depth, normalized read count, average nucleotide identity, etc of the microorganism and/or other genetically related microorganisms. The default reporting thresholds are different for different microorganisms, as microorganisms with close genetic neighbors generally require more stringent reporting thresholds than genetically distinct microorganisms. As with most tests and prediction algorithms, the default reporting thresholds are intended to balance the trade-off between analytical sensitivity and specificity. Should a given use case require more sensitive or specific reporting, a user-defined microorganism reporting file can be specified on a microorganism-by-microorganism basis using multiple parameters. See for further details. Additionally, the "Report microorganisms and/or AMR markers that are below threshold" option can be enabled.

Q. Are low coverage, median depth 0 microorganisms actually in the sample or are they artifacts?

A: Mathematically, any result with a horizontal coverage of <50% will have a median depth of 0 (50% or more of the nucleotide positions have a depth of 0). Low coverage results could represent true low positives (the most likely reason) or non-specific results, contamination, etc. If maximum confidence is required for a given use case, stricter microorganism reporting thresholds can be specified on a microorganism-by-microorganism basis using multiple parameters. See for further details.

Q. What is tiered reporting logic, which viruses are reported as part of a tiered reporting group, and why should I care?

A: See the "Has Tiered Reporting" and "Reporting Tier" columns of the "Microorganisms" table in the for RPIP, RVOP/RVEK, VSP, and VSP V2 to select and see which viruses are reported as part of a tiered reporting group. Membership in a tiered reporting group means that a hierarchical relationship is pre-built into the database and the most granular tier level passing reporting thresholds is reported. For example, if Influenza B virus (B/Victoria/2/87-like) or Influenza B virus (B/Yamagata/16/88-like) are reported in a sample then the less granular Influenza B virus reporting name will NOT be reported. Tiered reporting group membership is especially relevant when specifying a user-defined microorganism reporting file as including the entire tiered reporting group is necessary to preserve tiered reporting logic.

Q. How can I evaluate DRAGEN Microbial Enrichment Plus microorganism absolute quantification results?

A: To evaluate microorganism absolute quantification results, it is recommended to perform experiments using the relevant sample type and full sequencing workflow (including extraction) and to compare results obtained from the DME+ app with those from digital droplet PCR (ddPCR) and/or quantitative PCR (qPCR) assays. A per-microorganism absolute quantification correction factor can be applied to DME+ results as needed.

Q. I noticed some antimicrobials listed that do not usually get used in clinical environments - is this expected?

A: Yes. Not all antimicrobials and drug classes that are listed may be relevant. Detected AMR markers may also confer resistance to antimicrobials and drug classes that are not listed. Linkage between bacterial AMR marker, antimicrobial, and drug class is based on the Comprehensive Antibiotic Research Database (CARD, version 3.2.8) from McMaster University, ResFinder (version 2.2.1), NCBI Reference Gene Catalog (version 2023-09-26.1), EUCAST expert rules on indicator agents (2019-2023), and CLSI Performance Standards for Antimicrobial Susceptibility Testing (M100 34th Edition). Linkage between viral AMR marker, antimicrobial, and drug class is based on the publications provided in the JSON report - see the PubMed IDs (pmids) field.

Q. Some of the reported bacterial AMR markers in my sample have an “ESBL” flag, a “Carbapenemase” flag, or both. How are these flags determined?

A: Extended-spectrum beta-lactamase (ESBL) and Carbapenemase flags are assigned based on the antimicrobials and drug classes associated with each bacterial AMR marker. An ESBL flag is reported if a 3rd, 4th, or 5th generation cepholosporin OR a beta-lactam + beta-lactamase inhibitor combination is contained in the list of associated antimicrobials or drug classes. A carbapenemase flag is reported if a carbapenem is contained in the list of associated antimicrobials or drug classes. The logic for each of these flags is decoupled, such that a marker can be reported with both flags if the associated antimicrobial or drug class metadata indicates both ESBL and carbapenemase activity.

Results & Output Files

Q: Most of my reads are untargeted reads. Is enrichment working?

A: For complex samples or samples with the majority of nucleic acid being host/untargeted, while 100-1000X more targeted reads and sensitivity over a shotgun/pre-enriched library is expected, typically targeted reads will still only represent a minority of the overall sequencing reads. Notably, RPIP, UPIP, and VSP V2 support various Internal Control options that can be spiked into samples prior to extraction to enable automated calculation of an enrichment factor sample QC metric.

Q: Is any typing information included for my virus of interest?

A: See the "Has Tiered Reporting" and "Lineage/Clade Prediction" columns of the "Microorganisms" table in the for RPIP, RVOP/RVEK, VSP, and VSP V2. Consensus sequence and best match reference accession are also provided for RPIP, RVOP/RVEK, VSP, and VSP V2 viruses. Subtype information may be possible to infer from the consensus sequence (e.g. by Blast) or from the best match reference accession (if annotated in NCBI). Consensus sequence can also be used as input to downstream viral typing tools.

Q. The % Targeted Microbial Reads is not exactly equal to the sum of microorganism Aligned Read Count values, why?

A: The % Targeted Microbial Reads is calculated using a kmer-based classification approach that is intended to give a quick, high-level overview of sample composition. The Aligned Read Count values for microorganisms are calculated in a separate pipeline step using microorganism-specific reference sequence alignment as opposed to broad, categorical, kmer-based classification. Reads that were unclassified or that were classified as low-complexity or ambiguous may actually align to reference sequences. It is also possible for a read to align to a reference sequence of more than one microorganism, for example in a conserved region.

Q: How can I verify or compare results of the DRAGEN Microbial Enrichment Plus app to previously used apps (such as DRAGEN Targeted Microbial)?

A: FASTQ files previously run through other apps can be re-analyzed using the DME+ app. Results from other apps may not be identical to results from the DME+ app, most notably because of the expanded databases used in DME+.

Q: The Reference Coverage section of the HTML report only shows coverage plots for viral genomes. Why doesn't it show the plots for bacterial genomes and/or for viral targeted regions?

A: Viral genomes are orders of magnitude smaller and thus computationally much "cheaper" to align to than bacterial, fungal, and parasitic genomes. In the case of RVOP/RVEK, VSP, and VSP V2, the full viral genome is targeted for all viruses. For RPIP viruses, see the "Percent Genome Targeted" column of the "Microorganisms" table in the . While not visualized in the HTML report at this time, the DME+ does contain coverage depth vector information for all microorganism targeted regions (viruses, bacteria, fungi, and parasites). See: .targetReport.microorganisms[].condensedDepthVector[], which is the read depth across the targeted microorganism reference sequences, condensed (if needed) into 256 bins.

Report JSON format

The DRAGEN Microbial Enrichment Plus app outputs a comprehensive sample-level report.json file containing general metadata, version information, sample QC, microorganism, and AMR marker results, as well as detailed test information. The additional convenience file formats generated by the DRAGEN Microbial Enrichment Plus app do not contain novel content.

(*) indicates results generated by the application layer as opposed to the DRAGEN secondary analysis pipeline

Top-Level Node

The top-level section of the report JSON contains general metadata and version information.

Field

Description

.accession

Identifier used for the sample

.qcReport.sampleQc Node

This section contains information about sample quality control (QC). The fields are relative to .qcReport.sampleQc

Field

Description

.qcReport.enrichmentFactor Node

This section contains information about the enrichment factor calculation and is relevant to RPIP, UPIP, and VSP V2 only. Detection of an appropriate Internal Control is required. The fields are relative to .qcReport.enrichmentFactor

Field

Description

.qcReport.sampleComposition Node

This section contains information about the composition of the sample and is provided for RPIP, UPIP, and VSP V2 only. The fields are relative to .qcReport.sampleComposition

Field

Description

.qcReport.internalControls Node

This section contains information about internal control detection and is relevant to RPIP, UPIP, and VSP V2 only. The value of the .qcReport.internalControls field is an array of objects containing name and RPKM information for each Internal Control. See the code block below for an example:

.userOptions Node

This section gives information about analysis options specified by the user. The fields are relative to .userOptions

Field

Description

.targetReport.microorganisms[] Node

The value of the .targetReport.microorganisms[] field is an array of objects containing information about detected microorganisms. The following table describes one .targetReport.microorganisms[] object. The fields are relative to .targetReport.microorganisms[]

Field

Description

.targetReport.microorganisms[].predictionInformation[].relatedMicroorganisms[] Node

The value of the .targetReport.microorganisms[].predictionInformation[].relatedMicroorganisms[] field is an array of objects containing information about genetically related microorganisms. The following table describes one .targetReport.microorganisms[].predictionInformation[].relatedMicroorganisms[] object. The fields are relative to .targetReport.microorganisms[].predictionInformation[].relatedMicroorganisms[]

Field

Description

.targetReport.microorganisms[].variants[] Node

The value of the .targetReport.microorganisms[].variants[] field is an array of objects containing information about viral variants for all RVOP/RVEK, VSP, and VSP V2 viruses, RPIP: SARS-CoV-2 & FluA/B/C only. The following table describes one .targetReport.microorganisms[].variants[] object. The fields are relative to .targetReport.microorganisms[].variants[]

Field

Description

.targetReport.microorganisms[].pangoLineage[] Node

The value of the .targetReport.microorganisms[].pangoLineage[] field is an array of objects containing information about SARS-CoV-2 Pango lineage prediction results. The following table describes one .targetReport.microorganisms[].pangoLineage[] object. The fields are relative to .targetReport.microorganisms[].pangoLineage[].

.targetReport.microorganisms[].nextclade[] Node

The value of the .targetReport.microorganisms[].nextclade[] field is an array of objects containing information about viral clade assignment results for applicable viruses. The following table describes one .targetReport.microorganisms[].nextclade[] object. The fields are relative to .targetReport.microorganisms[].nextclade[].

.targetReport.amrMarkers[] Node

The value of the .targetReport.amrMarkers[] field is an array of objects containing information about detected bacterial AMR markers. The following table describes one .targetReport.amrMarkers[] object. The fields are relative to .targetReport.amrMarkers[]

Field

Description

.targetReport.amrMarkers[].variants[] Node

The value of the .targetReport.amrMarkers[].variants[] field is an array of objects containing information about variants for bacterial AMR markers with "protein variant" or "rRNA variant" model types. The following table describes one .targetReport.amrMarkers[].variants[] object. The fields are relative to .targetReport.amrMarkers[].variants[]

Field

Description

.targetReport.customReferences[] Node

This section contains information about custom reference detection results and is only present for custom database analyses. When only a custom reference FASTA file is provided (no BED file), each .targetReport.customReferences[] object contains information for a single reference sequence. When both a FASTA and BED file are provided, each .targetReport.customReferences[] object contains information for a single genome/microorganism, which can be a collection of one or more reference sequences. The fields are relative to .targetReport.customReferences[]

Field

Description

.targetReport.customReferences[].consensusSequences[] Node

The value of the .targetReport.customReferences[].consensusSequences[] field is an array of objects containing majority consensus sequence information for a single custom reference sequence. When only a FASTA file is provided (no BED file), there will be only one object in the array. When both a FASTA and BED file are provided, there may be more than one object in the array. The fields are relative to .targetReport.customReferences[].consensusSequences[]

Field

Description

.targetReport.customReferences[].variants[] Node

The value of the .targetReport.customReferences[].variants[] field is an array of objects containing information about a single detected variant. The fields are relative to .targetReport.customReferences[].variants[]

Field

Description

.targetReport.customReferences[].pangoLineage[] Node

The value of the .targetReport.customReferences[].pangoLineage[] field is an array of objects containing information about SARS-CoV-2 Pango lineage prediction results. The following table describes one .targetReport.customReferences[].pangoLineage[] object. The fields are relative to .targetReport.customReferences[].pangoLineage[]

.additionalInformation[] Node

The value of the .additionalInformation[] field is an array of objects containing additional information about the test and data analysis solution. The fields are relative to .additionalInformation[]

Field

Description