1 of 3

to set up and run an analysis

Launch the DRAGEN Microbial Enrichment Plus BaseSpace application. The application is found in the "DRAGEN" dropdown and the "Infectious Disease and Microbiology" dropdown in BaseSpace.
The DRAGEN Microbial Enrichment Plus only supports Biosample inputs.
After choosing a name for the Analysis, choose either “Biosample” or “Project” as input type. Selecting “Project” will result in all biosamples in the selected project being analyzed.
Select the enrichment panel library from the dropdown. Only one panel can be selected per analysis. There is no need to open the "Custom panel specification" tab if selecting one of the pre-set panels. Note that the user must first select the Enrichment Panel for the appropriate downstream analysis options to populate.
Under "Enrichment Panel Microorganism Reporting List" the default option is "All microorganisms". Optional: the microorganism reporting list can be subset from the full set of organisms targeted by the selected panel if desired. If the user is analyzing with the RPIP or UPIP panel, they can select "Pre-defined specification" or "User-defined specification". For VSP, VSP V2 or RVOP/RVEK, there is no "Pre-defined specification" option and only "User-defined specification" is available. If selecting "User-defined specification", the following steps must be taken. We recommend pre-loading your "User-defined specification" file before starting the analysis.
- Download the template with all the microorganisms as a starting point.
- Remove any rows of organisms that are not wanted in the analysis/report
- DO NOT remove any columns or alter their names
- DO NOT delete the header row or alter the names of the header row
- Optional but recommended: Rename the file to indicate it is altered. Note that in BaseSpace the file name is text before a period.
- Upload the file to a BaseSpace Project
- Once the data file is uploaded, select the "Dataset File(s)" under the "User-defined specification" tab **Once uploaded to a project, the user can reuse the same "User-defined specification" sheet across different analyses.
Analysis Options:

Perform read QC (Quality Control)
- If checked, reads are pre-processed using quality metrics before analysis.
- If unchecked, read quality metrics are calculated, but reads are not trimmed or filtered before analysis.
Report bacterial AMR markers only
- If checked, only bacterial AMR markers but no microorganisms are reported
- This option is disabled if RVOP, VSP, VSP V2 or Custom Panel is selected
- This option is disabled if the 'Report bacterial AMR markers only when an associated microorganism is reported' option is enabled
Report bacterial AMR markers only when an associated microorganism is reported
- If checked, detected bacterial AMR markers are reported if the bacterial AMR marker passes a minimum reporting threshold and one or more associated microorganisms are also detected and reported
- If unchecked, detected bacterial AMR markers are reported if the bacterial AMR marker passes a minimum reporting threshold
- This option is disabled if RVOP, VSP, VSP V2 or Custom Panel is selected
Report microorganisms and/or AMR markers that are below threshold
- If checked, microorganisms and/or AMR markers below reporting thresholds are included in reports
- If unchecked, only microorganisms and/or AMR markers above reporting thresholds are included in reports
- This option is disabled if Custom Panel is selected
- This option is disabled if the 'Report bacterial AMR markers only when an associated microorganism is reported' option is enabled

The 'Read classification sensitivity' default value is 5. This is a pre-filtering step only valid for VSP, VSP v2 or RVOP/RVEK. Decreasing the default may considerably slow the run time.
Pangolin is by default Enabled, expect if analyzing the UPIP panel. Note that Pangolin only runs if SARS-CoV-2 is detected.
Nextclade is by default Disabled. This can be Enabled. Note that Nextclade only runs if the viruses listed below are detected:

Human immunodeficiency virus 1 (HIV-1)
Human respiratory syncytial virus A (HRSV-A)
Human respiratory syncytial virus B (HRSV-B)
Influenza A virus (H1N1pdm09)
Influenza A virus (H3N2)
Influenza A virus (H5N1)
Influenza A virus (H5N6)
Influenza A virus (H5N8)
Influenza B virus (B/Victoria/2/87-like)
Influenza B virus (B/Yamagata/16/88-like)
Monkeypox virus (MPV)
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)

Select an Internal Control (IC). The default is none. Note that RVOP/RVEK and VSP do not have internal controls in their panel. VSP V2 can be analyzed with the following ICs - Enterobacteria Phage T7, Escherichia virus T4, Escherichia Virus MS2, or Armored RNA Quant Internal Process Control. The IC must be added to the workflow before library preparation - otherwise select "None" for IC. The Internal Control concentration needs to be normalized and in the correct format.
Select the Project where the Analysis Output should be saved.

Running analysis on Custom enrichment Panels or Custom reference databases

Users may define their own reference database through the Custom Panel analysis option. This option can be used to analyze samples enriched with standard Illumina Infectious Disease and Microbiology panels (RPIP, UPIP, VSP, VSP V2, RVOP/RVEK) if the user wishes to use a specific set of references rather than the built-in databases for each of these panels. This option also enables users to analyze samples enriched with other targeted enrichment kits through the DRAGEN Microbial Enrichment Plus app.

Launch the DRAGEN Microbial Enrichment Plus BaseSpace application. The application is found in the "DRAGEN" dropdown and the "Infectious Disease and Microbiology" dropdown in BaseSpace.
The DRAGEN Microbial Enrichment Plus only supports Biosample inputs.
After choosing a name for the Analysis, choose either “Biosample” or “Project” as input type. Selecting “Project” will result in all biosamples in the selected project being analyzed.
Select the "Custom Panel" option in the Enrichment Panel dropdown.
A custom reference FASTA must be uploaded to BaseSpace before launching the analysis. See https://help.basespace.illumina.com/manage-data/import-data for more information about importing files into BaseSpace. Information on formatting the FASTA file is provided here.
A BED file is optional. If provided, the BED file must also be uploaded to BaseSpace before launching the analysis. Information on formatting the bed file is provided here..
The only Analysis Options for a custom panel is the ability to turn on or off read QC.

Perform read QC (Quality Control)
- If checked, reads are pre-processed using quality metrics before analysis.
- If unchecked, read quality metrics are calculated, but reads are not trimmed or filtered before analysis.

Pangolin will run on custom reference sequences with at least 3% coverage that meet these naming conventions:

If only a FASTA file is provided, Pangolin will run on sequences that have a header containing either SARS-CoV-2 or NC_045512
If both a FASTA and BED file are provided, Pangolin will run on sequences where the first column (chrom) contains NC_045512 or the fourth column (genomeName) contains SARS-CoV-2

Nextclade DOES NOT work for custom panel analysis - if this is "Enabled" it will not run.
For Custom Analysis, the only option for "Internal Control" is "NONE".
Select the Project where the Analysis Output should be saved.

Custom references and BED files

Custom reference FASTA file:

A custom reference FASTA file is required to run the custom panel analysis. Sequence names in the custom reference FASTA file must be unique and should not contain any spaces. If there is any space in the FASTA header, the part before the first space is assumed to be the sequence name. It is recommended to use only the following in sequence names: alphabets, numbers, underscore (_), hyphen (-), parentheses ((,)), and period (.). Otherwise, the sequence names may appear different in the output. An example custom reference fasta file is provided in the link below.

The user may provide one or more reference genomes as the target for read alignment (and as the basis for generating consensus sequences). At a minimum, the user must provide a FASTA file containing the sequences of the reference genomes. To upload the reference FASTA file, go to the "Projects" tab and click on the folded paper icon (representing File), which will reveal a dropdown menu. Click on "Upload" and select "Files". Within the upload page, select "Other" format for FASTA files, and upload the file as a BioSample. Within the DRAGEN Microbial Enrichment Plus App, under 'Custom panel specification' use the 'Custom reference FASTA for consensus generation' control to select the uploaded FASTA file containing the reference sequences. The software will generate the required DRAGEN hash tables and other auxiliary files automatically, so there is no need to process the FASTA file with a separate app.

Custom reference BED file (optional):

Optionally, a genome definition BED file may also be provided. The BED file tells the software more information about each sequence in the fasta file, such as a human-readable common name to be used in the reports. For multi-segment genomes such as Influenza, the genome definition BED file provides the segment name of each sequence and indicates that all the segments of a single genome belong together. To upload the BED file, go to the "Projects" tab and click on the folded paper icon (representing File), which will reveal a dropdown menu. Click on "Upload" and select "Files". Within the upload page, select "Other" format for BED files, and upload the file as a BioSample. Within the DRAGEN Microbial Enrichment Plus App, under 'Custom panel specification' use the 'Custom reference BED' dropdown to select the uploaded BED file containing the genome definition. See the following page for a description of the format of the genome definition BED file:

The file must be tab-delimited with at least 4 columns:

chrom: the sequence name as it appears in the FASTA
chromStart: start position (always set to 0)
chromEnd: end position (sequence length)
genomeName: name of the genome, target, or microorganism the sequence belongs to (e.g. Monkeypox virus clade II)
segmentName (optional): the name of the segment or gene (e.g. Segment 4 (HA)). Set to 'Full' if the sequence is the full genome

Sequence names must match between the FASTA file and BED file (as included in the "chrom" column), and the same set of sequences must appear in both files. If there are multiple viruses, their names should be unique. For example, if there are multiple Influenza genomes, they should not be labeled with the same virus name in the 4th column.

The BED file controls how sequences are labeled in the output JSON. If the custom reference FASTA file includes sequences from multiple segments, it is recommended to provide a BED file so that the segments are included under the results of that microorganism. Otherwise, each segment will be treated independently and not all of them may be used as reference.

Example genome definition BED file

NC_012532.1	0	10794	Zika	Full
KJ609203.1	0	2292	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 1 (PB2)
KJ609204.1	0	2304	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 2 (PB1)
KJ609205.1	0	2168	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 3 (PA+PA-X)
KJ609206.1	0	1727	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 4 (HA)
KJ609207.1	0	1530	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 5 (NP)
KJ609208.1	0	1441	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 6 (NA)
KJ609209.1	0	1001	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 7 (M1+M2)
KJ609210.1	0	866	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 8 (NS1+NEP)

Microorganism reporting file templates

Panel

Template File

Editing this file

Do not delete or rename columns, and do not add additional columns.
To create a user-defined subset list of organisms on panel to target, do not change the prediction logic but delete rows of the unwanted microorganisms. For example, here is a reporting table of three bacteria with default report tresholds.

Reporting Name

Prediction Logic

Coverage

Median Depth

ANI

Aligned Read Count

RPKM

Kmer Read Count

Note that if sub-selecting to keep Influenza or Enterovirus (including Coxsackievirus, Poliovirus, and Echovirus) this analysis may require additional support.