App Settings

Describes the controls on the Input Form and their function

Item name

Description

Choices

Default

Required

Project to run the analysis in

This app can accept samples or a project as input.

Samples: Select up to 60 individual samples, from any project(s)
Project: Select a single project containing up to 1536 samples. The app will analyze every FASTQ sample in that project (FASTQ datasets with QcStatus=QcFailed will be excluded)

Select one or more samples to analyze. Either Input Samples or an Input Project can be selected - not both.

Required if Input Type is set to 'Samples'

Select a Project containing up to 1536 samples to be analyzed. The analysis will process all samples from that project (FASTQ datasets with QcStatus=QcFailed will be excluded). There is currently no way to filter specific samples from a project. If the project contains more than 1536 Biosamples, the app will appear to launch, but then will immediately exit.

Required if Input Type is set to 'Project'

This app can analyze samples generated from enrichment or amplicon sequencing experiments. Either can be selected - not both.

Enrichment, Amplicon

Select the enrichment panel used to generate the data. This determines the set of reference genomes the app uses. Different selection will produce different results. Choose 'Custom' to provide your own reference genomes below.

Viral Surveillance Panel (VSP)
Pan-Coronavirus Panel (Pan-Cov)
Respiratory Virus Oligo Panel (RVOP)
Custom

Required if Experiment Type is set to 'Enrichment'

Select the virus genome to align to and primer set used to generate the data. Primer locations determine primer trimming locations and amplicon definitions. If processing SARS-CoV-2 data from a non-amplicon protocol, choose 'SARS-CoV-2, no primers'. Different selection will produce different results. Choose 'Custom' to provide your own reference genomes and primer set below

SARS-CoV-2, ARTIC v5.3.2 primers
SARS-CoV-2, ARTIC v4.1 primers
SARS-CoV-2, ARTIC v4 primers
SARS-CoV-2, ARTIC v3 primers
SARS-CoV-2, no primers
Influenza A, Universal primers
Influenza B, Universal primers
Influenza A and B, Universal primers
Chikungunya Virus, Grubaugh Lab primers
Chikungunya Virus, Illumina primers
Dengue Virus Serotype 1 (DENV1), 400-bp DengueSeq primers
Dengue Virus Serotype 1 (DENV1), Illumina primers
Monkeypox Virus (MPXV) Clade II, Grubaugh Lab primers
Respiratory Syncytial Virus (RSV), CDC primers
Respiratory Syncytial Virus (RSV), WCCRRI primers
Zika Virus, Grubaugh Lab primers
Custom

Required if Experiment Type is set to 'Amplicon'

Custom Reference: Custom Reference FASTA For Consensus Generation

Provide a custom reference FASTA to use for consensus generation. Either Enrichment Panel or Amplicon Primer Set must be set to Custom to enable this field.

Sequence names must be unique and must not contain any space. If there is any space in the FASTA header, the part before the first space is assumed to be the sequence name.
It is recommended to use the following in sequence names: alphabets, numbers, underscore (_), hyphen (-), parentheses ((,)), and period (.). Otherwise, the sequence names may appear different in the output.
It is recommended to keep sequence names short (e.g. NC_045512.2). If needed, full names can be provided in the genomeName column of Reference BED below.
FASTA file name must not include any space, must not exceed 25 characters, and must use extension .fasta or .fa

Required if either Enrichment Panel or Amplicon Primer Set is set to 'Custom'

Custom Reference: Custom Reference BED

Provide a custom reference BED to describe each sequence in Custom Reference FASTA. See Genome definition BED file format

Optional if Enrichment Panel or Amplicon Primer Set is set to 'Custom'. Otherwise not applicable

Custom Reference: Custom PCR Primer Definitions

Provide a file defining primers used in amplicon sequencing. See Primer definition file formats

Optional if Amplicon Primer Set is set to 'Custom'. Otherwise not applicable

Custom Reference: NextClade Datasets

Select one or more available NextClade Datasets from the drop-down menu below. Hold ctrl/command key to select multiple or deselect.

Optional if either Enrichment Panel or Amplicon Primer Set is set to 'Custom'. Otherwise not applicable

Run Pangolin on applicable consensus genomes

Optional if any Enrichment Panel is selected, any SARS-CoV-2 Amplicon Primer Set is selected, or 'Custom' is selected for Enrichment Panel or Amplicon Primer Set. Otherwise not applicable

Run NextClade on applicable consensus genomes. If providing Custom Reference, select NextClade Datasets above to enable. Otherwise not applicable NextClade

Optional if any Enrichment Panel is selected, if a genome with NextClade dataset available is selected for Amplicon Primer Set, or if 'Custom' is selected for Enrichment Panel or Amplicon Primer Set. Otherwise not applicable

Advanced Workflow Settings: Dehost

If checked: input FASTQs will be scrubbed of all human reads, before the Map/Align stage, so that the output BAM includes only viral reads.

Advanced Workflow Settings: Trim Consensus Sequences

Remove any leading and trailing masked nucleotides from the resulting consensus sequences. Does not affect internal masked regions.

Advanced Workflow Settings: Minimum percentage of amplicons with at least 90% coverage ≥ 1x to enable variant calling and consensus sequence generation

At low input concentrations, errors produced by the reverse transcriptase enzyme can propagate to high frequencies, leading to false positive sequence variants. Therefore, we attempt to infer the sample concentration from the amplicon coverage using this metric. If you wish to adjust this, we advise conducting internal studies to examine variant call reproducibility between replicates to determine a threshold that will produce acceptable quality levels for your application. Only applicable to amplicon sequencing where primers are defined. See Special considerations for amplicon sequencing with IMAP protocols

Required if Experiment Type is set to 'Amplicon'

Advanced Workflow Settings: Minimum read coverage depth for consensus sequence generation

Genomic positions with read coverage below this threshold will be considered indeterminate and hard-masked in the final consensus sequence

Advanced Workflow Settings: Minimum percentage of consensus sequence generated to label as confident

Consensus sequences with percentage of callable bases below this threshold will be considered 'low confidence'. Callability is defined based on minimum coverage depth for consensus sequence generation (above)

Additional DRAGEN Command Line Arguments: Additional DRAGEN Map/Align Command Line Arguments

USE AT YOUR OWN RISK. This field allows the user to add any DRAGEN command line argument, which can cause DRAGEN to:

Crash/fail/hang
Run for a very long time
Generate unexpected or invalid results

The app appends this input text to the DRAGEN command line after removing invalid characters (valid characters are alphanumeric plus ._-"'). Note that there is no validation of the contents. If you use this field and the appsession aborts, the output*.log appsession log file may help to understand the cause of the failure.

Additional DRAGEN Command Line Arguments: Additional DRAGEN Variant Calling (Somatic) Command Line Arguments

USE AT YOUR OWN RISK. This field allows the user to add any DRAGEN command line argument, which can cause DRAGEN to:

Crash/fail/hang
Run for a very long time
Generate unexpected or invalid results

Organisms to Report (VSP)

Only the checked organisms will be reported (consensus sequences and metrics). This will not affect the underlying bioinformatics pipeline, only which outputs are provided.

Optional if Enrichment Panel is set to 'VSP'. Otherwise, not applicable

Organisms to Report (RVOP)

Only the checked organisms will be reported (consensus sequences and metrics). This will not affect the underlying bioinformatics pipeline, only which outputs are provided.

Optional if Enrichment Panel is set to 'RVOP'. Otherwise, not applicable