arrow-left

All pages
gitbookPowered by GitBook
1 of 3

Loading...

Loading...

Loading...

Analysis Launch on Standalone DRAGEN Server

Start the DRAGEN TruSight Oncology 500 Analysis Software with the DRAGEN_TSO500.sh Bash script. The script is installed in the /usr/local/bin directory. The Bash script is executed on the command line and runs the software with Docker (or Apptainer if specified).

For arguments, refer to Command-Line Options. You can start from BCL files or from the FASTQ folder produced by BCL Convert. The following requirements apply for both methods:

  • Path to the sequencing run or FASTQ folder. Copy the run or FASTQ folder to the DRAGEN server into the staging folder with the following recommended organization: /staging/runs/{RunID}. You can copy the run folder onto the DRAGEN server using Linux commands such as rsync. The sample sheet within the run folder is used unless otherwise specified through the command line.

  • Run folder must be intact. Refer to for input requirements.

  • If the analysis output folder path is different from the default, provide the analysis output folder path. Refer to .

circle-info

Before running the analysis, confirm that the output directory for the software to write to is empty and does not include results of previous analyses.

hashtag
Storage Requirements

For optimal performance, run analysis on data stored locally on the DRAGEN server. Analysis of data stored on NAS can take longer and performance can be less reliable.

The DRAGEN server provides an NVMe SSD in the /staging directory to use as the software output directory. Network-attached storage is required for long-term storage.

When running the DRAGEN TruSight Oncology 500 Analysis Software, use the default settings or set the -analysisFolder command line option to a directory in /staging to make sure the DRAGEN server processes read and write data on the NVMe SSD.

Before beginning analysis, develop a strategy to copy data from the DRAGEN server to a network‑attached storage. Delete output data on the DRAGEN server as soon as possible.

The following are the run and analysis output sizes for each sequencing system per 101 bp:

Sequencing System
Run Folder Output (Gb)
Analysis Output (Gb)
Minimum Disk Space (Gb)

When launching the analysis, the software checks that the minimum disk space required is available. If the minimum disk space is not available, the software shows an error message and prevents analysis from starting. If disk space is exhausted during a run, the run shows an error and stops analyzing.

circle-exclamation

Moving or modifying files during an analysis may cause the analysis to fail or provide incorrect results.

Run on Multiple DRAGEN Servers

DRAGEN TruSight Oncology 500 Analysis Software can be used to run a subset of samples on different DRAGEN servers to decrease overall processing time. This is possible using a three stage process called scatter/gather, which consists of demultiplexing, analysis, and result gathering.

The first stage is demultiplexing. Demultiplexing runs once on the entire run folder, generates FASTQ files for each sample in the run, and then separates sample files into respective folders. Once complete, note the output directory containing the sample directories holding the FASTQ files.

The process for scattering the analysis on multiple DRAGEN servers is as follows:

  1. Determine how many DRAGEN servers are available to run.

Run demultiplexing on a single DRAGEN server.

circle-exclamation

Moving or modifying files during an analysis may cause the analysis to fail or provide incorrect results.

circle-info

To sequence runs on multiple DRAGEN servers using the NovaSeq 6000 XP workflow, modify the sample sheet to include a subset of the lanes. For example, on an S2 flowcell, create two modified sample sheets with one containing the samples from lane 1 and the other from lane 2. This allows only the sample sheet to be modified instead of copying files between servers. This strategy would use the start from Run Folder commands without the --demultiplexOnly option. The entire run folder would need to be copied to each analysis server as demultiplexing is performed once per server.

  1. Transfer the FASTQ folder output from the original DRAGEN server to additional servers.

    1. Logs_Intermediates/FastqGeneration.

  2. Run analysis software using the --fastqFolder option on both the original and additional DRAGEN servers.

    • Option 1 Copy the original SampleSheet.csv to each server. Then provide a subsetted list to the Bash script on each DRAGEN server with the intended samples/pairs to run.

    • Option 2 Copy and modify the SampleSheet.csv to each DRAGEN server to only contain the list of samples/pairs to run. The software verifies that all samples in the sample sheet are contained within the FASTQ folders unless the --sampleOrPairIDs command-line option is present in the analysis launch. Failure to account for these checks results in an error.

  3. Copy the results from demultiplexing and each analysis run onto a single server, and then generate the final /Results directory, which contains the aggregated results. Enter the --gather command followed by the output directories of the demultiplexing step and each individual analysis run.

hashtag
Commands for Multinode Analysis

Step
Command

Demultiplexing

DRAGEN_TSO500.sh --resourcesFolder /staging/illumina/DRAGEN_TSO500/resources --hashtableFolder /staging/illumina/DRAGEN_TSO500/ref_hashtable --runFolder /staging/{RunFolderName} --analysisFolder /staging/{DemultiplexAnalysisFolderName} --demultiplexOnly --sampleSheet /staging/illumina/{SampleSheetName}

Analysis

(one server)

DRAGEN_TSO500.sh --resourcesFolder /staging/illumina/DRAGEN_TSO500/resources --hashtableFolder /staging/illumina/DRAGEN_TSO500/ref_hashtable --fastqFolder /staging/{DemultiplexAnalysisFolderName}/Logs_Intermediates/FastqGeneration/ --analysisFolder /staging/{Node1AnalysisFolderName} --sampleSheet /staging/illumina/{SampleSheetName} --sampleOrPairIDs Pair_1,Pair_2

Analysis (additional servers)

DRAGEN_TSO500.sh --resourcesFolder /staging/illumina/DRAGEN_TSO500/resources --hashtableFolder /staging/illumina/DRAGEN_TSO500/ref_hashtable --fastqFolder /staging/{DemultiplexAnalysisFolderName}/Logs_Intermediates/FastqGeneration/ --analysisFolder /staging/{Node1AnalysisFolderName} --sampleSheet /staging/illumina/{SampleSheetName} --sampleOrPairIDs Pair_3

Gather

DRAGEN_TSO500.sh --analysisFolder /Gathered_Results --resourcesFolder staging/illumina/DRAGEN_TSO500/resources --runFolder /staging/{RunFolderName}/--sampleSheet /staging/illumina/{SampleSheetName} --gather /Demultiplex_Output /Node1_Output /Node2_Output

800

NovaSeq 6000/6000Dx (RUO) S2 Flow Cell

290-460

890-1600

1500

NovaSeq 6000/6000Dx (RUO) S4 Flow Cell

800-1200

2700-4100

3000

NovaSeq X 1.5B

213

352

800

NovaSeq X 10B

1100

1800

3000

NovaSeq X 25B

1800

3300

4000

NextSeq 1000/2000

41

107

150

NextSeq 500/550/550Dx (RUO) HO flow cell

32-55

82-85

150

NovaSeq 6000/6000Dx (RUO) SP Flow Cell

85-100

250-374

300

NovaSeq 6000/6000Dx (RUO) S1 Flow Cell

164-200

Starting from BCL Files
Command-Line Options

360-665

Command-Line Options

You can use the following command-line options with DRAGEN TruSight Oncology 500 Analysis Software.

To learn more about the input requirements, use the --help command-line option.

Option
Required
Description

--help

No

Displays a help screen with available command line options.

--analysisFolder

Note:

  • Use full paths when specifying the file paths in the command line.

  • Avoid special characters such as &, *, #, and spaces.

  • When starting from BCL files, only the run folder needs to be specified. The immediate parent directory containing the BCL files does not need to be specified.

When running the analysis software using SSH, Illumina recommends using additional software to prevent unexpected termination of analysis. Illumina recommends screen and tmux.

  1. Wait for any running DRAGEN TruSight Oncology 500 Analysis Software containers to complete before launching a new analysis. Run the following command to generate a list of running containers:docker ps

  2. Select from one of the following options:

  • Start from BCL files in the run folder with the sample sheet included in the run folder. DRAGEN_TSO500.sh \ --runFolder /staging/{RunFolderName} \ --analysisFolder /staging/{AnalysisFolderName}

  • Start from BCL files in the run folder with the sample sheet located in a folder other than the run folder. DRAGEN_TSO500.sh \ --runFolder /staging/{RunFolderName} \ --analysisFolder /staging/{AnalysisFolderName} \

hashtag
Starting from BCL Files

circle-info

For the data generated by NextSeq 1000/2000 and NextSeq X, the analysis can only be started from FASTQs and not from BCLs.

If starting from BCL (*.bcl) files, DRAGEN TruSight Oncology 500 Analysis Software requires the run folder to contain certain files and folders. These inputs are required for Docker.

The run folder contains data from the sequencing run, make sure that the folder contains the following files:

Folder/File
Description

hashtag
Starting from FASTQ Files

The following inputs are required for running the DRAGEN TruSight Oncology 500 Analysis Software using FASTQ (*.fastq) files. The requirements apply to Docker.

  • Full path to an existing FASTQ folder.

  • The FASTQ folder structure conforms to the folder structure in

  • The sample sheet is in the FASTQ folder path, or you can set the path to the sample sheet with the --sampleSheet override command line option.

Make sure there is sufficient disk space for the analysis to complete. Refer to the --help command line argument details for disk space requirements.

circle-info

Use BCL Convert to produce FASTQ files for DRAGEN TruSight Oncology 500 Analysis Software. Using bcl2fastq does not produce the same results and is discouraged.

circle-info

Make sure that BCL Convert is set to write UMI sequences to the read headers in the FASTQ files.

hashtag
FASTQ File Organization

Store FASTQ files in individual subfolders that correspond to a specific Sample_ID. Keep file pairs together in the same folder. Alternatively, store the FASTQ files in one flat folder structure where the FASTQ files are stored in one folder.

The DRAGEN TruSight Oncology 500 Analysis Software requires separate FASTQ files per sample. Do not merge FASTQ files.

The instrument generates two FASTQ files per flow cell lane, so that there are eight FASTQ files per sample.

Sample1_S1_L001_R1_001.fastq.gz

  • Sample1 represents the Sample ID.

  • The S in S1 means sample, and the 1 in S1 is based on the order of samples in the sample sheet, so S1 is the first sample.

  • L001 represents the flow cell lane number.

--sampleSheet /staging/{SampleSheetName}.csv
  • Start from BCL files in the run folder with a different sample sheet and demultiplexing only. DRAGEN_TSO500.sh \ --runFolder /staging/{RunFolderName} \ --analysisFolder /staging/{AnalysisFolderName} \ --sampleSheet /staging/{SampleSheetName}.csv \ --demultiplexOnly

  • Start from FASTQ with the sample sheet included in the FASTQ folder and with different resources and hash table folders. DRAGEN_TSO500.sh \ --resourcesFolder /staging/illumina/DRAGEN_TSO500/resources \ --hashtableFolder /staging/illumina/DRAGEN_TSO500/ref_hashtable \ --fastqFolder /staging/{FastqFolderName} \ --analysisFolder /staging/{AnalysisFolderName}

  • Start from FASTQ folder with sample sheet included in the FASTQ folder and subset of samples or pairs. DRAGEN_TSO500.sh \ --fastqFolder /staging/{FastqFolderName} \ --analysisFolder /staging/{AnalysisFolderName} \ --sampleOrPairIDs "Pair_1,Pair2"

  • RunInfo.xml file

    Run information.

    RunParameters.xml file

    Run parameters.

    SampleSheet.csv file

    Sample information. If you want to use a sample sheet that is not in the run folder or a sample sheet named something other than SampleSheet.csv, provide the full path.

    The R in R1 means Read, so R1 refers to Read 1.

    No

    Path to the local analysis folder. The default location is /staging/DRAGEN_TSO500_Analysis_{timestamp}. If not using the default location, provide the full path to the local analysis folder. Folder must have sufficient space and must be on an NVMe SSD drive. For example, the /staging directory on the DRAGEN server. Refer to table in Storage Requirements for minimum disk space requirements.

    --resourcesFolder

    No

    Path to the resource folder location. The default location is /staging/illumina/DRAGEN_TSO500/resources. If not using the default location, enter the full path to the resource folder.

    --runFolder

    Yes

    Required when --fastqFolder is not specified. Provide the full path to the local run folder.

    --fastqFolder

    Yes

    Required when --runFolder is not specified. Provide the full path to the local FASTQ folder. Analysis starts at this location.

    --user

    No

    Optional for Docker. Specify the user ID to be used within the Docker container.

    --version

    No

    Displays the version of the software.

    --sampleSheet

    No

    Provide the full path, including file name, if not provided as SampleSheet.csv in the run folder

    --sampleOrPairIDs

    No

    Provide the comma-delimited sample or pair IDs that should be processed on this node with no spaces. For example, Pair_1,Pair_2,Sample_1.

    --demultiplexOnly

    No

    Demultiplex to generate FASTQ only without additional analysis.

    --gather

    No

    Follow this option for any directories with results that should be gathered into a single Results folder.

    --hashtableFolder

    No

    Defaults to the DRAGEN hash table location created upon install. If not using the default location, enter the hash table location.

    Config folder

    Configuration files

    Data folder

    *.bcl files

    Images folder

    [Optional] Raw sequencing image files.

    Interop folder

    Interop metric files.

    Logs folder

    [Optional] Sequencing system log files.

    RTALogs folder

    Real-Time Analysis (RTA) log files.

    FASTQ File Organization.