DRAGEN TruSight Oncology 500 Analysis Software can be used to run a subset of samples on different DRAGEN servers to decrease overall processing time. This is possible using a three stage process called scatter/gather, which consists of demultiplexing, analysis, and result gathering.
The first stage is demultiplexing. Demultiplexing runs once on the entire run folder, generates FASTQ files for each sample in the run, and then separates sample files into respective folders. Once complete, note the output directory containing the sample directories holding the FASTQ files.
The process for scattering the analysis on multiple DRAGEN servers is as follows:
Determine how many DRAGEN servers are available to run.
Run demultiplexing on a single DRAGEN server.
Moving or modifying files during an analysis may cause the analysis to fail or provide incorrect results.
To sequence runs on multiple DRAGEN servers using the NovaSeq 6000 XP workflow, modify the sample sheet to include a subset of the lanes. For example, on an S2 flowcell, create two modified sample sheets with one containing the samples from lane 1 and the other from lane 2. This allows only the sample sheet to be modified instead of copying files between servers. This strategy would use the start from Run Folder commands without the --demultiplexOnly
option. The entire run folder would need to be copied to each analysis server as demultiplexing is performed once per server.
Transfer the FASTQ folder output from the original DRAGEN server to additional servers.
Logs_Intermediates/FastqGeneration.
Run analysis software using the --fastqFolder
option on both the original and additional DRAGEN servers.
Option 1 Copy the original SampleSheet.csv
to each server. Then provide a subsetted list to the Bash script on each DRAGEN server with the intended samples/pairs to run.
Option 2 Copy and modify the SampleSheet.csv
to each DRAGEN server to only contain the list of samples/pairs to run.
The software verifies that all samples in the sample sheet are contained within the FASTQ folders unless the --sampleOrPairIDs
command-line option is present in the analysis launch. Failure to account for these checks results in an error.
Copy the results from demultiplexing and each analysis run onto a single server, and then generate the final /Results
directory, which contains the aggregated results. Enter the --gather
command followed by the output directories of the demultiplexing step and each individual analysis run.
Step | Command |
---|---|
Start the DRAGEN TruSight Oncology 500 Analysis Software with the DRAGEN_TSO500-2.6.0.sh
Bash script. The script is installed in the /usr/local/bin directory
. The Bash script is executed on the command line and runs the software with Docker (or Apptainer if specified).
For arguments, refer to Command-Line Options. You can start from BCL files or from the FASTQ folder produced by BCL Convert. The following requirements apply for both methods:
Path to the sequencing run or FASTQ folder. Copy the run or FASTQ folder to the DRAGEN server into the staging folder with the following recommended organization: /staging/runs/{RunID}
. You can copy the run folder onto the DRAGEN server using Linux commands such as rsync
. The sample sheet within the run folder is used unless otherwise specified through the command line.
Run folder must be intact. Refer to Starting from BCL Files for input requirements.
If the analysis output folder path is different from the default, provide the analysis output folder path. Refer to Command-Line Options.
Before running the analysis, confirm that the output directory for the software to write to is empty and does not include results of previous analyses.
For optimal performance, run analysis on data stored locally on the DRAGEN server. Analysis of data stored on NAS can take longer and performance can be less reliable.
The DRAGEN server provides an NVMe SSD in the /staging directory to use as the software output directory. Network-attached storage is required for long-term storage.
When running the DRAGEN TruSight Oncology 500 Analysis Software, use the default settings or set the -analysisFolder command line option to a directory in /staging to make sure the DRAGEN server processes read and write data on the NVMe SSD.
Before beginning analysis, develop a strategy to copy data from the DRAGEN server to a network‑attached storage. Delete output data on the DRAGEN server as soon as possible.
The following are the run and analysis output sizes for each sequencing system per 101 bp:
Sequencing System | Run Folder Output (Gb) | Analysis Output (Gb) | Minimum Disk Space (Gb) |
---|---|---|---|
When launching the analysis, the software checks that the minimum disk space required is available. If the minimum disk space is not available, the software shows an error message and prevents analysis from starting. If disk space is exhausted during a run, the run shows an error and stops analyzing.
Moving or modifying files during an analysis may cause the analysis to fail or provide incorrect results.
You can use the following command-line options with DRAGEN TruSight Oncology 500 Analysis Software.
To learn more about the input requirements, use the --help
command-line option.
Option | Required | Description |
---|
Note:
Use full paths when specifying the file paths in the command line.
Avoid special characters such as &, *, #, and spaces.
When starting from BCL files, only the run folder needs to be specified. The immediate parent directory containing the BCL files does not need to be specified.
When running the analysis software using SSH, Illumina recommends using additional software to prevent unexpected termination of analysis. Illumina recommends screen
and tmux
.
Wait for any running DRAGEN TruSight Oncology 500 Analysis Software containers to complete before launching a new analysis. Run the following command to generate a list of running containers:docker ps
Select from one of the following options:
Start from BCL files in the run folder with the sample sheet included in the run folder.
DRAGEN_TSO500-2.6.0.sh \
--runFolder /staging/{RunFolderName} \
--analysisFolder /staging/{AnalysisFolderName}
Start from BCL files in the run folder with the sample sheet located in a folder other than the run folder.
DRAGEN_TSO500.sh \
--runFolder /staging/{RunFolderName} \
--analysisFolder /staging/{AnalysisFolderName} \
--sampleSheet /staging/{SampleSheetName}.csv
Start from BCL files in the run folder with a different sample sheet and demultiplexing only.
DRAGEN_TSO500-2.6.0.sh \
--runFolder /staging/{RunFolderName} \
--analysisFolder /staging/{AnalysisFolderName} \
--sampleSheet /staging/{SampleSheetName}.csv \
--demultiplexOnly
Start from FASTQ with the sample sheet included in the FASTQ folder and with different resources and hash table folders.
DRAGEN_TSO500-2.6.0.sh \
--resourcesFolder /staging/illumina/DRAGEN_TSO500/resources \
--hashtableFolder /staging/illumina/DRAGEN_TSO500/ref_hashtable \
--fastqFolder /staging/{FastqFolderName} \
--analysisFolder /staging/{AnalysisFolderName}
Start from FASTQ folder with sample sheet included in the FASTQ folder and subset of samples or pairs.
DRAGEN_TSO500-2.6.0.sh \
--fastqFolder /staging/{FastqFolderName} \
--analysisFolder /staging/{AnalysisFolderName} \
--sampleOrPairIDs "Pair_1,Pair2"
If starting from BCL (*.bcl) files, DRAGEN TruSight Oncology 500 Analysis Software requires the run folder to contain certain files and folders. These inputs are required for Docker.
The run folder contains data from the sequencing run, make sure that the folder contains the following files:
The following inputs are required for running the DRAGEN TruSight Oncology 500 Analysis Software using FASTQ (*.fastq) files. The requirements apply to Docker.
Full path to an existing FASTQ folder.
The sample sheet is in the FASTQ folder path, or you can set the path to the sample sheet with the --sampleSheet
override command line option.
Make sure there is sufficient disk space for the analysis to complete. Refer to the --help
command line argument details for disk space requirements.
Use BCL Convert to produce FASTQ files for DRAGEN TruSight Oncology 500 Analysis Software. Using bcl2fastq does not produce the same results and is discouraged.
Make sure that BCL Convert is set to write UMI sequences to the read headers in the FASTQ files.
Store FASTQ files in individual subfolders that correspond to a specific Sample_ID. Keep file pairs together in the same folder. Alternatively, store the FASTQ files in one flat folder structure where the FASTQ files are stored in one folder.
The DRAGEN TruSight Oncology 500 Analysis Software requires separate FASTQ files per sample. Do not merge FASTQ files.
The instrument generates two FASTQ files per flow cell lane, so that there are eight FASTQ files per sample.
Sample1_S1_L001_R1_001.fastq.gz
Sample1 represents the Sample ID.
The S in S1 means sample, and the 1 in S1 is based on the order of samples in the sample sheet, so S1 is the first sample.
L001 represents the flow cell lane number.
The R in R1 means Read, so R1 refers to Read 1.
Folder/File | Description |
---|
The FASTQ folder structure conforms to the folder structure in
Demultiplexing
DRAGEN_TSO500_2.6.0.sh --resourcesFolder /staging/illumina/DRAGEN_TSO500/resources --hashtableFolder /staging/illumina/DRAGEN_TSO500/ref_hashtable --runFolder /staging/{RunFolderName} --analysisFolder /staging/{DemultiplexAnalysisFolderName} --demultiplexOnly --sampleSheet /staging/illumina/{SampleSheetName}
Analysis (one server)
DRAGEN_TSO500_2.6.0.sh --resourcesFolder /staging/illumina/DRAGEN_TSO500/resources --hashtableFolder /staging/illumina/DRAGEN_TSO500/ref_hashtable --fastqFolder /staging/{DemultiplexAnalysisFolderName}/Logs_Intermediates/FastqGeneration/ --analysisFolder /staging/{Node1AnalysisFolderName} --sampleSheet /staging/illumina/{SampleSheetName} --sampleOrPairIDs Pair_1,Pair_2
Analysis (additional servers)
DRAGEN_TSO500_2.6.0.sh --resourcesFolder /staging/illumina/DRAGEN_TSO500/resources --hashtableFolder /staging/illumina/DRAGEN_TSO500/ref_hashtable --fastqFolder /staging/{DemultiplexAnalysisFolderName}/Logs_Intermediates/FastqGeneration/ --analysisFolder /staging/{Node1AnalysisFolderName} --sampleSheet /staging/illumina/{SampleSheetName} --sampleOrPairIDs Pair_3
Gather
DRAGEN_TSO500_2.6.0.sh --analysisFolder /Gathered_Results --resourcesFolder staging/illumina/DRAGEN_TSO500/resources --runFolder /staging/{RunFolderName}/--sampleSheet /staging/illumina/{SampleSheetName} --gather /Demultiplex_Output /Node1_Output /Node2_Output
NextSeq 500/550/550Dx (RUO) HO flow cell
32-55
82-85
150
NovaSeq 6000/6000Dx (RUO) SP Flow Cell
85-100
250-374
300
NovaSeq 6000/6000Dx (RUO) S1 Flow Cell
164-200
360-665
800
NovaSeq 6000/6000Dx (RUO) S2 Flow Cell
290-460
890-1600
1500
NovaSeq 6000/6000Dx (RUO) S4 Flow Cell
800-1200
2700-4100
3000
NovaSeq X 1.5B
213
352
800
NovaSeq X 10B
1100
1800
3000
NovaSeq X 25B
1800
3300
4000
NextSeq 1000/2000
41
107
150
Config folder | Configuration files |
Data folder | *.bcl files |
Images folder | [Optional] Raw sequencing image files. |
Interop folder | Interop metric files. |
Logs folder | [Optional] Sequencing system log files. |
RTALogs folder | Real-Time Analysis (RTA) log files. |
RunInfo.xml file | Run information. |
RunParameters.xml file | Run parameters. |
SampleSheet.csv file | Sample information. If you want to use a sample sheet that is not in the run folder or a sample sheet named something other than |
| No | Displays a help screen with available command line options. |
| No |
| No | Path to the resource folder location. The default location is |
| Yes | Required when |
| Yes | Required when |
| No | Optional for Docker. Specify the user ID to be used within the Docker container. |
| No | Displays the version of the software. |
| No | Provide the full path, including file name, if not provided as |
| No | Provide the comma-delimited sample or pair IDs that should be processed on this node with no spaces. For example, |
| No | Demultiplex to generate FASTQ only without additional analysis. |
| No | Follow this option for any directories with results that should be gathered into a single Results folder. |
| No | Defaults to the DRAGEN hash table location created upon install. If not using the default location, enter the hash table location. |
Path to the local analysis folder. The default location is /staging/DRAGEN_TSO500_2.6.0_Analysis_{timestamp}
. If not using the default location, provide the full path to the local analysis folder. Folder must have sufficient space and must be on an NVMe SSD drive. For example, the /staging
directory on the DRAGEN server.
Refer to table in for minimum disk space requirements.