Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Warning Do not use this documentation for decompression on the DRAGEN Server. The DRAGEN Server has its own integrated DRAGEN ORA Decompression component.
The DRAGEN ORA Helper Suite Software is a suite of software for Linux distributions, designed to integrate in a transparent manner compressed FASTQ.ORA files into secondary analysis in a local or remote environment.
In Illumina® DRAGEN™ Software v3.8 and later, the handling of FASTQ.ORA files is integrated within the system. DRAGEN ORA Helper Suite is intended for processing FASTQ.ORA files in environments where a secondary analysis is not performed on the DRAGEN Software or in environments that do not have access to DRAGEN v3.8 or later. If you are running secondary analysis pipelines on a DRAGEN v3.7 platform, only the oraFuse software is supported. Refer to oraFuse Software.
With the DRAGEN ORA Helper Suite, FASTQ.ORA files can be used:
without full decompression to the disk
with no overhead
with no change in the current linux command (*.fastq is used instead of *.fastq.ora at input)
The DRAGEN ORA Helper Suite Software package contains four software. Three of them process FASTQ.ORA files, the fourth one is an interactive user guide. You can use the FASTQ.ORA processing software in downstream analysis. The DRAGEN ORA Helper Suite Software v2.0 include:
oraFuse v2.0
ora-ldpreload v2.0
orad v2.7.0
oraHelper
The oraFuse software uses the FUSE filesystem to create a virtual ora subdirectory in the current directory, with ora symbolic links pointing to a virtual uncompressed FASTQ.ORA file. The virtual FASTQ file does not impact the storage footprint.
Info Root privileges are required to install the oraFuse software. If you do not have root privileges, use the ora LD-Preload software or the orad software.
The ora LD-Preload software uses the LD-Preload variable of the dynamic linker to intercept stystem calls. The ora LD-Preload software creates a virtual FASTQ file for each FASTQ.ORA file. The virtual FASTQ files do not impact the storage footprint.
Info
The ora LD-Preload software only works with a dynamically compiled binary. If you do not know if the intended downstream bioinformatics software has been dynamically compiled, run the oraHelper command for guidance.
The auto-completion Linux feature does not work on virtual files.
orad is the DRAGEN ORA Decompression software. Although it does not operate in a fully transparent manner, it works in most situations. The Orad software can be used in combination with the pipe process or pipe substitution process to reduce reads and writes to the disk.
oraHelper is an interective user guide that provides the following guidance:
Which of the ORA Helper Suite software is the best fit.
The proper command line to use with each of the ORA Helper Suite software.
Information about the input file. For example, if a FASTQ.ORA file is a paired read interleaved file or which reference has been used to compress a FASTQ.ORA file.
If the input FASTQ.ORA file is an interleaved paired read file, it can be used as-is with downstream bioinformatics software which provide options to handle interleaved files. Make sure you are using the downstream bioinformatics software with proper interleaved options on the interleaved virtual FASTQ.
In the following example, after the oraFuse software has been mounted, the bwa
command is used with the –p
option to specify that the input contains interleaved paired reads.
bwa mem reference.fasta -p <interleaved file>.fastq -o result.sam
To decompress an interleaved FASTQ.ORA file into two separate files, use the orad software.
orad <interleaved file>.fastq.ora
To decompress into a single file with interleaved paired reads, use the --interleave
option.
In the following example the orad software is used with the bwa
command, and the -–interleave
option is used to decompress into a single stream containing interleaved data.
bwa mem reference.fasta -p <( orad -c --raw --interleave <interleaved file>.fastq.ora ) > result.sam
In a local environment, FASTQ.ORA files are accessible on the compute node via the file system. Cloud instances with local data, or data available via a network file system, are also considered local environments. Local environment examples include the following:
The compute node is a laptop and files are stored on the local hard disk.
The compute node is a cloud instance and files are copied to its local storage.
In a remote environment, FASTQ.ORA files are stored in the cloud on either AWS S3 or Azure blob.
The DRAGEN ORA Helper Suite Software reuses the AWS Command Line Interface (AWS CLI) configuration. The location of a file is passed by specifying the file name as follows:
s3://bucket/<file name>
To make sure that the AWS CLI is configured to access your file, use the following command:
$ aws s3 ls s3://bucket/<remote file name>.fastq.ora
Info Remote case on AWS S3 with DRAGEN v3.7 uses specific syntax. Refer to oraFuse Software for more information.
The DRAGEN ORA Helper Suite Software reuses the URI set up for your Azure Blob Storage. The location of a file is passed by specifying the file name as follows:
https://StorageAccountName.blob.core.windows.net/<my container>/<my blob>.fastq.ora
To make sure that the Azure CLI is configured to access your blob file, use the following command:
export AZURE_STORAGE_CONNECTION_STRING=<yourURI> az storage blob list – container-name <myContainer>
Info Remote case on Azure Blob Storage is not supported on DRAGEN v3.7.
The oraHelper interactive user guide helps you to determine which of the DRAGEN ORA Helper Suite software to use with your intended downstream bioinformatics software. The oraHelper interactive user guide also displays the correct syntax to use.
Info To get the relevant information about the intended downstream bioinformatics software, the bioinformatics software must be in the PATH variable.
Use the oraHelper interactive user guide as follows:
1. Open a terminal window.
2. Add the oraHelperSuite
directory to your PATH as follows:
PATH=<oraHelperSuite Directory>:$PATH
3. Enter oraHelper
followed by the intended downstream software command and the input *.fastq.ora file name.
$ oraHelper <command> <file name>.fastq.ora
The following example shows oraHelper used with the bwa
command.
$ oraHelper bwa -i <filename>.fastq.ora
The output will print the proper syntax for each of the DRAGEN ORA Helper Suite software.
The following example shows the output of oraHelper used with the head
command.
$ oraHelper head <file name>.fastq.ora
The following example shows the output of oraHelper used with the head
command.
The oraFuse software requires root privileges during installation. Refer to Installation Requirements.
The oraFuse software creates a virtual FASTQ file for each FASTQ.ORA file in the directory. The virtual file created has a *.fastq file extension instead of a *fastq.gz file extension.
Info Make sure you indicate a *.fastq file extension as input.
1. The oraFuse software runs with the following dependencies: fuse, fuse- libs, libcurl and openssl. If the dependencies were not installed during the DRAGEN ORA Helper Suite Software installation, run the following command.
$ sudo yum install fuse fuse-libs libcurl openssl
2. Add the oraHelperSuite
directory to your PATH as follows.
PATH=<oraHelperSuite DIR>:$PATH
3. Enter the following command to mount the oraFuse software to the directory where the FASTQ.ORA files are located.
$ oraFuse
4. Run the desired command on the virtual *.fastq file.
5. When finished, enter the following command to unmount the oraFuse software from the directory.
$ oraFuse --unmount
The following examples show the results of the oraFuse software with the ls
command.
Before oraFuse has been mounted to the current directory:
r10K_1.fastq.ora
After oraFuse has been mounted to the directory:
https: r10K_1.fastq r10K_1.fastq.ora s3:
After oraFuse has been mounted to the current directory with ls -l
:
The following example shows the head command on a virtual FASTQ file after oraFuse has been mounted.
$ head <file name>.fastq
The following example shows the bwa command on a virtual FASTQ file after oraFuse has been mounted.
$ bwa mem -t 8 -M <FASTA> -o <SAM> <file name>.fastq
The oraFuse software works on FASTQ.ORA files located in AWS s3 or in Azure Blob Storage. The AWS and Azure Blob Storage account configurations and credentials are used for authentication. Refer to Remote and Local Usage section.
The steps to use the oraFuse software in the remote environment are the same as those used in the local environment.
Info Azure Blob is not supported If you are using DRAGEN v3.7 as a downstream software.
The following command shows the remote and virtual files on S3.
$ ls s3 -l s3://bucket/<path>
The following example runs BWA on S3 files.
$ bwa mem -t 8 -M <FASTA> -o <SAM> s3://bucket/path/<file name>.fastq
When using DRAGEN v3.7 as a downstream software, you can use oraFuse in the remote environment if the location of a file gets passed by specifying the file name as follows:
.virtual/ora/s3:/bucket/<file name>.fastq
The following is an example of using DRAGEN v3.7 on a single FASTQ.ORA located on AWS S3:
Info Multiple users of the same file are not supported.
If you receive an error message while using the oraFuse software, use the following command to get more information.
$ cat .virtual/ora/.error
The ora LD-Preload software creates a virtual FASTQ file for each FASTQ.ORA file in the directory. The virtual file created has a *.fastq file extension instead of a *fastq.gz file extension.
Info Make sure you indicate a *.fastq file extension as input.
1. Add the oraHelperSuite
directory to your PATH as follows.
PATH=<oraHelperSuite Directory>:$PATH
2. Run the command with the ora LD- Preload shared library on the *.fastq file as follows.
$ LD_PRELOAD=<oraHelperSuite direcory>/ora-ldpreload.so <command><input file name>.fastq
The following is an example of the ora LD-Preload software with the 'bwa' command.
$ LD_PRELOAD=<oraHelperSuite DIR>/ora-ldpreload.so bwa mem -t 8 -M <FASTA> -o <SAM> <file name>.fastq
The following is an example of the ora LD-Preload software with the ls
command.
$ LD_PRELOAD=<oraHelperSuite DIR>/ora-ldpreload.so ls
The following shows the output of the ora LD-Preload software with the ls
command.
<file name>.fastq.ora <file name>.fastq (virtual file)
The ora LD-Preload software works on FASTQ.ORA files located in AWS s3 or Azure Blob Storage. The software reuses the AWS or Azure Blob Storage account configuration and credentials. Refer to Remote and Local Usage section.
The steps to use the ora LD-Preload software in the remote environment are the same as those used in the local environment. The features are the same, with the following exceptions:
In the remote environment, the ora LD-Preload software does not work on statically compiled bioinformatics software.
In the remote environment, the Linux shell auto-completion feature does not work on virtual files.
Orad is the executable of the DRAGEN ORA decompression software. It uses the pipe process, or pipe substitution process, to reduce reads/writes to the disk. If the downstream bioinformatics software do not work with pipes or process substitution, and the oraFuse software or the ora LD-Preload software cannot be used, then fully decompressed temporary files are required. Refer to Troubleshooting section for more information.
The steps are the same for local and remote environments.
1. Add the oraHelperSuite
directory to your PATH as follows.
PATH=<oraHelperSuite DIR>:$PATH
2. Enter the command with orad and the pipe or process substitution on the *.fastq.ora files. See the following examples.
Example with pipe:
When the downstream command or bioinformatics software can read from the standard input, the pipe process is used. The following is an example with the head command. The - c
option decompresses to standard output.
$ orad <file name>.fastq.ora -c --raw | head
Example with process substitution:
When the downstream bioinformatics software cannot read from the standard input, for example md5sum, process substitution is used. The following example shows the command with process substitution.
$ md5sum <( orad -c --raw "<file name>.fastq.ora" )
The orad software works on FASTQ.ORA files located in AWS s3 or in Azure Blob Storage. The software reuses the AWS or Azure Blob account configuration and credentials. Refer to Remote and Local Usage section.
Command | Required | Description |
---|---|---|
Command | Required | Description |
---|---|---|
-v --version
No
Print the version of the DRAGEN ORA Helper Suite Software along with the versions of the orad, oraFuse, and ora LD-Preload software.
--unmount
No
To unmount oraFuse from the directory
With DRAGEN v4.3, compression of FASTQ files derived from supported non-human species and from human bisulfite data (methylated DNA application) is available. DRAGEN ORA Helper Suite allows for transparent handling of such files, providing the correct reference has been downloaded and is available locally. Refer to the sections Install the DRAGEN ORA Decompression References and Supported References.
The command orahelper <file.fastq.ora>
provides information on:
which reference has been used to compress
the steps to follow to install the correct reference. Note: the default human reference refbin is installed by default with the Software and does not need to be downloaded again.
The following information provides the requirements and steps to install the DRAGEN ORA Helper Suite Software.
The following table lists the minimum requirements for the DRAGEN ORA Helper Suite Software.
Component | Minimum Requirement |
---|---|
Info The oraFuse software uses a FUSE file system. Because the FUSE file system requires root privileges, the oraFuse software requires root privileges during installation.
Download the installer from the ORA Support Site, and follow the installation steps below to install the DRAGEN ORA Helper Suite Software. Installation Steps:
1. Open a terminal window.
2. If the installation is for an online system, then move the installer to the directory where the software should be installed. If the installation is for an offline system, then the installer does not need to be moved.
3. Navigate to the installer location.
4. Run the following command:
sh <file name>.sh
5. To install oraFuse, install the dependencies when prompted. The following dependencies will automatically be installed on an online system:
Info If the installation is for an offline system, then make sure the dependencies are installed on the offline system.
The installer creates a folder named oraHelperSuite
. If the installation is for an offline system, then move the oraHelperSuite
folder to the offline system. The oraHelperSuite
folder contains the following:
orad executable
oraFuse executable
ora-ldpreload.so library
oraHelper wrapper executable
the human reference genome used to decompress: refbin
. The pre-installed refbin
is the default human reference. If the FASTQ.ORA file has been compressed with a different reference ensure the correct reference is available locally or download it from the ORA Support Site.
For decompression of FATSQ.ORA derived from specific species/models, the specific reference needs to be downloaded from the ORA Support Site. The whole reference database can also be downloaded.
Info For default human reference, there is no need to add this extra step. The default human reference is already included in the DRAGEN ORA Decompression Software.
1. Move the downloaded archive to the oraHelperSuite
directory
mv yourdownloadpath/<genus_specificname>.tar.gz yourpath/oraHelperSuite
2. Navigate to the oraHelperSuite
directory
cd yourpath/oraHelperSuite
3. Extract the archive file using the following command:
When the extraction of the specific reference is completed the oraHelperSuite
folder on a linux environment should be structured as follows with example on gallus_gallus reference:
oraHelperSuite |__orad |__orafuse |__ora-ldpreload.so |__oraHelper |__refbin |__Gallus_gallus |__refbin |__readme_gallus_gallus
1. Move the downloaded archive to the oraHelperSuite
directory
mv yourdownloadpath/oradata.tar.gz yourpath/oraHelperSuite
2. Navigate to the the oraHelperSuite
directory
cd yourpath/oraHelperSuite
3. Extract the archive file using the following command:
When the extraction of the full database is completed the the oraHelperSuite
folder on a linux environment should be structured as follows:
oraHelperSuite |__orad |__orafuse |__ora-ldpreload.so |__oraHelper |__refbin |__oradata |__Homo_sapiens |__refbin |__readme_homo_sapiens |__Homo_sapiens_bisulfite |__refbin |__readme_homo_sapiens_bisulfite |__Genus_specificname |__refbin |__readme_Genus_specificname
Distribution | Dependencies |
---|---|
System memory
8 GB RAM
Free disk space
From 2 GB to 15 GB depending on how many different specific references are needed for the decompression. Specific references are available for downloads in the ORA Support Site. For human data, 2 GB are required. The default human reference is already pre-installed within the Software and does not need to be downloaded.
Compatible Linux distributions
CentOS 7 or later
Ubuntu 14 or later
Oracle 8 or later
Internet connection
An internet connection is required to install the DRAGEN ORA Helper Suite. Once installed, the internet connection is not required unless the DRAGEN ORA Helper Suite is used in a remote environment.
CentOS
fuse
fuse-libs
libcurl
openssl
Ubuntu
fuse
openssl
libcurl3-gnutls
Oracle
fuse
fuse-libs
fuse-devel
libcurl
You can use the oraHelper interactive user guide for assistance with common Linux commands, and to help troubleshoot errors encountered when using the different software included in the DRAGEN ORA Helper Suite. For details, refer to oraHelper Interactive User Guide section.
The following table shows common Linux commands that you can use with the DRAGEN ORA Helper Suite software. This list is not exhaustive.
Command | orad | oraFuse | ora LD-Preload |
---|---|---|---|
The DRAGEN ORA Helper Suite software work with some commonly used bioinformatics software. If the bioinformatics software is supported, you can use the oraHelper interactive user guide to help troubleshoot errors with your command. Below is the list of supported bioinformatics software with DRAGEN ORA Helper Suite. It does not limit the use of other thrid-party software.
Bioinformatics software | orad | oraFuse | ora LD-Preload |
---|---|---|---|
ls
Not applicable
Yes
Yes
cat
Yes
Yes
Yes
head
Yes
Yes
Yes
md5sum
Yes
Yes
Yes
grep
Yes
Yes
No
cp
Copy *.fastq.ora
Copy uncompressed FASTQ or *.fastq.ora
No (use the cat
command)
awk
Yes
Yes
Yes
tail
Yes
Yes
Yes
wc
Yes
Yes
Yes
uniq
Yes
Yes
No
string
Yes
Yes
Yes
sort
Yes
Yes
No
BWA v0.7.17
Yes
Yes
Yes
Bowtie2 v2.4.2
Yes
Yes
Yes
STAR v2.7.8a
Yes
Yes
Yes
Minimap2 v2.17
Yes
Yes
Yes
DRAGMAP v1.3
Yes
Yes
No
FASTQC v0.11.9
Use std:input command
Yes
Use std:input command
Cutadapt v3.7
Yes
Yes
No
SPAdes v3.15.4
Yes
Yes
No
LAST v1.25.7
Yes
Yes
Yes
Seqkit v0.16.1
Yes
Yes
No
Seqtk v1.3
Yes
Yes
Yes
Fastp v0.23.1
Yes
Yes
No