1 of 7

DRAGEN ORA Helper Suite v2.0

Warning Do not use this documentation for decompression on the DRAGEN Server. The DRAGEN Server has its own integrated DRAGEN ORA Decompression component.

The DRAGEN ORA Helper Suite Software is a suite of software for Linux distributions, designed to integrate in a transparent manner compressed FASTQ.ORA files into secondary analysis in a local or remote environment.

In Illumina® DRAGEN™ Software v3.8 and later, the handling of FASTQ.ORA files is integrated within the system. DRAGEN ORA Helper Suite is intended for processing FASTQ.ORA files in environments where a secondary analysis is not performed on the DRAGEN Software or in environments that do not have access to DRAGEN v3.8 or later. If you are running secondary analysis pipelines on a DRAGEN v3.7 platform, only the oraFuse software is supported. Refer to oraFuse Software.

With the DRAGEN ORA Helper Suite, FASTQ.ORA files can be used:

without full decompression to the disk
with no overhead
with no change in the current linux command (*.fastq is used instead of *.fastq.ora at input)

Software quick overview

The DRAGEN ORA Helper Suite Software package contains four software. Three of them process FASTQ.ORA files, the fourth one is an interactive user guide. You can use the FASTQ.ORA processing software in downstream analysis. The DRAGEN ORA Helper Suite Software v2.0 include:

oraFuse v2.0
ora-ldpreload v2.0
orad v2.7.0
oraHelper

oraFuse Software

The oraFuse software uses the FUSE filesystem to create a virtual ora subdirectory in the current directory, with ora symbolic links pointing to a virtual uncompressed FASTQ.ORA file. The virtual FASTQ file does not impact the storage footprint.

Info Root privileges are required to install the oraFuse software. If you do not have root privileges, use the ora LD-Preload software or the orad software.

ora LD-Preload Software

The ora LD-Preload software uses the LD-Preload variable of the dynamic linker to intercept stystem calls. The ora LD-Preload software creates a virtual FASTQ file for each FASTQ.ORA file. The virtual FASTQ files do not impact the storage footprint.

Info

The ora LD-Preload software only works with a dynamically compiled binary. If you do not know if the intended downstream bioinformatics software has been dynamically compiled, run the oraHelper command for guidance.
The auto-completion Linux feature does not work on virtual files.

orad Software

orad is the DRAGEN ORA Decompression software. Although it does not operate in a fully transparent manner, it works in most situations. The Orad software can be used in combination with the pipe process or pipe substitution process to reduce reads and writes to the disk.

oraHelper Interactive User Guide

oraHelper is an interective user guide that provides the following guidance:

Which of the ORA Helper Suite software is the best fit.
The proper command line to use with each of the ORA Helper Suite software.
Information about the input file. For example, if a FASTQ.ORA file is a paired read interleaved file or which reference has been used to compress a FASTQ.ORA file.

Remote and Local Usage

Local vs. remote environment

In a local environment, FASTQ.ORA files are accessible on the compute node via the file system. Cloud instances with local data, or data available via a network file system, are also considered local environments. Local environment examples include the following:

The compute node is a laptop and files are stored on the local hard disk.
The compute node is a cloud instance and files are copied to its local storage.

In a remote environment, FASTQ.ORA files are stored in the cloud on either AWS S3 or Azure blob.

Remote using AWS S3

The DRAGEN ORA Helper Suite Software reuses the AWS Command Line Interface (AWS CLI) configuration. The location of a file is passed by specifying the file name as follows:

s3://bucket/<file name>

To make sure that the AWS CLI is configured to access your file, use the following command:

$ aws s3 ls s3://bucket/<remote file name>.fastq.ora

Info Remote case on AWS S3 with DRAGEN v3.7 uses specific syntax. Refer to oraFuse Software for more information.

Remote Using Azure Blob Storage

The DRAGEN ORA Helper Suite Software reuses the URI set up for your Azure Blob Storage. The location of a file is passed by specifying the file name as follows:

https://StorageAccountName.blob.core.windows.net/<my container>/<my blob>.fastq.ora

To make sure that the Azure CLI is configured to access your blob file, use the following command:

export AZURE_STORAGE_CONNECTION_STRING=<yourURI> az storage blob list – container-name <myContainer>

Info Remote case on Azure Blob Storage is not supported on DRAGEN v3.7.

Installation Software and References

The following information provides the requirements and steps to install the DRAGEN ORA Helper Suite Software.

Installation Requirements

The following table lists the minimum requirements for the DRAGEN ORA Helper Suite Software.

Component

Minimum Requirement

Info The oraFuse software uses a FUSE file system. Because the FUSE file system requires root privileges, the oraFuse software requires root privileges during installation.

Install the DRAGEN ORA Helper Suite Software

Download the installer from the ORA Support Site, and follow the installation steps below to install the DRAGEN ORA Helper Suite Software. Installation Steps:

1. Open a terminal window.

2. If the installation is for an online system, then move the installer to the directory where the software should be installed. If the installation is for an offline system, then the installer does not need to be moved.

3. Navigate to the installer location.

4. Run the following command:

sh <file name>.sh

5. To install oraFuse, install the dependencies when prompted. The following dependencies will automatically be installed on an online system:

Info If the installation is for an offline system, then make sure the dependencies are installed on the offline system.

The installer creates a folder named oraHelperSuite. If the installation is for an offline system, then move the oraHelperSuite folder to the offline system. The oraHelperSuite folder contains the following:

orad executable
oraFuse executable
ora-ldpreload.so library
oraHelper wrapper executable
the human reference genome used to decompress: refbin. The pre-installed refbin is the default human reference. If the FASTQ.ORA file has been compressed with a different reference ensure the correct reference is available locally or download it from the ORA Support Site.

Install the DRAGEN ORA Decompression References

For decompression of FATSQ.ORA derived from specific species/models, the specific reference needs to be downloaded from the ORA Support Site. The whole reference database can also be downloaded.

Info For default human reference, there is no need to add this extra step. The default human reference is already included in the DRAGEN ORA Decompression Software.

Extract a specific reference

1. Move the downloaded archive to the oraHelperSuite directory

mv yourdownloadpath/<genus_specificname>.tar.gz yourpath/oraHelperSuite

2. Navigate to the oraHelperSuite directory

cd yourpath/oraHelperSuite

3. Extract the archive file using the following command:

tar -xzvf <genus_specificname>.tar.gz (Linux)
tar -xzvf <genus_specificname>.tar.gz (Mac)

When the extraction of the specific reference is completed the oraHelperSuite folder on a linux environment should be structured as follows with example on gallus_gallus reference:

Extract the full database

1. Move the downloaded archive to the oraHelperSuite directory

mv yourdownloadpath/oradata.tar.gz yourpath/oraHelperSuite

2. Navigate to the the oraHelperSuite directory

cd yourpath/oraHelperSuite

3. Extract the archive file using the following command:

tar -xzvf oradata.tar.gz (Linux)
tar -xzvf oradata.tar.gz (Mac)

When the extraction of the full database is completed the the oraHelperSuite folder on a linux environment should be structured as follows:

Commands for the ORA Helper Suite software

oraHelper Interactive User Guide

The oraHelper interactive user guide helps you to determine which of the DRAGEN ORA Helper Suite software to use with your intended downstream bioinformatics software. The oraHelper interactive user guide also displays the correct syntax to use.

Info To get the relevant information about the intended downstream bioinformatics software, the bioinformatics software must be in the PATH variable.

Use the oraHelper interactive user guide as follows:

1. Open a terminal window.

2. Add the oraHelperSuite directory to your PATH as follows:

PATH=<oraHelperSuite Directory>:$PATH

3. Enter oraHelper followed by the intended downstream software command and the input *.fastq.ora file name.

$ oraHelper <command> <file name>.fastq.ora

Examples

The following example shows oraHelper used with the bwa command.

$ oraHelper bwa -i <filename>.fastq.ora

The output will print the proper syntax for each of the DRAGEN ORA Helper Suite software.

The following example shows the output of oraHelper used with the head command.

$ oraHelper head <file name>.fastq.ora

The following example shows the output of oraHelper used with the head command.

Command line Options

oraFuse Software

The oraFuse software requires root privileges during installation. Refer to Installation Requirements.

The oraFuse software creates a virtual FASTQ file for each FASTQ.ORA file in the directory. The virtual file created has a *.fastq file extension instead of a *fastq.gz file extension.

Info Make sure you indicate a *.fastq file extension as input.

Local Environment

1. The oraFuse software runs with the following dependencies: fuse, fuse- libs, libcurl and openssl. If the dependencies were not installed during the DRAGEN ORA Helper Suite Software installation, run the following command.

$ sudo yum install fuse fuse-libs libcurl openssl

2. Add the oraHelperSuite directory to your PATH as follows.

PATH=<oraHelperSuite DIR>:$PATH

3. Enter the following command to mount the oraFuse software to the directory where the FASTQ.ORA files are located.

$ oraFuse

4. Run the desired command on the virtual *.fastq file.

5. When finished, enter the following command to unmount the oraFuse software from the directory.

$ oraFuse --unmount

Examples

The following examples show the results of the oraFuse software with the ls command.

Before oraFuse has been mounted to the current directory:

r10K_1.fastq.ora

After oraFuse has been mounted to the directory:

https: r10K_1.fastq r10K_1.fastq.ora s3:

After oraFuse has been mounted to the current directory with ls -l:

The following example shows the head command on a virtual FASTQ file after oraFuse has been mounted.

$ head <file name>.fastq

The following example shows the bwa command on a virtual FASTQ file after oraFuse has been mounted.

$ bwa mem -t 8 -M <FASTA> -o <SAM> <file name>.fastq

Remote Environment

The oraFuse software works on FASTQ.ORA files located in AWS s3 or in Azure Blob Storage. The AWS and Azure Blob Storage account configurations and credentials are used for authentication. Refer to Remote and Local Usage section.

The steps to use the oraFuse software in the remote environment are the same as those used in the local environment.

Info Azure Blob is not supported If you are using DRAGEN v3.7 as a downstream software.

The following command shows the remote and virtual files on S3.

$ ls s3 -l s3://bucket/<path>

Examples

The following example runs BWA on S3 files.

$ bwa mem -t 8 -M <FASTA> -o <SAM> s3://bucket/path/<file name>.fastq

When using DRAGEN v3.7 as a downstream software, you can use oraFuse in the remote environment if the location of a file gets passed by specifying the file name as follows:

.virtual/ora/s3:/bucket/<file name>.fastq

The following is an example of using DRAGEN v3.7 on a single FASTQ.ORA located on AWS S3:

$ dragen -f
--ref-dir=<path to hash table directory on /ephemeral>
--1 .virtual/ora/s3:/bucket/<remote file>.fastq
--output-directory=<path to output directory on /ephemeral>
--output-file-prefix=<prefix name>
--output-format BAM

Comman Line Options

Info Multiple users of the same file are not supported.

Error Messages

If you receive an error message while using the oraFuse software, use the following command to get more information.

$ cat .virtual/ora/.error

ora LD-Preload Software

The ora LD-Preload software creates a virtual FASTQ file for each FASTQ.ORA file in the directory. The virtual file created has a *.fastq file extension instead of a *fastq.gz file extension.

Info Make sure you indicate a *.fastq file extension as input.

Local Environment

1. Add the oraHelperSuite directory to your PATH as follows.

PATH=<oraHelperSuite Directory>:$PATH

2. Run the command with the ora LD- Preload shared library on the *.fastq file as follows.

$ LD_PRELOAD=<oraHelperSuite direcory>/ora-ldpreload.so <command><input file name>.fastq

Examples

The following is an example of the ora LD-Preload software with the 'bwa' command.

$ LD_PRELOAD=<oraHelperSuite DIR>/ora-ldpreload.so bwa mem -t 8 -M <FASTA> -o <SAM> <file name>.fastq

The following is an example of the ora LD-Preload software with the ls command.

$ LD_PRELOAD=<oraHelperSuite DIR>/ora-ldpreload.so ls

The following shows the output of the ora LD-Preload software with the ls command.

<file name>.fastq.ora <file name>.fastq (virtual file)

Remote Environment

The ora LD-Preload software works on FASTQ.ORA files located in AWS s3 or Azure Blob Storage. The software reuses the AWS or Azure Blob Storage account configuration and credentials. Refer to Remote and Local Usage section.

The steps to use the ora LD-Preload software in the remote environment are the same as those used in the local environment. The features are the same, with the following exceptions:

In the remote environment, the ora LD-Preload software does not work on statically compiled bioinformatics software.
In the remote environment, the Linux shell auto-completion feature does not work on virtual files.

orad Software

Orad is the executable of the DRAGEN ORA decompression software. It uses the pipe process, or pipe substitution process, to reduce reads/writes to the disk. If the downstream bioinformatics software do not work with pipes or process substitution, and the oraFuse software or the ora LD-Preload software cannot be used, then fully decompressed temporary files are required. Refer to Troubleshooting section for more information.

The steps are the same for local and remote environments.

Local Environment

1. Add the oraHelperSuite directory to your PATH as follows.

PATH=<oraHelperSuite DIR>:$PATH

2. Enter the command with orad and the pipe or process substitution on the *.fastq.ora files. See the following examples.

Examples

Example with pipe:

When the downstream command or bioinformatics software can read from the standard input, the pipe process is used. The following is an example with the head command. The - c option decompresses to standard output.

$ orad <file name>.fastq.ora -c --raw | head

Example with process substitution:

When the downstream bioinformatics software cannot read from the standard input, for example md5sum, process substitution is used. The following example shows the command with process substitution.

$ md5sum <( orad -c --raw "<file name>.fastq.ora" )

Remote environment

The orad software works on FASTQ.ORA files located in AWS s3 or in Azure Blob Storage. The software reuses the AWS or Azure Blob account configuration and credentials. Refer to Remote and Local Usage section.

Commands for Interleaved fastq.ora files

If the input FASTQ.ORA file is an interleaved paired read file, it can be used as-is with downstream bioinformatics software which provide options to handle interleaved files. Make sure you are using the downstream bioinformatics software with proper interleaved options on the interleaved virtual FASTQ.

In the following example, after the oraFuse software has been mounted, the bwa command is used with the –p option to specify that the input contains interleaved paired reads.

bwa mem reference.fasta -p <interleaved file>.fastq -o result.sam

To decompress an interleaved FASTQ.ORA file into two separate files, use the orad software.

orad <interleaved file>.fastq.ora

To decompress into a single file with interleaved paired reads, use the --interleave option.

In the following example the orad software is used with the bwa command, and the -–interleave option is used to decompress into a single stream containing interleaved data.

bwa mem reference.fasta -p <( orad -c --raw --interleave <interleaved file>.fastq.ora ) > result.sam

Handling of fastq.ora files compressed with reference other than default human reference

With DRAGEN v4.3, compression of FASTQ files derived from supported non-human species and from human bisulfite data (methylated DNA application) is available. DRAGEN ORA Helper Suite allows for transparent handling of such files, providing the correct reference has been downloaded and is available locally. Refer to the sections Install the DRAGEN ORA Decompression References and Supported References.

The command orahelper <file.fastq.ora> provides information on:

which reference has been used to compress
the steps to follow to install the correct reference. Note: the default human reference refbin is installed by default with the Software and does not need to be downloaded again.

Troubleshooting

You can use the oraHelper interactive user guide for assistance with common Linux commands, and to help troubleshoot errors encountered when using the different software included in the DRAGEN ORA Helper Suite. For details, refer to oraHelper Interactive User Guide section.

Common Linux Commands

The following table shows common Linux commands that you can use with the DRAGEN ORA Helper Suite software. This list is not exhaustive.

Command

orad

oraFuse

ora LD-Preload

Bioinformatics Software

The DRAGEN ORA Helper Suite software work with some commonly used bioinformatics software. If the bioinformatics software is supported, you can use the oraHelper interactive user guide to help troubleshoot errors with your command. Below is the list of supported bioinformatics software with DRAGEN ORA Helper Suite. It does not limit the use of other thrid-party software.

Commands for the ORA Helper Suite software

oraHelper Interactive User Guide

Info To get the relevant information about the intended downstream bioinformatics software, the bioinformatics software must be in the PATH variable.

Use the oraHelper interactive user guide as follows:

1. Open a terminal window.

2. Add the oraHelperSuite directory to your PATH as follows:

PATH=<oraHelperSuite Directory>:$PATH

3. Enter oraHelper followed by the intended downstream software command and the input *.fastq.ora file name.

$ oraHelper <command> <file name>.fastq.ora

Examples

The following example shows oraHelper used with the bwa command.

$ oraHelper bwa -i <filename>.fastq.ora

The output will print the proper syntax for each of the DRAGEN ORA Helper Suite software.

The following example shows the output of oraHelper used with the head command.

$ oraHelper head <file name>.fastq.ora

The following example shows the output of oraHelper used with the head command.

Command line Options

Command

Required

Description

oraFuse Software

The oraFuse software requires root privileges during installation. Refer to Installation Requirements.

The oraFuse software creates a virtual FASTQ file for each FASTQ.ORA file in the directory. The virtual file created has a *.fastq file extension instead of a *fastq.gz file extension.

Info Make sure you indicate a *.fastq file extension as input.

Local Environment

$ sudo yum install fuse fuse-libs libcurl openssl

2. Add the oraHelperSuite directory to your PATH as follows.

PATH=<oraHelperSuite DIR>:$PATH

3. Enter the following command to mount the oraFuse software to the directory where the FASTQ.ORA files are located.

$ oraFuse

4. Run the desired command on the virtual *.fastq file.

5. When finished, enter the following command to unmount the oraFuse software from the directory.

$ oraFuse --unmount

Examples

The following examples show the results of the oraFuse software with the ls command.

Before oraFuse has been mounted to the current directory:

r10K_1.fastq.ora

After oraFuse has been mounted to the directory:

https: r10K_1.fastq r10K_1.fastq.ora s3:

After oraFuse has been mounted to the current directory with ls -l:

The following example shows the head command on a virtual FASTQ file after oraFuse has been mounted.

$ head <file name>.fastq

The following example shows the bwa command on a virtual FASTQ file after oraFuse has been mounted.

$ bwa mem -t 8 -M <FASTA> -o <SAM> <file name>.fastq

Remote Environment

The steps to use the oraFuse software in the remote environment are the same as those used in the local environment.

Info Azure Blob is not supported If you are using DRAGEN v3.7 as a downstream software.

The following command shows the remote and virtual files on S3.

$ ls s3 -l s3://bucket/<path>

Examples

The following example runs BWA on S3 files.

$ bwa mem -t 8 -M <FASTA> -o <SAM> s3://bucket/path/<file name>.fastq

When using DRAGEN v3.7 as a downstream software, you can use oraFuse in the remote environment if the location of a file gets passed by specifying the file name as follows:

.virtual/ora/s3:/bucket/<file name>.fastq

The following is an example of using DRAGEN v3.7 on a single FASTQ.ORA located on AWS S3:

$ dragen -f
--ref-dir=<path to hash table directory on /ephemeral>
--1 .virtual/ora/s3:/bucket/<remote file>.fastq
--output-directory=<path to output directory on /ephemeral>
--output-file-prefix=<prefix name>
--output-format BAM

Comman Line Options

Command

Required

Description

Info Multiple users of the same file are not supported.

Error Messages

If you receive an error message while using the oraFuse software, use the following command to get more information.

$ cat .virtual/ora/.error

ora LD-Preload Software

The ora LD-Preload software creates a virtual FASTQ file for each FASTQ.ORA file in the directory. The virtual file created has a *.fastq file extension instead of a *fastq.gz file extension.

Info Make sure you indicate a *.fastq file extension as input.

Local Environment

1. Add the oraHelperSuite directory to your PATH as follows.

PATH=<oraHelperSuite Directory>:$PATH

2. Run the command with the ora LD- Preload shared library on the *.fastq file as follows.

$ LD_PRELOAD=<oraHelperSuite direcory>/ora-ldpreload.so <command><input file name>.fastq

Examples

The following is an example of the ora LD-Preload software with the 'bwa' command.

$ LD_PRELOAD=<oraHelperSuite DIR>/ora-ldpreload.so bwa mem -t 8 -M <FASTA> -o <SAM> <file name>.fastq

The following is an example of the ora LD-Preload software with the ls command.

$ LD_PRELOAD=<oraHelperSuite DIR>/ora-ldpreload.so ls

The following shows the output of the ora LD-Preload software with the ls command.

<file name>.fastq.ora <file name>.fastq (virtual file)

Remote Environment

The steps to use the ora LD-Preload software in the remote environment are the same as those used in the local environment. The features are the same, with the following exceptions:

In the remote environment, the ora LD-Preload software does not work on statically compiled bioinformatics software.
In the remote environment, the Linux shell auto-completion feature does not work on virtual files.

orad Software

The steps are the same for local and remote environments.

Local Environment

1. Add the oraHelperSuite directory to your PATH as follows.

PATH=<oraHelperSuite DIR>:$PATH

2. Enter the command with orad and the pipe or process substitution on the *.fastq.ora files. See the following examples.

Examples

Example with pipe:

$ orad <file name>.fastq.ora -c --raw | head

Example with process substitution:

When the downstream bioinformatics software cannot read from the standard input, for example md5sum, process substitution is used. The following example shows the command with process substitution.

$ md5sum <( orad -c --raw "<file name>.fastq.ora" )

Remote environment

Installation Software and References

The following information provides the requirements and steps to install the DRAGEN ORA Helper Suite Software.

Installation Requirements

The following table lists the minimum requirements for the DRAGEN ORA Helper Suite Software.

Component

Minimum Requirement

Info The oraFuse software uses a FUSE file system. Because the FUSE file system requires root privileges, the oraFuse software requires root privileges during installation.

Install the DRAGEN ORA Helper Suite Software

Download the installer from the ORA Support Site, and follow the installation steps below to install the DRAGEN ORA Helper Suite Software. Installation Steps:

1. Open a terminal window.

3. Navigate to the installer location.

4. Run the following command:

sh <file name>.sh

5. To install oraFuse, install the dependencies when prompted. The following dependencies will automatically be installed on an online system:

Distribution

Dependencies

Info If the installation is for an offline system, then make sure the dependencies are installed on the offline system.

orad executable
oraFuse executable
ora-ldpreload.so library
oraHelper wrapper executable
the human reference genome used to decompress: refbin. The pre-installed refbin is the default human reference. If the FASTQ.ORA file has been compressed with a different reference ensure the correct reference is available locally or download it from the ORA Support Site.

Install the DRAGEN ORA Decompression References

For decompression of FATSQ.ORA derived from specific species/models, the specific reference needs to be downloaded from the ORA Support Site. The whole reference database can also be downloaded.

Info For default human reference, there is no need to add this extra step. The default human reference is already included in the DRAGEN ORA Decompression Software.

Extract a specific reference

1. Move the downloaded archive to the oraHelperSuite directory

mv yourdownloadpath/<genus_specificname>.tar.gz yourpath/oraHelperSuite

2. Navigate to the oraHelperSuite directory

cd yourpath/oraHelperSuite

3. Extract the archive file using the following command:

tar -xzvf <genus_specificname>.tar.gz (Linux)
tar -xzvf <genus_specificname>.tar.gz (Mac)

When the extraction of the specific reference is completed the oraHelperSuite folder on a linux environment should be structured as follows with example on gallus_gallus reference:

Extract the full database

1. Move the downloaded archive to the oraHelperSuite directory

mv yourdownloadpath/oradata.tar.gz yourpath/oraHelperSuite

2. Navigate to the the oraHelperSuite directory

cd yourpath/oraHelperSuite

3. Extract the archive file using the following command:

tar -xzvf oradata.tar.gz (Linux)
tar -xzvf oradata.tar.gz (Mac)

When the extraction of the full database is completed the the oraHelperSuite folder on a linux environment should be structured as follows: