1 of 21

ORA

Introduction

Illumina® DRAGEN™ ORA Decompression Software Family

Warning This user guide is intended for users who are working with FASTQ.ORA files AND who are not using the DRAGEN secondary analysis software.

DRAGEN v3.8 has introduced ORA Compression, a fully lossless compression algorithm to compress FASTQ.GZ files at a ratio up to x5 vs FASTQ.GZ. FASTQ.ORA files can be used as-is within the DRAGEN ecosystem. For users who are not using the DRAGEN secondary analysis software, FASTQ.ORA files can either be:

used as-is, in a seamless manner, with most common downstream software, using DRAGEN ORA Helper Suite standalone software; or
decompressed, using DRAGEN ORA Decompression standalone software.

Please refer to the relevant user guide for using them.

Product Guides

DRAGEN ORA Decompression v2.7

Warning Do not use this documentation for decompression on the DRAGEN Server. The DRAGEN Server has its own integrated DRAGEN ORA Decompression component.

DRAGEN ORA Decompression Software decompresses fastq.ora files into fastq.gz files. Fastq.ora files are generated using lossless compression technology as part of DRAGEN. Orad is the executable file that runs the DRAGEN ORA Decompression Software, which is a standalone piece of software.

The DRAGEN ORA Decompression Software is available for the following operating systems in three separated executable:

Linux
Mac
Windows

Decompression of FASTQ.ORA stored on local storage is supported on Linux, Mac, and Windows. Decompression of FASTQ.ORA stored on AWS S3 is only supported on Linux.

Decompression of FASTQ.ORA stored on Azure Blob storage is only supported on Linux.

When FASTQ.ORA is located on AWS S3 or Azure Blob storage, the decompression occurs on a streaming mode: the FASTQ.ORA file does not need to be fully transferred before decompression can start.

Software Installation

Installation Requirements

The following are the minimum requirements for the DRAGEN ORA Decompression Software:

Component

Minimum Requirement

System memory

8 GB RAM

Free disk space

Compatible Linux distributions

CentOS 7 and later
Ubuntu 14 and later
Oracle 8 and later
Fedora 26
Debian 8

Compatible Mac distribution

Mac 10.15 and later (Apple silicon and Intel CPU processors)

Compatible Windows distribution

Windows 10 and later

Installing the DRAGEN ORA Decompression Software

Linux or Mac

1. Extract the archive files using the following command:

tar -xzvf orad.2.7.0.linux.tar.gz (Linux) 
tar -xzvf orad.2.7.0.mac.tar.gz (Mac)

2. Navigate to the Orad directory as follows:

cd orad.2.7.0.linux

3. Move the executable to your preferred location as follows:

mv orad your_preferred_location/

4. Add Orad to your path as follows:

echo 'PATH=$PATH: your_preferred_location/'» ~/.bashrc source ~/.bashrc

5. Move the oradata folder content into the home repository as follows:

mv oradata ~

To store the folder in a different location, use the following command:

mv oradata ~/otherlocation/

When oradata has been moved in another location, you can:

either point to the reference by using the ORA_REF_PATH environment variable as follows:

export ORA_REF_PATH=~/otherlocation/oradata/

or use the following command at decompression

--ora-reference ~/otherlocation/

Windows

1. Extract the downloaded archive with a software that can handle gziped tarballs, such as 7-Zip. Right-click on the archive and select Extract with. The following two files are extracted:

orad.exe
refbin

The following steps use C:\Users\user1 as an example location. Change C:\Users\user1 to the location where you extracted the archive.

2. Open the Command Prompt application.

3. Set the environment variables to use the orad.exe and the refbin file with the set command or the setx command. The set command configures the variables temporarily (for the current console window) while the setx command configures the variables permanently.

4. Set the path to the orad.exe to the PATH environment variable as follows:

set PATH=%PATH%; C:\Users\user1

setx PATH=%PATH%; C:\Users\user1

5. Set the path to the refbin file to an ORA_REF_PATH environment variable as follows:

set ORA_REF_PATH= C:\Users\user1

setx ORA_REF_PATH= C:\Users\user1

References Installation

Info For the default human reference, there is no need to add this extra step. The default human reference is already included in the DRAGEN ORA Decompression Software.

Linux or mac

Extract a specific reference

1. Move the downloaded archive to the oradata directory

mv yourdownloadpath/<genus_specificname>.tar.gz yourpath/oradata

2. Navigate to the oradata directory

cd yourpath/oradata

3. Extract the archive file using the following command:

tar -xzvf <genus_specificname>.tar.gz (Linux)
tar -xzvf <genus_specificname>.tar.gz (Mac)

When the extraction of the specific reference is completed on Linux OS, the orad.2.7.0.linux folder should be structured as follows with example on gallus_gallus reference:

orad.2.7.0.linux

|___orad

|___oradata

|___refbin

|___Gallus_gallus

|___refbin

|___readme_gallus_gallus

Note The oradatafolder can be moved to another location but should keep its structure.

Extract the full database

1. Move the downloaded archive to the orad.2.7.0.linux directory

mv yourdownloadpath/oradata.tar.gz yourpath/orad.2.7.0.linux

2. Navigate to the orad.2.7.0.linux directory

cd yourpath/ orad.2.7.0.linux

3. Delete existing oradata folder

rm -r oradata

4. Extract the archive file using the following command:

tar -xzvf oradata.tar.gz (Linux)
tar -xzvf oradata.tar.gz (Mac)

When the extraction of the full database is completed the orad.2.7.0.linux folder should be structured as follows:

orad.2.7.0.linux

|___orad

|___oradata

|___Homo_sapiens

|___refbin

|___readme_homo_sapiens

|___Homo_sapiens_bisulfite

|___refbin

|___readme_homo_sapiens_bisulfite

|___<Genus_specificname>

|___refbin

|___ readme_<Genus_specificname>

Note The oradatafolder can be moved to another location but should keep its structure.

Windows

Extract the downloaded archive with a software that can handle gziped tarballs, such as 7-Zip. Right-click on the archive and select Extract with.

When a specific reference is downloaded, a folder with name <Genus_specifcname> is extracted. This folder contains the corresponding refbin and readme files. Move this folder in the location where orad and the default human refbin file has been extracted during the installation of the DRAGEN ORA Decompression Software procedure.

When the full database is downloaded, a folder with name oradata is extracted. This folder contains subfolders for each specific species which contains the corresponding refbin and readme files. Move this oradata folder in the location where orad and the default human refbin has been extracted during the installation of the DRAGEN ORA Decompression Software procedure.

Commands

Required commands

Use the following commands to decompress the files:

orad FILE [args]or orad [args] FILE

A folder that contains FASTQ.ORA files can also be provided for batch decompression of FASTQ.ORA files at top level directory:

orad ./pathtofolder/ [args]

On Windows, replace orad with orad.exe.

orad.exe FILE [args].

To decompress FASTQ.ORA compressed with a reference other than the default human reference, ensure the specific reference is available locally.

No change applies in the command line. The decompression software automatically detects which species/model is used.

Command Line Options

Command

Description

-C --check

Checks the integrity of the specified ORA file. This option decompresses the file in memory and verifies that the checksum of the decompressed data and the checksum of the original data are identical. The decompressed file is not saved.

--raw

Decompresses the ORA file into an uncompressed FASTQ file. By default, the DRAGEN ORA Decompression Software decompresses to gzip format.

--rm

Deletes the input file after successful execution. By default, the input file is not deleted. This option is not supported for files in AWS S3 or Azure Blob Storage.

-t --threads

Sets the maximum number of threads allowed by the system. The default value is 8.

-f --force

Overwrites the output file without prompting. By default, if the output file exists, the software exits without overwriting.

-h --help

Prints help and exits.

-v --version

Prints software version.

-i --info

Prints information about the compressed ORA file. The following information is included:

Software version used to compress the file.
Total number of sequences in the file.
Total number of bases in the file.
If the file contains interleaved data.
The original name of the file if it was saved in the fastq.ora file.
The name of the species/reference it has been compressed with, and the related xxhash checksum of the reference. When nothing is specified the FASTQ.ORA file has been compressed with the default human reference.

This option is not supported for files in AWS S3 or Azure Blob Storage. Although the ORA file format supports concatenation of fastq.ora files, using this command on a concatenated fastq.ora file prints erroneous information.

-c --stdout

Prints the decompressed file to the default standard output stdout. This is useful to share the results with another application without writing the decompressed file to disk.

Reads an input fastq.ora file from the default standard input stdin. This option is not supported for Windows OS.

-P --path

Sets the path location of the output file. The default file name is used. If a path is not specified, the file is created in the same location as the input file.This option overwrites the path if it is used with the -o option.

-o --out

Sets the name of the output file and the path when used with -P. The default is the name of the input fastq.ora file.

-N --name

Restores the original name saved in the fastq.ora file, at the time it was compressed to a fastq.ora file.

-I --interleave

Decompresses the output file into a single interleaved file. By default, when the input is a single interleaved fastq.ora file, the decompression automatically decompresses into two separate paired read files. If the interleaved fastq.ora file was generated with DRAGEN ORA Decompression v.4.0 or later, -interleaved is included in the file name.

--ora-reference

Changes the directory of the ORA reference file refbin.By default the software looks for the reference file in the following locations:

$HOME/oradata/refbin
$HOME/oradata/Genus_specificname/refbin
./oradata/refbin
./oradata/Genus_specificname/refbin

If you specified a location in the environment variable, the software also looks in the location ORA_REF_PATH. For example, set with export ORA_REF_PATH=/some/path/.

--check-ora-reference-path

Verifies if the ORA reference file refbin is accessible and prints the refbin path. Decompression does not occur when this option is added.

--quiet

Sets decompression to quiet mode. In quiet mode, nothing is written to the standard output and standard error. This mode is ignored when used with -c (--stdout) or -C (--check).

--empty-third-line

Outputs the third line in the FASTQ format, (which is, the line that starts with +) as an empty line. By default this line is preserved.

-r --repeat-header

Adds the read header to the third line in the FASTQ format, (which is, the line that starts with +).

Command Examples

Using Windows, replace orad with orad.exe. Example is: orad.exe myfile.fastq.ora --check

Command

Description

orad myfile.fastq.ora --check

Checks the integrity of an ora file.

orad myfile.fastq.ora

Decompresses myfile.fastq.ora to myfile.fastq.gz

orad myfile.fastq.ora --info

Prints information summary of an ora file.

orad -c --raw myfile.fastq.ora | head

Prints the first lines of the corresponding .FASTQ file in the terminal.

orad --check-ora-reference-path

Verifies the accessibility of the ora reference file and print its path.

Combine with downstream analysis

For a semi-transparent usage of fastq.ora files with third-party bioinformatics software, use DRAGEN ORA Decompression with the pipe function or process substitution. This method improves system performance by reducing reads and writing to the disk versus a full decompression step.

If the analysis software can read from the standard input, such as BWA, use the following command:

orad file.fastq.ora -c --raw | bwa mem humanref.fasta - > resu.sam

The -c option decompresses to standard output. The result is sent | to BWA, which uses the dash option - to read from standard input. This also works for paired reads, which uses the -p option of BWA to specify that the input contains interleaved paired reads.

If the analysis software cannot read from the standard input, you can use process substitution:

bwa mem humanref.fasta <(orad file.fastq.ora -c --raw) > resu.sam

For the file name, use the <( ) syntax containing the command that generates the file to standard output. In this case, orad with the -c option as in the command above. This method does not work when the third-party software checks the input file name or when the third-party software does not read the file sequentially.

Info On Windows, replace orad with orad.exe

Check Losslessness

DRAGEN ORA Compression is a lossless compression.

If you wish to verify that no data was lost during the compression of the fastq.ora file, compare the MD5 checksum of the decompressed fastq.ora file and the MD5 checksum of the decompressed fastq.gz file.

1. Compute the md5 checksum of the uncompressed FASTQ.ORA content as follows

md5sum <(orad myfile.fastq.ora --raw -c )

2. Compute the md5 checksum of the uncompressed FASTQ.GZ content as follows

md5sum <(gzip -d -c myfile.fastq.gz)

DRAGEN ORA Helper Suite v2.0

Warning Do not use this documentation for decompression on the DRAGEN Server. The DRAGEN Server has its own integrated DRAGEN ORA Decompression component.

The DRAGEN ORA Helper Suite Software is a suite of software for Linux distributions, designed to integrate in a transparent manner compressed FASTQ.ORA files into secondary analysis in a local or remote environment.

With the DRAGEN ORA Helper Suite, FASTQ.ORA files can be used:

without full decompression to the disk
with no overhead
with no change in the current linux command (*.fastq is used instead of *.fastq.ora at input)

Software quick overview

The DRAGEN ORA Helper Suite Software package contains four software. Three of them process FASTQ.ORA files, the fourth one is an interactive user guide. You can use the FASTQ.ORA processing software in downstream analysis. The DRAGEN ORA Helper Suite Software v2.0 include:

oraFuse v2.0
ora-ldpreload v2.0
orad v2.7.0
oraHelper

oraFuse Software

The oraFuse software uses the FUSE filesystem to create a virtual ora subdirectory in the current directory, with ora symbolic links pointing to a virtual uncompressed FASTQ.ORA file. The virtual FASTQ file does not impact the storage footprint.

Info Root privileges are required to install the oraFuse software. If you do not have root privileges, use the ora LD-Preload software or the orad software.

ora LD-Preload Software

The ora LD-Preload software uses the LD-Preload variable of the dynamic linker to intercept stystem calls. The ora LD-Preload software creates a virtual FASTQ file for each FASTQ.ORA file. The virtual FASTQ files do not impact the storage footprint.

Info

The ora LD-Preload software only works with a dynamically compiled binary. If you do not know if the intended downstream bioinformatics software has been dynamically compiled, run the oraHelper command for guidance.
The auto-completion Linux feature does not work on virtual files.

orad Software

orad is the DRAGEN ORA Decompression software. Although it does not operate in a fully transparent manner, it works in most situations. The Orad software can be used in combination with the pipe process or pipe substitution process to reduce reads and writes to the disk.

oraHelper Interactive User Guide

oraHelper is an interective user guide that provides the following guidance:

Which of the ORA Helper Suite software is the best fit.
The proper command line to use with each of the ORA Helper Suite software.
Information about the input file. For example, if a FASTQ.ORA file is a paired read interleaved file or which reference has been used to compress a FASTQ.ORA file.

Remote and Local Usage

Local vs. remote environment

In a local environment, FASTQ.ORA files are accessible on the compute node via the file system. Cloud instances with local data, or data available via a network file system, are also considered local environments. Local environment examples include the following:

The compute node is a laptop and files are stored on the local hard disk.
The compute node is a cloud instance and files are copied to its local storage.

In a remote environment, FASTQ.ORA files are stored in the cloud on either AWS S3 or Azure blob.

Remote using AWS S3

The DRAGEN ORA Helper Suite Software reuses the AWS Command Line Interface (AWS CLI) configuration. The location of a file is passed by specifying the file name as follows:

s3://bucket/<file name>

To make sure that the AWS CLI is configured to access your file, use the following command:

$ aws s3 ls s3://bucket/<remote file name>.fastq.ora

Remote Using Azure Blob Storage

The DRAGEN ORA Helper Suite Software reuses the URI set up for your Azure Blob Storage. The location of a file is passed by specifying the file name as follows:

https://StorageAccountName.blob.core.windows.net/<my container>/<my blob>.fastq.ora

To make sure that the Azure CLI is configured to access your blob file, use the following command:

export AZURE_STORAGE_CONNECTION_STRING=<yourURI> az storage blob list – container-name <myContainer>

Info Remote case on Azure Blob Storage is not supported on DRAGEN v3.7.

Installation Software and References

The following information provides the requirements and steps to install the DRAGEN ORA Helper Suite Software.

Installation Requirements

The following table lists the minimum requirements for the DRAGEN ORA Helper Suite Software.

Component

Minimum Requirement

System memory

8 GB RAM

Free disk space

Compatible Linux distributions

CentOS 7 or later
Ubuntu 14 or later
Oracle 8 or later

Internet connection

An internet connection is required to install the DRAGEN ORA Helper Suite. Once installed, the internet connection is not required unless the DRAGEN ORA Helper Suite is used in a remote environment.

Info The oraFuse software uses a FUSE file system. Because the FUSE file system requires root privileges, the oraFuse software requires root privileges during installation.

Install the DRAGEN ORA Helper Suite Software

1. Open a terminal window.

2. If the installation is for an online system, then move the installer to the directory where the software should be installed. If the installation is for an offline system, then the installer does not need to be moved.

3. Navigate to the installer location.

4. Run the following command:

sh <file name>.sh

5. To install oraFuse, install the dependencies when prompted. The following dependencies will automatically be installed on an online system:

Distribution

Dependencies

CentOS

fuse
fuse-libs
libcurl
openssl

Ubuntu

fuse
openssl
libcurl3-gnutls

Oracle

fuse
fuse-libs
fuse-devel
libcurl

Info If the installation is for an offline system, then make sure the dependencies are installed on the offline system.

The installer creates a folder named oraHelperSuite. If the installation is for an offline system, then move the oraHelperSuite folder to the offline system. The oraHelperSuite folder contains the following:

orad executable
oraFuse executable
ora-ldpreload.so library
oraHelper wrapper executable

Install the DRAGEN ORA Decompression References

Info For default human reference, there is no need to add this extra step. The default human reference is already included in the DRAGEN ORA Decompression Software.

Extract a specific reference

1. Move the downloaded archive to the oraHelperSuite directory

mv yourdownloadpath/<genus_specificname>.tar.gz yourpath/oraHelperSuite

2. Navigate to the oraHelperSuite directory

cd yourpath/oraHelperSuite

3. Extract the archive file using the following command:

tar -xzvf <genus_specificname>.tar.gz (Linux)
tar -xzvf <genus_specificname>.tar.gz (Mac)

When the extraction of the specific reference is completed the oraHelperSuite folder on a linux environment should be structured as follows with example on gallus_gallus reference:

Extract the full database

1. Move the downloaded archive to the oraHelperSuite directory

mv yourdownloadpath/oradata.tar.gz yourpath/oraHelperSuite

2. Navigate to the the oraHelperSuite directory

cd yourpath/oraHelperSuite

3. Extract the archive file using the following command:

tar -xzvf oradata.tar.gz (Linux)
tar -xzvf oradata.tar.gz (Mac)

When the extraction of the full database is completed the the oraHelperSuite folder on a linux environment should be structured as follows:

Commands for the ORA Helper Suite software

oraHelper Interactive User Guide

The oraHelper interactive user guide helps you to determine which of the DRAGEN ORA Helper Suite software to use with your intended downstream bioinformatics software. The oraHelper interactive user guide also displays the correct syntax to use.

Info To get the relevant information about the intended downstream bioinformatics software, the bioinformatics software must be in the PATH variable.

Use the oraHelper interactive user guide as follows:

1. Open a terminal window.

2. Add the oraHelperSuite directory to your PATH as follows:

PATH=<oraHelperSuite Directory>:$PATH

3. Enter oraHelper followed by the intended downstream software command and the input *.fastq.ora file name.

$ oraHelper <command> <file name>.fastq.ora

Examples

The following example shows oraHelper used with the bwa command.

$ oraHelper bwa -i <filename>.fastq.ora

The output will print the proper syntax for each of the DRAGEN ORA Helper Suite software.

The following example shows the output of oraHelper used with the head command.

$ oraHelper head <file name>.fastq.ora

The following example shows the output of oraHelper used with the head command.

Command line Options

Command

Required

Description

-v --version

Print the version of the DRAGEN ORA Helper Suite Software along with the versions of the orad, oraFuse, and ora LD-Preload software.

oraFuse Software

The oraFuse software creates a virtual FASTQ file for each FASTQ.ORA file in the directory. The virtual file created has a *.fastq file extension instead of a *fastq.gz file extension.

Info Make sure you indicate a *.fastq file extension as input.

Local Environment

1. The oraFuse software runs with the following dependencies: fuse, fuse- libs, libcurl and openssl. If the dependencies were not installed during the DRAGEN ORA Helper Suite Software installation, run the following command.

$ sudo yum install fuse fuse-libs libcurl openssl

2. Add the oraHelperSuite directory to your PATH as follows.

PATH=<oraHelperSuite DIR>:$PATH

3. Enter the following command to mount the oraFuse software to the directory where the FASTQ.ORA files are located.

$ oraFuse

4. Run the desired command on the virtual *.fastq file.

5. When finished, enter the following command to unmount the oraFuse software from the directory.

$ oraFuse --unmount

Examples

The following examples show the results of the oraFuse software with the ls command.

Before oraFuse has been mounted to the current directory:

r10K_1.fastq.ora

After oraFuse has been mounted to the directory:

https: r10K_1.fastq r10K_1.fastq.ora s3:

After oraFuse has been mounted to the current directory with ls -l:

The following example shows the head command on a virtual FASTQ file after oraFuse has been mounted.

$ head <file name>.fastq

The following example shows the bwa command on a virtual FASTQ file after oraFuse has been mounted.

$ bwa mem -t 8 -M <FASTA> -o <SAM> <file name>.fastq

Remote Environment

The steps to use the oraFuse software in the remote environment are the same as those used in the local environment.

Info Azure Blob is not supported If you are using DRAGEN v3.7 as a downstream software.

The following command shows the remote and virtual files on S3.

$ ls s3 -l s3://bucket/<path>

Examples

The following example runs BWA on S3 files.

$ bwa mem -t 8 -M <FASTA> -o <SAM> s3://bucket/path/<file name>.fastq

When using DRAGEN v3.7 as a downstream software, you can use oraFuse in the remote environment if the location of a file gets passed by specifying the file name as follows:

.virtual/ora/s3:/bucket/<file name>.fastq

The following is an example of using DRAGEN v3.7 on a single FASTQ.ORA located on AWS S3:

$ dragen -f
--ref-dir=<path to hash table directory on /ephemeral>
--1 .virtual/ora/s3:/bucket/<remote file>.fastq
--output-directory=<path to output directory on /ephemeral>
--output-file-prefix=<prefix name>
--output-format BAM

Comman Line Options

Command

Required

Description

--unmount

To unmount oraFuse from the directory

Info Multiple users of the same file are not supported.

Error Messages

If you receive an error message while using the oraFuse software, use the following command to get more information.

$ cat .virtual/ora/.error

ora LD-Preload Software

The ora LD-Preload software creates a virtual FASTQ file for each FASTQ.ORA file in the directory. The virtual file created has a *.fastq file extension instead of a *fastq.gz file extension.

Info Make sure you indicate a *.fastq file extension as input.

Local Environment

1. Add the oraHelperSuite directory to your PATH as follows.

PATH=<oraHelperSuite Directory>:$PATH

2. Run the command with the ora LD- Preload shared library on the *.fastq file as follows.

$ LD_PRELOAD=<oraHelperSuite direcory>/ora-ldpreload.so <command><input file name>.fastq

Examples

The following is an example of the ora LD-Preload software with the 'bwa' command.

$ LD_PRELOAD=<oraHelperSuite DIR>/ora-ldpreload.so bwa mem -t 8 -M <FASTA> -o <SAM> <file name>.fastq

The following is an example of the ora LD-Preload software with the ls command.

$ LD_PRELOAD=<oraHelperSuite DIR>/ora-ldpreload.so ls

The following shows the output of the ora LD-Preload software with the ls command.

<file name>.fastq.ora <file name>.fastq (virtual file)

Remote Environment

The steps to use the ora LD-Preload software in the remote environment are the same as those used in the local environment. The features are the same, with the following exceptions:

In the remote environment, the ora LD-Preload software does not work on statically compiled bioinformatics software.
In the remote environment, the Linux shell auto-completion feature does not work on virtual files.

orad Software

The steps are the same for local and remote environments.

Local Environment

1. Add the oraHelperSuite directory to your PATH as follows.

PATH=<oraHelperSuite DIR>:$PATH

2. Enter the command with orad and the pipe or process substitution on the *.fastq.ora files. See the following examples.

Examples

Example with pipe:

When the downstream command or bioinformatics software can read from the standard input, the pipe process is used. The following is an example with the head command. The - c option decompresses to standard output.

$ orad <file name>.fastq.ora -c --raw | head

Example with process substitution:

When the downstream bioinformatics software cannot read from the standard input, for example md5sum, process substitution is used. The following example shows the command with process substitution.

$ md5sum <( orad -c --raw "<file name>.fastq.ora" )

Remote environment

Commands for Interleaved fastq.ora files

If the input FASTQ.ORA file is an interleaved paired read file, it can be used as-is with downstream bioinformatics software which provide options to handle interleaved files. Make sure you are using the downstream bioinformatics software with proper interleaved options on the interleaved virtual FASTQ.

In the following example, after the oraFuse software has been mounted, the bwa command is used with the –p option to specify that the input contains interleaved paired reads.

bwa mem reference.fasta -p <interleaved file>.fastq -o result.sam

To decompress an interleaved FASTQ.ORA file into two separate files, use the orad software.

orad <interleaved file>.fastq.ora

To decompress into a single file with interleaved paired reads, use the --interleave option.

In the following example the orad software is used with the bwa command, and the -–interleave option is used to decompress into a single stream containing interleaved data.

bwa mem reference.fasta -p <( orad -c --raw --interleave <interleaved file>.fastq.ora ) > result.sam

Handling of fastq.ora files compressed with reference other than default human reference

The command orahelper <file.fastq.ora> provides information on:

which reference has been used to compress
the steps to follow to install the correct reference. Note: the default human reference refbin is installed by default with the Software and does not need to be downloaded again.

Troubleshooting

Common Linux Commands

The following table shows common Linux commands that you can use with the DRAGEN ORA Helper Suite software. This list is not exhaustive.

Bioinformatics Software

The DRAGEN ORA Helper Suite software work with some commonly used bioinformatics software. If the bioinformatics software is supported, you can use the oraHelper interactive user guide to help troubleshoot errors with your command. Below is the list of supported bioinformatics software with DRAGEN ORA Helper Suite. It does not limit the use of other thrid-party software.

Reference

Supported References

Latin Name

Common Name

Size (GB)

Homo_sapiens Homo_sapiens_bisulfite Sus_scrofa Gallus_gallus Oryza_sativa Arabidopsis_thaliana Triticum_aestivum Bos_taurus Glycine_max Rattus_norvegicus Zea_mays Danio_rerio Mus_musculus Caenorhabditis_elegans

Human Human_bisulfite Pig Chicken Rice Arabidopsis thaliana Wheat Cattle Soybean Rat Maze Zebrafish Mouse Roundworm

1.3 2.4 0.80 0.34 0.15 0.04 4.7 1.1 0.31 1.1 0.71 0.54 1.1 0.03

Resources

Terms

This document and its contents are proprietary to Illumina, Inc. and its affiliates ("Illumina"), and are intended solely for the contractual use of its customer in connection with the use of the product(s) described herein and for no other purpose. This document and its contents shall not be used or distributed for any other purpose and/or otherwise communicated, disclosed, or reproduced in any way whatsoever without the prior written consent of Illumina. Illumina does not convey any license under its patent, trademark, copyright, or common- law rights nor similar rights of any third parties by this document.

The instructions in this document must be strictly and explicitly followed by qualified and properly trained personnel in order to ensure the proper and safe use of the product(s) described herein. All of the contents of this document must be fully read and understood prior to using such product(s).

FAILURE TO COMPLETELY READ AND EXPLICITLY FOLLOW ALL OF THE INSTRUCTIONS CONTAINED HEREIN MAY RESULT IN DAMAGE TO THE PRODUCT(S), INJURY TO PERSONS, INCLUDING TO USERS OR OTHERS, AND DAMAGE TO OTHER PROPERTY, AND WILL VOID ANY WARRANTY APPLICABLE TO THE PRODUCT(S).

ILLUMINA DOES NOT ASSUME ANY LIABILITY ARISING OUT OF THE IMPROPER USE OF THE PRODUCT(S) DESCRIBED HEREIN (INCLUDING PARTS THEREOF OR SOFTWARE).

All trademarks are the property of Illumina, Inc. or their respective owners. For specific trademark information, refer to www.illumina.com/company/legal.html

Revision History

Revision history of the DRAGEN ORA standalone Software family product documentation.

Commands for the ORA Helper Suite software

oraHelper Interactive User Guide

Info To get the relevant information about the intended downstream bioinformatics software, the bioinformatics software must be in the PATH variable.

Use the oraHelper interactive user guide as follows:

1. Open a terminal window.

2. Add the oraHelperSuite directory to your PATH as follows:

PATH=<oraHelperSuite Directory>:$PATH

3. Enter oraHelper followed by the intended downstream software command and the input *.fastq.ora file name.

$ oraHelper <command> <file name>.fastq.ora

Examples

The following example shows oraHelper used with the bwa command.

$ oraHelper bwa -i <filename>.fastq.ora

The output will print the proper syntax for each of the DRAGEN ORA Helper Suite software.

The following example shows the output of oraHelper used with the head command.

$ oraHelper head <file name>.fastq.ora

The following example shows the output of oraHelper used with the head command.

Command line Options

Command

Required

Description

-v --version

Print the version of the DRAGEN ORA Helper Suite Software along with the versions of the orad, oraFuse, and ora LD-Preload software.

oraFuse Software

The oraFuse software requires root privileges during installation. Refer to .

The oraFuse software creates a virtual FASTQ file for each FASTQ.ORA file in the directory. The virtual file created has a *.fastq file extension instead of a *fastq.gz file extension.

Info Make sure you indicate a *.fastq file extension as input.

Local Environment

$ sudo yum install fuse fuse-libs libcurl openssl

2. Add the oraHelperSuite directory to your PATH as follows.

PATH=<oraHelperSuite DIR>:$PATH

3. Enter the following command to mount the oraFuse software to the directory where the FASTQ.ORA files are located.

$ oraFuse

4. Run the desired command on the virtual *.fastq file.

5. When finished, enter the following command to unmount the oraFuse software from the directory.

$ oraFuse --unmount

Examples

The following examples show the results of the oraFuse software with the ls command.

Before oraFuse has been mounted to the current directory:

r10K_1.fastq.ora

After oraFuse has been mounted to the directory:

https: r10K_1.fastq r10K_1.fastq.ora s3:

After oraFuse has been mounted to the current directory with ls -l:

The following example shows the head command on a virtual FASTQ file after oraFuse has been mounted.

$ head <file name>.fastq

The following example shows the bwa command on a virtual FASTQ file after oraFuse has been mounted.

$ bwa mem -t 8 -M <FASTA> -o <SAM> <file name>.fastq

Remote Environment

The oraFuse software works on FASTQ.ORA files located in AWS s3 or in Azure Blob Storage. The AWS and Azure Blob Storage account configurations and credentials are used for authentication. Refer to section.

The steps to use the oraFuse software in the remote environment are the same as those used in the local environment.

Info Azure Blob is not supported If you are using DRAGEN v3.7 as a downstream software.

The following command shows the remote and virtual files on S3.

$ ls s3 -l s3://bucket/<path>

Examples

The following example runs BWA on S3 files.

$ bwa mem -t 8 -M <FASTA> -o <SAM> s3://bucket/path/<file name>.fastq

When using DRAGEN v3.7 as a downstream software, you can use oraFuse in the remote environment if the location of a file gets passed by specifying the file name as follows:

.virtual/ora/s3:/bucket/<file name>.fastq

The following is an example of using DRAGEN v3.7 on a single FASTQ.ORA located on AWS S3:

$ dragen -f
--ref-dir=<path to hash table directory on /ephemeral>
--1 .virtual/ora/s3:/bucket/<remote file>.fastq
--output-directory=<path to output directory on /ephemeral>
--output-file-prefix=<prefix name>
--output-format BAM

Comman Line Options

Command

Required

Description

--unmount

To unmount oraFuse from the directory

Info Multiple users of the same file are not supported.

Error Messages

If you receive an error message while using the oraFuse software, use the following command to get more information.

$ cat .virtual/ora/.error

ora LD-Preload Software

The ora LD-Preload software creates a virtual FASTQ file for each FASTQ.ORA file in the directory. The virtual file created has a *.fastq file extension instead of a *fastq.gz file extension.

Info Make sure you indicate a *.fastq file extension as input.

Local Environment

1. Add the oraHelperSuite directory to your PATH as follows.

PATH=<oraHelperSuite Directory>:$PATH

2. Run the command with the ora LD- Preload shared library on the *.fastq file as follows.

$ LD_PRELOAD=<oraHelperSuite direcory>/ora-ldpreload.so <command><input file name>.fastq

Examples

The following is an example of the ora LD-Preload software with the 'bwa' command.

$ LD_PRELOAD=<oraHelperSuite DIR>/ora-ldpreload.so bwa mem -t 8 -M <FASTA> -o <SAM> <file name>.fastq

The following is an example of the ora LD-Preload software with the ls command.

$ LD_PRELOAD=<oraHelperSuite DIR>/ora-ldpreload.so ls

The following shows the output of the ora LD-Preload software with the ls command.

<file name>.fastq.ora <file name>.fastq (virtual file)

Remote Environment

The ora LD-Preload software works on FASTQ.ORA files located in AWS s3 or Azure Blob Storage. The software reuses the AWS or Azure Blob Storage account configuration and credentials. Refer to section.

The steps to use the ora LD-Preload software in the remote environment are the same as those used in the local environment. The features are the same, with the following exceptions:

In the remote environment, the ora LD-Preload software does not work on statically compiled bioinformatics software.
In the remote environment, the Linux shell auto-completion feature does not work on virtual files.

orad Software

Orad is the executable of the DRAGEN ORA decompression software. It uses the pipe process, or pipe substitution process, to reduce reads/writes to the disk. If the downstream bioinformatics software do not work with pipes or process substitution, and the oraFuse software or the ora LD-Preload software cannot be used, then fully decompressed temporary files are required. Refer to section for more information.

The steps are the same for local and remote environments.

Local Environment

1. Add the oraHelperSuite directory to your PATH as follows.

PATH=<oraHelperSuite DIR>:$PATH

2. Enter the command with orad and the pipe or process substitution on the *.fastq.ora files. See the following examples.

Examples

Example with pipe:

$ orad <file name>.fastq.ora -c --raw | head

Example with process substitution:

When the downstream bioinformatics software cannot read from the standard input, for example md5sum, process substitution is used. The following example shows the command with process substitution.

$ md5sum <( orad -c --raw "<file name>.fastq.ora" )

Remote environment

The orad software works on FASTQ.ORA files located in AWS s3 or in Azure Blob Storage. The software reuses the AWS or Azure Blob account configuration and credentials. Refer to section.