arrow-left

All pages
gitbookPowered by GitBook
1 of 7

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Handling of fastq.ora files compressed with reference other than default human reference

With DRAGEN v4.3, compression of FASTQ files derived from supported non-human species and from human bisulfite data (methylated DNA application) is available. DRAGEN ORA Helper Suite allows for transparent handling of such files, providing the correct reference has been downloaded and is available locally. Refer to the sections Install the DRAGEN ORA Decompression References and Supported References.

The command orahelper <file.fastq.ora> provides information on:

  • which reference has been used to compress

  • the steps to follow to install the correct reference. Note: the default human reference refbin is installed by default with the Software and does not need to be downloaded again.

DRAGEN ORA Helper Suite v2.0

circle-exclamation

Warning Do not use this documentation for decompression on the DRAGEN Server. The DRAGEN Server has its own integrated DRAGEN ORA Decompression component.\

The DRAGEN ORA Helper Suite Software is a suite of software for Linux distributions, designed to integrate in a transparent manner compressed FASTQ.ORA files into secondary analysis in a local or remote environment.

In Illumina® DRAGEN™ Software v3.8 and later, the handling of FASTQ.ORA files is integrated within the system. DRAGEN ORA Helper Suite is intended for processing FASTQ.ORA files in environments where a secondary analysis is not performed on the DRAGEN Software or in environments that do not have access to DRAGEN v3.8 or later. If you are running secondary analysis pipelines on a DRAGEN v3.7 platform, only the oraFuse software is supported. Refer to

Remote and Local Usage

hashtag
Local vs. remote environment

In a local environment, FASTQ.ORA files are accessible on the compute node via the file system. Cloud instances with local data, or data available via a network file system, are also considered local environments. Local environment examples include the following:

  • The compute node is a laptop and files are stored on the local hard disk.

.

With the DRAGEN ORA Helper Suite, FASTQ.ORA files can be used:

  • without full decompression to the disk

  • with no overhead

  • with no change in the current linux command (*.fastq is used instead of *.fastq.ora at input)

hashtag
Software quick overview

The DRAGEN ORA Helper Suite Software package contains four software. Three of them process FASTQ.ORA files, the fourth one is an interactive user guide. You can use the FASTQ.ORA processing software in downstream analysis. The DRAGEN ORA Helper Suite Software v2.0 include:

  • oraFuse v2.0

  • ora-ldpreload v2.0

  • orad v2.7.0

  • oraHelper

hashtag
oraFuse Software

The oraFuse software uses the FUSE filesystem to create a virtual ora subdirectory in the current directory, with ora symbolic links pointing to a virtual uncompressed FASTQ.ORA file. The virtual FASTQ file does not impact the storage footprint.

circle-info

Info Root privileges are required to install the oraFuse software. If you do not have root privileges, use the ora LD-Preload software or the orad software.\

hashtag
ora LD-Preload Software

The ora LD-Preload software uses the LD-Preload variable of the dynamic linker to intercept stystem calls. The ora LD-Preload software creates a virtual FASTQ file for each FASTQ.ORA file. The virtual FASTQ files do not impact the storage footprint.

circle-info

Info\

  • The ora LD-Preload software only works with a dynamically compiled binary. If you do not know if the intended downstream bioinformatics software has been dynamically compiled, run the oraHelper command for guidance.

  • The auto-completion Linux feature does not work on virtual files.\

hashtag
orad Software

orad is the DRAGEN ORA Decompression software. Although it does not operate in a fully transparent manner, it works in most situations. The Orad software can be used in combination with the pipe process or pipe substitution process to reduce reads and writes to the disk.

hashtag
oraHelper Interactive User Guide

oraHelper is an interective user guide that provides the following guidance:

  • Which of the ORA Helper Suite software is the best fit.

  • The proper command line to use with each of the ORA Helper Suite software.

  • Information about the input file. For example, if a FASTQ.ORA file is a paired read interleaved file or which reference has been used to compress a FASTQ.ORA file.

The compute node is a cloud instance and files are copied to its local storage.

In a remote environment, FASTQ.ORA files are stored in the cloud on either AWS S3 or Azure blob.

hashtag
Remote using AWS S3

The DRAGEN ORA Helper Suite Software reuses the AWS Command Line Interface (AWS CLI) configuration. The location of a file is passed by specifying the file name as follows:

s3://bucket/<file name>

To make sure that the AWS CLI is configured to access your file, use the following command:

$ aws s3 ls s3://bucket/<remote file name>.fastq.ora

circle-info

Info Remote case on AWS S3 with DRAGEN v3.7 uses specific syntax. Refer to for more information.\

hashtag
Remote Using Azure Blob Storage

The DRAGEN ORA Helper Suite Software reuses the URI set up for your Azure Blob Storage. The location of a file is passed by specifying the file name as follows:

https://StorageAccountName.blob.core.windows.net/<my container>/<my blob>.fastq.ora

To make sure that the Azure CLI is configured to access your blob file, use the following command:

export AZURE_STORAGE_CONNECTION_STRING=<yourURI> az storage blob list – container-name <myContainer>

circle-info

Info Remote case on Azure Blob Storage is not supported on DRAGEN v3.7.\

oraFuse Software
oraFuse Software

Installation Software and References

The following information provides the requirements and steps to install the DRAGEN ORA Helper Suite Software.

hashtag
Installation Requirements

The following table lists the minimum requirements for the DRAGEN ORA Helper Suite Software.

Component
Minimum Requirement
circle-info

Info The oraFuse software uses a FUSE file system. Because the FUSE file system requires root privileges, the oraFuse software requires root privileges during installation.\

hashtag
Install the DRAGEN ORA Helper Suite Software

Download the installer from the , and follow the installation steps below to install the DRAGEN ORA Helper Suite Software. Installation Steps:

1. Open a terminal window.

2. If the installation is for an online system, then move the installer to the directory where the software should be installed. If the installation is for an offline system, then the installer does not need to be moved.

3. Navigate to the installer location.

4. Run the following command:

sh <file name>.sh

5. To install oraFuse, install the dependencies when prompted. The following dependencies will automatically be installed on an online system:

Distribution
Dependencies
circle-info

Info If the installation is for an offline system, then make sure the dependencies are installed on the offline system.\

The installer creates a folder named oraHelperSuite. If the installation is for an offline system, then move the oraHelperSuite folder to the offline system. The oraHelperSuite folder contains the following:

  • orad executable

  • oraFuse executable

  • ora-ldpreload.so library

  • oraHelper wrapper executable

hashtag
Install the DRAGEN ORA Decompression References

For decompression of FATSQ.ORA derived from specific species/models, the specific reference needs to be downloaded from the . The whole reference database can also be downloaded.

circle-info

Info For default human reference, there is no need to add this extra step. The default human reference is already included in the DRAGEN ORA Decompression Software.\

hashtag
Extract a specific reference

1. Move the downloaded archive to the oraHelperSuite directory

mv yourdownloadpath/<genus_specificname>.tar.gz yourpath/oraHelperSuite

2. Navigate to the oraHelperSuite directory

cd yourpath/oraHelperSuite

3. Extract the archive file using the following command:

When the extraction of the specific reference is completed the oraHelperSuite folder on a linux environment should be structured as follows with example on gallus_gallus reference:

oraHelperSuite   |__orad   |__orafuse   |__ora-ldpreload.so   |__oraHelper   |__refbin   |__Gallus_gallus        |__refbin        |__readme_gallus_gallus

hashtag
Extract the full database

1. Move the downloaded archive to the oraHelperSuite directory

mv yourdownloadpath/oradata.tar.gz yourpath/oraHelperSuite

2. Navigate to the the oraHelperSuite directory

cd yourpath/oraHelperSuite

3. Extract the archive file using the following command:

When the extraction of the full database is completed the the oraHelperSuite folder on a linux environment should be structured as follows:

oraHelperSuite   |__orad   |__orafuse   |__ora-ldpreload.so   |__oraHelper   |__refbin   |__oradata     |__Homo_sapiens       |__refbin       |__readme_homo_sapiens     |__Homo_sapiens_bisulfite       |__refbin       |__readme_homo_sapiens_bisulfite     |__Genus_specificname       |__refbin        |__readme_Genus_specificname

Commands for Interleaved fastq.ora files

If the input FASTQ.ORA file is an interleaved paired read file, it can be used as-is with downstream bioinformatics software which provide options to handle interleaved files. Make sure you are using the downstream bioinformatics software with proper interleaved options on the interleaved virtual FASTQ.

In the following example, after the oraFuse software has been mounted, the bwa command is used with the –p option to specify that the input contains interleaved paired reads.

bwa mem reference.fasta -p <interleaved file>.fastq -o result.sam

To decompress an interleaved FASTQ.ORA file into two separate files, use the orad software.

orad <interleaved file>.fastq.ora

To decompress into a single file with interleaved paired reads, use the --interleave option.

In the following example the orad software is used with the bwa command, and the -–interleave option is used to decompress into a single stream containing interleaved data.

bwa mem reference.fasta -p <( orad -c --raw --interleave <interleaved file>.fastq.ora ) > result.sam

  • the human reference genome used to decompress: refbin. The pre-installed refbin is the default human reference. If the FASTQ.ORA file has been compressed with a different reference ensure the correct reference is available locally or download it from the ORA Support Sitearrow-up-right.

  • System memory

    8 GB RAM

    Free disk space

    From 2 GB to 15 GB depending on how many different specific references are needed for the decompression. Specific references are available for downloads in the ORA Support Sitearrow-up-right. For human data, 2 GB are required. The default human reference is already pre-installed within the Software and does not need to be downloaded.

    Compatible Linux distributions

    • CentOS 7 or later

    • Ubuntu 14 or later

    • Oracle 8 or later

    Internet connection

    An internet connection is required to install the DRAGEN ORA Helper Suite. Once installed, the internet connection is not required unless the DRAGEN ORA Helper Suite is used in a remote environment.

    CentOS

    • fuse

    • fuse-libs

    • libcurl

    • openssl

    Ubuntu

    • fuse

    • openssl

    • libcurl3-gnutls

    Oracle

    • fuse

    • fuse-libs

    • fuse-devel

    • libcurl

    ORA Support Sitearrow-up-right
    ORA Support Sitearrow-up-right
    tar -xzvf <genus_specificname>.tar.gz (Linux)
    tar -xzvf <genus_specificname>.tar.gz (Mac)
    tar -xzvf oradata.tar.gz (Linux)
    tar -xzvf oradata.tar.gz (Mac)

    Commands for the ORA Helper Suite software

    hashtag
    oraHelper Interactive User Guide

    The oraHelper interactive user guide helps you to determine which of the DRAGEN ORA Helper Suite software to use with your intended downstream bioinformatics software. The oraHelper interactive user guide also displays the correct syntax to use.

    circle-info

    Info To get the relevant information about the intended downstream bioinformatics software, the bioinformatics software must be in the PATH variable.\

    Use the oraHelper interactive user guide as follows:

    1. Open a terminal window.

    2. Add the oraHelperSuite directory to your PATH as follows:

    PATH=<oraHelperSuite Directory>:$PATH

    3. Enter oraHelper followed by the intended downstream software command and the input *.fastq.ora file name.

    $ oraHelper <command> <file name>.fastq.ora

    hashtag
    Examples

    The following example shows oraHelper used with the bwa command.

    $ oraHelper bwa -i <filename>.fastq.ora

    The output will print the proper syntax for each of the DRAGEN ORA Helper Suite software.

    The following example shows the output of oraHelper used with the head command.

    $ oraHelper head <file name>.fastq.ora

    The following example shows the output of oraHelper used with the head command.

    oraHelper output with head command

    hashtag
    Command line Options

    Command
    Required
    Description

    -v --version

    No

    Print the version of the DRAGEN ORA Helper Suite Software along with the versions of the orad, oraFuse, and ora LD-Preload software.

    hashtag
    oraFuse Software

    The oraFuse software requires root privileges during installation. Refer to Installation Requirements.

    The oraFuse software creates a virtual FASTQ file for each FASTQ.ORA file in the directory. The virtual file created has a *.fastq file extension instead of a *fastq.gz file extension.

    circle-info

    Info Make sure you indicate a *.fastq file extension as input.\

    hashtag
    Local Environment

    1. The oraFuse software runs with the following dependencies: fuse, fuse- libs, libcurl and openssl. If the dependencies were not installed during the DRAGEN ORA Helper Suite Software installation, run the following command.

    $ sudo yum install fuse fuse-libs libcurl openssl

    2. Add the oraHelperSuite directory to your PATH as follows.

    PATH=<oraHelperSuite DIR>:$PATH

    3. Enter the following command to mount the oraFuse software to the directory where the FASTQ.ORA files are located.

    $ oraFuse

    4. Run the desired command on the virtual *.fastq file.

    5. When finished, enter the following command to unmount the oraFuse software from the directory.

    $ oraFuse --unmount

    hashtag
    Examples

    The following examples show the results of the oraFuse software with the ls command.

    • Before oraFuse has been mounted to the current directory:

    r10K_1.fastq.ora

    • After oraFuse has been mounted to the directory:

    https: r10K_1.fastq r10K_1.fastq.ora s3:

    • After oraFuse has been mounted to the current directory with ls -l:

    oraFuse output with ls -l command

    The following example shows the head command on a virtual FASTQ file after oraFuse has been mounted.

    $ head <file name>.fastq

    The following example shows the bwa command on a virtual FASTQ file after oraFuse has been mounted.

    $ bwa mem -t 8 -M <FASTA> -o <SAM> <file name>.fastq

    hashtag
    Remote Environment

    The oraFuse software works on FASTQ.ORA files located in AWS s3 or in Azure Blob Storage. The AWS and Azure Blob Storage account configurations and credentials are used for authentication. Refer to Remote and Local Usage section.

    The steps to use the oraFuse software in the remote environment are the same as those used in the local environment.

    circle-info

    Info Azure Blob is not supported If you are using DRAGEN v3.7 as a downstream software.\

    The following command shows the remote and virtual files on S3.

    $ ls s3 -l s3://bucket/<path>

    hashtag
    Examples

    The following example runs BWA on S3 files.

    $ bwa mem -t 8 -M <FASTA> -o <SAM> s3://bucket/path/<file name>.fastq

    When using DRAGEN v3.7 as a downstream software, you can use oraFuse in the remote environment if the location of a file gets passed by specifying the file name as follows:

    .virtual/ora/s3:/bucket/<file name>.fastq

    The following is an example of using DRAGEN v3.7 on a single FASTQ.ORA located on AWS S3:

    hashtag
    Comman Line Options

    Command
    Required
    Description

    --unmount

    No

    To unmount oraFuse from the directory

    circle-info

    Info Multiple users of the same file are not supported.\

    hashtag
    Error Messages

    If you receive an error message while using the oraFuse software, use the following command to get more information.

    $ cat .virtual/ora/.error

    hashtag
    ora LD-Preload Software

    The ora LD-Preload software creates a virtual FASTQ file for each FASTQ.ORA file in the directory. The virtual file created has a *.fastq file extension instead of a *fastq.gz file extension.

    circle-info

    Info Make sure you indicate a *.fastq file extension as input.\

    hashtag
    Local Environment

    1. Add the oraHelperSuite directory to your PATH as follows.

    PATH=<oraHelperSuite Directory>:$PATH

    2. Run the command with the ora LD- Preload shared library on the *.fastq file as follows.

    $ LD_PRELOAD=<oraHelperSuite direcory>/ora-ldpreload.so <command><input file name>.fastq

    hashtag
    Examples

    The following is an example of the ora LD-Preload software with the 'bwa' command.

    $ LD_PRELOAD=<oraHelperSuite DIR>/ora-ldpreload.so bwa mem -t 8 -M <FASTA> -o <SAM> <file name>.fastq

    The following is an example of the ora LD-Preload software with the ls command.

    $ LD_PRELOAD=<oraHelperSuite DIR>/ora-ldpreload.so ls

    The following shows the output of the ora LD-Preload software with the ls command.

    <file name>.fastq.ora <file name>.fastq (virtual file)

    hashtag
    Remote Environment

    The ora LD-Preload software works on FASTQ.ORA files located in AWS s3 or Azure Blob Storage. The software reuses the AWS or Azure Blob Storage account configuration and credentials. Refer to Remote and Local Usage section.

    The steps to use the ora LD-Preload software in the remote environment are the same as those used in the local environment. The features are the same, with the following exceptions:

    • In the remote environment, the ora LD-Preload software does not work on statically compiled bioinformatics software.

    • In the remote environment, the Linux shell auto-completion feature does not work on virtual files.

    hashtag
    orad Software

    Orad is the executable of the DRAGEN ORA decompression software. It uses the pipe process, or pipe substitution process, to reduce reads/writes to the disk. If the downstream bioinformatics software do not work with pipes or process substitution, and the oraFuse software or the ora LD-Preload software cannot be used, then fully decompressed temporary files are required. Refer to Troubleshooting section for more information.

    The steps are the same for local and remote environments.

    hashtag
    Local Environment

    1. Add the oraHelperSuite directory to your PATH as follows.

    PATH=<oraHelperSuite DIR>:$PATH

    2. Enter the command with orad and the pipe or process substitution on the *.fastq.ora files. See the following examples.

    hashtag
    Examples

    Example with pipe:

    When the downstream command or bioinformatics software can read from the standard input, the pipe process is used. The following is an example with the head command. The - c option decompresses to standard output.

    $ orad <file name>.fastq.ora -c --raw | head

    Example with process substitution:

    When the downstream bioinformatics software cannot read from the standard input, for example md5sum, process substitution is used. The following example shows the command with process substitution.

    $ md5sum <( orad -c --raw "<file name>.fastq.ora" )

    hashtag
    Remote environment

    The orad software works on FASTQ.ORA files located in AWS s3 or in Azure Blob Storage. The software reuses the AWS or Azure Blob account configuration and credentials. Refer to Remote and Local Usage section.

    $ dragen -f
    --ref-dir=<path to hash table directory on /ephemeral>
    --1 .virtual/ora/s3:/bucket/<remote file>.fastq
    --output-directory=<path to output directory on /ephemeral>
    --output-file-prefix=<prefix name>
    --output-format BAM

    Troubleshooting

    You can use the oraHelper interactive user guide for assistance with common Linux commands, and to help troubleshoot errors encountered when using the different software included in the DRAGEN ORA Helper Suite. For details, refer to oraHelper Interactive User Guide section.

    hashtag
    Common Linux Commands

    The following table shows common Linux commands that you can use with the DRAGEN ORA Helper Suite software. This list is not exhaustive.

    Command
    orad
    oraFuse
    ora LD-Preload

    hashtag
    Bioinformatics Software

    The DRAGEN ORA Helper Suite software work with some commonly used bioinformatics software. If the bioinformatics software is supported, you can use the oraHelper interactive user guide to help troubleshoot errors with your command. Below is the list of supported bioinformatics software with DRAGEN ORA Helper Suite. It does not limit the use of other thrid-party software.

    Bioinformatics software
    orad
    oraFuse
    ora LD-Preload

    Yes

    No

    cp

    Copy *.fastq.ora

    Copy uncompressed FASTQ or *.fastq.ora

    No (use the catcommand)

    awk

    Yes

    Yes

    Yes

    tail

    Yes

    Yes

    Yes

    wc

    Yes

    Yes

    Yes

    uniq

    Yes

    Yes

    No

    string

    Yes

    Yes

    Yes

    sort

    Yes

    Yes

    No

    Minimap2 v2.17

    Yes

    Yes

    Yes

    DRAGMAP v1.3

    Yes

    Yes

    No

    FASTQC v0.11.9

    Use std:input command

    Yes

    Use std:input command

    Cutadapt v3.7

    Yes

    Yes

    No

    SPAdes v3.15.4

    Yes

    Yes

    No

    LAST v1.25.7

    Yes

    Yes

    Yes

    Seqkit v0.16.1

    Yes

    Yes

    No

    Seqtk v1.3

    Yes

    Yes

    Yes

    Fastp v0.23.1

    Yes

    Yes

    No

    ls

    Not applicable

    Yes

    Yes

    cat

    Yes

    Yes

    Yes

    head

    Yes

    Yes

    Yes

    md5sum

    Yes

    Yes

    Yes

    grep

    BWA v0.7.17

    Yes

    Yes

    Yes

    Bowtie2 v2.4.2

    Yes

    Yes

    Yes

    STAR v2.7.8a

    Yes

    Yes

    Yes

    Yes