For a fully transparent usage of fastq.ora files (no changes in the command, no overhead, no additional footprint) with third-party bioinformatics software, DRAGEN ORA Helper Suite Software is recommended and available for download on the ORA Support Site. This software is only supported on Linux.
For a semi-transparent usage of fastq.ora files with third-party bioinformatics software, use DRAGEN ORA Decompression with the pipe function or process substitution. This method improves system performance by reducing reads and writing to the disk versus a full decompression step.
If the analysis software can read from the standard input, such as BWA
, use the following command:
orad file.fastq.ora -c --raw | bwa mem humanref.fasta - > resu.sam
The -c
option decompresses to standard output. The result is sent |
to BWA
, which uses the dash option -
to read from standard input. This also works for paired reads, which uses the -p
option of BWA to specify that the input contains interleaved paired reads.
If the analysis software cannot read from the standard input, you can use process substitution:
bwa mem humanref.fasta <(orad file.fastq.ora -c --raw) > resu.sam
For the file name, use the <( )
syntax containing the command that generates the file to standard output. In this case, orad with the -c
option as in the command above. This method does not work when the third-party software checks the input file name or when the third-party software does not read the file sequentially.
Info
On Windows, replace orad
with orad.exe
DRAGEN ORA Compression is a lossless compression.
If you wish to verify that no data was lost during the compression of the fastq.ora file, compare the MD5 checksum of the decompressed fastq.ora file and the MD5 checksum of the decompressed fastq.gz file.
1. Compute the md5 checksum of the uncompressed FASTQ.ORA content as follows
md5sum <(orad myfile.fastq.ora --raw -c )
2. Compute the md5 checksum of the uncompressed FASTQ.GZ content as follows
md5sum <(gzip -d -c myfile.fastq.gz)
Use the following commands to decompress the files:
orad FILE [args]
or orad [args] FILE
A folder that contains FASTQ.ORA files can also be provided for batch decompression of FASTQ.ORA files at top level directory:
orad ./pathtofolder/ [args]
On Windows, replace orad
with orad.exe
.
orad.exe FILE [args]
.
To decompress FASTQ.ORA compressed with a reference other than the default human reference, ensure the specific reference is available locally.
No change applies in the command line. The decompression software automatically detects which species/model is used.
Command | Description |
---|---|
Using Windows, replace orad
with orad.exe
. Example is: orad.exe myfile.fastq.ora --check
For decompression of FASTQ.ORA derived from specific species/models, the reference that has been used at compression needs to be downloaded from the . The whole reference database can also be downloaded.
Info For the default human reference, there is no need to add this extra step. The default human reference is already included in the DRAGEN ORA Decompression Software.
1. Move the downloaded archive to the oradata
directory
2. Navigate to the oradata
directory
3. Extract the archive file using the following command:
When the extraction of the specific reference is completed on Linux OS, the orad.2.7.0.linux
folder should be structured as follows with example on gallus_gallus reference:
orad.2.7.0.linux
|___orad
|___oradata
|___refbin
|___Gallus_gallus
|___refbin
|___readme_gallus_gallus
Note
The oradata
folder can be moved to another location but should keep its structure.
1. Move the downloaded archive to the orad.2.7.0.linux
directory
2. Navigate to the orad.2.7.0.linux
directory
3. Delete existing oradata folder
4. Extract the archive file using the following command:
When the extraction of the full database is completed the orad.2.7.0.linux
folder should be structured as follows:
orad.2.7.0.linux
|___orad
|___oradata
|___Homo_sapiens
|___refbin
|___readme_homo_sapiens
|___Homo_sapiens_bisulfite
|___refbin
|___readme_homo_sapiens_bisulfite
|___<Genus_specificname>
|___refbin
|___ readme_<Genus_specificname>
Note
The oradata
folder can be moved to another location but should keep its structure.
Extract the downloaded archive with a software that can handle gziped tarballs, such as 7-Zip. Right-click on the archive and select Extract with.
When a specific reference is downloaded, a folder with name <Genus_specifcname>
is extracted. This folder contains the corresponding refbin
and readme
files. Move this folder in the location where orad and the default human refbin
file has been extracted during the installation of the DRAGEN ORA Decompression Software procedure.
When the full database is downloaded, a folder with name oradata
is extracted. This folder contains subfolders for each specific species which contains the corresponding refbin
and readme
files. Move this oradata
folder in the location where orad and the default human refbin
has been extracted during the installation of the DRAGEN ORA Decompression Software procedure.
Warning Do not use this documentation for decompression on the DRAGEN Server. The DRAGEN Server has its own integrated DRAGEN ORA Decompression component.
DRAGEN ORA Decompression Software decompresses fastq.ora files into fastq.gz files. Fastq.ora files are generated using lossless compression technology as part of DRAGEN. Orad is the executable file that runs the DRAGEN ORA Decompression Software, which is a standalone piece of software.
With DRAGEN v4.3, compression of FASTQ files derived from supported non-human species and from human bisulfite data (methylated DNA application) is available. DRAGEN ORA decompression v2.7 allows for the decompression of FASTQ.ORA compressed with all supported references. Refer to the and sections.
The DRAGEN ORA Decompression Software is available for the following operating systems in three separated executable:
Linux
Mac
Windows
Decompression of FASTQ.ORA stored on local storage is supported on Linux, Mac, and Windows. Decompression of FASTQ.ORA stored on AWS S3 is only supported on Linux.
Decompression of FASTQ.ORA stored on Azure Blob storage is only supported on Linux.
When FASTQ.ORA is located on AWS S3 or Azure Blob storage, the decompression occurs on a streaming mode: the FASTQ.ORA file does not need to be fully transferred before decompression can start.
The following are the minimum requirements for the DRAGEN ORA Decompression Software:
Component | Minimum Requirement |
---|
Use the following steps to install the DRAGEN ORA Decompression Software once DRAGEN ORA Decompression has been downloaded from the .
1. Extract the archive files using the following command:
2. Navigate to the Orad
directory as follows:
3. Move the executable to your preferred location as follows:
4. Add Orad to your path as follows:
5. Move the oradata
folder content into the home repository as follows:
To store the folder in a different location, use the following command:
When oradata has been moved in another location, you can:
either point to the reference by using the ORA_REF_PATH
environment variable as follows:
or use the following command at decompression
1. Extract the downloaded archive with a software that can handle gziped tarballs, such as 7-Zip. Right-click on the archive and select Extract with. The following two files are extracted:
orad.exe
refbin
The following steps use C:\Users\user1 as an example location. Change C:\Users\user1 to the location where you extracted the archive.
2. Open the Command Prompt application.
3. Set the environment variables to use the orad.exe
and the refbin
file with the set
command or the setx
command. The set
command configures the variables temporarily (for the current console window) while the setx
command configures the variables permanently.
4. Set the path to the orad.exe
to the PATH
environment variable as follows:
or
5. Set the path to the refbin
file to an ORA_REF_PATH environment variable as follows:
or
Command | Description |
---|---|
-C --check
Checks the integrity of the specified ORA file. This option decompresses the file in memory and verifies that the checksum of the decompressed data and the checksum of the original data are identical. The decompressed file is not saved.
--raw
Decompresses the ORA file into an uncompressed FASTQ file. By default, the DRAGEN ORA Decompression Software decompresses to gzip format.
--rm
Deletes the input file after successful execution. By default, the input file is not deleted. This option is not supported for files in AWS S3 or Azure Blob Storage.
-t --threads
Sets the maximum number of threads allowed by the system. The default value is 8.
-f --force
Overwrites the output file without prompting. By default, if the output file exists, the software exits without overwriting.
-h --help
Prints help and exits.
-v --version
Prints software version.
-i --info
Prints information about the compressed ORA file. The following information is included:
Software version used to compress the file.
Total number of sequences in the file.
Total number of bases in the file.
If the file contains interleaved data.
The original name of the file if it was saved in the fastq.ora file.
The name of the species/reference it has been compressed with, and the related xxhash checksum of the reference. When nothing is specified the FASTQ.ORA file has been compressed with the default human reference.
This option is not supported for files in AWS S3 or Azure Blob Storage. Although the ORA file format supports concatenation of fastq.ora files, using this command on a concatenated fastq.ora file prints erroneous information.
-c --stdout
Prints the decompressed file to the default standard output stdout. This is useful to share the results with another application without writing the decompressed file to disk.
-
Reads an input fastq.ora file from the default standard input stdin. This option is not supported for Windows OS.
-P --path
Sets the path location of the output file. The default file name is used. If a path is not specified, the file is created in the same location as the input file.This option overwrites the path if it is used with the -o option.
-o --out
Sets the name of the output file and the path when used with -P. The default is the name of the input fastq.ora file.
-N --name
Restores the original name saved in the fastq.ora file, at the time it was compressed to a fastq.ora file.
-I --interleave
Decompresses the output file into a single interleaved file. By default, when the input is a single interleaved fastq.ora file, the decompression automatically decompresses into two separate paired read files. If the interleaved fastq.ora file was generated with DRAGEN ORA Decompression v.4.0 or later, -interleaved
is included in the file name.
--ora-reference
Changes the directory of the ORA reference file refbin
.By default the software looks for the reference file in the following locations:
$HOME/oradata/refbin
$HOME/oradata/Genus_specificname/refbin
./oradata/refbin
./oradata/Genus_specificname/refbin
If you specified a location in the environment variable, the software also looks in the location ORA_REF_PATH
. For example, set with export ORA_REF_PATH=/some/path/
.
--check-ora-reference-path
Verifies if the ORA reference file refbin
is accessible and prints the refbin
path. Decompression does not occur when this option is added.
--quiet
Sets decompression to quiet mode. In quiet mode, nothing is written to the standard output and standard error. This mode is ignored when used with -c (--stdout)
or -C (--check)
.
--empty-third-line
Outputs the third line in the FASTQ format, (which is, the line that starts with +) as an empty line. By default this line is preserved.
-r --repeat-header
Adds the read header to the third line in the FASTQ format, (which is, the line that starts with +).
orad myfile.fastq.ora --check
Checks the integrity of an ora file.
orad myfile.fastq.ora
Decompresses myfile.fastq.ora
to myfile.fastq.gz
orad myfile.fastq.ora --info
Prints information summary of an ora file.
orad -c --raw myfile.fastq.ora
| head
Prints the first lines of the corresponding .FASTQ file in the terminal.
orad --check-ora-reference-path
Verifies the accessibility of the ora reference file and print its path.
System memory | 8 GB RAM |
Free disk space |
Compatible Linux distributions |
|
Compatible Mac distribution | Mac 10.15 and later (Apple silicon and Intel CPU processors) |
Compatible Windows distribution | Windows 10 and later |
From 2 GB to 15 GB depending on how many different specific references are needed for the decompression. Specific references are available for downloads in the . For human data, 2 GB are required. The default human reference is already pre-installed within the software and does not need to be downloaded.