Getting Started
DRAGEN provides tests you can run to make sure that your DRAGEN system is properly installed and configured. Before running the tests, make sure that the DRAGEN server has adequate power and cooling, and is connected to a network that is fast enough to move your data to and from the machine with adequate performance.
Please refer to the Server Site Prep & Installation Guide when installing a new system.
On-premises Installation
The software can be installed on an on-premises server by executing the .run installer for the desired version. Installers are made available for all releases at the DRAGEN Software Support Site page.
Installation procedure:
Download the desired installer from the support website and unzip the package
The archive integrity can be checked using:
./<dragen .run file> --check
Install the appropriate release based on your Linux OS with the command:
sudo sh <dragen .run file>
The .run file includes a script that administers un-installation of an existing software, integrity checking of the package and files, installation of the new DRAGEN software version. The DRAGEN software is installed in part by use of the Linux RPM Package Manager (rpm). Several rpm packages comprise the installation of a single DRAGEN software version. The RPM packages also configure the system for dragen, like raised user ulimits
, and the .run script starts services needed for functionality, such as the Licensing daemon dragen_licd
, and the hugepages daemon, dragend_hp
.
NOTE: Root privileges are required for the installation.
Single Version Installation
Up to DRAGEN Software v4.2, only one version of the DRAGEN software can be installed at a time. Executing the .run file will remove any existing installed version and (re)install the new version.
After installation, the application and associated files are available at /opt/edico
.
The single version installer will add /opt/edico
to the Linux $PATH, so that the user can just call dragen
without specifying the full path.
Multi-Version Installation
Starting with DRAGEN Software v4.3 and later, multiple compatible versions of the DRAGEN software can be installed at a time. Executing the .run file will add the new version to the system.
After installation, the application files are available at /opt/dragen/{version}
and FPGA files are located at /opt/bitstream/{bitstream version}
.
The multi-version installer will NOT add /opt/dragen/{version}
to the Linux $PATH, since multiple versions can be present at a given time. User should manage the desired paths to the specific version they want to run. When this guide provides command line examples, it will assume that the Linux $PATH is set to correct dragen version, and we will just refer to dragen <options>
Notes on multi-version installation:
Installers released for DRAGEN v4.2 and earlier are single version packages
Single version packages and multi-version packages can not be mixed
Installation of a prior single version package will remove all the multi-version packages
Installation of a multi-version package will remove any installed single version package
After installing a multi-version package, see a list of installed versions at any time by running
/usr/bin/dragen_versions
To remove any multi-version package, call
yum remove
on its Path
Example:
Location of dragen
and resource files
dragen
and resource filesDRAGEN Version | on-premises server | cloud instance |
---|---|---|
4.3 and later |
|
|
4.2 and earlier |
|
|
Throughout this guide we will refer to <INSTALL_PATH>
which will be either of the locations above
Running the System Check
After turning on the server, you can make sure that your DRAGEN server is functioning properly by running <INSTALL_PATH>/self_test/self_test.sh
, which does the following:
Automatically indexes chromosome M from the hg19 reference genome
Loads the reference genome and index
Maps and aligns a set of reads
Saves the aligned reads in a BAM file
Asserts that the alignments exactly match the expected results
Each server ships with the test input FASTQ data for this script, which is located in <INSTALL_PATH>/self_test
. The system check takes approximately 25--30 minutes.
The following example shows how to run the script and shows the output from a successful test.
If the output BAM file does not match expected results, then the last line of the above text is as follows:
SELF TEST RESULT : FAIL
If you experience a FAIL result after running this test script immediately after turning on your DRAGEN server, contact Illumina Technical Support.
Running Your Own Test
When you are satisfied that your DRAGEN system is performing as expected, you are ready to run some of your own data through the machine, as follows:
Load the reference table for the reference genome
Determine location of input and output files
Process input data
Loading the Reference Genome
Before a reference genome can be used with DRAGEN, it must be converted from FASTA format into a custom binary format for use with the DRAGEN hardware. For more information, see Prepare a Reference Genome.
The reference hash table specified on the command line is automatically loaded onto the board the first time you process data with a pipeline. You can manually load the hash table for your reference genome by using the following command:
dragen -r <reference_hash-table_directory>
Make sure that the reference hash table directory is on the fast file IO drive.
The default location for the hash table for hg19 is as follows.
/staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149
The command to load reference genome hg19 from the default location is as follows.
dragen -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149
This command loads the binary reference genome into memory on the DRAGEN board, where it is used for processing any number of input data sets. You do not need to reload the reference genome unless you restart the system or need to switch to a different reference genome. It can take up to a minute to load a reference genome.
DRAGEN checks whether the specified reference genome is already resident on the board. If it is, then the upload of the reference genome is automatically skipped. You can force reloading of the same reference genome using the force-load-reference (-l)
command line option.
The command to load the reference genome prints the software and hardware versions to standard output. For example:
After the reference genome has been loaded, the following message is printed to standard output:
Determine Input and Output File Locations
The DRAGEN Pipeline is very fast, which requires careful planning for the locations of the input and output files. If the input or output files are on a slow file system, then the overall performance of the system is limited by the throughput of that file system. It is recommended that inputs and outputs are streamed directly from/to a mounted external storage system.
The DRAGEN system is preconfigured with at least one fast file system consisting of a set of fast SSD disks grouped with RAID-0 for performance. This file system is mounted at /staging
. This name was chosen to emphasize the fact that this area was built to be large and fast, but is not redundant. Failure of any of the file system's constituent disks leads to the loss of all data stored there.
During processing, DRAGEN generates and reads back temporary files. With DRAGEN, it is highly recommended to always direct temporary files to the fast SSD (or /staging
) by using the --intermediate-results-dir
option. If the --intermediate-results-dir
option is not provided, temporary files are written to the --output-directory
. DRAGEN recommends streaming inputs and outputs using an mounted external storage system.
Process Your Input Data
To analyze FASTQ data, use the dragen command. For example, the following command can be used to analyze a single-ended FASTQ file:
For detailed information on the command line options, see DRAGEN Host Software.
For recommended command lines in typical use cases, see DRAGEN Recipes.
Last updated