Reference Databases

A reference database is required to run DME+ in DRAGEN. The databases are stored remotely and must be downloaded prior to running an analysis. A shell script is provided to facilitate the download.

Directory Setup

Prior to downloading the databases, create a directory that will be dedicated to storing them. It is recommended that the directory be on a disk with at least 150 GB of free space. The path to this directory will be used for the -d parameter when the download script is run in subsequent steps: "databases/" is used in the examples below.

Obtaining the Download Script

Download and management of the reference databases is handled by a shell script. The script can be downloaded with the following command:

wget -O explify-dbs.sh https://illumina-databases.s3.us-east-1.amazonaws.com/explify-dbs.sh
chmod +x explify-dbs.sh

Seeing What Databases are Available for Download

The search subcommand can be used to list what databases can be downloaded:

$ ./explify-dbs.sh search -d databases/
4 database(s) found meeting those criteria:
- Custom-1.0.0
- RPIP-6.7.0
- UPIP-8.8.0
- VSPv2-2.9.0
  • The -d argument is the base directory used for storage of the databases

  • Optionally, when a test panel name is specified with the -p argument, the results will be limited to that panel

  • Optionally, setting the -n argument will filter the search to databases that have not already been downloaded

Downloading a Database

The download subcommand is used to download the database files for a test panel:

  • The -d argument is the base directory used for storage of the databases

  • The -p argument is the test panel name

  • The -v argument is the test panel version

  • The -n argument is the number of CPUs that can be used to download the files (defaults to 1)

Additional notes:

  • In this example, after the UPIP-8.8.0 are downloaded, additional required files will be downloaded to a subdirectory named "common"

  • After the files are downloaded, their checksums will be automatically checked

  • Due to the size of some of the files, this command will take some time. It is best to run it via screen or nohup

Listing Downloaded Databases

The list subcommand is used to view the databases that have already been downloaded:

  • The -d argument is the base directory used for storage of the databases

  • Optionally, when a test panel name is specified with the -p argument, the results will be limited to that panel

Checking Database Integrity

The download subcommand will automatically check the file checksums after download. The check subcommand can also be used on its own to check the files:

  • The -d argument is the base directory used for storage of the databases

  • The -p argument is the test panel name

  • The -v argument is the test panel version

  • The -n argument is the number of CPUs that can be used to download the files (defaults to 1)

Using the Databases with the DME+ Pipeline

The database files should be organized under a root directory first by test panel type, then by test panel version. Assuming the root directory is databases/, its organization should look like this:

To run an analysis with RPIP 6.7.0, for example, the following inputs would be needed:

The DME+ pipeline will use these inputs to navigate to the specified database location, namely databases/RPIP/6.7.0.

If the databases are stored on a normal file system, it is recommended that you set --explify-load-db-ram=true. This will tell the pipeline to load the databases into memory for faster analysis. It is also allowable to store the databases on a RAM disk, which reduces load time over many pipeline runs. In this case, it is recommended to set --explify-load-db-ram=false.

Last updated

Was this helpful?