DRAGEN on AWS Batch

DRAGEN on AWS Batch deployment guide was created by Illumina in collaboration with Amazon Web Services (AWS). This helps people to deploy popular technologies on AWS according to AWS best practices. If you’re unfamiliar with AWS Batch stacks, refer to the AWS Batch user guide.

This deployment guide provides instructions for deploying the Illumina DRAGEN in the AWS Cloud using AWS Batch. The AWS Batch is a fully managed service that simplifies the process of running and scaling batch computing workloads in AWS cloud. For those who prefer using an AWS EC2 virtual machine for DRAGEN analyses, please use the DRAGEN Complete Suite.

Important: This template is provided as a starting point. Users are expected to tailor the cloudformation configuration, input data, and parameters to meet their specific workflow and requirements.

Requirements

To use DRAGEN on AWS Batch, the following are required:

  • Subscription to the DRAGEM AMI (marketplace or private)

  • s3 bucket

  • Quota for EC2 f1 or f2 instances

Supported Regions

DRAGEN on AWS Batch is available with Marketplace AMIs in the following regions for f1 and f2 instances:

  • us-east-1

  • us-west-2

  • eu-central-1

  • eu-west-1

  • ap-southeast-2

For BYOL users using a private AMI, the appropriate AMI ID must be specified in the template for their region. Additionally, the SupportedRegionRule may need to be removed.

Deployment Steps

  1. Clone the repository: DRAGEN on AWS Batch deployment guide (include submodules as needed).

  2. Modify the CloudFormation template to fit your needs (e.g., specify your private AMI).

  3. Upload the modified templates to your S3 bucket, e.g., s3://your-bucket-name/DRAGEN-on-AWS-Batch/.

  4. In the AWS Console, go to CloudFormation > Stacks > Create Stack, choose "With existing template"

  5. provide the S3 URL

    • Create a new VPC and deploy DRAGEN on AWS Batch

      • https://your-bucket-name.s3.<aws_region>.amazonaws.com/DRAGEN-on-AWS-Batch/templates/dragen-main.template.yaml

    • Deploy DRAGEN on AWS Batch in an existing VPC

      • https://your-bucket-name.s3.<aws_region>.amazonaws.com/DRAGEN-on-AWS-Batch/templates/dragen.template.yaml

  6. Configure stack settings:

    • Stack name: Provide a name for your stack

    • Availability Zones: select two AZs

    • Key pair: Use the key pair you want to use for SSH access to instances

    • Instance type: f2.6xlarge

    • Max vCPU: 24

    • Genomics Data Bucket: s3://your-bucket-genomic-data/

    • Quick Start S3 region: <aws_region>

    • Quick Start S3 Key Prefix: DRAGEN-on-AWS-Batch/

  7. Click Submit to launch the stack.

Simple DRAGEN Run Example with AWS Batch

After successful deployment, you can initiate a sample run:

cat > e2e-test.json << EOF
{
    "jobName": "e2e-job",
    "jobQueue": "dragen-queue",
    "jobDefinition": "dragen",
    "containerOverrides": {
        "vcpus": 24,
        "memory": 240000,
        "command": [
            "-f", "-r", "s3://your-bucket-genomic-data/ref/hg38-alt_masked.cnv.graph.hla.methyl_cg.rna-11-r5.0-2.tar.gz",
            "-1", "s3://your-bucket-genomic-data/input/NA24385-AJ-Son-R1-NS_S33_L001_R1_001.fastq.gz",
            "-2", "s3://your-bucket-genomic-data/input/NA24385-AJ-Son-R1-NS_S33_L001_R2_001.fastq.gz",
            "--RGID", "1",
            "--RGSM", "HG002",
            "--enable-bam-indexing", "true",
            "--enable-map-align-output", "true",
            "--enable-sort", "true",
            "--output-file-prefix", "RGMS",
            "--enable-map-align", "true",
            "--output-format", "BAM",
            "--output-directory", "s3://your-bucket-genomic-data/output/",
            "--enable-variant-caller", "true",
            "--lic-server", https://<ID>:<PASSWD>@license.dragen.illumina.com # requires for BYOL users
        ]
    },
    "retryStrategy": {
        "attempts": 1
    }
}
EOF
 
aws batch submit-job --cli-input-json file://e2e-test.json --region <aws_region>

Note:

  • Input files from the S3 bucket are automatically copied to the instance.

  • The job runs using local copies of the S3 input files along with the specified parameters.

  • Output files are first saved locally, then transferred to the designated S3 output folder.

  • Job status can be monitored via the AWS Batch console.

  • Logs are available in both CloudWatch and the S3 output folder.

Last updated

Was this helpful?