1 of 20

DRAGEN Heme WGS Tumor Only Pipeline

Overview

DRAGEN Heme WGS Tumor Only Pipeline, henceforth referred as the Heme Pipeline, is a comprehensive and unbiased whole genome sequencing solution to replace conventional cytogenetic and panel sequencing approaches for detecting all types of mutation using a limited amount of DNA. It can be applied to detect clinically actionable mutations for cancer spanning a wide range of genomic events, e.g., structural variants (SV), Copy Number Alterations (CNA), small variants (SNV/insertion/deletion/delins) and internal tandem duplications (ITD) and DUX4 variants using Heme samples.

The Heme pipeline includes a DNA-only workflow designed to analyze whole genome sequencing data generated on supported instruments. It may be run as a local off-instrument solution installable on a DRAGEN server or accessible through the Illumina Connected Analytics (ICA) cloud environment. The Heme pipeline is for Research Use Only (RUO).

Features

Superb performance based on the DRAGEN BioIT platform Release 4.4.4
Supports starting the analysis from BCL, FASTQ, BAM or CRAM as inputs.
Flexible custom configurable options on top of well established DRAGEN recipes for Heme WGS analysis.
Available on local DRAGEN servers and Illumina Connected Analytics (ICA)
Seamless integration with Illumina Connected Insights (ICI) for tertiary interpretation

Supported Library Prep Kits (LPKs)

Illumina DNA PCR Free Prep Kit
Illumina DNA Prep Kit
Custom LPKs

Supported Sequencing Instruments

NovaSeq 6000 or 6000Dx in RUO mode
NovaSeq X or NovaSeq X plus

Note Unsupported instruments can still be analyzed, but a warning will be generated.

Supported FLow Cells

NovaSeq 6000 or 6000Dx S4
NovaSeq X or NovaSeq X plus 10B, 25B

Sample Sheet Requirements

The pipeline has fields that are required in addition to general sample sheet requirements. Follow the steps below to create a valid samplesheet.

Standard Sample Sheet Requirements

The following sample sheet requirements describe required and optional fields for the pipeline. Depending on the deployment (standalone DRAGEN server, ICA with auto-launch, ICA with manual launch), certain sections and required values can deviate from the standard requirements. These deviations are noted in the information below.

The analysis fails if the sample sheet requirements are not met.

Use the following steps to create a valid sample sheet.

Download the sample sheet v2 template that matches the instrument & assay run.
In the Sequencing Settings section, enter the following required parameters:

[Sequencing_Settings] Section

Sample Parameter

Required

Details

LibraryPrepKits

Required

Accepted values are: IlluminaDNAPrep or IlluminaDNAPCRFree

In the BCL Convert Settings section, enter the following required parameters:

[BCLConvert_Settings] Section

Sample Parameter

Required

Details

SoftwareVersion

Required

The DRAGEN component software version. The pipeline requires 4.4.4 or higher. To ensure you are using the latest compatible version, refer to the software release notes.

AdapterRead1

Required

If using 10 bp indexes with UDP: CTGTCTCTTATACACATCTCCGAGCCCACGAGAC Analysis fails if the incorrect adapter sequences are used

AdapterRead2

Required

If using 10 bp indexes with UDP: CTGTCTCTTATACACATCTGACGCTGCCGACGA Analysis fails if the incorrect adapter sequences are used

AdapterBehavior

Optional

Enter trim This indicates that the BCL Convert software trims the specified adapter sequences from each read.

MinimumTrimmedReadLength

Optional

Enter 35. Reads with a length trimmed below this point are masked.

MaskShortReads

Optional

Enter 35. Reads with a length trimmed below this point are masked.

In the BCL Convert Data section, enter the following parameters for each sample.

[BCLConvert_Data] Section

Sample Parameter

Required

Details

Sample_ID

Required

Must match a Sample_ID listed in the [Heme_Data] section section.

Index

Required

Index 1 sequence valid for Index_ID assigned to matching Sample_ID in the [Heme_Data] section.

Index2

Required

Index 2 sequence valid for Index_ID assigned to matching Sample_ID in the [Heme_Data] section.

Lane

Only for NovaSeq 6000 XP, NovaSeq 6000Dx, or NovaSeq X workflows

Indicates which lane corresponds to a given sample. Enter a single numeric value per row. Cannot be empty, i.e the analysis fails if the Lane column is present without a value in each row.

In the [Heme_Data] section, enter the following parameters:

[Heme_Data] Section header changes depending on the deployment: Section header changes depending on the deployment:

Standalone DRAGEN Server and ICA with Manual Launch: Heme_Data
ICA with Auto-launch: Cloud_Heme_Data

[Heme_Data] Section

Sample Parameter

Required

Details

Sample_ID

Required

The unique ID to identify a sample. The sample ID is included in the output file names. Sample IDs are not case sensitive. Sample IDs must have the following characteristics: - Unique for the run. - 1–70 characters. - No spaces. - Alphanumeric characters with underscores and dashes. If you use an underscore or dash, enter an alphanumeric character before and after the underscore or dash. eg, Sample1-T5B1_022515. - Cannot be called all, default, none, unknown, undetermined, stats, or reports. - Must match a Sample_ID listed in the [BCLConvert_Data] section. Each sample must have a unique combination of Lane (if applicable), sample ID, and index ID or the analysis will fail.

Sample_Type

Optional

Enter DNA

Case_ID

Optional

A unique ID that links the same biological samples from the same individual. It is used for variant interpretation in downstream software such as the Illumina Connected Insights software

Sample_Description

Optional

Sample description must meet the following requirements: - 1–50 characters. - Alphanumeric characters with underscores, dashes and spaces. If you enter a underscore, dash, or space, enter an alphanumeric character before and after. eg, heme-WGS_213.

To ensure a successful analysis, follow these guidelines:

Avoid any blank lines at the end of the sample sheet; these can cause the analysis to fail.
When running local analysis using the command line save the sample sheet in the sequencing run folder with the default name SampleSheet.csv, or choose a different name and specify the path in the command-line options.

ICA with Auto-launch: Sample Sheet Requirements

Refer to the following requirements to create sample sheets for running the analysis on ICA with Auto-launch. For sample sheet requirements common between deployments see Standard Sample Sheet Requirements. Samples sheets can be created using BaseSpace Run Planning Tool or manually by downloading and editing a sample sheet template

To auto-launch analysis from the sequencer run folder, ensure the StartsFromFastq and SampleSheetRequested fields are set to FALSE. To auto-launch analysis from FASTQs after BCL Convert auto-launch, StartsFromFastq and SampleSheet Requested fields must be set to TRUE

[Cloud_Heme_Data] Section

Refer to [Heme_Data] Section for this section's requirements.

[Cloud_Heme_Settings] Section

Parameters

Required

Details

SoftwareVersion

Not Required

The Heme pipeline software version

StartsFromFastq

Required

Set the value to TRUE or FALSE. To auto-launch from BCL files, set to FALSE. To auto-launch from FASTQ files after auto-launch of BCL Convert, set to TRUE.

SampleSheetRequested

Required

Set the value to TRUE or FALSE. To auto-launch from BCL files, set to FALSE. To auto-launch from FASTQ files after auto-launch of BCL Convert, set to TRUE.

[Cloud_Data] Section

Parameters

Required

Details

Sample_ID

Not Required

The same sample ID used in the Cloud_HemeS_Data section.

ProjectName

Not Required

The BaseSpace project name.

LibraryName

Not Required

Combination of sample ID and index values in the following format: sampleID_Index_Index2

LibraryPrepKitName

Required

The Library Prep Kit used.

IndexAdapterKitName

Not Required

The Index Adapter Kit used.

[Cloud_Settings] Section

Parameter

Required

Details

GeneratedVersion

Not Required

The cloud GSS version used to create the sample sheet. Optional if manually updating a sample sheet.

CloudWorkflow

Not Required

Ica_workflow_1

Cloud_Heme_Pipeline

Required

This value is a universal record number (URN). The valid values are described in the Release Information

BCLConvert_Pipeline

Required

The value is a URN in the following format: urn:ilmn:ica:pipeline: <pipeline-ID>#<pipeline-name>

Templates

Description

Sample Sheet templates for the Heme pipeline for standalone DRAGEN server and ICA manual launch analysis can be found in the table below. For auto-launch compatible sample sheets, use BaseSpace Run Planner.

The Heme pipeline is compatible with several instruments and assay workflows (standard, XP), each of which have implications for the sample sheet.

Templates

Sample sheet templates contain all required fields, including index sequences in the proper orientation for all indexes from a given library prep kit. The templates are provided as a starting point for creating a sample sheet manually when launching analysis on a standalone DRAGEN server or on ICA using manual launch.

*Lane numbers cannot exceed what is supported by the flow cell in use.

Advanced Topics

Demultiplex only option

In order to break up the workflow, one may wish to run the software with the demux only option. The pipeline will perform FASTQ generation with the settings provided by default or as specified in the sample sheet. Then the subsequent analysis may start from FASTQ.

CRAM input

When CRAM is used as input, the reference genome used to generate the CRAM files is required. This may be provided using the custom configuration file

Custom Config Support

Local App Setup

Overview

This document describes how to use the Custom Configuration Support feature for the pipeline software. This feature allows users to customize a specific set of DRAGEN command-line options to override the default values pre-defined in the pipeline.

Customization with `customConfig` and `customResourceDir`

Users can customize pipeline behavior and file inputs using:

--customConfig : path to a custom configuration file listing customized parameter values.
--customResourceDir : path to a directory containing custom resource files.

Both options should be used together if file-based overrides are required.

Important note for using File Parameters

For file parameters (parameters that require a file), users must specify relative paths in the customConfig file. The software will join customResourceDir and the relative path to form the full file path.
Additionally, the value assigned to a file parameter must be enclosed in single quotes ('').

Examples

Command Line

run_Heme_WGS_TO_{version}.sh \
  --inputType bcl \
  --inputFolder /heme_input_bcl \
  --customConfig /path/heme_custom_param.config \
  --customResourceDir custom_resources_Heme_dir

`heme_custom_param.config` Content

# custom parameters
vc_output_evidence_bam = false
qc_detect_contamination = true
aligner_clip_pe_overhang = 0

# custom reference files
vc_systematic_noise = '/snv/WGS_hg38_v1.0_systematic_noise.snv.bed.gz'
sv_systematic_noise = '/sv/WGS_FF_Heme_hg38_v1.0_systematic_noise.sv.bedpe.gz'
vc_somatic_hotspots = '/snv/somatic_hotspots_GRCh38.vcf.gz'

`custom_resources_Heme_dir` Folder Structure

custom_resources_Heme/
├── snv
│   ├── WGS_hg38_v1.0_systematic_noise.snv.bed.gz
│   └── somatic_hotspots_GRCh38.vcf.gz
└── sv
    └── WGS_FF_Heme_hg38_v1.0_systematic_noise.sv.bedpe.gz

`customConfig` Template (with default value)

#vc_systematic_noise = ''
#enable_map_align = true
#sv_systematic_noise = ''
#vc_output_evidence_bam = false
#qc_detect_contamination = true
#vc_somatic_hotspots = ''
#sv_somatic_ins_tandup_hotspot_regions_bed = ''
#cram_reference = ''
#aligner_clip_pe_overhang = 0

Supported Parameters

Display Name

Parameter Name

Component

Allowed Values

Default Value

Optional

VC Systematic Noise File

vc_systematic_noise

Variant Caller

file

included

Yes

VC Somatic Hotspots File

vc_somatic_hotspots

Variant Caller

file

included

Yes

CRAM Input Reference Genome

cram_reference

Mapper

file

included

Yes

Aligner Clip Paired End Reads Overhang

aligner_clip_pe_overhang

Mapper

0,1,2

Yes

Enable Map Align

enable_map_align

Mapper

true / false

true

Yes

SV Somatic Hotspot BED File

sv_somatic_ins_tandup_hotspot_regions_bed

Structural VC

file

included

Yes

SV Systematic Noise File

sv_systematic_noise

Structural VC

file

included

Yes

Output SNV Evidence BAM

vc_output_evidence_bam

Debug

true / false

false

Yes

QC Detect Contamination

qc_detect_contamination

true / false

true

Yes

ℹ️ Note: For CRAM Input Reference Genome, a list of commonly-used human reference FASTA files can be downloaded from the Illumina support site:Illumina DRAGEN Product Files

DRAGEN Heme WGS Tumor Only Pipeline

Overview

Features

Supported Library Prep Kits (LPKs)

Supported Sequencing Instruments

Supported FLow Cells

Sample Sheet Requirements

Standard Sample Sheet Requirements

[Sequencing_Settings] Section

[BCLConvert_Settings] Section

[BCLConvert_Data] Section

[Heme_Data] Section

ICA with Auto-launch: Sample Sheet Requirements

[Cloud_Heme_Data] Section

[Cloud_Heme_Settings] Section

[Cloud_Data] Section

[Cloud_Settings] Section

Templates

Description

Templates

Advanced Topics

Demultiplex only option

CRAM input

Custom Config Support

Local App Setup

Overview

Customization with customConfig and customResourceDir

Important note for using File Parameters

Examples

Command Line

heme_custom_param.config Content

custom_resources_Heme_dir Folder Structure

customConfig Template (with default value)

Supported Parameters

Customization with `customConfig` and `customResourceDir`

`heme_custom_param.config` Content

`custom_resources_Heme_dir` Folder Structure

`customConfig` Template (with default value)