# Launching a DRAGEN Pipeline

This guide covers launching, monitoring, and debugging DRAGEN pipelines using the DRAGEN Germline Whole Genome pipeline as example.

## Prerequisites

Before launching the pipeline, ensure you have the following in place:

* **ICA Project** — You must have an existing project in ICA. If you need to create a new project, follow the insctructions described in [Projects](/connected-analytics/home/h-projects.md#create-new-project).
* **DRAGEN Bundle** — The DRAGEN bundle must be linked to your project. See [Linking Bundles](/connected-analytics/home/h-bundles.md#linking-an-existing-bundle-to-a-project). This provides bundled references, pipelines, and demo data.
* **Input Data** — Upload your sequencing data (FASTQ, ORA, BAM, or CRAM files) to your [project](/connected-analytics/project/p-data.md), or use the demo data provided in "Illumina DRAGEN Germline Demo Data."

## Launching via the ICA GUI

{% stepper %}
{% step %}

### Start a New Analysis

1. Navigate to **Projects > your\_project > Flow > Pipelines**.
2. Select **DRAGEN\_Germline\_Whole\_Genome**.
3. (Optional) Read the **Pipeline** **Documentation** page to find out more information about the pipeline, including its changelog and additional resources.
4. Click **Start analysis**.
5. Enter a **User Reference** (a meaningful name for this analysis run) and select a **Subscription** from the Pricing drop-down.
   {% endstep %}

{% step %}

### Configure Inputs

Select your **Input Type** (FASTQ GZ, FASTQ ORA, BAM, or CRAM) and provide your input files:

* **FASTQs / ORAs** — Select your sequencing files. Multiple samples may be provided. The pipeline automatically parses filenames to determine FASTQ pairs and sample groupings. The RGSM is taken from the filename up to `_SX` and the suffix after `_RX`. `_LXXX` denotes the lane number and `_RX` denotes the read number.

{% hint style="info" %}
To override the auto-detected sample names, you can provide a **FASTQ List** CSV containing the filenames with your own specified RGSM values.
{% endhint %}

* **BAMs / CRAMs** — Select your alignment files. Map/Align can be turned off if realignment is not desired.

Select a **Reference** genome from the drop-down. The default is **Homo sapiens \[1000 Genomes] hg38 v6 Pangenome**. Expand the drop-down for the full list of bundled references, or select **Custom** and provide your own reference hash table.
{% endstep %}

{% step %}

### Configure Analysis Options

The input form provides options that vary by pipeline. Common sections include Map/Align, Variant Calling, CNV, SV, Variant Annotation, and Advanced Settings. Some pipelines also expose sections such as UMI, HLA Typing, Methylation, Fingerprint Checking, Targeted Callers, or Beta Features. Each field includes built-in help text describing its purpose and valid values.

For a standard WGS germline run, the defaults are suitable for most use cases — see [Analysis Settings](#analysis-settings). Review and adjust any options as needed for your experiment.
{% endstep %}

{% step %}

### Launch

Review your settings and click **Start analysis**.
{% endstep %}
{% endstepper %}

## Launching the Pipeline via CLI

If you have the [CLI](/connected-analytics/command-line-interface/cli-indexcommands.md) installed on your system, you can also use the commands below to work with pipelines. If you do not have an active CLI, please follow [these instructions](/connected-analytics/command-line-interface/cli-installation.md) first.

{% hint style="info" %}
For any `icav2` CLI command, you can append `--help` to see a list of available optional settings.
{% endhint %}

### Accessing your Pipeline

1. List your projects with

```bash
icav2 projects list
```

2. Enter your project context (replace your\_project\_name with the actual listed name of your project)

```bash
icav2 projects enter "<your_project_name>"
```

3. List the pipelines in your project with

```bash
icav2 projectpipelines list
```

4. List the analyses inputs with the pipeline uuid, (not with the pipeline name).

```bash
icav2 projectpipelines input <your_pipeline_uuid>
```

### Minimal Example

JSON Pipelines are started with the following command:

```bash
icav2 projectpipelines start nextflowjson <your_pipeline_uuid> --pipeline_parameters
```

To retrieve the ICA file IDs for your input files, use:

```bash
icav2 projectdata list --file-name "<my_file_name>"
```

If you do not know the exact filename, you can search for files in your project with the command

```bash
icav2 projectdata list --file-name <part_of_the_filename> --match-mode fuzzy.
```

The command below launches a germline analysis with FASTQ inputs and the default reference, relying on form defaults for all other settings:

```bash
icav2 projectpipelines start nextflowjson \
  <pipeline-id> \
  --user-reference "my-germline-run" \
  --storage-size medium \
  --field-data fastqs:<fastq-file-id-1>,<fastq-file-id-2> \
  --field reference:"hg38_alt_masked_graph_v6"
```

### Key CLI Parameters

<table><thead><tr><th width="132.06640625">Field ID</th><th width="265.7109375">Example Value</th><th>Notes</th></tr></thead><tbody><tr><td>fastqs</td><td>&#x3C;file-id></td><td>Provide ICA file IDs for FASTQ inputs.</td></tr><tr><td>reference</td><td>"hg38_alt_masked_graph_v6"</td><td>Expand the drop-down in the UI for available values.</td></tr></tbody></table>

Omitted fields with defaults (e.g., enable\_map\_align, enable\_variant\_caller, enable\_cnv, enable\_sv, output\_format, enable\_dragen\_reports) are automatically applied from the form definition.

{% hint style="info" %}
The `icav2 projectpipelines input` command does not necessarily return all available fields. To discover the full set of available parameters, view the input form JSON from the pipeline's UI page in ICA.
{% endhint %}

{% hint style="info" %}
Some older or less commonly used pipelines use an XML-based input definition rather than `nextflowjson`. To launch these pipelines via the CLI, use the `nextflow` subcommand instead. Run `icav2 projectpipelines start nextflow --help` for usage details, as the parameter conventions differ. One key difference is that XML pipelines take a `ref_tar` input for the reference, where the user must provide the reference hash table as a file included in the DRAGEN bundle. See [Analysis Settings](#analysis-settings) for more details.
{% endhint %}

## Monitoring and Viewing Outputs

### Monitoring Analysis Status

1. Navigate to **Projects > your\_project > Flow > Analyses**.
2. Click the refresh button to update the status.
3. Click on a run to view details. The **Details** tab shows configuration, the **Nextflow execution** tab shows workflow progress, and the **Steps** tab shows logs (enable "Show technical steps" for additional log files).

The analysis status can also be monitored via the CLI:

```bash
icav2 projectanalyses get <id>
```

The `id` corresponds to the `id` field returned in the `projectpipelines start` command.

For more details on analysis states, see [Analysis Lifecycle](/connected-analytics/project/p-flow/f-analyses.md#lifecycle).

{% hint style="info" %}
If the analysis failed, look at the [Debugging section](#debugging) to figure out what to do.
{% endhint %}

### Viewing Outputs

Analysis outputs can be viewed by navigating to the [analysis](/connected-analytics/project/p-flow/f-analyses.md#starting-analyses) page in the GUI.

#### Report Tab

Most DRAGEN pipelines show an analysis report in the **Report** tab, unless it is disabled. The left-hand panel contains a **Summary** section with the overall `report.html`, as well as a **Samples** section listing individual per-sample reports. Selecting the summary report displays an interactive DRAGEN Reports page with tabs for key metrics, such as Summary, Enrichment, Trimmer, QC, Mapping, Coverage, and Variants. Selecting a sample report shows the same breakdowns for that sample.

<figure><img src="/files/g3TCiuC6qdpijLGRtq1Y" alt=""><figcaption></figcaption></figure>

#### Output Files Tab

The **Output files** tab lists all files produced by the analysis. Smaller files can be downloaded directly from the browser, while larger files such as BAMs and VCFs should be downloaded via the CLI.

**Output JSON**

The output includes an `output.json` file with two top-level sections:

* **summary** — Counts of completed, failed, and total samples for the run.
* **samples** — A per-sample map keyed by sample name. Each entry includes the sample's processing status and analysis info, such as the reference genome and other DRAGEN options used.

This file is useful for reproducing the analysis, auditing the parameters that were applied, or programmatically checking which samples succeeded or failed.

**Analysis File Outputs**

Typical analysis file outputs might include:

* Alignment files (BAM/CRAM) with indexes
* VCF/GVCF files for small variants, CNVs, SVs, and STRs, where applicable and enabled
* Targeted caller reports (if enabled)
* QC metrics and coverage reports (if enabled)
* `report.html` (if DRAGEN Reports is enabled)

{% hint style="info" %}
If you have failed samples, you may notice that they do not appear in the report, have no output files, or have a status of "Failed" in the output.json. Refer to the [Debugging](#debugging) section for how to debug failed samples.
{% endhint %}

## Debugging

When you encounter a failed analysis, there are a few things to look for. The "Error" field on the main analysis UI page will, most of the time, give you a hint about the kind of error encountered. For multi-sample analyses, the output.json gives you summary statuses and the status for each sample. If the information above is insufficient, you can dig deeper into the process and pipeline runner logs.

### Finding the Failing Process

After identifying a failed analysis in **Projects > your\_project > Flow > Analyses**, navigate to the **Steps** tab of the analysis. A failing process will be marked with a non-zero exit code.

<figure><img src="/files/wrJam3jBZFr6nhtFb2hb" alt=""><figcaption></figcaption></figure>

### Finding the DRAGEN Command

Knowing the exact DRAGEN command that was executed is useful for debugging as well as for reproducing an analysis outside of the pipeline.

For DRAGEN processes, executed commands are logged in the **stderr**. DRAGEN commands start with `/opt/edico/bin/dragen`. The command can also be found in the **stdout** with the format:

```
Command Line: /opt/edico/bin/dragen ...
```

{% hint style="info" %}
Multiple samples may run in a single process depending on the `samples_per_node` input. A failed sample may not terminate the process, so failures can appear in the middle of the stdout log rather than at the end.
{% endhint %}

### Finding the Pipeline Runner Log

If no failing processes are visible, click the **Show technical steps** checkbox to reveal additional steps, including the pipeline runner stdout. Expand the `pipeline_runner.0` stdout to see Nextflow's own log messages, which will indicate which process failed and why.

<figure><img src="/files/9kfnYYGyGtqpFnL3lLJ3" alt=""><figcaption></figcaption></figure>

## Analysis Settings

For most DRAGEN pipelines, the defaults are a good place to start. For more information on any of the input parameters beyond the parameter description, refer to the [DRAGEN User Guide](https://help.dragen.illumina.com/).

For a given published pipeline version, using the same set of parameters with a set of input data will give identical results, ensuring analyses are reproducible. Across different DRAGEN and/or pipeline versions, however, results may differ due to algorithmic improvements or the addition or removal of features. The best way to ensure the analysis is performed as similarly as possible across different versions is to check the DRAGEN options, which can be found in the [output.json](#output-files-tab) or the stdout of the DRAGEN process. Refer to the [Debugging](#debugging) section for more details on where to locate the stdout.

### Common Fields

**Input FASTQs / ORAs** — Pipelines will attempt to parse sample IDs from FASTQ filenames. To ensure that files are matched to the correct sample ID, users may optionally supply a FASTQ list to specify the structure. See the description in the input field for more details.

**samples\_per\_node** — This setting can be tweaked to optimize analysis runtime. For WGS samples, it is recommended to keep this at 1 sample per node. For exome or smaller panel samples, users can set it to 5 or higher.

**Storage Size** — Select a storage size equivalent to 2x the size of your input FASTQs (assuming BAM outputs).

**ref\_tar** — When supplying a custom reference to a JSON pipeline or any reference to an XML pipeline, ensure that the DRAGEN hash table version matches the DRAGEN version used by the pipeline. A mismatched hash table will cause the analysis to fail.

## Additional Resources

* [DRAGEN User Guide](https://help.dragen.illumina.com/)
* [DRAGEN Release Notes](https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/downloads.html)
* [ICA End-to-End Tutorial](/connected-analytics/tutorials/end-to-end-1.md)
* [ICA Pricing](/connected-analytics/reference/r-pricing.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.connected.illumina.com/connected-analytics/tutorials/launching-a-dragen-pipeline.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
