# DRAGEN QC Metrics

DRAGEN generates multiple pipeline-specific metrics including:

* Mapping and Aligning metrics
* Variant calling metrics
* Biomarker metrics
* Coverage (or enrichment) metrics and reports
* Duration (or run time) metrics

Figure 10: Generation of Metrics and Reports

![](https://25033470-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG9szlFZupV6Q2DasL98y%2Fuploads%2Fgit-blob-b9858b6cd923766fc439ad83f88a19516828de5c%2Fqc-metrics-reporting.GenerationOfMetricAndReports.png?alt=media)

The QC metrics are printed to the standard output. In addition CSV files are written to the run output directory:

* \<output prefix>.mapping\_metrics.csv
* \<output prefix>.vc\_metrics.csv
* \<output prefix>.\<coverage region prefix>\_coverage\_metrics.csv
* \<output prefix>.time\_metrics.csv
* \<output prefix>.\<other coverage reports>.csv

Each CSV includes 5 columns, including: Section, Subsection (e.g. read group or sample), Metric, Value 1 (Count/Ratio/Minutes) and Value 2 (Percentage/Seconds).

### Mapping and Aligning Metrics

DRAGEN computes mapping and aligning metrics similar to Samtools Flagstat.

Mapping metrics are:

* available both on an aggregate level and on a per read group level.
* in germline and somatic tumor-only mode only one set of mapping metrics are available.
* in somatic tumor-normal mode, the mapping and aligning metrics are generated separately for the tumor and normal samples, with each line beginning with TUMOR or NORMAL to indicate the sample. The metrics for the tumor sample are output first, followed by the metrics for the normal sample. Metrics per read group are also separated into tumor and normal read groups.
* unless explicitly stated, the metrics units are in reads (not in terms of pairs).

Definitions:

* **Total input reads**---Total number of reads in the input FASTQ files.
* **Number of duplicate marked reads**---Reads marked as duplicates as a result of the `--enable-duplicate-marking` option being set to true.
* **Number of duplicate marked and mate reads removed**---Reads marked as duplicates, along with any mate reads, that are removed when the `--remove-duplicates` option is set to true.
* **Number of unique reads**---Total number of reads minus the duplicate marked reads.
* **Reads with mate sequenced**---Number of reads with a mate.
* **Reads without mate sequenced**---Total number of reads minus number of reads with mate sequenced.
* **QC-failed reads**---Reads that did not pass platform/ vendor quality checks (SAM flag 0x200).
* **Mapped reads**---Total number of mapped reads
* **Mapped reads R1**---Total number of R1 mapped reads
* **Mapped reads R2**---Total number of R2 mapped reads
* **Mapped reads to pop-alt insertions (PAI)**---Number of reads mapped to pop-alt insertions.
* **Mapped reads to non-ref decoys (NRD)**---Number of reads mapped to non-reference decoys.
* **Mapped reads to ref-external sequences (PAI or NRD)**---Number of reads mapped to reference external sequences such as pop-alt insertions or non-reference decoys.
* **Mapped reads (RNA) to rRNA and filtered**---Number of reads mapped to the rRNA filter contig.
* **Mapped reads (RNA) to chrM and exluded from metrics**---Number of reads mapped to the excluded mitochondrial contig.
* **Mapped reads including ref-external or filtered or excluded**---Total number of mapped reads plus reads mapped to reference external sequences plus reads mapped to the rRNA filter contig plus reads mapped to the excluded mitochondrial contig.
* **Unmapped reads**---Total number of reads that could not be mapped. It includes reads mapped to reference external sequences, reads mapped to the rRNA filter contig or reads mapped to the excluded mitochondrial contig.
* **Unmapped reads minus ref-external mappings**---Total number of unmapped reads minus reads mapped to reference external sequences.
* **Unmapped reads minus filtered mappings**---Total number of unmapped reads minus reads mapped to the rRNA filter contig.
* **Unmapped reads minus excluded mappings**---Total number of unmapped reads minus reads mapped to the excluded RNA mitochondrial contig.
* **Unmapped reads minus ref-external or filtered or excluded**---Total number of unmapped reads minus reads mapped to reference external sequences minus reads mapped to the rRNA filter contig minus reads mapped to the excluded mitochondrial contig.
* **Singleton reads**---Number of reads where the read could be mapped, but the paired mate could not be read.
* **Paired reads**---Count of reads in which both reads in the pair are mapped.
* **Properly paired reads**---Both reads in the pair are mapped and fall within an acceptable range from each other based on the estimated insert length distribution.
* **Not properly paired reads (discordant)**---The number of paired reads minus the number of properly paired reads.
* **Paired reads mapped to different chromosomes**---The number of reads with a mate, where the mate was mapped to a different chromosome.
* **Paired reads mapped to different chromosomes (MAPQ >= 10)**---The number of reads with a MAPQ>10 and with a mate, where the mate was mapped to a different chromosome.
* **Reads with indel R1**---The percentage of R1 reads containing at least 1 indel.
* **Reads with indel R2**---The percentage of R2 reads containing at least 1 indel.
* **Soft-clipped bases R1**---The percentage of bases in R1 reads that are soft-clipped.
* **Soft-clipped bases R2**---The percentage of bases in R2 reads that are soft-clipped.
* **Mismatched bases R1**---The number of mismatched bases on R1, which is the sum of SNP count and indel lengths. The metric does not count anything within soft clipping or RNA introns. The metric also does not count a mismatch if either the reference base or read base is N.
* **Mismatched bases R2**---The number of mismatched bases on R2, which is the sum of SNP count and indel lengths. The metric does not count anything within soft clipping or RNA introns. The metric also does not count a mismatch if either the reference base or read base is N.
* **Mismatched bases R1 (excluding indels)**---The number of mismatched bases on R1. The indels lengths are ignored. It does not count anything within soft clipping or RNA introns. The metric also does not count a mismatch if either the reference base or read base is N.
* **Mismatched bases R2 (excluding indels)**---The number of mismatched bases on R2. The indels lengths are ignored. The metric does not count anything within soft clipping or RNA introns. The metric also does not count a mismatch if either the reference base or read base is N.
* **Q30 Bases**---The total number of bases with a BQ >= 30. Includes mapped & unmapped reads & bases. Excludes duplicate marked reads & secondary alignments.
* **Q30 Bases R1**---The total number of bases on R1 with a BQ >= 30.
* **Q30 Bases R2**---The total number of bases on R2 with a BQ >= 30.
* **Q30 Bases (excluding dups and clipped bases)**---The number of bases on non-duplicate and non-clipped bases with a BQ >= 30.
* **Histogram of reads map qualities**
* Reads with MAPQ \[40:inf)
* Reads with MAPQ \[30:40)
* Reads with MAPQ \[20:30)
* Reads with MAPQ \[10:20)
* Reads with MAPQ \[0:10)
* **Total alignments**---Total number of loci reads aligned to with > 0 quality.
* **Secondary alignments**---Number of secondary alignment loci.
* **Supplementary (chimeric) alignments**---A chimeric read is split over multiple loci (possibly due to structural variants). One alignment is referred to as the representative alignment. The other are supplementary.
* **Estimated read length**---Total number of input bases divided by the number of reads.
* **Insert length: mean**---Mean insert size estimated for the read group
* **Insert length: median**---Median insert size estimated for the read group
* **Insert length: standard deviation**---Standard deviation of insert size estimated for the read group

Note: The insert length metrics reported above are computed using high quality (MAPQ >= 20) properly paired read pairs, considering all the read pairs for the read group. It may be different from the standard output log reported during insert stats sampling which reports these metrics only for the first \~2M read pairs for DNA (\~100K read pairs for RNA).

Whole read group insert length estimation for RNA datasets is currently not supported. For RNA runs, the reported insert length metrics are computed using up to the first \~100K high quality read pairs for the read group from the input FASTQ/BAM/CRAM file.

* **Input bases divided by reference genome size**---Raw coverage as computed by summing all read lengths (including duplicate marked reads, but excluding secondary and supplementary alignments) and dividing by the reference genome size.
* **Input bases divided by target bed size**---Raw coverage as computed by summing all read lengths (including duplicate marked reads, but excluding secondary and supplementary alignments) and dividing by the target bed size.
* **Estimated sample contamination**---The estimated fraction of reads in a sample that may be from another human source. For more detail see [Contamination Detection](https://help.connected.illumina.com/dragen/product-guides/dragen-v4.5/qc-metrics-reporting/contamination-detection).

### Coverage and Callability Reports

DRAGEN supports a number of reports dedicated to coverage and GVCF-based callability metrics, as listed in the table below. The coverage reports do not require the mapper or variant callers; however, they are not compatible with --enable-sort=false. The GVCF-based callability reports require the small variant caller to be run in GVCF mode. DRAGEN coverage reports are generated by default over the whole genome and, if provided, also over a target region. DRAGEN also supports specifying custom regions and report types of interest.

| Report Name                | Output File              | Notes                                                                                |
| -------------------------- | ------------------------ | ------------------------------------------------------------------------------------ |
| Coverage metrics           | \_coverage\_metrics.csv  | Important coverage summary statistics. On by default.                                |
| Fine histogram coverage    | \_fine\_hist.csv         | Detailed coverage histogram. On by default.                                          |
| Histogram coverage         | \_hist.csv               | Binned coverage histogram. On by default.                                            |
| Overall mean coverage      | \_overall\_mean\_cov.csv | Redundant subset of information available in \_coverage\_metrics.csv. On by default. |
| Per contig mean coverage   | \_contig\_mean\_cov.csv  | On by default.                                                                       |
| Read-level coverage report | \_read\_cov\_report.bed  | On by default.                                                                       |
| Basepair full resolution   | \_full\_res.bed          | Optionally enabled with custom reports.                                              |
| Per BED region cov\_report | \_cov\_report.bed        | Optionally enabled with custom reports.                                              |
| GVCF callability           | \_callability.bed        | Optionally enabled with custom reports.                                              |

#### Tumor-Normal Support

In somatic tumor-normal mode, DRAGEN generates separate reports for the tumor and normal samples. Each report is labeled according to the sample type. Tumor sample reports include `tumor` at the end of the file name, and normal sample reports include `normal` at the end of the file name. To include both tumor and normal sample results in one file, set the `--vc-enable-separate-t-n-metrics` option to false. DRAGEN then reports on the aggregate of both samples.

The coverage reports do not require the mapper or variant callers, however it is not compatible with `--enable-sort=false`.

#### Custom Regions

The following command shows an example use case for specifying custom coverage reports:

```
dragen ... \
--qc-coverage-region-1 <bed file 1> \
--qc-coverage-reports-1 full_res \
--qc-coverage-region-2 <bed file 2> \
--qc-coverage-region-3 <bed file 3> \
--qc-coverage-reports-3 full_res cov_report
```

The settings `--qc-coverage-region-i` and `--qc-coverage-reports-i` work as a pair (i can be 1, 2, or 3). The former setting specifies the region while the latter specify the report type for that region. The number `i` links the settings. Up to 3 such region and report pairs may be specified.

* The `--qc-coverage-region-i` option requires a BED file argument (i can be 1, 2, or 3).
* Regions in each BED file can be optionally padded using `--qc-coverage-region-padding-i` option (by default 0 padding is applied).
* A set of default reports are generated for each region.
* Additional reports can be specified for each region by using the `--qc-coverage-reports-i` option.

| Optional Report type       | Enabled with                                                               |
| -------------------------- | -------------------------------------------------------------------------- |
| Basepair full resolution   | --qc-coverage-region-1=BED\_FILE\_PATH --qc-coverage-reports-1 full\_res   |
| Per BED region cov\_report | --qc-coverage-region-1=BED\_FILE\_PATH --qc-coverage-reports-1 cov\_report |
| GVCF callability           | --qc-coverage-region-1=BED\_FILE\_PATH --qc-coverage-reports-1 callability |

If multiple report types are selected per region, they should be space-separated, e.g. `--qc-coverage-reports-1 callability full_res`.

#### Rules for including reads and bases in the coverage calculations

The following default settings are used for all DRAGEN coverage reports:

* Duplicate reads are ignored.
* Soft and/or hard-clipped bases are ignored
* Overlapping mates are double-counted.
* Reads with MAPQ > 0 are included. MAPQ=0 reads are filtered.
* BQ >= 0 are included.

#### Optional settings for including or excluding reads and bases in the coverage calculations

| Filtering rules              | Description                                                                                                                                                                                                                                                                                                                                                                                              |
| ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Handing of overlapping mates | By default overlapping mates are double counted. Set `--qc-coverage-ignore-overlaps=true` to resolve all of the alignments for each fragment and avoid double-counting any overlapping bases. This might result in marginally longer run times. This option also requires setting `--enable-map-align=true`. `--qc-coverage-ignore-overlaps` is a global setting and updates all qc-coverage reports.    |
| Soft-clipped bases           | By default soft-clipped bases are not counted towards coverage. Set `--qc-coverage-count-soft-clipped-bases=true` to also include those bases in the coverage calculations. `--qc-coverage-count-soft-clipped-bases` is a global setting and updates all qc-coverage reports.                                                                                                                            |
| MAPQ and BQ filters          | The `--qc-coverage-filters-i` setting can be used to override the min MAPQ and BQ filters. A coverage filter is enabled by using one of the `--qc-coverage-filters-i` options (where i is 1, 2, or 3), in combination with the associated `--qc-coverage-region-i` option. The default value for `--qc-coverage-filters-i` is `mapq<1,bq<0`. The default includes all BQ, but filters reads with MAPQ=0. |

As example, the following options are used to enable full (basepair) resolution coverage output with more stringent MAPQ and BQ filtering:

```
--qc-coverage-region-1 <custom_regions.bed>
--qc-coverage-filters-1 'mapq<10,bq<30'
--qc-coverage-reports-1 full_res
```

* The argument syntax mapq\<value,bq\<value implies that reads with a mapping quality less than the specified value, or bases with a read base call quality below the specified value, will be ignored.
* Valid filter arguments are mapq and bq only. Either, or both, can be specified.
* Only one operator < is supported. <=, >, >=, = are not supported.

#### Key Differences from other DRAGEN Metrics and External Tools

Coverage estimates will differ based on rules for filtering reads, which regions are considered, and how CIGAR strings are processed. Some other DRAGEN components (including the mapper and aligner, ploidy caller, and variant callers) also emit coverage-related metrics. These metrics will usually not exactly match the metrics in the DRAGEN coverage reports and are not recommended for general coverage QC.

DRAGEN metrics that may differ from the main DRAGEN coverage reports:

| Dragen output                                       | Description                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| --------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| DRAGEN SNV VCF INFO DP field                        | Computed after excluding unmapped reads, secondary alignments, BQ<10, bad quality reads (badly mated reads, and reads with bad cigars). It will generally be equal or lower than coverage reported in the `fine_hist` or other coverage reports. It is also expected to be lower than the unfiltered coverage track reported in IGV.                                                                                                              |
| DRAGEN SNV VCF FORMAT DP field                      | Similar to the `INFO DP` field, but additionally also excludes non-informative reads that match more than one haplotype equally well. In general, the following pattern is expected: `FORMAT DP` <= `INFO DP` <= `full_res` report.                                                                                                                                                                                                               |
| Input bases divided by reference genome size.       | Available in `mapping_metrics.csv`. This metric is a useful indication of the raw sequencing coverage. All primary reads (including duplicates) are considered. Secondary and supplementary alignments are ignored, but no other filters are applied.                                                                                                                                                                                             |
| Autosomal Median Coverage                           | Found in `ploidy_estimation_metrics.csv`. This is an internal development metric that makes various assumptions about which regions will be treated as callable or not. This metric will **not** be consistent with "Median autosomal coverage over genome" in "wgs\_coverage\_metrics.csv". It is not recommended for any QC.                                                                                                                    |
| CNV SUMMARY, Average alignment coverage over genome | Computed in the CNV module and available in `cnv_metrics.csv`. It has its own set of filters (low MAPQ and duplicate marked reads are removed by default) and is based on the intervals used for CNV. Bases in read mates that are overlapping will always be double counted. This is an internal development metric that is not intended to be consistent with "\_coverage\_metrics.csv". It is not recommended as a general QC coverage metric. |

DRAGEN coverage metrics use default rules that differ from Picard tools such as HsMetrics and WgsMetrics. The table below lists the key default settings.

| Tool/Metrics                | Min BQ | Min MAPQ | Ignore overlapping bases paired reads |
| --------------------------- | ------ | -------- | ------------------------------------- |
| Picard HsMetrics/WgsMetrics | 20     | 20       |                                       |
| DRAGEN Coverage Metrics     | 0      | 1        | FALSE                                 |

To override the DRAGEN defaults and change behavior to be more similar to Picard please specify the following (optional) settings: • `--qc-coverage-region-1 <custom_regions.bed> --qc-coverage-filters-1 'mapq<20,bq<20' --qc-coverage-ignore-overlaps=true`

#### Main Coverage Report

DRAGEN will emit a `coverage_metrics.csv` file for each available region type, including the full genome, target region, and any additionally specified QC regions. This report contains a summary of the main coverage metrics and is therefore generally the most commonly used.

The first column of the output file contains the section name COVERAGE SUMMARY and the second column (the subsection) is empty for all entries in the file.

The following metrics are calculated:

* **Aligned bases in region**---Number of uniquely mapped bases to region and the percentage relative to the number of uniquely mapped bases to the genome.
* **Average alignment coverage over region**---Number of uniquely mapped bases to region divided by the number of sites in region.
* **Uniformity of coverage (PCT > 0.2\*mean)** over region\_\_---Percentage of sites with coverage greater than 20% of the mean coverage in region.
* **PCT of region with coverage \[ix, inf)**---Percentage of sites in region with at least ix coverage, where i can equal 100, 50, 20, 15, 10, 3, 1 and 0.
* **PCT of region with coverage \[ix, jx)**---Percentage of sites in region with at least ix but less than jx coverage, where (i, j) can equal (50, 100), (20, 50), (15, 20), (10, 15), (3, 10), (1, 3) and (0, 1).
* **Median chr X coverage (ignore 0x regions) over region**---Median alignment coverage over the chr X loci in region ignoring loci with 0x coverage. If there is no chromosome X in the reference genome or the region does not intersect chromosome X, this metric shows as NA.
* **Median chr Y coverage (ignore 0x regions) over region**---Median alignment coverage over the chr Y loci in region ignoring loci with 0x coverage. If there is no chromosome Y in the reference genome or the region does not intersect chromosome Y, this metric shows as NA.
* **Average mitochondrial coverage over region**---Total number of bases that aligned to the intersection of the mitochondrial chromosome with region divided by the total number of loci in the intersection of the mitochondrial chromosome with region. If there is no mitochondrial chromosome in the reference genome or the region does not intersect mitochondrial chromosome, this metric shows as NA.
* **Average autosomal coverage over region**---Total number of bases that aligned to the autosomal loci in region divided by the total number of loci in the autosomal loci in region. If there is no autosome in the reference genome, or the region does not intersect autosomes, this metric shows as NA.
* **Median autosomal coverage over region**---Median alignment coverage over the autosomal loci in region. If there is no autosome in the reference genome or the region does not intersect autosomes, this metric shows as NA.
* **Mean/Median autosomal coverage ratio over region**---Mean autosomal coverage in region divided by the median autosomal coverage in region. If there is no autosome in the reference genome or the region does not intersect autosomes, this metric shows as NA.
* **Aligned reads in region**---Number of uniquely mapped reads to region and percentage relative to the number of uniquely mapped reads to the genome. Only reads with MAPQ ≥ 1 are included. Secondary and supplementary alignments are ignored.

The following is an example of the contents of the `\_coverage\_metrics.csv` file:

```
COVERAGE SUMMARY,,Aligned bases,148169295474
COVERAGE SUMMARY,,Aligned bases in genome,148169295474,100.00
COVERAGE SUMMARY,,Average alignment coverage over genome,46.08
COVERAGE SUMMARY,,Uniformity of coverage (PCT > 0.2*mean) over genome,91.01
COVERAGE SUMMARY,,PCT of genome with coverage [100x: inf),0.25
COVERAGE SUMMARY,,PCT of genome with coverage [ 50x: inf),50.01
COVERAGE SUMMARY,,PCT of genome with coverage [ 20x: inf),89.46
COVERAGE SUMMARY,,PCT of genome with coverage [ 15x: inf),90.51
COVERAGE SUMMARY,,PCT of genome with coverage [ 10x: inf),91.01
COVERAGE SUMMARY,,PCT of genome with coverage [ 3x: inf),91.69
COVERAGE SUMMARY,,PCT of genome with coverage [ 1x: inf),92.10
COVERAGE SUMMARY,,PCT of genome with coverage [ 0x: inf),100.00
COVERAGE SUMMARY,,PCT of genome with coverage [ 50x:100x),49.76
COVERAGE SUMMARY,,PCT of genome with coverage [ 20x: 50x),39.45
COVERAGE SUMMARY,,PCT of genome with coverage [ 15x: 20x),1.04
COVERAGE SUMMARY,,PCT of genome with coverage [ 10x: 15x),0.51
COVERAGE SUMMARY,,PCT of genome with coverage [ 3x: 10x),0.67
COVERAGE SUMMARY,,PCT of genome with coverage [ 1x: 3x),0.42
COVERAGE SUMMARY,,PCT of genome with coverage [ 0x: 1x),7.90
COVERAGE SUMMARY,,Average chr X coverage over genome,24.70
COVERAGE SUMMARY,,Average chr Y coverage over genome,20.96
COVERAGE SUMMARY,,Average mitochondrial coverage over genome,20682.19
COVERAGE SUMMARY,,Average autosomal coverage over genome,47.81
COVERAGE SUMMARY,,Median autosomal coverage over genome,48.62
COVERAGE SUMMARY,,Mean/Median autosomal coverage ratio over genome,0.98
COVERAGE SUMMARY,,XAvgCov/YAvgCov ratio over genome,1.18
COVERAGE SUMMARY,,XAvgCov/AutosomalAvgCov ratio over genome,0.52
COVERAGE SUMMARY,,YAvgCov/AutosomalAvgCov ratio over genome,0.44
COVERAGE SUMMARY,,Aligned reads,1477121058
COVERAGE SUMMARY,,Aligned reads in genome,1477121058,100.00
```

#### Fine Histogram Coverage Report

The fine histogram report outputs a `_fine_hist.csv` file, which contains two columns: `Depth` and `Count`. The value in the `Depth` column ranges from 0 to 2000+ and the `Count` column indicates the number of loci covered at the corresponding depth.

Masked regions in the FASTA are ignored and no depth for these regions are reported.

#### Histogram Coverage Report

The histogram report outputs a \_hist.csv file, which provides the following:

* Percentage of bases in the coverage BED/target BED/WGS region that fall within a certain range of coverage.
* Duplicate reads are ignored if DRAGEN is run with `--enable-duplicate-marking` true.

The following ranges are used: `"[100x:inf)", "[1x:3x)", "[0x:1x)"`

Masked regions in the FASTA are ignored and no depth for these regions are reported.

#### Overall Mean Coverage Report

The Overall Mean Coverage report generates an \_overall\_mean\_cov.csv file, which contains the average alignment coverage over the coverage BED/target BED/WGS, as applicable.

The following is an example of the contents of the \_overall\_mean\_cov.csv file:

`Average alignment coverage over target_bed,80.69`

Masked regions in the FASTA are ignored and no depth for these regions are reported.

#### Per Contig Mean Coverage Report

The Contig Mean Coverage report generates a \_contig\_mean\_cov.csv file, which contains the estimated coverage for all contigs and an autosomal estimated coverage. The file includes the following three columns:

| Column 1    | Column 2                                                                                                                        | Column3                                                                                                                                                                                                         |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Contig name | Number of bases aligned to that contig, which excludes bases from duplicate marked reads, reads with MAPQ=0, and clipped bases. | Estimated coverage, as follows: \<number of bases aligned to the contig (ie, Col2)> divided by \<length of the contig or (if a target BED is used) the total length of the target region spanning that contig>. |

Masked regions in the FASTA are ignored and no depth for these regions are reported.

#### Full Res Report

The Full Res Report outputs a \_full\_res.bed file in tab-delimited format. The first three columns are the standard BED fields, and the fourth column is the depth. Each record in the file is for a given interval that has a constant depth. If the depth changes, then a new record is written to the file. Alignments that have a mapping quality value of 0, duplicate reads, and clipped bases are not counted towards the depth.

Only base positions that fall under the user-specified coverage-region bed regions are present in the \_full\_res.bed output file.

The \_full\_res.bed file structure is similar to the output file of bedtools genomecov -bg. The contents are identical if the bedtools command line is executed after filtering out alignments with mapping quality value of 0, and possibly filtering by a target BED (if specified).

The following is an example of the contents of the \_full\_res.bed file:

```
chr1 121483984 121483985 10
chr1 121483985 121483986 9
chr1 121483986 121483989 8
chr1 121483989 121483991 7
chr1 121483991 121483992 6
chr1 121483992 121483993 4
chr1 121483993 121483994 2
chr1 121483994 121484039 1
chr1 121484039 121484043 2
chr1 121484043 121484048 3
```

Coverage is reported for all positions specified by `qc-coverage-region-i`. Masked regions in the FASTA are not ignored.

When `--enable-metrics-compression` is set to true, the 1 bp resolution coverage metrics output bed file (`_full_res.bed`) are compressed to bigwig format.

#### Per Region Coverage Report

The cov\_report report generates a \_cov\_report.bed file in a tab-delimited format. This report includes summary coverage statistic per BED region. The first three columns are standard BED fields. The DRAGEN Amplicon pipeline includes a fourth column for name and fifth column for gene\_id. The remaining column fields are statistics calculated over the interval region specified on the same record line.

The following table lists the appended columns.

* total\_cvg---The total coverage value.
* mean\_cvg---The mean coverage value.
* Q1\_cvg---The lower quartile (25th percentile) coverage value.
* median\_cvg---The median coverage value.
* Q3\_cvg---The upper quartile (75th percentile) coverage value.
* min\_cvg---The minimum coverage value.
* max\_cvg---The maximum coverage value.
* pct\_above\_X---Indicates the percentage of bases over the specified interval region that had a depth coverage greater than X.

By default, if an interval has a total coverage of 0, then the record is written to the output file. To filter out intervals with zero coverage, set `vc-emit-zero-coverage-intervals` to false in the configuration file.

By default, if `--qc-coverage-region-i-thresholds` are not set, the thresholds will default to 5, 15, 20, 30, 50, 100, 200, 300, 400, 500, 1000.

The following is an example of the contents of the \_cov\_report.bed file:

```
chrom start end total_cvg mean_cvg Q1_cvg median_cvg Q3_cvg min_cvg max_cvg pct_above_5 ...
chr5 34190121 34191570 76636 52.89 44.00 54.00 60.00 32 76 100.00 ...
chr5 34191751 34192380 39994 63.58 57.00 61.00 69.00 50 85 100.00 ...
chr5 34192440 34192642 10074 49.87 47.00 49.00 51.00 44 62 100.00 ...
chr9 66456991 66457682 31926 46.20 39.00 45.00 52.00 27 65 100.00 ...
chr9 68426500 68426601 4870 48.22 42.00 48.00 54.00 39 58 100.00 ...
chr17 41465818 41466180 24470 67.60 4.00 66.00 124.00 2 153 66.30 ...
chr20 29652081 29652203 5738 47.03 40.00 49.00 52.00 34 58 100.00 ...
chr21 9826182 9826283 4160 41.19 23.00 52.00 58.00 5 60 99.01 ...
```

### Read Coverage Report

The read\_cov\_report report generates a \_read\_cov\_report.bed file in a tab-delimited format. The first five columns are `chrom`, `start`, `end`, `name`, and `gene_id` BED fields. The following additional columns represent statistics that are calculated over the interval region specified on the same record line.

* total\_cvg---The total coverage value.
* read1\_cvg---The total Read 1 coverage value.
* read2\_cvg---The total Read 2 coverage value.

If an alignment overlaps more than one region, the alignment is counted toward the region with the largest overlap. If the alignment overlaps equally with more than one region, the alignment is counted toward the first intersecting region.

The following shows the contents of the \_read\_cov\_report.bed file:

```
#chrom    start    end    name    gene_id    total_cvg    read1_cvg    read2_cvg
chr21    10033000    10034919            48    24    24
chr21    10034919    10034920            0    0    0
chr21    10034920    10034921            0    0    0
```

#### Callability

Callability is defined as the fraction of positions in the genome or target region having a GVCF PASSing genotype call. The callability report can be interpreted as the fraction of sites in the genome or target bed where the small variant caller had sufficient information (enough good quality reads) to confidently either call a variant or a HOM-REF region.

The callability report requires DRAGEN to be run in gVCF mode. When gVCF mode is enabled, DRAGEN will automatically generate a callability report as part of variant caller metrics.

The following criteria are used to calculate callability metrics:

* Callability is calculated over all positions included in the gVCF.
* Decoy contigs are ignored.
* Unplaced and unlocalized contigs are ignored.
* Masked regions in the FASTA (bases set to N) are ignored.
* For regions where no variant calling was performed, callability is 0.
* A homozygous deletion counts as a PASSing genotype call for all the reference positions spanned by the deletion.

If the `--vc-target-bed` option is specified, the output is a `target_bed_callability.bed` file that contains the overall and autosome callability over the input target bed region. The padding size specified by the `--vc-target-bed-padding` option is used and overlapping regions are merged.

Callability can also be output over custom regions. If the `--qc-coverage-region-i` option is used with `--qc-coverage-reports-i` (where i is 1, 2, or 3), callability can be added as a report type for that region. The output is a `qc-coverage-region-i_callability.bed` file. For each specified `qc-coverage-region-i` file, the average callability is reported in the variant calling metrics file. The padding size specified by the `--qc-coverage-region-padding-i` is used and overlapping regions are merged.

The optional min MAPQ and min BQ filters only influence read and base counting and do not influence the callability reports. The callability reports only depends on the gVCF PASS variants.

#### Coverage/Callability Reports Use Cases and Expected Output

The following table shows which outputs are generated when default options (`--vc-target-bed`) versus optional coverage region options (`--coverage-region`) are used.

| <p>--vc-target-bed<br>specified? Y/N</p> | <p>--qc-coverage-region-i<br>(i equal to 1, 2, or 3)<br>specified? Y/N</p> | Expected Output Files                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| ---------------------------------------- | -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| N                                        | N                                                                          | <p>wgs\_coverage\_metrics.csv<br>wgs\_fine\_hist.csv<br>wgs\_hist.csv wgs\_overall\_mean\_cov.csv<br>wgs\_contig\_mean\_cov.csv</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| N                                        | Y                                                                          | <p>wgs\_coverage\_metrics.csv<br>wgs\_fine\_hist.csv<br>wgs\_hist.csv wgs\_overall\_mean\_cov.csv<br>wgs\_contig\_mean\_cov.csv<br><br>For each coverage region specified by the user:<br>qc-coverage-region-i\_coverage\_metrics.csv<br>qc-coverage-region-i\_fine\_hist.csv<br>qc-coverage-region-i\_hist.csv<br>qc-coverage-region-i\_overall\_mean\_cov.csv<br>qc-coverage-region-i\_contig\_mean\_cov.csv<br>qc-coverage-region-i\_full\_res.bed if full\_res report type is requested for qc-coverage-region-i<br>qc-coverage-region-i\_cov\_report.bed if cov\_report report type is requested for qc-coverage-region-i<br>qc-coverage-region-i\_callability.bed if GVCF mode is enabled and the callability or exome-callability report type is requested</p>                                                                                                                                                                                                                                              |
| Y                                        | N                                                                          | <p>wgs\_coverage\_metrics.csv<br>wgs\_fine\_hist.csv<br>wgs\_hist.csv<br>wgs\_overall\_mean\_cov.csv<br>wgs\_contig\_mean\_cov.csv<br><br>target\_bed\_coverage\_metrics.csv<br>target\_bed\_fine\_hist.csv<br>target\_bed\_hist.csv<br>target\_bed\_overall\_mean\_cov.csv<br>target\_bed\_contig\_mean\_cov.csv<br>target\_bed\_callability.bed if GVCF mode is enabled</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| Y                                        | Y                                                                          | <p>wgs\_coverage\_metrics.csv<br>wgs\_fine\_hist.csv<br>wgs\_hist.csv<br>wgs\_overall\_mean\_cov.csv<br>wgs\_contig\_mean\_cov.csv<br><br>target\_bed\_coverage\_metrics.csv<br>target\_bed\_fine\_hist.csv<br>target\_bed\_hist.csv<br>target\_bed\_overall\_mean\_cov.csv<br>target\_bed\_contig\_mean\_cov.csv<br>target\_bed\_callability.bed if GVCF mode is enabled<br><br>For each coverage region specified by the user:<br>qc-coverage-region-i\_coverage\_metrics.csv<br>qc-coverage-region-i\_fine\_hist.csv<br>qc-coverage-region-i\_hist.csv<br>qc-coverage-region-i\_overall\_mean\_cov.csv<br>qc-coverage-region-i\_contig\_mean\_cov.csv<br><br>qc-coverage-regon-i\_full\_res.bed if full\_res report type is requested for qc-coverage-region-i<br>qc-coverage-region-i\_cov\_report.bed if cov\_report report type is requested for qc-coverage-region-i<br>qc-coverage-region-i\_callability.bed if GVCF mode is enabled and the callability or exome-callability report type is requested</p> |

### &#x20;GC Bias Report

The GC bias report provides information on GC content and the associated read coverage across a genome. DRAGEN GC bias metric is modeled after the Picard implementation and adapted to preexisting internal measures. The DRAGEN GC bias correction module attempts to correct these biases following the target count stage. For more information, see [GC Bias Correction](https://help.connected.illumina.com/dragen/product-guides/dragen-dna-pipeline/cnv-overview/cnv-reference#gc-bias-correction)

The GC bias metric is computed as follows.

1. Calculates GC content using a 100 bp wide, per-base rolling window over all chromosomes in the reference genome, excluding any decoys and alternate contigs. Windows containing more than four masked (N) bases in the reference are discarded.
2. Calculates the average coverage for each window, excluding any non-PF, duplicate, secondary, and supplementary reads.
3. Calculates the average global coverage across the whole genome.
4. Groups valid windows based on the percentage of GC content, both at individual percentages and five 20% ranges as summary.
5. Calculates the normalized coverage for each group by dividing the average coverage for the bin by the global average coverage across the genome. Values below 1.0 indicate a lower than expected coverage at the given GC percent or range. Coverages significantly deviating from 1.0 at greater GC values are an expected result.
6. Calculates dropout metrics as the sum of all positive values of (percentage of windows at GC X-percentage aligned reads at GC X) for each GC ≤ 50% and > 50% for AT and GC dropout.

By default, the GC bias metric report is not calculated. To enable GC Bias calculations, enter the --gc-metrics-enable command line option. The following is an example command:

`$ dragen -b <BAM file> -r <reference genome> --gc-metrics-enable=true`

The GC metrics report generates a gc\_metrics.csv file. The file is structured as follows.

```
GC BIAS DETAILS,,Windows at GC [0-100],<number of windows>,<fraction of all windows>
GC BIAS DETAILS,,Normalized coverage at GC [0-100],<average coverage of all windows at given GC divided by average coverage of whole genome>
GC METRICS SUMMARY,,Window size,<window size in base, typically 100>
GC METRICS SUMMARY,,Number of valid windows,<total number of windows used in calculations>
GC METRICS SUMMARY,,Number of discarded windows,<total number windows discarded due to more than 4 masked bases>
GC METRICS SUMMARY,,Average reference GC,<average GC content over all valid windows>
GC METRICS SUMMARY,,Mean global coverage,<average genome coverage over all valid windows>
GC METRICS SUMMARY,,Normalized coverage at GCs <GC range>,<average coverage of all windows at given GC range divided by average coverage of whole genome>
GC METRICS SUMMARY,,AT Dropout,<Calculated AT dropout value>
GC METRICS SUMMARY,,GC Dropout,<Calculated GC dropout value>
```

The GC bias report also includes the following command line options, but they are not recommended.

| Setting                    | Description                                          |
| -------------------------- | ---------------------------------------------------- |
| `--gc-metrics-window-size` | Overrides the default rolling window size of 100 bp. |
| `--gc-metrics-num-bins`    | Overrides the number of summary bins.                |

**Note** Currently, the GC bias report for WES germline and targeted applications does not restrict assessment to the target region.  Instead, it only considers sites with >0x coverage. Exome calculations could include alignemnts from spill over coverage in padded and off-target regions.

### &#x20;Somatic Callable Regions Report

In somatic mode, DRAGEN automatically generates a somatic callable regions report as a bed file. The somatic callable regions report includes all regions with tumor coverage at least as high as the tumor threshold and (if applicable) normal coverage at least as high as the normal threshold. If only the tumor sample is provided, then the report includes all regions with tumor coverage at least as high as the tumor threshold. Each line in the bed output file is formatted as follows.

`chromosome region_start region_end`

You can specify the threshold values using the `--vc-callability-tumor-thresh` or `--vc-callability-normal-thresh` options. The default value for the tumor threshold is 50. The default value for the normal threshold is 5. For more information on each option, see \[Somatic Mode Options]{.underline}.

If the target bed or the `--qc-coverage-region-i` (where i is 1, 2, or 3) option is included in the run. DRAGEN then generates corresponding somatic callable regions bed files in addition to the whole genome somatic callable region bed file.

### Germline Tagging Counts in Tumor-Only Pipeline

In tumor-only mode, the DRAGEN somatic small variant caller includes both germline and somatic variants in the output and [can tag potential germline variants.](https://help.connected.illumina.com/dragen/product-guides/dragen-dna-pipeline/small-variant-calling/somatic-mode#germline-tagging-in-the-tumor-only-pipeline). When germline tagging is enabled, DRAGEN will add a GermlineStatus tag to the INFO field of the VCF. It will also create a headerless CSV file named `*.vc_germline_tagging_metrics.csv` which indicates the overall count of germline and somatic variants. It follows the general convention for QC metrics reporting in DRAGEN, and it contains four fields:

1. Always "Variant Germline Status"
2. Always empty
3. The category of variants: "Germline\_DB", "Somatic", or "Unknown"
4. The number of variants in the VCF in that category

### Somatic Allele Transition Noise Metrics File

The DRAGEN somatic small variant caller performs [nucleotide error bias estimation, which can be especially important for FFPE samples](https://help.connected.illumina.com/dragen/product-guides/dragen-dna-pipeline/small-variant-calling/somatic-mode#sample-specific-ntd-error-bias-estimation). DRAGEN estimates the rate of all possible transitions and transversions in the sample, and produces a headerless CSV file named `*.allele-transition-noise-metrics.csv`. It follows the general convention for QC metrics reporting in DRAGEN, and it contains four fields:

1. Always "Allele Nucleotide Transition"
2. Always empty
3. The transition or transversion, e.g. "A->C"
4. The error rate as a phred-scaled integer

Because the error rate is phred-scaled, the higher the error rate of a particular change, the lower the phred-scaled value.

### Duration Metrics

The duration metrics section includes a breakdown of the run duration for each process. For example, the following metrics are generated for the mapper and variant caller pipeline:

* Time loading reference
* Time aligning reads
* Time sorting and marking duplicates
* Time DRAGStr calibration
* Time partial reconfiguration
* Time variant calling
* Total run time
