1 of 1

DNA Expanded Metrics

DNA expanded metrics are provided for information only. They can be informative for troubleshooting but are provided without explicit specification limits and are not directly used for sample quality control. For additional guidance, contact Illumina Technical Support.

Metric

Description

Troubleshooting

DNA Expanded Metrics

Metric

Description

Troubleshooting

Total number of non-supplementary, non-secondary, and passing QC reads after alignment to the whole genome sequence.

Primarily driven by data output of sequencer, quality of library and balancing of library in library pool. If TOTAL_PF_READS is in line with other samples, but coverage metrics are more may suggest non-specific enrichment.

Low values for all samples indicate a poor quality run with possible low cluster numbers or low numbers of Q30 and PF%.

A low value for an individual sample indicates poor pooling of this library into the final pool.

A UMI Family is a group of reads that all have the same UMI barcode. The family size is the number of reads in family. MEAN_FAMILY_SIZE is the mean of the entire population of reads assembled into UMI families.

The mean UMI family size decreases with increased unique read numbers, and more input DNA leads to more unique reads. Conversely over sequencing of a fixed population of unique DNA molecules leads to increased family size.

As a guide, for a good run with optimal cluster density, passing specs, even sample pooling, and good quality DNA we usually observe values <10.

UMI family size = 1 is not ideal as it is harder to correct for errors.

UMI family size of 2 to 5 enables efficient error correction without wasting sequencing capacity on high percentages of duplicate reads.

Lower median target coverage may be due to poor sample input/quality, library preparation issues or low sequencing output.

Chimeric reads occur when one sequencing read aligns to two distinct portions of the genome with little or no overlap. Metric is proportion of total number of non-supplementary, non-secondary, and passing QC reads after alignment to the whole genome sequence.

While this can be indicative of large-scale structural rearrangement of the genome, values that are elevated above the usual baseline may indicate enrichment probe contamination during library preparation. A suggested metric USL is 8% (those that are higher might see decrease performance in small variant and tmb scores).

Percentage of exon bases with 100X fragment coverage. Calculated against all regions in manifest containing _exon in name.

Indicative of general enrichment performance. Reduced proportions of enriched reads may indicate issues with the enrichment proportion of the library preparation.

As UMI reads are sequenced at the start of each read, loss of valid UMI sequence may be cause by sequencing issues impacting the quality of base calling in this portion of the sequencing read.

Lower mean target coverage may be due to poor sample input/quality, library preparation issues or low sequencing output. Large differences between the median and mean target coverage values may indicated a skewed distribution of target coverage.

Proportion of aligned reads that are non-supplementary, non-secondary and pass QC versus aligned reads that are non-supplementary, non-secondary, mapped and pass QC.

This metric should only be evaluated if the CONTAMINATION_SCORE metric exceed the USL. This metric estimates the amount of contamination in a sample. The contamination level is computed by taking 2.0* the average of the adjusted allele frequencies of all variants that were selected. The adjusted alllele frequency is either the actual allele frequency of the variant if it is less than 0.5, or 1 -allele frequency if it is greater than or equal to 0.5.

If the sample does not fail the CONTAMINATION_SCORE this metric has no intended meaning as it will be driven by statistical noise (e.g. the few variants that naturally fall outside an expected interval around 0.5 due to random chance)

High contamination estimates may be due to any of the following:

Inter-sample contamination caused by mixing of samples during extraction or library preparation.

Intra-sample contamination, due to mixing of clonally different cell populations during extraction. Large scale genomic rearrangements that cause unexpected VAFs for large numbers of variants.

Parentage of target (all locations in manifest) reads that have a coverage depth of greater the 0.4x the mean target coverage depth (see definition above).

Provides an indication of uniformity of coverage of the target regions in the manifest file. When trended over time reductions in this metric may indicate an issue with the enrichment process resulting in coverage bias.

Soft clipped reads are used as a part of the downstream analysis for small variants calling. A higher-than-expected number could indicate a low-quality enrichment step.

Average percentage of bases ≥ Q30. A prediction of the probability of an incorrect base call (Q‑score).

An indicator of sequencing run quality, low Q30 across all samples on a run could be the result of run overclustering.

Proprietary Myriad Genetics estimate of b-allele dosage based on b-allele noise/signal ratio. B-Allele noise is correlated with coverage; lower coverage samples will have higher noise. B-allele signal is also correlated with tumor fraction; a higher tumor fraction produces a higher signal for b-allele sites. Samples with lower tumor fraction and higher amount of noise (or lower coverage) will have higher Allele Dosage Ratio. The upper limit of the score is 50, therefore any sample with 50 Allele Dosage Ratio can be assumed to have tumor fraction close to zero and typically has a GIS = 0.

Median target fragment coverage across all target positions in the genome. Coverage is the total number of non-duplicate pair alignments that overlap.