The MetricsOutput.tsv
file contains the following quality control metrics for all samples:
DNA library QC metrics for:
Small variant calling
TMB
MSI
CNV
[HRD] GIS
RNA library QC metrics
Run QC metrics, analysis status, and contamination
This TSV file also includes expanded DNA library QC metrics per sample, based on total reads, collapsed reads, chimeric reads, and on-target reads. Analysis using RNA samples also produces RNA library QC metrics and expanded RNA library QC metrics per sample based on total reads and coverage.
The MetricsOutput.tsv
file is a final combined metrics report with sample status, key analysis metrics, and metadata. Sample metrics within the report include suggested lower specification limits (LSL) and upper specification limits (USL) for each sample in the run.
For troubleshooting information, refer to Troubleshooting
DNA expanded metrics are provided for information only. They can be informative for troubleshooting but are provided without explicit specification limits and are not directly used for sample quality control. For additional guidance, contact Illumina Technical Support.
TOTAL_PF_READS (count)
Total number of non-supplementary, non-secondary, and passing QC reads after alignment to the whole genome sequence.
Primarily driven by data output of sequencer, quality of library and balancing of library in library pool. If TOTAL_PF_READS is in line with other samples, but coverage metrics are more may suggest non-specific enrichment.
Low values for all samples indicate a poor quality run with possible low cluster numbers or low numbers of Q30 and PF%.
A low value for an individual sample indicates poor pooling of this library into the final pool.
MEAN_FAMILY_SIZE (count)
A UMI Family is a group of reads that all have the same UMI barcode. The family size is the number of reads in family. MEAN_FAMILY_SIZE is the mean of the entire population of reads assembled into UMI families.
The mean UMI family size decreases with increased unique read numbers, and more input DNA leads to more unique reads. Conversely over sequencing of a fixed population of unique DNA molecules leads to increased family size.
As a guide, for a good run with optimal cluster density, passing specs, even sample pooling, and good quality DNA we usually observe values <10.
UMI family size = 1 is not ideal as it is harder to correct for errors.
UMI family size of 2 to 5 enables efficient error correction without wasting sequencing capacity on high percentages of duplicate reads.
MEDIAN_TARGET_COVERAGE (count)
Median depth across all the unique loci occurring in all regions of the manifest file.
Lower median target coverage may be due to poor sample input/quality, library preparation issues or low sequencing output.
PCT_CHIMERIC_READS (%)
Chimeric reads occur when one sequencing read aligns to two distinct portions of the genome with little or no overlap. Metric is proportion of total number of non-supplementary, non-secondary, and passing QC reads after alignment to the whole genome sequence.
While this can be indicative of large-scale structural rearrangement of the genome, values that are elevated above the usual baseline may indicate enrichment probe contamination during library preparation. A suggested metric USL is 8% (those that are higher might see decrease performance in small variant and tmb scores).
PCT_EXON_100X (%)
Percentage of exon bases with 100X fragment coverage. Calculated against all regions in manifest containing _exon in name.
Can be used in combination with other PCT_EXON metrics to understand under or over coverage of exons.
PCT_READ_ENRICHMENT (%)
Percentage of reads that have overlapping sequence with the target regions defined in the sample manifest.
Indicative of general enrichment performance. Reduced proportions of enriched reads may indicate issues with the enrichment proportion of the library preparation.
PCT_USABLE_UMI_READS (%)
Percentage of reads that have valid UMI sequences associated with them.
As UMI reads are sequenced at the start of each read, loss of valid UMI sequence may be cause by sequencing issues impacting the quality of base calling in this portion of the sequencing read.
MEAN_TARGET_COVERAGE (count)
Mean depth across all the unique loci defined in the manifest file.
Lower mean target coverage may be due to poor sample input/quality, library preparation issues or low sequencing output. Large differences between the median and mean target coverage values may indicated a skewed distribution of target coverage.
PCT_ALIGNED_READS (%)
Proportion of aligned reads that are non-supplementary, non-secondary and pass QC versus aligned reads that are non-supplementary, non-secondary, mapped and pass QC.
PCT_CONTAMINATION_EST (%)
This metric should only be evaluated if the CONTAMINATION_SCORE metric exceed the USL. This metric estimates the amount of contamination in a sample. The contamination level is computed by taking 2.0* the average of the adjusted allele frequencies of all variants that were selected. The adjusted alllele frequency is either the actual allele frequency of the variant if it is less than 0.5, or 1 -allele frequency if it is greater than or equal to 0.5.
If the sample does not fail the CONTAMINATION_SCORE this metric has no intended meaning as it will be driven by statistical noise (e.g. the few variants that naturally fall outside an expected interval around 0.5 due to random chance)
High contamination estimates may be due to any of the following:
Inter-sample contamination caused by mixing of samples during extraction or library preparation.
Intra-sample contamination, due to mixing of clonally different cell populations during extraction. Large scale genomic rearrangements that cause unexpected VAFs for large numbers of variants.
PCT_TARGET_0.4X_MEAN (%)
Parentage of target (all locations in manifest) reads that have a coverage depth of greater the 0.4x the mean target coverage depth (see definition above).
Provides an indication of uniformity of coverage of the target regions in the manifest file. When trended over time reductions in this metric may indicate an issue with the enrichment process resulting in coverage bias.
PCT_TARGET_50X (%)
Percentage of target bases with 50X fragment coverage. Calculated against all regions in manifest file.
Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.
PCT_TARGET_100X (%)
Percentage of target bases with 100X fragment coverage. Calculated against all regions in manifest file.
Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.
PCT_TARGET_250X (%)
Percentage of target bases with 250X fragment coverage. Calculated against all regions in manifest file.
Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.
PCT_SOFT_CLIPPED_BASES (%)
percentage of based that were not used for alignment but retained as part of the alignment file
Soft clipped reads are used as a part of the downstream analysis for small variants calling. A higher-than-expected number could indicate a low-quality enrichment step.
PCT_Q30_BASES (%)
Average percentage of bases ≥ Q30. A prediction of the probability of an incorrect base call (Q‑score).
An indicator of sequencing run quality, low Q30 across all samples on a run could be the result of run overclustering.
ALLELE DOSAGE_RATIO (with HRD add-on)
Proprietary Myriad Genetics estimate of b-allele dosage based on b-allele noise/signal ratio. B-Allele noise is correlated with coverage; lower coverage samples will have higher noise. B-allele signal is also correlated with tumor fraction; a higher tumor fraction produces a higher signal for b-allele sites. Samples with lower tumor fraction and higher amount of noise (or lower coverage) will have higher Allele Dosage Ratio. The upper limit of the score is 50, therefore any sample with 50 Allele Dosage Ratio can be assumed to have tumor fraction close to zero and typically has a GIS = 0.
MEDIAN TARGET HRD (with HRD add-on)
Median target fragment coverage across all target positions in the genome. Coverage is the total number of non-duplicate pair alignments that overlap.
The Illumina DRAGEN TruSight Oncology 500 Analysis Software allows for analysis of sequencing data generated from the TruSight Oncology 500 HRD assay. When HRD samples are analyzed new results and metrics are included in the CombinedVariantOutput and MetricsOutput files respectively. The following tables detail how these scores and QC metrics are derived.
GIS Score*
Proprietary Genomic Instability Score (GIS) indicating level of genomic instability in sample genome. Combination of Loss of Heterozygosity (LOH), Telomeric allelic imbalance and Large-scale State Transitions (LST) scores. The GIS scores provided by TruSight Oncology 500 HRD show good correlation (R2= 0.98) with Myriad Genetics GIS however they are not identical (Refer to TruSight Oncology 500 HRD Product Data Sheet Doc# M-GL-00748 for more details). GIS from alternative HRD assays should be not be considered equivalent to Illumina/Myriad GIS.
*The GIS algorithm within the TSO500 pipeline (which does not have a cell line mode due to the TSO500 pipeline being non-configurable) is only intended for FFPE samples. Cell line samples will not accurately report GIS results as the tumor fraction (>90%) is too high to reliably distinguish tumor vs germline variants.
HRD Metrics Added to Metrics Output File
PCT_TARGET_HRD_50X
Percent of HRD probe SNP panel covered by at least 50X coverage
DNA Library QC Metrics for GIS
EXCESSIVE_TF
EXCESSIVE TF indicates if there is excessive tumor content in sample. Troubleshooting: Samples with pure tumor fraction >90% are outside the design for GIS estimation (this includes pure tumor cell lines)
DNA Library QC Metrics for GIS
ALLELE_DOSAGE_RATIO
Proprietary Myriad Genetics estimate of b-allele dosage based on b-allele noise/signal ratio. B-Allele noise is correlated with coverage; lower coverage samples will have higher noise. B-allele signal is also correlated with tumor fraction; a higher tumor fraction produces a higher signal for b-allele sites. Samples with lower tumor fraction and higher amount of noise (or lower coverage) will have higher Allele Dosage Ratio. The upper limit of the score is 50, therefore any sample with 50 Allele Dosage Ratio can be assumed to have tumor fraction close to zero and typically has a GIS = 0.
DNA Expanded Metrics
MEDIAN_TARGET_HRD_COVERAGE
Median target fragment coverage across all target positions in the genome. Coverage is the total number of non-duplicate pair alignments that overlap.
DNA Expanded Metrics
RNA expanded metrics are provided for information only. They can be informative for troubleshooting but are provided without explicit specification limits and are not directly used for sample quality control. For additional guidance, contact Illumina Technical Support.
PCT_CHIMERIC_READS
Percentage of reads that are aligned as two segments which map to nonconsecutive regions in the genome.
%
PCT_ON_TARGET_READS
Percentage of reads that cross any part of the target region versus total reads. A read that partially maps to a target region is counted as on target.
%
SCALED_MEDIAN_GENE_COVERAGE
Median of median base coverage of genes scaled by length. An indication of median coverage depth of genes in the panel.
Count
TOTAL_PF_READS
Total number of reads passing filter.
Count
GENE_MEDIAN_COVERAGE
The median coverage depth of all genes in the panel.
Count
GENE_ABOVE_MEDIAN_CUTOFF
Number of genes above the median coverage cutoff.
Count
PER_GENE_MEDIAN_COVERAGE
Median deduped coverage across each gene (available in Logs_Intermediates only)
Count
PCT_SOFT_CLIPPED_BASES
percentage of based that were not used for alignment but retained as part of the alignment file
%
RNA_PCT_030_BASES
Average percentage of bases ≥ Q30. A prediction of the probability of an incorrect base call (Q‑score). Troubleshooting: An indicator of sequencing run quality, low Q30 across all samples on a run could be the result of run overclustering.
%