arrow-left

All pages
gitbookPowered by GitBook
1 of 4

Loading...

Loading...

Loading...

Loading...

RNA Expanded Metrics

RNA expanded metrics are provided for information only. They can be informative for troubleshooting but are provided without explicit specification limits and are not directly used for sample quality control. For additional guidance, contact Illumina Technical Support.

Metric
Description
Units

PCT_CHIMERIC_READS

Percentage of reads that are aligned as two segments which map to nonconsecutive regions in the genome.

%

PCT_ON_TARGET_READS

Percentage of reads that cross any part of the target region versus total reads. A read that partially maps to a target region is counted as on target.

%

SCALED_MEDIAN_GENE_COVERAGE

Median of median base coverage of genes scaled by length. An indication of median coverage depth of genes in the panel.

Count

TOTAL_PF_READS

Total number of reads passing filter.

Count

GENE_MEDIAN_COVERAGE

The median coverage depth of all genes in the panel.

Count

GENE_ABOVE_MEDIAN_CUTOFF

Number of genes above the median coverage cutoff.

Count

PER_GENE_MEDIAN_COVERAGE

Median deduped coverage across each gene (available in Logs_Intermediates only)

Count

PCT_SOFT_CLIPPED_BASES

percentage of based that were not used for alignment but retained as part of the alignment file

%

RNA_PCT_030_BASES

Average percentage of bases ≥ Q30. A prediction of the probability of an incorrect base call (Q‑score). Troubleshooting: An indicator of sequencing run quality, low Q30 across all samples on a run could be the result of run overclustering.

%

Quality Control

The software calculates several quality control metrics for runs and samples.

circle-check

These metrics and guidelines apply to DRAGEN TSO 500 v2.1 and above.

hashtag
Run QC

The Run Metrics section of the metrics output report provides sequencing run quality metrics along with suggested values to determine if they are within an acceptable range. The overall percentage of reads passing filter is compared to a minimum threshold. For Read 1 and Read 2, the average percentage of bases ≥ Q30, which gives a prediction of the probability of an incorrect base call (Q‑score), are also compared to a minimum threshold. The following tables show run metric and quality threshold information for different systems.

The values in the Run Metrics section are listed as NA in the following situations:

  • If the analysis was started from FASTQ files.

  • If the analysis was started from BCL files and the InterOp files are missing or corrupt.

hashtag
NextSeq 500/550 or NextSeq 550Dx (RUO)

Metric
Description
Recommended Guideline Quality Threshold
Variant Class

hashtag
NovaSeq 6000 or NovaSeq 6000Dx (RUO)

Metric
Description
Recommended Guideline Quality Threshold
Variant Class

hashtag
NextSeq 1000/2000

Metric
Description
Recommended Guideline Quality Threshold
Variant Class

hashtag
NovaSeq X

Metric
Description
Recommended Guideline Quality Threshold
Variant Class

hashtag
DNA Sample QC

DRAGEN TruSight Oncology 500 uses QC metrics to assess the validity of analysis for DNA libraries that pass contamination quality control. If the library fails one or more quality metrics, then the corresponding variant type or biomarker is not reported, and the associated QC category in the report header displays FAIL. Additionally, a companion diagnostic result may not be available if it relies on QC passing for one or more of the following QC categories.

DNA library QC results are available in the MetricsOutput.tsv file.

Metric
Description
Recommended Guideline Quality Threshold
Variant Class

hashtag
RNA Sample QC

The input for RNA Library QC is RNA alignment. Metrics and guideline thresholds can be found in the MetricsOutput.tsv file.

Metric
Description
Recommended Guideline Quality Threshold
Variant Classes

*TOTAL_ON_TARGET_READS is the only QC metric with guidelines specific to chemistry (v1 vs. v2 assay); all other guidelines are applicable to both

** To avoid failing RNA samples unnecessarily, Illumina does not recommend a universal threshold for GENE_MEDIAN_COVERAGE to determine RNA sample quality. RNA expression varies significantly across tissue types and a small panel size (55 genes), which makes normalization challenging. Tissue-specific thresholds could be considered for normalization.

All

All

Small variant TMB

PCT_EXON_50X

Percent exon bases with 50x fragment coverage.

≥ 90.0

Small variant TMB

MEDIAN_INSERT_SIZE

The median fragment length in the sample.

≥ 70

Small variant TMB

USABLE_MSI_SITES

The number of MSI sites usable for MSI calling.

≥ 40

MSI

MEDIAN_BIN_COUNT_CNV_TARGET

The median raw bin count per CNV target.

≥ 1.0

CNV

PCT_TARGET_HRD_50X (HRD samples)

Percent of HRD probe SNP panel covered by at least 50X coverage

≥ 50

GIS

EXCESSIVE_TF (HRD samples)

EXCESSIVE TF indicates if there is excessive tumor content in sample. Troubleshooting: Samples with pure tumor fraction >90% are outside the design for GIS estimation (this includes pure tumor cell lines)

= 0 (= 1 indicates Excessive TF)

GIS

Fusion, Splice

GENE_MEDIAN_COVERAGE**

The median deduped coverage across all genes in the RNA panel (55 genes).

N/A

Fusion, Splice

PCT_PF_READS (%)

Total percentage of reads passing filter.

≥80.0

All

PCT_Q30_R1 (%)

Percentage of Read 1 reads with quality score ≥ 30.

≥80.0

All

PCT_Q30_R2 (%)

Percentage of Read 2 reads with quality score ≥ 30.

PCT_PF_READS (%)

Total percentage of reads passing filter.

≥55.0

All

PCT_Q30_R1 (%)

Percentage of Read 1 reads with quality score ≥ 30.

≥80.0

All

PCT_Q30_R2 (%)

Percentage of Read 2 reads with quality score ≥ 30.

PCT_Q30_R1 (%)

Percentage of Read 1 reads with quality score ≥ 30.

≥85.0

All

PCT_Q30_R2 (%)

Percentage of Read 2 reads with quality score ≥ 30.

≥85.0

All

PCT_Q30_R1 (%)

Percentage of Read 1 reads with quality score ≥ 30.

≥85.0

All

PCT_Q30_R2 (%)

Percentage of Read 2 reads with quality score ≥ 30.

≥85.0

All

CONTAMINATION_SCORE

The contamination score is based on VAF distribution of SNPs.

≤ 1457

All

MEDIAN_EXON_COVERAGE

Median exon fragment coverage across all exon bases.

≥ 150

Small variant TMB

PCT_CHIMERIC_READS

Proportion of total number of non-supplementary, non-secondary, and passing QC reads after alignment to the whole genome sequence.

MEDIAN_CV_GENE_500X

The median CV for all genes with median coverage > 500x. Genes with median coverage > 500x are likely to be highly expressed. Higher CV median > 500x indicates an issue with library preparation (poor sample input and/or probes pulldown issue).

≤ 0.93

Fusion, Splice

MEDIAN_INSERT_SIZE

The median fragment length in the sample.

≥ 80

Fusion, Splice

TOTAL_ON_TARGET_READS*

The total number of reads that map to the target regions.

≥80.0

≥80.0

≤ 8

≥ 9000000 (v1)

≥ 2,500,000 (v2)

DNA Expanded Metrics

DNA expanded metrics are provided for information only. They can be informative for troubleshooting but are provided without explicit specification limits and are not directly used for sample quality control. For additional guidance, contact Illumina Technical Support.

Metric
Description
Troubleshooting

TOTAL_PF_READS (count)

Total number of non-supplementary, non-secondary, and passing QC reads after alignment to the whole genome sequence.

Primarily driven by data output of sequencer, quality of library and balancing of library in library pool. If TOTAL_PF_READS is in line with other samples, but coverage metrics are more may suggest non-specific enrichment.

Low values for all samples indicate a poor quality run with possible low cluster numbers or low numbers of Q30 and PF%.

A low value for an individual sample indicates poor pooling of this library into the final pool.

MEAN_FAMILY_SIZE (count)

A UMI Family is a group of reads that all have the same UMI barcode. The family size is the number of reads in family. MEAN_FAMILY_SIZE is the mean of the entire population of reads assembled into UMI families. In V1 chemistry only the TSO500 manifest is considered while in V2 the TSO500 and HRD manifests are both considered.

The mean UMI family size decreases with increased unique read numbers, and more input DNA leads to more unique reads. Conversely over sequencing of a fixed population of unique DNA molecules leads to increased family size.

As a guide, for a good run with optimal cluster density, passing specs, even sample pooling, and good quality DNA we usually observe values <10.

UMI family size = 1 is not ideal as it is harder to correct for errors.

UMI family size of 2 to 5 enables efficient error correction without wasting sequencing capacity on high percentages of duplicate reads.

Contamination

The contamination score evaluates presence of sample-to-sample contamination. The algorithm uses common germline SNPs in the homozygous state expected to have variant allele frequencies (VAF) at 0% and 100%. In contaminated samples, the VAFs shift away from the expected values allowing the detection of sample-to-sample contamination.

circle-check

The contamination score can detect sample-to-sample contamination greater than or equal to 2% (more than 2% of DNA input is coming from the non-source sample)

hashtag
Contamination Score Calculation

The contamination score is calculated using the SNP error file and Pileup file that are generated during the small variant calling, as well as the TMB trace file. The algorithm includes the following steps:

  • All positions that overlap with a pre-defined set of common SNPs that have variant allele frequencies of < 25% or > 75% are collected (only SNP are considered, indels are excluded)

  • Variants in CNV events are removed using a clustering method

  • The likelihood that the positions are an error or a real mutation is calculated by:

CONTAMINATION_SCORE = sum(log10(P(vi is False Positive)))

hashtag
Contamination Score Interpretation

  • The contamination score is output in the metrics output file, MetricsOutput.tsv

  • If a contamination score is equal or below 1457 (the upper specification limit provided in the "USL Guideline" field in the metrics output file, see ), the sample has less than 2% sample-to-sample contamination.

  • If a contamination score is above 1457, the sample has more than 2% sample-to-sample contamination. In this case, an estimation of the contamination can be obtained from the PCT_CONTAMINATION_EST metric, see more details on the . As noted, PCT_CONTAMINATION_EST is not valid unless the contamination score exceeds 1457.

circle-exclamation

Samples with highly rearranged genomes (HRD samples) can have variants with VAFs that shift away from the expected frequencies due to genomic rearrangement, which can lead to false-positive contamination scores

  • Visual examination can help determine if a shift of VAFs is due to true contamination

hashtag
How to build a VAF plot for visual examination

  1. To build a VAF plot, use the {Sample_ID}.tmb.trace.csv file. Filter to only germline variants (for example, by using tags "Germline_DB" and "Germline_Proxi" in the column "Status") and use values in the VAF column.

  2. Select Scatter from the Charts menu

  3. Review plot as described above analyzing whether variants are scattered or clustered around 50% and 100% VAF

MEDIAN_TARGET_COVERAGE (count)

Median depth across all the unique loci occurring in all regions of the manifest file.

Lower median target coverage may be due to poor sample input/quality, library preparation issues or low sequencing output.

PCT_EXON_100X (%)

Percentage of exon bases with 100X fragment coverage. Calculated against all regions in manifest containing _exon in name.

Can be used in combination with other PCT_EXON metrics to understand under or over coverage of exons.

PCT_READ_ENRICHMENT (%)

Percentage of reads that have overlapping sequence with the target regions defined in the sample manifest. In V1 chemistry only the TSO500 manifest is considered while in V2 the TSO500 and HRD manifests are both considered.

Indicative of general enrichment performance. Reduced proportions of enriched reads may indicate issues with the enrichment proportion of the library preparation.

PCT_USABLE_UMI_READS (%)

Percentage of reads that have valid UMI sequences associated with them.

As UMI reads are sequenced at the start of each read, loss of valid UMI sequence may be cause by sequencing issues impacting the quality of base calling in this portion of the sequencing read.

MEAN_TARGET_COVERAGE (count)

Mean depth across all the unique loci defined in the manifest file.

Lower mean target coverage may be due to poor sample input/quality, library preparation issues or low sequencing output. Large differences between the median and mean target coverage values may indicated a skewed distribution of target coverage.

PCT_ALIGNED_READS (%)

Proportion of aligned reads that are non-supplementary, non-secondary and pass QC versus aligned reads that are non-supplementary, non-secondary, mapped and pass QC.

PCT_CONTAMINATION_EST (%)

This metric should only be evaluated if the CONTAMINATION_SCORE metric exceed the USL. This metric estimates the amount of contamination in a sample. The contamination level is computed by taking 2.0* the average of the adjusted allele frequencies of all variants that were selected. The adjusted alllele frequency is either the actual allele frequency of the variant if it is less than 0.5, or 1 -allele frequency if it is greater than or equal to 0.5.

If the sample does not fail the CONTAMINATION_SCORE this metric has no intended meaning as it will be driven by statistical noise (e.g. the few variants that naturally fall outside an expected interval around 0.5 due to random chance)

High contamination estimates may be due to any of the following:

Inter-sample contamination caused by mixing of samples during extraction or library preparation.

Intra-sample contamination, due to mixing of clonally different cell populations during extraction. Large scale genomic rearrangements that cause unexpected VAFs for large numbers of variants.

PCT_TARGET_0.4X_MEAN (%)

Parentage of target (all locations in manifest) reads that have a coverage depth of greater the 0.4x the mean target coverage depth (see definition above).

Provides an indication of uniformity of coverage of the target regions in the manifest file. When trended over time reductions in this metric may indicate an issue with the enrichment process resulting in coverage bias.

PCT_TARGET_50X (%)

Percentage of target bases with 50X fragment coverage. Calculated against all regions in manifest file.

Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.

PCT_TARGET_100X (%)

Percentage of target bases with 100X fragment coverage. Calculated against all regions in manifest file.

Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.

PCT_TARGET_250X (%)

Percentage of target bases with 250X fragment coverage. Calculated against all regions in manifest file.

Can be used in combination with other PCT_TARGET metrics to understand under or over coverage of targets.

PCT_SOFT_CLIPPED_BASES (%)

percentage of based that were not used for alignment but retained as part of the alignment file

Soft clipped reads are used as a part of the downstream analysis for small variants calling. A higher-than-expected number could indicate a low-quality enrichment step.

PCT_Q30_BASES (%)

Average percentage of bases ≥ Q30. A prediction of the probability of an incorrect base call (Q‑score).

An indicator of sequencing run quality, low Q30 across all samples on a run could be the result of run overclustering.

ALLELE DOSAGE_RATIO (HRD samples)

Proprietary Myriad Genetics estimate of b-allele dosage based on b-allele noise/signal ratio. B-Allele noise is correlated with coverage; lower coverage samples will have higher noise. B-allele signal is also correlated with tumor fraction; a higher tumor fraction produces a higher signal for b-allele sites. Samples with lower tumor fraction and higher amount of noise (or lower coverage) will have higher Allele Dosage Ratio.

The upper limit of the score is 50, therefore any sample with 50 Allele Dosage Ratio can be assumed to have tumor fraction close to zero and typically has a GIS = 0.

MEDIAN TARGET HRD (HRD samples)

Median target fragment coverage across all target positions in the HRD manifest. Coverage is the total number of non-duplicate pair alignments that overlap.

Estimating the error rate per sample

  • Counting mutation support

  • Counting total depth

  • The contamination score is calculated as the sum of all the log likelihood scores across the pre-defined SNP positions whose minor allele frequency is <25% in the sample and not likely due to CNV events:

  • Metrics Output page
    DNA Expanded Metrics page
    Visual investigation of VAFs across the genome can help determine if a shift of VAFs is due to true contamination