arrow-left

All pages
gitbookPowered by GitBook
1 of 4

Loading...

Loading...

Loading...

Loading...

Quality Control

Contamination

The contamination score evaluates presence of sample-to-sample contamination. The algorithm uses common germline SNPs in the homozygous state expected to have variant allele frequencies (VAF) at 0% and 100%. In contaminated samples, the VAFs shift away from the expected values allowing the detection of sample-to-sample contamination.

circle-check

The contamination score can detect sample-to-sample contamination greater than or equal to 0.4% (more than 0.4% of DNA input is coming from the contaminant)

hashtag
Contamination Score Calculation

The contamination score is calculated using the SNP error file and Pileup file that are generated during the small variant calling, as well as the TMB trace file. The algorithm includes the following steps:

  • All positions that overlap with a pre-defined set of common SNPs that have variant allele frequencies of < 25% or > 75% are collected (only SNP are considered, indels are excluded)

  • Variants in CNV events are removed using a clustering method

  • The likelihood that the positions are an error or a real mutation is calculated by:

CONTAMINATION_SCORE = sum(log10(P(vi is False Positive)))

hashtag
Contamination Score Interpretation

  • The contamination score is output in the metrics output file, MetricsOutput.tsv

  • If a contamination score is equal or below 1227 (the upper specification limit provided in the "USL Guideline" field in the metrics output file, see ), the sample has less than 0.4% sample-to-sample contamination.

  • If a contamination score is above 1227, the sample has more than 0.4% sample-to-sample contamination. In this case, an estimation of the contamination can be obtained from the PCT_CONTAMINATION_EST metric, see more details on the . As noted, PCT_CONTAMINATION_EST is not valid unless the contamination score exceeds 1227.

circle-exclamation

Samples with highly rearranged genomes (HRD samples) can have variants with VAFs that shift away from the expected frequencies due to genomic rearrangement, which can lead to false-positive contamination scores

  • Visual examination can help determine if a shift of VAFs is due to true contamination

hashtag
How to build a VAF plot for visual examination

  1. To build a VAF plot, use the {Sample_ID}.tmb.trace.csv file. Filter to only germline variants (for example, by using tags "Germline_DB" and "Germline_Proxi" in the column "Status") and use values in the VAF column.

  2. Select Scatter from the Charts menu

  3. Review plot as described above analyzing whether variants are scattered or clustered around 50% and 100% VAF

Estimating the error rate per sample

  • Counting mutation support

  • Counting total depth

  • The contamination score is calculated as the sum of all the log likelihood scores across the pre-defined SNP positions whose minor allele frequency is <25% in the sample and not likely due to CNV events:

  • Metrics Output page
    DNA Expanded Metrics page
    Visual investigation of VAFs across the genome can help determine if a shift of VAFs is due to true contamination

    Run QC

    The Run Metrics section of the metrics output report provides sequencing run quality metrics along with suggested values to determine if they are within an acceptable range. The overall percentage of reads passing filter is compared to a minimum threshold. For Read 1 and Read 2, the average percentage of bases ≥ Q30, which gives a prediction of the probability of an incorrect base call (Q‑score), are also compared to a minimum threshold. The following tables show run metric and quality threshold information for different systems.

    The values in the Run Metrics section are listed as NA in the following situations:

    • If the analysis was started from FASTQ files.

    • If the analysis was started from BCL files and the InterOp files are missing or corrupt.

    hashtag
    NovaSeq 6000 or NovaSeq 6000Dx (RUO)

    Metric
    Description
    Recommended Guideline Quality Threshold
    Variant Class

    hashtag
    NovaSeq X

    Metric
    Description
    Recommended Guideline Quality Threshold
    Variant Class
    circle-info

    There is no PCT_PF_READS value in NovaSeqX Plus runs, so the PCT_PF_READS value will always be NA

    All

    PCT_PF_READS (%)

    Total percentage of reads passing filter.

    ≥55.0

    All

    PCT_Q30_R1 (%)

    Percentage of Read 1 reads with quality score ≥ 30.

    ≥80.0

    All

    PCT_Q30_R2 (%)

    Percentage of Read 2 reads with quality score ≥ 30.

    PCT_Q30_R1 (%)

    Percentage of Read 1 reads with quality score ≥ 30.

    ≥85.0

    All

    PCT_Q30_R2 (%)

    Percentage of Read 2 reads with quality score ≥ 30.

    ≥85.0

    All

    ≥80.0

    Sample QC

    DRAGEN TruSight Oncology 500 uses QC metrics to assess the validity of analysis for DNA libraries that pass contamination quality control. If the library fails one or more quality metrics, then the corresponding variant type or biomarker is not reported, and the associated QC category in the report header displays FAIL.

    DNA library QC results are available in the MetricsOutput.tsv file. Refer to Metrics Output for details.

    Metric
    Description
    Recommended Guideline Quality Threshold
    Variant Class

    CONTAMINATION_SCORE

    The contamination score is based on VAF distribution of SNPs.

    ≤ 1227

    All

    MEDIAN_EXON_COVERAGE

    Median exon fragment coverage across all exon bases.

    ≥ 1300

    Small variant, TMB, Fusion, MSI

    PCT_EXON_1000X

    Percent exon bases with 1000x fragment coverage.

    ≥ 80.0

    Small variant, TMB

    GENE_SCALED_MAD

    The median of absolute deviations normalized by gene fold change.

    ≤ 0.059*

    CNV

    MEDIAN_BIN_COUNT_CNV_TARGET

    The median raw bin count per CNV target.

    ≥ 6.0

    CNV