# TMB

Tumor mutational burden (TMB) is a total number of somatic mutations present within the cancer genome.

To calculate TMB, the algorithm follows the following steps.

## Small variant calling

Refer to [Small Variants](https://github.com/illumina-swi/tso500/blob/prod-tso500/docs/dragen-tso-500-ctdna-guides/dragen-tso-500-ctdna-v2.6/analysis-methods/broken-reference/README.md) on how small variants are called.

## Eligible region detection

TMB is computed over protein coding regions with sufficient coverage, excluding low confidence regions (our blocklist regions.) In case of the DRAGEN TSO 500 ctDNA analysis software, the total coding region with coverage ≥ 1000X is used.

## Germline variant identification

To exclude germline variants from TMB calculation, the algorithm includes two methods for predicting germline variant origin.

### **1. Database filter**

Variants with a population allele count ≥ 10 in either the 1000 Genome or gnomAD database are marked as germline and assigned a tag *Germline\_DB* in the “tmb.trace.tsv” and “hard-filtered.vcf” files.

### **2. Proxi filter**

In the TSO 500 ctDNA pipeline, the proxi filter uses a probabilistic approach. For a target variant, it estimates the expected germline allele frequency using the surrounding germline variants. It then tests whether the allele frequency of the target variant is similar to the expected germline allele frequency. If the allele frequency is similar to expected, a tag *Germline\_Proxi* is assigned to the target variant in the “tmb.trace.tsv” and “hard-filtered.vcf” files.

{% hint style="info" %}
Note that proxi filter does not work well for 100% pure cell lines as well as for mixed or contaminated samples, as these samples do not have clear germline variant allele frequency distributions.
{% endhint %}

## Clonal hematopoiesis (CH) variant identification

Clonal hematopoiesis (CH) is characterized by the overrepresentation of blood cells derived from a single clone. CH is common and increases in prevalence with age. For the accurate determination of TMB, the CH variants need to be excluded.

The TSO 500 ctDNA pipeline uses two methods to tag variants as CH variants.

### 1. CH genes whitelist

Some of the most commonly mutated genes in clonal hematopoiesis, DNMT3A, TET2, PPM1D, and ASXL1, are included into the CH genes whitelist. If the variant is in one of these genes, a tag *Somatic\_Putative\_CH* is assigned to the variant in the “tmb.trace.tsv” and “hard-filtered.vcf” files.

### 2. cfDNA fragment size analysis

CH-derived cfDNA fragments are generally longer compared to tumor-derived cfDNA, which tends to be shorter. This difference is used to identify CH variants based on the fragment size of reads supporting variant calls. Non-germline variants from the longer fragments are tagged as *Somatic\_Putative\_CH* in the “tmb.trace.tsv” and “hard-filtered.vcf” file.

Only variants with sufficient level of supporting reads or variant allele counts (VAC) > 50 are tested for fragment size difference between the reads supporting reference allele and reads supporting the variant allele. Non-germline variants with lower levels of VAC or without enough statistical power for the size difference test will remain tagged as *Somatic* in the “tmb.trace.tsv” and “hard-filtered.vcf” file.

<figure><img src="https://3845108255-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F7XRWgkRPkhoHXVslBqXD%2Fuploads%2Fgit-blob-6ed816cd3ee317852c7e0446c04d7d31d35f5771%2Fimage%20(1)%20(2).png?alt=media" alt=""><figcaption><p>Germline variant and clonal hematopoiesis (CH) variant identification in the TMB algorithm.</p></figcaption></figure>

## Tumor driver variant identification

Excluding tumor driver variants helps reduce bias for the bTMB calculations that could be due to targeted enrichment of the panel of genes. Variants with count ≥ 50 in the COSMIC database are treated as tumor driver variants and excluded from the calculation.

## Nonsynonymous variant identification

The nonsynonymous variant are defined as described in the [DRAGEN user guide](https://help.connected.illumina.com/dragen/product-guides/dragen-v4.4/dragen-dna-pipeline/biomarkers/biomarker-tmb). Only nonsynonymous variants are used to calculate Nonsynonymous TMB.

## TMB calculation

The TMB is calculated using the following equations:

$$TMB = {Eligible\ Variants \over Effective\ Panel\ Size}$$

$$Nonsynonymous TMB = {Filtered\ Nonsynonymous\ Variants \over Eligible\ Region\ Size (Mbp)}$$

The eligible variants and effective panel size of the TMB calculation are summarized in the following table:

| Calculation Value                  | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| ---------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Eligible variants (numerator)      | <ul><li>Variants in the coding region (RefSeq Cds)</li><li>Variant frequency ≥ 0.2%</li><li>Coverage ≥ 1000X</li><li>SNVs and Indels (MNVs excluded)</li><li>Nonsynonymous and synonymous variants. Only nonsynonymous variants are used for Nonsynonymous TMB.</li><li>Variants with count ≥ 50 in the COSMIC database are excluded</li><li>Mutations in ASXL1, DNMT3A, PPM1D, and TET2 are excluded</li><li>Fragment-size based potential clonal hematopoiesis (CH) variants are excluded</li></ul> |
| Effective panel size (denominator) | Total coding region with coverage ≥ 1000X.                                                                                                                                                                                                                                                                                                                                                                                                                                                            |

## TMB Output Files

The TMB algorithm outputs results in several files:

1. Combined Variant Output File, `{SampleID}_CombinedVariantOutput.tsv`
2. TMB Metrics CSV file, `{Sample_ID}.tmb.metrics.csv`
3. TMB Trace TSV file, `{Sample_ID}.tmb.trace.tsv`
4. TMB Max Somatic VAF file, `{Sample_ID}.tmb.msaf.csv`

### 1. Combined Variant Output File

File name: `{SampleID}_CombinedVariantOutput.tsv`

The TMS results are output in the section \[TMB] and include:

* The TMB value
* Coding Region Size in Megabases (a denominator for the [TMB formula](#tmb-calculation))
* Number of Passing Eligible Variants (a numerator for the [TMB formula](#tmb-calculation))

### 2. TMB Metrics CSV

File name: `{Sample_ID}.tmb.metrics.csv`

The TMB metrics file contains the TMB and Nonsynonimous TMB calculation results and values used to calculated them for each DNA sample.

| Column                                  | Description                                                                                                                                                                                                   |
| --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Total Input Variant Count               | Total number of variant considered by the algorithm                                                                                                                                                           |
| Total Input Variant Count in TMB region | Total number of variant considered by the algorithm in the TMB eligible region                                                                                                                                |
| Filtered Variant Count                  | Variants remaining after filtering, see [TMB algorithm page](https://help.connected.illumina.com/tso500/dragen-tso-500-ctdna-guides/dragen-tso-500-ctdna-v2.6/analysis-methods/tmb) for details               |
| Filtered Nonsyn Variant Count           | Nonsynonymous variants remaining after filtering, see [TMB algorithm page](https://help.connected.illumina.com/tso500/dragen-tso-500-ctdna-guides/dragen-tso-500-ctdna-v2.6/analysis-methods/tmb) for details |
| Eligible Region (MB)                    | The eligible region, in megabases, that meet the minimum coverage threshold.                                                                                                                                  |
| TMB                                     | TMB value for the sample                                                                                                                                                                                      |
| Nonsyn TMB                              | Nonsynonymous TMB value for the sample                                                                                                                                                                        |

### 3. TMB Trace File

The TMB trace file provides comprehensive information on how the TMB value is calculated for a given sample. All passing small variants from the small variant filtering step are included in this file. To view eligible variants for TMB calculation, set the filter for the column IncludedInTMBNumerator to TRUE.

{% hint style="danger" %}
Variant statuses (somatic, germline, clonal hematopoiesis (CH) variant) are predictions intended for TMB calculation. Use caution if using them separately as their performance has not been tested outside of the TMB algorithm.
{% endhint %}

| Column                   | Description                                                                                                                                                                                                                                                                                                                                                                              |
| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Chromosome               | Chromosome                                                                                                                                                                                                                                                                                                                                                                               |
| Position                 | Position of variant                                                                                                                                                                                                                                                                                                                                                                      |
| RefCall                  | Reference base                                                                                                                                                                                                                                                                                                                                                                           |
| AltCall                  | Alternate base                                                                                                                                                                                                                                                                                                                                                                           |
| VAF                      | Variant allele frequency                                                                                                                                                                                                                                                                                                                                                                 |
| Depth                    | Coverage of position                                                                                                                                                                                                                                                                                                                                                                     |
| CytoBand                 | Cytoband of variant                                                                                                                                                                                                                                                                                                                                                                      |
| GeneName                 | Name of gene if applicable. A semicolon delimited list is used for multiple genes.                                                                                                                                                                                                                                                                                                       |
| VariantType              | Type of the variant: SNV, insertion, deletion, MNV                                                                                                                                                                                                                                                                                                                                       |
| CosmicIDs                | Cosmic IDs, if multiple concatenated by “;”                                                                                                                                                                                                                                                                                                                                              |
| MaxCosmicCount           | Maximum COSMIC study count                                                                                                                                                                                                                                                                                                                                                               |
| ClinVarIDs               | Reference ClinVar Variation IDs (RCV IDs)                                                                                                                                                                                                                                                                                                                                                |
| ClinVarSignificance      | Variant Classification in ClinVar database                                                                                                                                                                                                                                                                                                                                               |
| AlleleCountsGnomadExome  | Variant allele count in gnomAD exome database                                                                                                                                                                                                                                                                                                                                            |
| AlleleCountsGnomadGenome | Variant allele count in gnomAD genome database                                                                                                                                                                                                                                                                                                                                           |
| AlleleCounts1000Genomes  | Variant allele count in 1000 Genomes database                                                                                                                                                                                                                                                                                                                                            |
| MaxDatabaseAlleleCounts  | Maximum variant allele count over the three databases                                                                                                                                                                                                                                                                                                                                    |
| GermlineFilterDatabase   | TRUE if variant was filtered by the database filter                                                                                                                                                                                                                                                                                                                                      |
| GermlineFilterProxi      | TRUE if variant was filtered by the proxi filter                                                                                                                                                                                                                                                                                                                                         |
| CodingVariant            | TRUE if variant is in the coding region                                                                                                                                                                                                                                                                                                                                                  |
| Nonsynonymous            | TRUE if variant has any transcript annotations with nonsynonymous consequences                                                                                                                                                                                                                                                                                                           |
| IncludedinTMBNumerator   | TRUE if variant is used in the TMB calculation                                                                                                                                                                                                                                                                                                                                           |
| Status                   | *Germline\_DB* or *Germline\_Proxi* if the variant was filtered by [the Database or the Proxi filter](#id-3.-germline-variant-identification), correspondingly. *Somatic\_Putative\_CH* if the variant was predicted to be associated with [clonal hematopoiesis (CH)](#id-4.-clonal-hematopoiesis-ch-variant-identification). *Somatic* - variants not determined to be germline or CH. |
| ProteinChange            | p.HGVS                                                                                                                                                                                                                                                                                                                                                                                   |
| CDSChange                | c.HGVS                                                                                                                                                                                                                                                                                                                                                                                   |
| Exons                    | Exon, where the variant is located                                                                                                                                                                                                                                                                                                                                                       |
| Consequence              | Variant consequence                                                                                                                                                                                                                                                                                                                                                                      |

### 4. TMB Max Somatic VAF file

The file outputs a variant with the [Max Somatic VAF](https://help.connected.illumina.com/tso500/dragen-tso-500-ctdna-guides/dragen-tso-500-ctdna-v2.6/analysis-methods/max-somatic-vaf), using the same file format as the [TMB Trace File](#tmb-trace-file).
