> For the complete documentation index, see [llms.txt](https://help.connected.illumina.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://help.connected.illumina.com/tso500/dragen-tso-500-ctdna-guides/dragen-tso-500-ctdna-v2.6/analysis-methods/msi.md).

# MSI

The DRAGEN microsatellite instability status (MSI) module assesses microsatellite sites for evidence of microsatellite instability relative to a set of baseline normal samples. The algorithm is designed for robust performance when working with liquid biopsy samples that frequently have low tumor fraction and high noise level. Compared to other technologies for MSI detection, the algorithm uses higher number of microsatellite sites (>2,300) and shorter site size (6-7 bp) reducing error rates and decreasing potential false positives commonly found in homopolymer sequencing.

<table><thead><tr><th width="166.98828125">Parameter</th><th width="264.18359375">TSO 500 ctDNA MSI algorithm</th><th width="266.984375">Other technologies</th></tr></thead><tbody><tr><td>Site size</td><td>6 – 7 bp</td><td>10+ bp</td></tr><tr><td>Number of sites</td><td>> 2,300</td><td><p>PCR: 5 – 7</p><p>Tissue NGS: ~20-150</p><p>Other ctDNA NGS: ~50-200</p></td></tr><tr><td>Output</td><td>SumJSD (aggregated signature across unstable sites)</td><td>% unstable sites</td></tr></tbody></table>

{% hint style="info" %}
bMSI, or blood-based Microsatellite Instability, is a term often used when analyzing liquid biopsy samples to distinguish from MSI, tissue-based Microsatellite Instability. DRAGEN TSO 500 ctDNA Analysis Software uses MSI to maintain alignment with DRAGEN TSO 500 Analysis Software and DRAGEN MSI algorithms, however, is designed specifically for bMSI determination in liquid biopsy samples.
{% endhint %}

### MSI Algorithm

The MSI algorithm uses the following steps:

1. **MSI Allele Counting.** The algorithm takes as an input the BAM file from the DNA alignment and read collapsing step. For all 2,343 selected MSI sites, the number of spanning duplex collapsed reads is calculated. Only reads spanning the entire MSI site are included to ensure accuracy.
2. **MSI Site Classification.** For MSI sites with 500 or more spanning duplex collapsed reads, Jensen – Shannon distance (JSD), an information entropy metric, is calculated using the test sample vs baseline normal samples, and then any two baseline normal samples. If the JSD is significantly higher in the test sample vs baseline normal with p-value ≤ 0.01 and the JSD difference is ≥ 0.02, the site is considered unstable.
3. **Final Score Calculation.** The final MSI score, SumJSD, aggregates JSD scores across all unstable sites

<figure><img src="/files/CP7ujrba9qCtjPH38B5t" alt=""><figcaption><p>MSI Algorithm in TruSight Oncology 500 ctDNA v2 assay</p></figcaption></figure>

### MSI Output Files

The MSI algorithm outputs results in the following files:

* Combined Variant Output File, `{SampleID}_CombinedVariantOutput.tsv`
* MSI Output JSON file, `{Sample_ID}.microsat_output.json`

#### 1. Combined Variant Output File

File name: `{SampleID}_CombinedVariantOutput.tsv`

The MSI results are output in the section \[MSI] and include:

* `SUM_JSD` value, a sum of SJD scores across unstable sites.

#### 2. MSI Output JSON File

File name: `{Sample_ID}.microsat_output.json`

The file is output in the intermediate files (under `Logs_Intermediates/DragenCaller`) and includes the following information:

| Field                            | Description                                                                                                                                                                                       |
| -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Settings                         | A set of parameters including p-value and spanning reads coverage threshold.                                                                                                                      |
| TotalMicrosatelliteSitesAssessed | Total number of sites with sufficient coverage (500 or more spanning duplex collapsed reads).                                                                                                     |
| TotalMicrosatelliteSitesUnstable | Total sites with p-value ≤ 0.01 and the mean JSD difference ≥ 0.02                                                                                                                                |
| PercentageUnstableSites          | <p>Percent of unstable sites, calculated as TotalMicrosatelliteSitesUnstable</p><p>divided by TotalMicrosatelliteSitesAssessed and multiplied by 100.</p>                                         |
| ResultIsValid                    | `True` or `False`. The ResultValid field will be false if the total number of assessed sites are less than or equal to 20.                                                                        |
| ResultMessage                    | If the results are not valid, this field will have a message indicating if the number of the assessed sites is lower than the threshold or the coverage of the sites is lower than the threshold. |
| SumDistance                      | A sum of JSD scores across unstable sites.                                                                                                                                                        |

### MSI result interpretation

Based on the internal cohort of 294 data points from 136 unique cfDNA samples (127 and 9 unique MSS/normal and MSI-H samples, respectively), we developed an empirical cut-off for SumJSD score to identify MSI-H samples: SumJSD >0.08. The value optimizes false positives (in normals) and false negatives (in cancer samples). Users are recommended to test and adjust the cut-off as needed for their application. Refer to the page [Commercial Control Use with TSO 500 ctDNA](/tso500/performance-testing/commercial-control-use-with-tso-500-ctdna.md) for recommendations on sample use to test MSI analytical performance.

### Microsatellite site selection

For addition background, microsatellite site selection for the algorithm was guided by the need to optimize detection for low fraction samples, which is a frequent occurrence in liquid biopsy testing. The gold standard, the Promega MSI-PCR assay, utilizes 5-10 polyA/T sites with 25+bp repeats. The TSO 500 ctDNA panel includes 7 of the canonical MSI-PCR sites, but there is often insufficient spanning read coverage for these sites in the liquid biopsy samples due to their longer repeat length as well as high AT dropout rates.

The need for the higher number of smaller microsatellite sites was additionally driven by the experience with the previous version of the MSI algorithm in TSO 500 ctDNA v1 assay that used 76 10+ bp sites.

Based on that, we evaluated sites with various repeat lengths and selected for the TSO 500 ctDNA MSI algorithm all sites in the scope of the panel with size 6-7 bp with the best signal to noise ratio.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://help.connected.illumina.com/tso500/dragen-tso-500-ctdna-guides/dragen-tso-500-ctdna-v2.6/analysis-methods/msi.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
