LogoLogo
Illumina KnowledgeIllumina SupportSign In
  • Home
  • Overview
    • Illumina® DRAGEN™ Secondary Analysis
    • DRAGEN Applications
    • Deployment Options
  • Product Guides
    • DRAGEN v4.4
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • Clinical Research Workflows
        • DRAGEN Heme WGS Tumor Only Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
        • DRAGEN Solid WGS Tumor Normal Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Quick Start
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
            • Custom Workflow
              • Custom Config Support
            • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • Illumina scRNA
        • Other scRNA prep
        • RNA Panel
        • RNA WTS
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Pedigree Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • Available pipelines
            • Germline CNV Calling (WGS/WES)
            • Germline CNV Calling ASCN (WGS)
            • Multisample Germline CNV Calling
            • Somatic CNV Calling ASCN (WGS)
            • Somatic CNV Calling WES
            • Somatic CNV Calling ASCN (WES)
          • Additional documentation
            • CNV Input
            • CNV Preprocessing
            • CNV Segmentation
            • CNV Output
            • CNV ASCN module
            • CNV with SV Support
            • Cytogenetics Modality
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
          • Structural Variant IGV Tutorial
        • VNTR Calling
        • Population Genotyping
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • JSON Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single Cell Pipeline
        • Illumina PIPseq scRNA
        • Other scRNA Prep
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN MRD Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
        • Docker Requirements
      • DRAGEN Reports
      • Tools and Utilities
    • DRAGEN v4.3
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Joint Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • CNV Output
          • CNV with SV Support
          • Multisample CNV Calling
          • Somatic CNV Calling WGS
          • Somatic CNV Calling WES
          • Allele Specific CNV for Somatic WES CNV
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
        • VNTR Calling
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
          • Effective Coverage Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single-Cell Pipeline
        • scRNA
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • RNA Panel
        • RNA WTS
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
      • DRAGEN Reports
      • Tools and Utilities
  • Reference
    • DRAGEN Server
    • DRAGEN Multi-Cloud
      • DRAGEN on AWS
      • DRAGEN on AWS Batch
      • DRAGEN on Microsoft Azure
        • Run DRAGEN VM on Azure
      • DRAGEN on Microsoft Azure Batch
        • Azure Batch Run Modes
    • DRAGEN Licensing
      • DRAGEN Server Licensing
      • DRAGEN Cloud Licensing
    • DRAGEN Application Manager
    • Support
    • Resource Files
      • Noise Baselines
    • Supplementary Information
    • Troubleshooting
    • Citing DRAGEN software
    • Release Notes
    • Revision History
Powered by GitBook
On this page
  • Command-Line Options
  • Assay Specific Settings
  • Default Microsatellite sites files
  • Custom Microsatellite files
  • Normal references of miscrosatellite repeat distribution
  • MSI Output
  • MSI Debugging Output
  • MSI Verbose Output
  • MSI Algorithm

Was this helpful?

Export as PDF
  1. Product Guides
  2. DRAGEN v4.3
  3. DRAGEN DNA Pipeline
  4. Biomarkers

Microsatellite Instability

Microsatellites are genomic regions of short DNA motifs that are repeated 5–50 times and are associated with high mutation rates. Microsatellite Instability (MSI) results from deficiencies in the DNA mismatch repair pathway and can be used as a critical biomarker to predict immunotherapy responses in multiple tumor types.

DRAGEN MSI supports running in tumor-normal and tumor-only modes. Tumor-normal is generally expected to generate more accurate results. The tumor-only mode will require a panel of normals. The panel of normals will be generated using the collect-evidence mode.

Command-Line Options

The following is an example command for tumor-normal mode. Default resource files are available for WES and WGS. Please note that the WES and WGS tumor-normal modes are fully supported and tested. Custom panels may require more extensive validation and possibly require generating a new sites file.

dragen \
--msi-command tumor-normal \
--msi-coverage-threshold 60 \ 
--msi-microsatellites-file ${microsatellite_file} \ # See section: Default Microsatellite sites files
--output-directory ${output_directory} \
--output-file-prefix ${prefix} \
--enable-map-align true \
--RGID=read_group_ID \ 
--RGSM=read_group_sample \
--ref-dir ${reference_directory} \
--enable-map-align-output true \
--enable-sort true \
--enable-duplicate-marking true \
--tumor-fastq1 ${tumor_fq1} \
--tumor-fastq2 ${tumor_fq2} \
--fastq-file1 ${fq1} \
--fastq-file2 ${fq2}

The following is an example command for the tumor-only mode. Please note that the WES and WGS tumor-only modes are not as extensively tested as the tumor-normal modes. The TSO500 panels do not have normal controls, and are only tested and validated in tumor-only mode.

dragen \
--msi-command tumor-only \
--msi-coverage-threshold 60 \ 
--msi-microsatellites-file ${microsatellite_file} \  # See section: Default Microsatellite sites files
--msi-ref-normal-dir ${normal_reference_directory} \ # See section: Normal references of miscrosatellite repeat distribution 
--output-directory ${output_directory} \
--output-file-prefix ${prefix} \
--enable-map-align=true \
--RGID=read_group_ID \ 
--RGSM=read_group_sample \
--ref-dir ${reference_directory} \
--enable-map-align-output=true \
--enable-sort true \
--enable-duplicate-marking=true \
--tumor-fastq1 ${tumor_fq1} \
--tumor-fastq2 ${tumor_fq2}
Option
Description

msi-command tumor-only/tumor-normal/collect-evidence

Mode of execution: tumor-only, tumor-normal, or collect-evidence.

msi-microsatellites-file

Specify the file containing the microsatellites. You can generate this file by scanning the genome for microsatellites using an MSI-sensor. DRAGEN has tested with ≥ 10 bp homopolymers for solid samples, and 6-7 bp homopolymers for liquid samples.

msi-ref-normal-dir

Full name of directory containing files with normal reference repeat length distribution. Used only in tumor-only mode. These files can be generated by running collect-evidence on each normal sample. At least 20 normal samples are required.

msi-coverage-threshold

Specify the minimum spanning read coverage for a microsatellite. Microsatellites that do not meet the specified threshold are not included in analysis. DRAGEN recommends using 60 as the value for solid samples. For TSO500 liquid, a value of 500 is recommended.

msi-distance-threshold

Threshold for distance distributions to be considered different. Default is 0.1. For liquid samples, a value of 0.02 is recommended.

Assay Specific Settings

TSO500 Solid microsatellite instability is defined as all samples with "PercentageUnstableSites >= 20". It is generally recommended to use "PercentageUnstableSites" as metric for determining the MSI status. This metric is normalized, and is expected to be more consistent for different pipelines and with different input site files. The exact thresholds for other assays may still depend on the sample noise characteristics (PCR / UMI etc) and may need some empirical calibration.

Sample Type
Assay
Microsatelitte file
Specific Settings
PercentageUnstableSites Threshold

Solid

TSO500

Part of TSO500 resource bundle. Repeats 10 - 50. 130 sites.

msi-distance-threshold=0.1

20

Heme

TSO500

N/A

N/A

N/A

Liquid (cfDNA)

TSO500

Part of TSO500 resource bundle. Repeats 6,7. 2344 sites.

msi-distance-threshold=0.02

TBD

Solid, Heme

WES

Available for download. Repeats 10 - 50. Approx. 3.5K sites.

msi-distance-threshold=0.1

TBD

Liquid (cfDNA)

WES

Available for download. Repeats 10 - 50. Approx. 3.5K sites.

msi-distance-threshold=0.02

TBD

Solid, Heme

WGS

Available for download. Repeats 10 - 50.Approx. 1 mil sites.

msi-distance-threshold=0.1

TBD

Liquid (cfDNA)

WGS

Available for download. Repeats 10 - 50. Approx. 1 mil sites.

msi-distance-threshold=0.02

TBD

Default Microsatellite sites files

The following is an example of a microsatellite file:

#chromosome     location        repeat_unit_length      repeat_unit_binary      repeat_times    left_flank_binary       right_flank_binary      repeat_unit_bases       left_flank_bases    right_flank_bases
chr1	985443	1	2	15	676	992	G	GGGCA	TTGAA
chr1	7980985	1	0	10	231	1020	A	ATGCT	TTTTA
chr1	8022800	1	3	19	13	41	T	AAATC	AAGGC
chr1	8029500	1	2	10	39	0	G	AAGCT	AAAAA
chr1	9146447	1	3	15	887	248	T	TCTCT	ATTGA
chr1	9767837	1	3	12	704	195	T	GTAAA	ATAAT

For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the panel of interest. This will avoid using any off-target reads in the MSI analysis.

Custom Microsatellite files

Custom Microsatellite site files may be required if a small panel is targeted and the default site files do not have sufficient overlapping sites.

Custom Microsatellite site files can be generated by using msi-sensor [https://github.com/xjtu-omics/msisensor-pro/wiki/Best-Practices].

msisensor-pro scan -d /path/to/reference.fa -o ${microsatellite_file}

A subsequent post-processing step is recommended:

  • only keep microsatellites sites with a repeat unit of length 1

  • keep sites with 10 - 50bp repeats (a max length of 100bp repeats is supported)

  • remove any sites containing Ns in the left or right anchors

  • downsample the remaining sites to contain at least 2000 sites, but no more than 1 million sites (to avoid excessive run time)

Please note an error would occur if long (>100bp) microsatellite sites are present in the file.

Normal references of miscrosatellite repeat distribution

Normal reference files can be generated by running collect-evidence mode on a panel of normal samples.

dragen -f \
--msi-command collect-evidence \
--ref-dir ${reference_directory} \
--msi-microsatellites-file ${microsatellite_file} \
--msi-coverage-threshold 60 \
--output-directory ${output_directory} \
--output-file-prefix ${prefix} \
-1 ${normal_fq1} \
-2 ${normal_fq2}

Please note:

  • The collect-evidence mode MUST be run in DRAGEN germline mode.

  • The --msi-microsatellites-file and --msi-coverage-threshold settings used in collect-evidence mode must be consistent with the settings used during tumor-only MSI calling.

  • At least 20 normal samples are required.

MSI Output

The output containing MSI score (PecentageUnstableSites) are stored in <output prefix>.microsat_output.json.

"TotalMicrosatelliteSitesAssessed": "20020",
"TotalMicrosatelliteSitesUnstable": "4374",
"PecentageUnstableSites": "21.850000000000001",
"ResultIsValid": "true",
"ResultMessage": "",
"SumDistance": "1214.174" 

The "SumDistance" is the sum of Jensen-Shannon distance of all unstable sites based on distances of T vs N distributions. The "sumDistace" depends on the size of microsatellite file, and is not normalized. In general it is recommended to set MSI thresholds based on "PecentageUnstableSites" rather than "SumDistance".

In TSO500, Solid microsatellite instability is defined as all samples with "PercentageUnstableSites >= 20". The exact thresholds for other assays with different site files and noise characteristics may need some empirical calibration.

MSI Debugging Output

There are two other output files (*_diffs.txt and *.dist) that are useful for debugging.

Here is an example of *_diffs.txt file

#Chromosome	Start	RepeatUnit	Assessed	Distance	PValue	PassFilter
chr1	69106	T	true	0.04105300052	0.4786448589	true
chr1	69116	TC	false	0	0	false

The fourth column (Assessed) is the coverage filter. Any site with coverage >= 60 is true for this column

The sixth column (PassFilter) is an internal flag used for left allele filter. It removes low quality sites that has no coverage and helps to increase prediction accuracy. It's true when the following conditions are met.

1. read count for [reference repeat length] > 0
2. read count for [reference repeat length - 1] >= 0
3. the ratio of read count for [reference repeat length - 1] and [reference repeat length] >= 0

The *.dist file stores the read counts for each repeat length of the microsatellite site

#chromosome	location	repeat_unit_bases	reference_allele	covered	length_distribution
chr1	69106	T	5	true	0,0,0,0,103,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

The coverage of the site can be obtained by summing up all counts in the last column

MSI Verbose Output

    1. <output prefix>.microsat_output.json (described above)

    1. <output prefix>.microsat_tumor.dist. This file contains the repeat length array for every microsatellite.

#chromosome     location        repeat_unit_bases       reference_allele        covered length_dis
chr1    16200729        T       10      true    0,0,0,0,0,0,0,0,0,118,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
chr1    40361307        A       10      true    0,0,0,0,0,0,2,0,2,95,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Column length_dis is the repeat length array.

    1. <output prefix>.microsat_diffs.txt. This file contains the distance metrics for every microsatellite between tumor/normal or tumor/reference normals.

#Chromosome     Start   RepeatUnit      Assessed        Distance        PValue  PassFilter
chr1    16200729        T       true    0.03348841224   0.002411104562  true
chr1    40361307        A       true    0.0406985608    0.0006306633961 true
chr1    156842471       T       false   0       0       true
chr1    239881908       T       true    0.003136536956  0.5983661726    true

Column Assessed indicates if a site passes the coverage filter (msi-coverage-threshold). Column PassFilter is an internal metric and currently is not used for filtering microsatellites.

MSI Algorithm

The MSI algorithm performs the following steps:

  1. Tabulates tumor and normal counts from the read alignments for each microsatellite site.

  2. Calculates Jensen-Shannon distance of tumor and normal distribution for each microsatellite site (tumor-normal mode), or Jensen-Shannon distance of two normal baseline samples (tumor-only mode).

  3. Determines unstable sites by performing chi-square testing of tumor and normal distribution. Unstable sites have repeat length distributions that are significantly shifted between tumor and normal measured by Jensen-Shannon distance (tumor-normal mode). In tumor-only mode, JSD is calculated for each pair of tumor and normal reference samples, as well as each pair of normal-normal samples. Then the two sets of JSD is compared to derive a mean distance difference and p-value calculated from student t-test. Microsatellite instability is called if the mean distance difference is greater than or equal to the distance threshold (default 0.1) and p-value less than or equal to the p-value threshold (default 0.01).

  4. Produces a report given assessed site count, unstable site count, the percentage of unstable sites in all assessed sites and the sum of the Jensen-Shannon distance of all the unstable sites.

PreviousTumor Mutational BurdenNextHomologous Recombination Deficiency

Last updated 11 months ago

Was this helpful?

Default WES and WGS Microsatellite site files can be downloaded here:

DRAGEN Software Support Site page