# Compare clusters

## What is Compare clusters?

Compare clusters is a tool to identify the optimal number of clusters for K-means Clustering using the Davies-Bouldin index. The Davies-Bouldin index is a measure of cluster quality where a lower value indicates better clustering, i.e., the separation between points within the clusters is low (tight clusters) and separation between clusters is high (distinct clusters).

## Running Compare clusters

We recommend normalizing your data prior to running *Compare clusters*, but the task will run on any counts data node.

* Click the counts data node
* Click the **Exploratory analysis** section of the toolbox
* Click **Compare clusters**
* Configure the parameters
* Click **Finish** to run

<div align="left"><figure><img src="https://580316046-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWMxqQAMFOJtu98OBk9KN%2Fuploads%2Fgit-blob-0749185fd12f9e4f97bacc53be8fb79cd4d60525%2Fimage%20(165).png?alt=media" alt=""><figcaption></figcaption></figure></div>

The parameters for *Compare clusters* are the same as for [*K-means* *clustering*](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/exploratory-analysis/k-means-clustering).

## Compare clusters task report

The *Compare clusters* task report is line chart with the number of clusters on the x-axis and the Davies-Bouldin index on the y-axis.

<figure><img src="https://580316046-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWMxqQAMFOJtu98OBk9KN%2Fuploads%2Fgit-blob-baf8b328ea3528ceb44ca01e8adce9620a329e5c%2Fimage%20(166).png?alt=media" alt=""><figcaption></figcaption></figure>

The *Compare clusters* task report can be used to run *K-means clustering.*

* Click a point on the plot to select it or type the number of clusters in the text box *Partition data into clusters*

Selecting a point sets it as the number of clusters to partition the data into. The number of clusters with the lowest Davies-Bouldin index value is chosen by default.

* Click **Generate clusters** to run *K-means clustering* with the selected number of clusters

A *K-means clustering* task node and a *Clustering result* data node are produced. Please see our documentation on [K-means Clustering](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/exploratory-analysis/k-means-clustering) for more details.
