# Correlation

Correlation analysis is used to test the relationship of two numeric variables. It determines the strength and direction of the association between them. The methods included in Connected Multiomics are linear correlation like Pearson's correlation and rank correlation like Spearman's rank correlation, Kendall's Tau correlation.

There are four type of format used to compute correlation:

* [Feature many-to-one correlation](#feature-many-to-one-correlation)
* [Similarity matrix](#similarity-matrix)
* [Correlation across assays](#correlation-across-assays)
* [Sample correlation plot](#sample-correlation-plot)

We recommend normalizing you data prior to running *Correlation analysis*, but it can be invoked on any counts data node.

* Click the counts data node
* Click the **Statistics** section in the toolbox
* Click **Correlation**
* Choose the method to use for correlation analysis

<figure><img src="/files/ICjnaTHY05KA0G1btyOp" alt=""><figcaption></figcaption></figure>

## Feature many-to-one correlation <a href="#many-to-one" id="many-to-one"></a>

When select Feature many-to-one correlation option and click **Next,** it will perform correlation of selected numeric attribute and/or feature vs every feature in the input data node one pair at a time

<figure><img src="/files/WcYJxFhXQ2ov1zKp0HRO" alt=""><figcaption></figcaption></figure>

When multiple numeric factors are added, the correlation analysis will perform each factor with a feature in the data node independently. If you are interested in particular features, use the **Search features** box to add one or more.

* Select factor(s) or feature(s) and click **Add factors** to include in the statistical test.
* Click **Next**
* It is optional to apply a lowest coverage filter or configure the advanced settings
* Click **Finish** to run

*Correlation analysis* produces a *Correlation* data node; double-click to open the task report which is similar to the [ANOVA/LIMMA-trend/LIMMA-voom](/icm/analyses/analysis-functionality/task-menu/statistics/differential-analysis/anova-limma-trend-limma-voom.md) and [GSA](/icm/analyses/analysis-functionality/task-menu/statistics/differential-analysis/gsa.md) task reports and includes a table with features on rows and statistical results on columns.

Each numeric attribute includes p-value, adjusted p-value columns (FDR step up and/or Storey q-value if included), and a partial correlation value.

<figure><img src="/files/M04RRpvfSo2Xnfcxnhcf" alt=""><figcaption></figcaption></figure>

#### Correlation analysis advanced options

<div align="left"><figure><img src="/files/MlnbDlC41j52S2HULc7F" alt=""><figcaption></figcaption></figure></div>

#### Multiple test correction

Multiple test correction can be performed on the p-values of each comparison, with **FDR step-up** being the default. If you check the *Storey q-value*, an extra column with q-values will be added to the report.

#### Use only reliable estimation results

There are situations when a model estimation procedure does not fail outright but still encounters some difficulties. In this case, it can even generate p-value and fold change on the comparisons, but they are not reliable, i.e. they can be misleading. Therefore, the default of *Use only reliable estimation results* is set **Yes**.

#### Correlation type

Sets the type of correlation used to calculate the correlation coefficient and p-value. Options are *Pearson (linear)*, *Spearman (rank)*, *Kendall (tau)*. Default is **Pearson (linear)**.

## Similarity matrix <a href="#similarity-matrix" id="similarity-matrix"></a>

Similarity matrix task is only available on bulk count matrix data node. It is used to compute the correlation of every sample/or feature vs every other sample/or feature. The result is a matrix with the same set of samples/or features on rows and columns, the value in the matrix is correlation coefficient --r.

<div align="left"><figure><img src="/files/qEKm9TeNUBhFBlvcVXln" alt=""><figcaption></figcaption></figure></div>

Select the computation is on samples or features and correlation method:

Pearson: linear correlation: ![](/files/1EQrrc9dLKY7ZUBgpDqI)

Spearman: rank correlation: ![](/files/lj9CW1cqz0kusypP68V7)

Kendal: rank correlation: ![](/files/z171VNV0pM7DMRiO67iH)

Click **Finish** to run the task. The output report of this task can be displayed in heatmap and/or table in the data viewer.

## Correlation across assays

*Correlation across assays* should be used to perform correlation analysis across different modalities (e.g. ATAC-Seq enriched regions vs. RNA-Seq expression) for multiomics data analysis. It performs correlation analysis of every feature in one assay vs every feature in the other assay. We recommend the two count matrix data node should be filtered only include the features of interest to reduce the computation.

* Select the data node to be compared to the node that the task has been invoked from using the **Select data node** button
* Modify any parameters
* Click **Finish**

<figure><img src="/files/GoBhwNLsyd1vgxzVxnGw" alt=""><figcaption></figcaption></figure>

#### Correlation and similarity measures

*Features within same chromosome*: this option will restrict feature comparison to the chromosome location

*All features in one data node vs all features in the other data node*: this option will perform the comparison using all combinations without location constraint

#### Report correlation pairs

*P-value*: select a cut-off value for significance and only those pairs that meet the criteria will be reported

*abs(Correlation coefficient)*: select a cutoff for reporting the absolute value of the correlation coefficient (represented by the symbol r) where a perfect relationship is 1 and no relationship is 0

*Correlation across assays* produces a *Correlation pair list* data node; double-click to open the table. The table can be sorted and filtered using the column titles.

<figure><img src="/files/JAB3n8rsOK8ERZxaueIl" alt=""><figcaption></figcaption></figure>

Click <img src="/files/mhG6Lhc3jNGN0mCFb8fK" alt="" data-size="line"> *View correlation plot* to open the correlation plot for each feature pair.

## Sample correlation plot

Sample correlation plot is a data visualization used to compare all the features between two samples number of variables across two samples. Sample correlation can be performed on any count matrix data node whether it is raw counts or normalized counts. When the *Sample correlation* page opens, you will be asked to select two samples for comparison. The sample in the top box will be shown on the X-axis, while the sample in the bottom box will be shown on the Y-axis. Click on the sample names to select different sample and then hit **Apply**.

<figure><img src="/files/z6sMZrYpsKzfwz48FZrj" alt=""><figcaption></figcaption></figure>

A scatterplot is displayed on the right. Each dot is a feature (gene/transcript/protein) while the expression values in the two samples can be read off the coordinate axes, in the same units as present in the data node. Pearson correlation coefficient and regression slope results are displayed on the upper-right corner of the plot


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/statistics/correlation-analysis-1.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
