Correlation

Correlation analysis is used to test the relationship of two numeric variables. It determines the strength and direction of the association between them. The methods included in Connected Multiomics are linear correlation like Pearson's correlation and rank correlation like Spearman's rank correlation, Kendall's Tau correlation.

There are four type of format used to compute correlation:

We recommend normalizing you data prior to running Correlation analysis, but it can be invoked on any counts data node.

  • Click the counts data node

  • Click the Statistics section in the toolbox

  • Click Correlation

  • Choose the method to use for correlation analysis

Feature many-to-one correlation

When select Feature many-to-one correlation option and click Next, it will perform correlation of selected numeric attribute and/or feature vs every feature in the input data node one pair at a time

When multiple numeric factors are added, the correlation analysis will perform each factor with a feature in the data node independently. If you are interested in particular features, use the Search features box to add one or more.

  • Select factor(s) or feature(s) and click Add factors to include in the statistical test.

  • Click Next

  • It is optional to apply a lowest coverage filter or configure the advanced settings

  • Click Finish to run

Correlation analysis produces a Correlation data node; double-click to open the task report which is similar to the ANOVA/LIMMA-trend/LIMMA-voom and GSA task reports and includes a table with features on rows and statistical results on columns.

Each numeric attribute includes p-value, adjusted p-value columns (FDR step up and/or Storey q-value if included), and a partial correlation value.

Correlation analysis advanced options

Multiple test correction

Multiple test correction can be performed on the p-values of each comparison, with FDR step-up being the default. If you check the Storey q-value, an extra column with q-values will be added to the report.

Use only reliable estimation results

There are situations when a model estimation procedure does not fail outright but still encounters some difficulties. In this case, it can even generate p-value and fold change on the comparisons, but they are not reliable, i.e. they can be misleading. Therefore, the default of Use only reliable estimation results is set Yes.

Correlation type

Sets the type of correlation used to calculate the correlation coefficient and p-value. Options are Pearson (linear), Spearman (rank), Kendall (tau). Default is Pearson (linear).

Similarity matrix

Similarity matrix task is only available on bulk count matrix data node. It is used to compute the correlation of every sample/or feature vs every other sample/or feature. The result is a matrix with the same set of samples/or features on rows and columns, the value in the matrix is correlation coefficient --r.

Select the computation is on samples or features and correlation method:

Pearson: linear correlation:

Spearman: rank correlation:

Kendal: rank correlation:

Click Finish to run the task. The output report of this task can be displayed in heatmap and/or table in the data viewer.

Correlation across assays

Correlation across assays should be used to perform correlation analysis across different modalities (e.g. ATAC-Seq enriched regions vs. RNA-Seq expression) for multiomics data analysis. It performs correlation analysis of every feature in one assay vs every feature in the other assay. We recommend the two count matrix data node should be filtered only include the features of interest to reduce the computation.

  • Select the data node to be compared to the node that the task has been invoked from using the Select data node button

  • Modify any parameters

  • Click Finish

Correlation and similarity measures

Features within same chromosome: this option will restrict feature comparison to the chromosome location

All features in one data node vs all features in the other data node: this option will perform the comparison using all combinations without location constraint

Report correlation pairs

P-value: select a cut-off value for significance and only those pairs that meet the criteria will be reported

abs(Correlation coefficient): select a cutoff for reporting the absolute value of the correlation coefficient (represented by the symbol r) where a perfect relationship is 1 and no relationship is 0

Correlation across assays produces a Correlation pair list data node; double-click to open the table. The table can be sorted and filtered using the column titles.

Click View correlation plot to open the correlation plot for each feature pair.

Sample correlation plot

Sample correlation plot is a data visualization used to compare all the features between two samples number of variables across two samples. Sample correlation can be performed on any count matrix data node whether it is raw counts or normalized counts. When the Sample correlation page opens, you will be asked to select two samples for comparison. The sample in the top box will be shown on the X-axis, while the sample in the bottom box will be shown on the Y-axis. Click on the sample names to select different sample and then hit Apply.

A scatterplot is displayed on the right. Each dot is a feature (gene/transcript/protein) while the expression values in the two samples can be read off the coordinate axes, in the same units as present in the data node. Pearson correlation coefficient and regression slope results are displayed on the upper-right corner of the plot

Last updated

Was this helpful?