1 of 4

Correlation

Partek Flow offers a wide variety of tools to help you explore your data. Which tools are available depends on the type of data node selected.

Correlation analysis
Sample Correlation
Similarity matrix

Correlation analysis

What is Correlation analysis?
Running Correlation analysis
Feature many-to-one correlation
- Correlation analysis advanced options
Correlation across assays
- Correlation across assays analysis options

What is Correlation analysis?

Correlation analysis is a statistical test that lets you rank features by their correlation with numeric attributes using Pearson (linear), Spearman (rank), or Kendall (tau) correlation.

Running Correlation analysis

We recommend normalizing you data prior to running Correlation analysis, but it can be invoked on any counts data node.

Click the counts data node
Click the Statistics section in the toolbox
Click Correlation
Choose the method to use for correlation analysis (Figure 1)

Feature many-to-one correlation

When multiple numeric factors are added, the correlation analysis will perform each factor with a feature in the data node independently. If you are interested in particular features, use the Search features box to add one or more.

Select the factors and interactions to include in the statistical test (Figure 2).

Click Next
It is optional to apply a lowest coverage filter or configure the advanced settings
Click Finish to run

Correlation analysis produces a Correlation data node; double-click to open the task report (Figure 3) which is similar to the ANOVA/LIMMA-trend/LIMMA-voom and GSA task reports and includes a table with features on rows and statistical results on columns.

Each numeric attribute includes p-value, adjusted p-value columns (FDR step up and/or Storey q-value if included), and a partial correlation value. Each interaction will have p-value and adjusted p-value columns (FDR step up and/or Storey q-value if included).

Correlation analysis advanced options

Low value filter

Low-value filter allows you to specify criteria to exclude features that do not meet the requirements for the calculation. If there is a filter feature task performed in the upstream analysis, the default of this filter is set to None, otherwise, the default is Lowest average coverage is set to 1.

Lowest average coverage: the computation will exclude a feature if its geometric mean across all samples is below the specified value

Lowest maximum coverage: the computation will exclude a feature if its maximum across all samples is below the specified value

Minimum coverage: the computation will exclude a feature if its sum across all samples is below the specified value

None: include all features in the computation

Multiple test correction

Multiple test correction can be performed on the p-values of each comparison, with FDR step-up being the default. If you check the Storey q-value, an extra column with q-values will be added to the report.

Use only reliable estimation results

There are situations when a model estimation procedure does not fail outright but still encounters some difficulties. In this case, it can even generate p-value and fold change on the comparisons, but they are not reliable, i.e. they can be misleading. Therefore, the default of Use only reliable estimation results is set Yes.

Correlation type

Sets the type of correlation used to calculate the correlation coefficient and p-value. Options are Pearson (linear), Spearman (rank), Kendall (tau). Default is Pearson (linear).

Correlation across assays

Correlation across assays should be used to perform correlation analysis across different modalities (e.g. ATAC-Seq enriched regions vs. RNA-Seq expression) for multiomics data analysis.

Select the data node to be compared to the node that the task has been invoked from using the Select data node button
Modify any parameters (Figure 4)
Click Finish

Correlation across assays analysis options

Correlation and similarity measures

Features within same chromosome: this option will restrict feature comparison to the chromosome location

All features in one data node vs all features in the other data node: this option will perform the comparison using all combinations without location constraint

Report correlation pairs

P-value: select a cut-off value for significance and only those pairs that meet the criteria will be reported

abs(Correlation coefficient): select a cutoff for reporting the absolute value of the correlation coefficient (represented by the symbol r) where a perfect relationship is 1 and no relationship is 0

Correlation across assays produces a Correlation pair list data node; double-click to open the table (Figure 5). The table can be sorted and filtered using the column titles.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Sample Correlation

Sample correlation plot is a data visualization used to compare a number of variables across two samples. A hypothesis underlying many gene expression experiments (next generation sequencing or microarray) is that most genes/transcripts are not differentially regulated between the conditions, causing most of the data points to fall on the diagonal (i.e. regression line with slope of 1). If that is not the case, a normalization method should be applied before the statistical analysis. Therefore, you may want to run sample correlation plots and your data set before and after the normalization.

Sample correlation in Partek Flow can be performed after quantification by selecting a Gene counts or Transcript counts data node, or on a Normalized counts node in case that you want to assess its effect on the data. The Sample correlation option is visible in the Correlation section of the task menu (Figure 1). The task has no particular setup dialog (and creates no task node), but launches immediately.

When the Sample correlation page opens, you will be asked to select two samples for comparison (Figure 2). The sample in the left box will be shown on the horizontal axis, while the sample in the right box will be shown on the vertical axis. Click on the sample names and then hit OK to proceed.

An example of the resulting scatterplot is in Figure 3. Each dot is a feature (gene/transcript) while the expression values in the two samples can be read off the coordinate axes, in the same units as present in the data node. For instance, if you normalized your RNA-seq data by transcripts per million (TPM), the coordinate axis will give you expression in TPMs. Pearson’s correlation coefficient and the slope of the regression line are in the upper left corner of the plot.

To visualize a different pair of samples, select another sample from the X axis or Y axis list on the left and push Apply.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Similarity matrix

Similarity matrix task is only available on bulk RNA-seq count matrix data node. It is used to compute the correlation of every sample/or feature vs every other sample/or feature. The result is a matrix with the same set of samples/or features on rows and columns, the value in the matrix is correlation coefficient --r.

Click on Similarity matrix task in Correlation section on the menu (Figure 1)

When the dialog opens, you will be asked to select whether the calculation samples or features (Figure 2). The are three correlation method options:

Click Finish to run the task. The output report of this task can be displayed in heatmap and/or table in the data viewer.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Correlation analysis

What is Correlation analysis?
Running Correlation analysis
Feature many-to-one correlation
- Correlation analysis advanced options
Correlation across assays
- Correlation across assays analysis options

What is Correlation analysis?

Correlation analysis is a statistical test that lets you rank features by their correlation with numeric attributes using Pearson (linear), Spearman (rank), or Kendall (tau) correlation.

Running Correlation analysis

We recommend normalizing you data prior to running Correlation analysis, but it can be invoked on any counts data node.

Click the counts data node
Click the Statistics section in the toolbox
Click Correlation
Choose the method to use for correlation analysis (Figure 1)

Feature many-to-one correlation

Select the factors and interactions to include in the statistical test (Figure 2).

Click Next
It is optional to apply a lowest coverage filter or configure the advanced settings
Click Finish to run

Each feature includes chromosome view, dot plot, correlation plot, and extra details buttons in the View column.

Correlation analysis advanced options

Low value filter

Lowest average coverage: the computation will exclude a feature if its geometric mean across all samples is below the specified value

Lowest maximum coverage: the computation will exclude a feature if its maximum across all samples is below the specified value

Minimum coverage: the computation will exclude a feature if its sum across all samples is below the specified value

None: include all features in the computation

Multiple test correction

Use only reliable estimation results

Correlation type

Sets the type of correlation used to calculate the correlation coefficient and p-value. Options are Pearson (linear), Spearman (rank), Kendall (tau). Default is Pearson (linear).

Correlation across assays

Correlation across assays should be used to perform correlation analysis across different modalities (e.g. ATAC-Seq enriched regions vs. RNA-Seq expression) for multiomics data analysis.

Select the data node to be compared to the node that the task has been invoked from using the Select data node button
Modify any parameters (Figure 4)
Click Finish

Correlation across assays analysis options

Correlation and similarity measures

Features within same chromosome: this option will restrict feature comparison to the chromosome location

All features in one data node vs all features in the other data node: this option will perform the comparison using all combinations without location constraint

Pearson: linear correlation:

Spearman: rank correlation:

Report correlation pairs

P-value: select a cut-off value for significance and only those pairs that meet the criteria will be reported

Correlation across assays produces a Correlation pair list data node; double-click to open the table (Figure 5). The table can be sorted and filtered using the column titles.

Click View correlation plot to open the correlation plot for each comparison. Scroll to the bottom of the table to download the full table report.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Sample Correlation

To visualize a different pair of samples, select another sample from the X axis or Y axis list on the left and push Apply.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.