Pre-analysis tools

This section constitutes tools that are useful in preparing single cell data for downstream analysis, such as multi-sample comparison or multi-omics analysis. To invoke Pre-analysis tools, click on any Single cell counts data node. These include the following tasks:

Generate group cell counts
Pool cells
Split matrix
Hashtag demultiplexing
Merge matrices
Descriptive statistics
Spot clean

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Generate group cell counts

If a single cell data node contains cell attribute information, e.g., clustering results, classifications, or imported attributes, a counts-type data node containing the number of cells from each attribute group for each sample can be generated and used for downstream analysis.

To invoke Generate group cell counts:

Click a single cell count data node with cell-level attribute information
Click Pre-analysis tools in the toolbox
Click Generate group cell counts
Select the attribute to group the cells from the Group by drop-down menu (Figure 1)
Click Finish

A group cell counts node will be generated. The data node contains a matrix of cell counts in each sample for each group. You can view the counts results in the Group cell counts report (Figure 2).

The Cell counts data node is a counts type data node and downstream analysis tasks, such as normalization, PCA, and ANOVA, can be used to analyze the group cell counts data.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Pool cells

Running Pool cells
Output of Pool cells

Pool cells combines RNA-Seq data from all cells of a particular cell type classification for each sample. In essence, Pool cells creates virtual bulk RNA-Seq data from single cell RNA-Seq data. Because it is virtual bulk RNA-Seq data, all the same tasks that can be performed on bulk RNA-Seq gene counts data in Partek Flow can be performed on the output of Pool cells.

Pool cells makes it easy to compare gene expression for a cell type of interest between experimental groups.

Running Pool cells

Before running Pool cells, you must classify the cells. To run Pool cells, select the data node with your classified cells and select Pool cells from the QA/QC section of the task menu (Figure 1).

Options for Pool cells are Sum, Maximum, Mean, and Median. Expression values for cells from the same sample with the same cell type classification will be merged using the chosen pooling method (Figure 2). Sum is selected by default. After choosing a pooling method, select Finish to run the Pool cells task.

Output of Pool cells

Pool cells generates a counts data node for each classified cell type in the data set (Figure 3).

Each counts data node is equivalent to simulated bulk RNA-Seq counts data for a cell type. The same tasks that can be performed on bulk RNA-Seq counts data can be performed on Pool cells output data nodes, including normalization, filtering, PCA, and differential expression analysis.

The counts data of a cell type for each sample can be downloaded by clicking the counts data node and selecting Download data from the task menu. The counts data text file lists each sample and its pooled counts values (sum, maximum, mean, or median) for each feature (gene/transcript) in alphabetical order (Figure 4).

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Split matrix

Split matrix can be invoked on any counts data node with more than one feature type. For example, a CITE-Seq experiment would have Gene Expression counts and Antibody Capture counts in the single cell counts data node. Datasets generated by 10X Genomics' Feature Barcoding experiments also utilize this task to split different feature measurements for downstream analysis.

There are no parameters to configure, to run:

Click the counts data node you want to split
Click the Pre-analysis tools section of the toolbox
Click Split matrix

The Split matrix task will run and generate output data nodes for each of the feature types. For example, if there are Antibody Capture and Gene Expression feature types in the input, Split matrix will generate two data nodes (Figure 1). Every sample is included in both matrices.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Hashtag demultiplexing

Cell Hashing enables sample multiplexing and super-loading in single cell RNA-Seq isolation by labeling each sample with a sample-specific oligo-tagged antibody against a ubiquitously expressed cell surface protein.

The Hashtag demultiplexing task is an implementation of the algorithm used in Stoeckius et al. 20181 for multiplexing cell hashing data. The task adds cell-level attributes Sample of origin and Cells in droplet.

Prerequisites for running Hashtag demultiplexing

To run Hashtag demultiplexing, your data must meet the following criteria:

Data node contains number of features less than number of observations
Data node must be output from normalization task (recommended normalization method for hashtag is CLR)

If you are processing your FASTQ files in Partek Flow, be sure to specify a different Data type for your Cell Hashing FASTQ files on import than the FASTQ files for your gene expression and any other antibody data.

If you are processing your FASTQ files using Cell Ranger, be sure to specify a different feature_type for your Cell Hashing antibodies than any other antibodies in the Feature Reference CSV File.

If you want to specify sample IDs instead of using hashtag feature IDs as the sample IDs, you will need to prepare a tab-delimited text file (.txt) with hashtag feature IDs in the first column and the corresponding sample IDs in the second column (Figure 1). A header row is required.

Running Hashtag demultiplexing

Click the Normalized counts data node for your cell hashing data
Click Hashtag demultiplexing in the Pre-analysis tools section of the toolbox
Click Browse to select your Sample ID file (Optional)
Click Finish to run

The output is a Demultiplexed counts data node (Figure 2).

Two cell-level attributes, Cells in droplet and Sample of origin, are added by this task and are available for use in downstream tasks. You can download the attribute values for each cell by clicking the Demultiplexed counts data node, clicking Download, and choosing to download Attributes only.

We recommend using Annotate cells to transfer the new attributes to other sections of your project after downloading the attributes text file.

It is also possible to use the Merge matrices task to combine your data types and attributes.

References

Stoeckius, M., Zheng, S., Houck-Loomis, B., Hao, S., Yeung, B.Z., Mauck, W.M., Smibert, P. and Satija, R., 2018. Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome biology, 19(1), p.224.

\

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Merge matrices

In complex projects, different data matrices (e.g. observations on rows and features on columns) need to be merged in order to achieve the analysis goals. For example, two cell populations were identified on separate branches of the analysis pipeline and to combine them before any joint downstream steps, the expression matrices have to be combined. Alternatively, two assays (gene expression and protein expression) were performed on the same cells so the expression matrices have to be merged for joint analysis.

Merge matrices task is located in the Pre-analysis tools section of the toolbox and it can handle two scenarios: Merge cells/samples and Merge features (Figure 1). To start, select the first data node on the pipeline (e.g. single cell counts) and then select the Merge matrices task.

Merge Cells/Samples

To use the Merge cells option, the data matrices (one or more) that are to be merged with the currently selected one should have the same features (e.g. genes), but distinct cells. Push the Select data nodes button and Partek Flow will display a preview of the pipeline; the data nodes that can be merged are shown in color of the branch, other data nodes are disabled (greyed out). Left click on the data node that you want to merge with the current one and push the Select button, you can select multiple data nodes to merge. The selected node(s) will be shown under the Select data nodes button (Figure 2). If you made a mistake, use the Clear selection icon. Push Finish to proceed.

Merge Features

To use the Merge features option, the data matrices (one or more) that are to be merged with the currently selected one should have the same cells (or samples), but distinct features (e.g. gene and protein expression). Push the Select data nodes button and Partek Flow will display you a preview of the pipeline; the data nodes that can be merged are shown in color of the branch, others are disabled (greyed out). Left lick on the data node that you want to merge with the current one and push the Select button. The selected node will be shown under the Select data nodes button. Repeat the procedure if you would like to merge additional nodes. If you made a mistake, use the Clear selection icon. Push Finish to proceed.

Task Output

The output of the Merge matrices task is a Merged counts data node (Figure 3).

For a practical example using Merge matrices, please see our tutorial on Analyzing CITE-Seq Data.

Alternative paths

Depending on your goal, you may want to consider a different approach. For example, data matrices based on two different assays (e.g. gene and protein expression) can be combined using Find multimodal neighbors. Instead of merging two (or more) cell populations by using Merge cells, you may want to use filtering (Filtering > Filter groups) to filter out the populations that you do not consider relevant / filter in the populations of your interest.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Descriptive statistics

Descriptive statistics task can be invoked on matrix data node e.g. Gene Counts, Normalized Counts data node in bulk RNA seq analysis pipeline or Single Cell counts Data node etc. It calculates measures of central tendency and variability on observations or features of the matrix data.

Running Descriptive statistics

Click on a counts data node
Choose Descriptive Statistics in Statistics section of the toolbox (Figure 1)

This will invoke the dialog configuration dialog; use it to specify which calculation(s) will be performed on cells (or samples for a bulk analysis data node) or features (Figure 2).

The available statistics are listed on the left panel, suppose "x1, x2, ..., xn"represent an array of numbers

Coefficient of variation (CV): s represent the standard deviation
Geometric mean:
Max:
Mean:
Median: when n is odd, median is , when n is even, median is
Median absolute deviation: , where
Min:
Number of cells: Available when Calculate for is set to Features. Reports the number of cells with the value [<, <=, =, !=, > >=] (select one from the drop down list) than the cut off value entered in the text box. The cut off will be applied to the values present in the input data node, i.e. if invoked on non-normalised data node, the values are raw counts. For instance, use this option if you want to know the number of cells in which each feature was detected; possible filter: Number of cells whose value > 0.0
Percent of cells: Available when Calculate for is set to Features. Reports the number of cells with the value [<, <=, =, !=, > >=] (select one from the drop down list) than the cut off value entered in the text box.
Number of features: Available when Calculate for is set to Cells. Reports the number of features with the value [<, <=, =, !=, > >=] (select one from the drop down list) than the cut off value entered in the text box. The cut off will be applied to the values present in the input data node, i.e. if invoked on non-normalised data node, the values are raw counts. For example, use this option if you want to know the number of detected genes per each cell; filter: Number of features whose value > 0.0
Percent of features: Available when Calculate for is set to Cells. Reports the fraction of features with the value [<, <=, =, !=, > >=] (select one from the drop down list) than the cut off value entered in the text box.
Q1: 25th percentile
Q3: 75th percentile
Range: xmax - x min
Standard deviation: where
Sum:
Variance:

Left click to select measurement and drag to move to the right panel one at a time, or when you mouse over on a measurement, click on the green plus button to move to the right panel. When Sample (Cell) is select, the calculation will be performed on all the features in the input matrix for each sample (or cell). When Feature is selected, the calculation will be performed across all the samples (cells) in the input matrix for each feature.

In addition, when Feature is selected, there is an extra Group by option (Figure 3)

From the drop-down list, choose a categorical attribute to calculate the descriptive statistics on all the subgroups for each feature.

The output of the task is a matrix: Cell stats (result of Calculate for Cells) or Feature stats (result of Calculate for Features) (Figure 4). The results can be visualized in the Data Viewer.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Spot clean

The terminology spot swapping describes the artifact for spatial data that mRNA bleed from nearby spots causes substantial contamination of UMI counts[1]. Spot clean in Flow is a task that aims to improve estimates of expression by correcting for spot swapping.

The task can only be invoked from the Space Ranger task output data node since it takes the raw count matrix as input. To run the Spot clean task in Flow:

Click the Single cell counts outputted from Space Ranger (Figure 1)
Click Pre-analysis tools in the toolbox
Click Spot clean
Click Finish to run the task with default settings

Another single cell counts node will be generated. The data node contains a matrix of cell counts with the decontaminated gene expressions (Figure 2). Downstream analysis tasks, such as normalization, PCA, and ANOVA, can be performed on the new single cell counts node.

Parameters in this task that you can adjust include:

Gene cutoff: Filter out genes with average expressions among tissue spots below or equal to this cutoff. Default: 0.1.

Max iteration: Maximum iteration for EM parameter updates. Default: 10. Set a smaller number to save computation time.

References

Ni, Z., Prasad, A., Chen, S. et al. SpotClean adjusts for spot swapping in spatial transcriptomics data. Nat Commun 13, 2971 (2022). https://doi.org/10.1038/s41467-022-30587-y

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.