When a project contains multiple libraries, the data might contains variabilities due to technical differences (e.g. sequencing machine, library prep kit etc.) in addition to biological differences (like treatment, genotype etc.). Batch removal is essential to remove the noise and discover biological variations.
This method is based on general linear model, much like ANOVA in reverse, it calculates the variation attributed to the factor(s) being removed then adjusting the original values to remove the variation.
By including batch in the differential analysis model, the variability due to the batch effect is accounted for when calculating p-values. In this sense, batch effects are best handled as part of the differential analysis model. However, clustering data or visualizing biological effects can be very difficult if batch effects are present in the original data. We transform the original values to remove the batch effect using this tool.
We recommend normalizing your data prior to removing batch effects, but the task will run on any counts data node.
Click the counts data node
Click the Batch removal section of the toolbox
Click General linear model
The batch effect removal dialog is similar to the dialog for ANOVA. To set up the model, you need to choose which attributes should be considered. Here, you should include the batch attribute, any attributes that interact with batch, and the interactions between these attributes.
For example, in the case where you have different cell types and batch may have a different effect on different cell types, you would need to include both batch, cell type, and the interaction between batch and cell type. Here, batch is the attribute Version and cell type is the attribute Classification.
Click Version and Classification
Click Add factors
Click Version and Classification
Click Add interaction (Figure 1)
To remove the batch effect and its interaction with cell type, we can click the Remove checkbox for both Version and Version*Classification.
Click the Remove checkbox for Version and Version*Classification
Click Finish to run (Figure 2)
The output of is a new data node, Batch effect adjusted counts. This data node contains the batch effect corrected values can be used as the input for downstream tasks such as clustering and UMAP (Figure 3).
The advanced options for Remove batch effect are shared by ANOVA/LIMMA-trend/LIMMA-voom.
Seurat v3[1] introduced new methods for the integration of multiple single-cell datasets, no matter whether they were collected from different individuals, experimental conditions, technologies, etc. Seurat 3 integration method aims to use a subset of the data as reference for the integrate analysis. The method integrates all other data with the reference subset. The subset can be one sample or a subgroup of samples defined by the factor attribute.
Seurat3 integration in Flow can be invoked in Batch removal section if a Normalized counts data node is selected (Figure 1).
To run Seurat3 integration,
Click a Normalized counts data node
Click the Batch removal section in the toolbox
Click Seurat3 Integration
You will be promoted to pick some attribute(s) for analysis. The first Seurat3 integration dialog is a drop-down list that includes the factors for data integration. To set up the model, you need to choose which attribute should be considered. For example, in the case of one dataset that has different cell types from multiple technologies(Tech), different technology may have divergent impacts on different cell types. Hence, the attribute Tech should be considered to be the batch factor_._ The attribute celltype represents different cell types in this dataset (Figure 2).
To integrate data with default settings,
Select Tech from the dropdown list
Click Finish
The output of Seurat3 integration is a new data node - Integrated counts (Figure 1). We can then use this new integrated matrix for downstream analysis and visualization (Figure 3).
Users can click Configure to change the default settings In Advanced options (Figure 4).
Use reference to find anchors: when this box is checked, the first group of the selected attribute is used as reference to find anchors. To use a different group as reference, change the order of subgroups of the attribute in the attribute management page on Data tab. When the box is unchecked, anchors will be identified by comparing all pairs of subgroups, this option is very computationally intensive.
Perform L2 normalization: Perform L2 normalization on the CCA cell embeddings after dimensional reduction.
Pick anchors: How many neighbors (k) to use when picking anchors.
Filter anchors: How many neighbors (k) to use when filtering anchors.
Score anchors: How many neighbors (k) to use when scoring anchors.
Nearest neighbor finding methods: Method for nearest neighbor finding. Options include: rann, annoy.
\
Stuart T, Butler A, Hoffman P, et al. Comprehensive integration of single-cell data. Cell, 2019. DOI:https://doi.org/10.1016/j.cell.2019.05.031 \
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
\
\
It is challenging to analyze scRNA-seq data, particularly when they are assayed with different technologies. Because biological and technical differences are interspersed. Harmony[1] is an algorithm that projects cells into a shared embedding where cells group by cell type rather than dataset-specific conditions. Harmony is able to simultaneously account for multiple experimental and biological factors while integrating different datasets.
Harmony in Flow can be invoked in Batch removal section only if
the data has some categorical attributes (only categorical attributes can be included in the model)
PCA data node is selected (Figure 1).
To run Harmony,
Click a PCA data node
Click the Batch removal section in the toolbox
Click Harmony
You will be prompted to pick some attribute(s) for analysis. The Harmony dialog is similar to the General linear model batch removal. To set up the model, you need to choose which attributes should be considered. For example, in the case of one dataset that has different cell types from multiple batches, the batch may have divergent impacts on different cell types. Here, batch is the attribute Sample name and cell type is the attribute Cell type (Figure 2).
To remove batch effects with default settings,
Click Sample name
Click Add factors
Click Finish
The output of Harmony is a new data node. This data node contains the Harmony corrected values and can be used as the input for downstream tasks such as Graph-based clustering, UMAP and T-SNE (Figure 3).
Users can click Configure to change the default settings In Advanced options (Figure 4).
Diversity clustering penalty (theta): Default theta=2. Higher value of penalty will have stronger correction, which results in better mixing . Zero penalty means no correction. The range of this value is from 0 to positive infinity.
Number of clusters (nclust): Number of clusters in model. Set this to the distinct count of cell types. nclust=1 equivalent to simple linear regression. Use 0 to enable Seurat’s RunHarmony() default setting.
Width of soft kmeans clusters (sigma): The range of this value is from 0 to positive infinity. When set it to 0, an observation will be assigned to 1 cluster (hard clustering). When the value is greater than 0, the observation will be potentially belong to multiple clusters (soft clustering, or fuzzy clustering). Default sigma=0.1. Sigma scales the distance from a cell to cluster centroids. Larger values of sigma result in observations assigned to more clusters. Smaller values of sigma make soft kmeans cluster approach hard clustering.
Ridge regression penalty (lambda): Default lambda=1. Lambda must be strictly positive. Smaller values result in more aggressive correction.
Random seed: Use the same random seed to reproduce the results.
Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh P-r, Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nature Methods; 2019. https://doi.org/10.1038/s41592-019-0619-0.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.