arrow-left

All pages
gitbookPowered by GitBook
1 of 6

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Importing Feature Barcoding Data

  • Create a new Project

  • Import data

hashtag
Create a new Project

Let's start by creating a new project.

  • On the Home page, click New project (Figure 1)

  • Give the project a name

  • Click Create project

hashtag
Import data

  • In the Analyses tab, click Add data

  • Click 10x Genomics Cell Ranger counts h5 (Figure 2)

  • Choose the filtered HDF5 file for the MALT sample produced by Cell Ranger

Move the .h5 file to where Partek Flow is installed using , then browse to its location.

Note that Partek Flow also supports the feature-barcode matrix output (barcodes.tsv, features.tsv, matrix.mtx) from Cell Ranger. The import steps for a feature-barcode matrix are identical to this tutorial.

  • Click Next

  • Name the sample MALT (the default is the file name)

  • Specify the annotation used for the gene expression data (here, we choose Homo sapiens (human) - hg38 and Ensembl Transcripts release 109). If Ensembl 109 is not available from the drop-down list, choose Add annotation and download it.

A Single cell counts data node will be created under the Analyses tab after the file has been imported. We can move on to processing the data.

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Check Features with non-zero values across all samples in the Report section

  • Click Finish (Figure 3)

  • our support pagearrow-up-right
    Figure 1. Create a new project and give it a meaningful name (e.g. CITE-Seq tutorial)
    Figure 2. Import options for CITE-Seq tutorial data
    Figure 3. Import options for CITE-Seq tutorial data
    Figure 4. File format options for MALT data set

    Dimensionality Reduction and Clustering

    • PCA

    • Graph-based clustering

    • UMAP

    hashtag
    PCA

    Next, we will perform some exploratory analysis on the merged mRNA and protein expression data and visualize the data in preparation to identify cell populations. Because the merged count matrix has thousands of features, it is a good idea to reduce the dimensionality of the data for more efficient downstream processing.

    • Click the Merged counts data node

    • Click Exploratory analysis in the toolbox

    • Click PCA

    A PCA task node will be added to the pipeline under the Analyses tab and a circular PCA output data node will be produced (Figure 2).

    Once the task completes, we will inspect the results to decide the optimal number of principal components (PCs) to use in downstream analyses. To do this, we will use a Scree plot.

    • Double click the PCA data node to open the task report

    The PCA plot will open in a new data viewer session. A 3D scatterplot will be displayed on the canvas (Figure 3).

    • Click and drag the Scree plot from New plot under Setup on the left onto the canvas

    • Drop it over the Replace option (Figure 4)

    • Select PCA as data for the new Scree plot (Figure 5)

    The Scree plot (Figure 6) shows the eigenvalues on the y-axis for each of the 100 PCs on the x-axis. The higher the eigenvalue, the more variance explained by each PC. Typically, after an initial set of highly informative PCs, the amount of variance explained by analyzing additional components is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering and UMAP.

    • Click and drag over the first set of PCs to zoom in (Figure 7)

    • Mouse over the Scree plot to identify the point where additional PCs offer little additional information (Figure 8)

    In this data set, a reasonable cut-off could be set anywhere between around 10 and 30 PCs. We will use 15 in downstream steps.

    hashtag
    Graph-based clustering

    We can use Graph-based clustering to group similar cells together in an unsupervised manner.

    • Click the project name near the top to go back to the Analyses tab

    • Click the circular PCA data node

    • Click Exploratory analysis in the toolbox

    A Graph-based clustering task node will be added to the pipeline under the Analyses tab and a circular Graph-based clusters output data node will be produced (Figure 10)

    hashtag
    UMAP

    Once the graph-based clustering task has completed, we can visualize the results with a UMAP plot. You could use the same steps here to generate a t-SNE plot. For this tutorial, we will use UMAP, as it is faster on several thousand cells.

    • Click the circular PCA data node

    • Click Exploratory analysis in the toolbox

    • Click UMAP

    A UMAP task node will be added to the pipeline under the Analyses tab and a circular UMAP output data node will be produced (Figure 12)

    hashtag
    Notes on Performing Exploratory Analysis with Protein or Gene Expression Data Only

    In this tutorial, we have performed exploratory analysis on merged protein and gene expression data, and we will perform classification on the merged data in the next step.

    It can be interesting to perform exploratory analysis on the two feature types separately. For example, you might be interested to see how the clustering of the same cells differs between protein expression profiles vs. gene expression profiles.

    To perform exploratory analysis on the two feature types separately, select the Merged counts data node, click Pre-analysis tools, followed by Split by feature type from the toolbox. A new task, Split by feature type, will be added to the pipeline resulting in two output data nodes: Antibody capture (protein data) and Gene expression (mRNA data). Both contain the same high-quality cells.

    Performing exploratory analysis with gene expression data is the same as for the merged counts. Because there are a large number of genes, you will need to reduce the dimensionality with PCA, choose an optimal number of PCs and perform downstream clustering and visualization (e.g. graph-based clustering and UMAP/t-SNE). Performing exploratory analysis with protein data is different. There is no need to reduce the dimensionality as there are only a handful of features (17 proteins in this case), so you can proceed straight to downstream clustering and visualization. Figure 13 shows an example of how the pipeline might look if the data is split and analyzed separately.

    You can then use the Data viewer to bring together multiple plots for comparison (Figure 14).

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Click Finish to run the PCA with default settings (Figure 1)
    Click Graph-based clustering
  • Click to Compute biomarkers

  • Set the number of principal components to 15 (Figure 9)

  • Click Configure under Advanced options and change the Resolution to 1.0

  • Click Finish to run the task

  • Set the number of principal components to 15 (Figure 11)
  • Click Finish to run the task

  • Notes on Performing Exploratory Analysis with Protein or Gene Expression Data Only
    our support pagearrow-up-right
    Figure 1. Run PCA with default settings
    Figure 2. PCA task run on the merged counts data node
    Figure 3. Each dot is a different cell. Cells are clustered based on how similar their expression profile is across the combined mRNA and protein data
    Figure 4. Click and drag the Scree plot to replace the PCA plot on the canvas
    Figure 5. The PCA data node contains the data to draw the Scree plot
    Figure 6. Scree plot shows the amount of variation explained by each principal component
    Figure 7. Click and drag on the Scree plot to zoom in and see the first set of principal components
    Figure 8. Identifying the optimal number of PCs
    Figure 9. Graph-based clustering task set up. Reduce the number of PCs to 15
    Figure 10. Graph-based clustering task and output data nodes
    Figure 11. UMAP task set up. Reduce the number of PCs to 15.
    Figure 12. UMAP task and output data node
    Figure 13. Example of how the pipeline might look if you split the merged counts and perform exploratory analysis for protein and gene expression data separately
    Figure 14. Comparison of 2D UMAP plots for the same cells clustered on protein, mRNA and merged data. All cells are coloured based on their expression of the CD3D gene (in blue). Note, the plots in this figure may differ from the default UMAP plots because these are 2D plots. Default UMAP plots re in 3D.

    Analyzing CITE-Seq Data

    • Importing Feature Barcoding Dataarrow-up-right

    • Data Processing

    • Dimensionality Reduction and Clusteringarrow-up-right

    This tutorial presents an outline of the basic series of steps for analyzing a 10x Genomics Gene Expression with Feature Barcoding (antibody) data set in Partek Flow starting with the output of Cell Ranger.

    If you have Cell Hashing data, please see our documentation on .

    This tutorial includes only one sample, but the same steps will be followed when analyzing multiple samples. For notes on a few aspects specific to a multi-sample analysis, please see our tutorial.

    If you are new to Partek Flow, please see for information about data transfer and import and for information about the Partek Flow user interface.

    hashtag
    Data set

    The data set for this tutorial is a demonstration data set from 10x Genomics. The sample includes cells from a dissociated Extranodal Marginal Zone B-Cell Tumor (MALT: Mucosa-Associated Lymphoid Tissue) stained with BioLegend TotalSeq-B antibodies. We are starting with the produced by Cell Ranger. Prior to beginning, transfer this file to your Partek Flow using the Transfer files button on the homepage.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Classifying Cells
    Differentially Expressed Proteins and Genes
    Hashtag demultiplexing
    Single Cell RNA-Seq Analysis (Multiple Samples)
    Getting Started with Your Partek Flow Hosted Trial
    Creating and Analyzing a Project
    Feature / cell matrix HDF5 (filtered)arrow-up-right
    our support pagearrow-up-right

    Data Processing

    • Split matrix

    • Filter low-quality cells

    • Normalization

    hashtag
    Split matrix

    The Single cell counts data node contains two different types of data, mRNA expression and protein expression. So that we can process these two different types of data separately, we will split the data by data type.

    • Click the Single cell counts data node

    • Click Pre-analysis tools in the toolbox

    • Click Split by feature type

    A rectangular task node will be created along with two circular data nodes, one for each data type (Figure 1). The labels for these data types are determined by features.csv file used when processing the data with Cell Ranger. Here, our data is labeled Gene Expression, for the mRNA data, and Antibody Capture, for the protein data.

    hashtag
    Filter low-quality cells

    An important step in analyzing single cell RNA-Seq data is to filter out low-quality cells. A few examples of low-quality cells are doublets, cells damaged during cell isolation, or cells with too few counts to be analyzed. In a CITE-Seq experiment, protein aggregation in the antibody staining reagents can cause a cell to have a very high number of counts. These are low-quality cells that can be excluded. Additionally, if all cells in a data set are expected to show a baseline level of expression for one of the antibodies used, it may be appropriate to filter out cells with very low counts or a low number of detected features. You can do this in Partek Flow using the Single cell QA/QC task.

    We will start with the protein data.

    • Click the Antibody Capture data node

    • Click QA/QC in the toolbox

    • Click Single Cell QA/QC

    This produces a Single-cell QA/QC task node (Figure 2).

    • Double-click the Single cell QA/QC task node to open the task report

    The Single cell QA/QC report opens in a new data viewer session. There are interactive violin plots showing the most commonly used quality metrics for each cell: the total count per cell and the number of detected features per cell (Figure 3). Each point on the plots is a cell and the violins illustrate the distribution of values for the y-axis metric.

    For this analysis, we will set a maximum counts threshold to exclude potential protein aggregates and, because we expect every cell to be bound by several antibodies, we will also set a minimum counts threshold.

    • Select one of the plots on the canvas

    • In the Select & Filter icon on the left under Tools, set the Counts threshold to keep cells between 500 and 20000 (Figure 4)

    • Click under Filter on the right

    • Click Apply observation filter...

    • Select the Antibody Capture data node as input in the pipeline preview (Figure 5)

    You will see a message telling you a new task has been enqueued.

    • Click OK to dismiss the message

    • Click the project name at the top to go back to the Analyses tab

    • Your browser may warn you that any unsaved changes to the data viewer session will be lost. Ignore this message and proceed to the Analyses tab

    A new task, Filter counts, is added to the Analyses tab. This task produces a new Filter counts data node.

    Next, we can repeat this process for the Gene Expression data node.

    • Click the Gene Expression data node

    • Click the QA/QC section in the toolbox

    • Click Single Cell QA/QC

    This produces a Single-cell QA/QC task node

    • Double-click the Single cell QA/QC task node to open the task report

    The task report lists the number of counts per cell, the number of detected features per cell, the percentage of mitochondrial reads per cell, and the percentage of ribosomal counts per cell in four violin plots (Figure 6). For this analysis, we will set maximum and minimum thresholds for total counts and detected genes to exclude potential doublets and a maximum mitochondrial reads percentage filter to exclude potential dead or dying cells. There is no need to apply a filter based on the percentage of ribosomal counts in this tutorial.

    • In the Selection card on the right, set the Counts threshold to keep cells between 1500 and 15000

    • Set the Detected features to keep cells between 400 and 4000

    • Set the % Mitochondrial counts to keep cells between 0% and 20%

    • Click under Filter on the right

    • Click Apply observations filter

    • Select the Gene Expression data node as input in the pipeline preview

    A new task, Filter counts, is added to the Analyses tab. This task produces a new Filter counts data node (Figure 7)

    hashtag
    Normalization

    After excluding low-quality cells, we can normalize the data.

    We will start with the protein data.

    • Click the Filtered counts data node produced by filtering the Antibody Capture data node

    • Click Normalization and scaling in the toolbox

    • Click Normalization

    The recommended normalization for protein data includes the following steps: Add 1, Divide by Geometric mean, Add 1, Log base 2. This is a variant of Centered log-ratio (CLR), which was used to normalize antibody capture protein counts data in the paper that introduced CITE-Seq [1] and in subsequent publications on similar assays [2. 3]. CLR normalization includes the following steps: Add 1, Divide by Geometric mean, Add 1, log base e. Normalizing the protein data to base 2 instead of e allows for better integration with gene expression data further downstream. If you would prefer to use CLR, click and drag CLR from the panel on the left to the right. If you do choose to use CLR, we recommend making sure the gene expression data is normalized to the base e, to allow for smoother integration further downstream.

    Normalization produces a Normalized counts data node on the Antibody Capture branch of the pipeline.

    Next, we can normalize the mRNA data. We will use the recommended normalization method in Partek Flow, which accounts for differences in library size, or the total number of UMI counts, per cell, adds 1 and log2 transforms the data.

    • Click the Filtered counts data node produced by filtering the Gene Expression data node

    • Click the Normalization and scaling section in the toolbox

    • Click Normalization

    Normalization produces a Normalized counts data node on the Gene Expression branch of the pipeline (Figure 10).

    hashtag
    Merge Protein and mRNA data

    For quality filtering and normalization, we needed to have the two data types separate as the processing steps were distinct. For downstream analysis, we want to be able to analyze protein and mRNA data together. To bring the two data types back together, we will merge the two normalized counts data nodes.

    • Click the Normalized counts data node on the Antibody Capture branch of the pipeline

    • Click Pre-analysis tools in the toolbox

    • Click Merge matrices

    Data nodes that can be merged with the Antibody Capture branch Normalized counts data node are shown in color (Figure 11).

    • Click the Normalized counts data node on the Gene Expression branch of the pipeline (Figure 11)

    • Click Select

    • Click Finish to run the task

    The output is a Merged counts data node (Figure 12). This data node will include the normalized counts of our protein and mRNA data. The intersection of cells from the two input data nodes is retained so only cells that passed the quality filter for both protein and mRNA data will be included in the Merged counts data node.

    hashtag
    Collapsing tasks to simplify the pipeline

    To simplify the appearance of the pipeline, we can group task nodes into a single collapsed task. Here, we will collapse the filtering and normalization steps.

    • Right-click the Split by feature type task node

    • Choose Collapse tasks from the pop-up dialog (Figure 13)

    Tasks that can be selected for the beginning and end of the collapsed section of the pipeline are highlighted in purple (Figure 14). We have chosen the Split matrix task as the start and we can choose Merge matrices as the end of the collapsed section.

    • Click the Merge matrices task to choose it as the end of the collapsed section

    • Name the Collapsed task Data processing

    • Click Save (Figure 15)

    The new collapsed task, Data processing, appears as a single rectangular task node (Figure 16).

    To view the tasks in Data processing, we can expand the collapsed task.

    • Double-click Data processing to expand it or right-click and choose Expand collapsed task

    When expanded, the collapsed task is shown as a shaded section of the pipeline with a title bar (Figure 17).

    To re-collapse the task, you can double click the title bar or click the icon in the title bar. To remove the collapsed task, you can click the . Please note that this will not remove tasks, just the grouping.

    • Double-click the Data processing title bar to re-collapse

    hashtag
    References

    [1] Stoeckius, M., Hafemeister, C., Stephenson, W., Houck-Loomis, B., Chattopadhyay, P. K., Swerdlow, H., ... & Smibert, P. (2017). Simultaneous epitope and transcriptome measurement in single cells. Nature methods, 14(9), 865.

    [2] Stoeckius, M., Zheng, S., Houck-Loomis, B., Hao, S., Yeung, B. Z., Mauck, W. M., ... & Satija, R. (2018). Cell hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome biology, 19(1), 224.

    [3] Mimitou, E., Cheng, A., Montalbano, A., Hao, S., Stoeckius, M., Legut, M., ... & Satija, R. (2018). Expanding the CITE-seq tool-kit: Detection of proteins, transcriptomes, clonotypes and CRISPR perturbations with multiplexing, in a single assay. bioRxiv, 466466.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Click Select
    (Figure 6)
    Click Select
  • Click OK to dismiss the message about the task being enqueued

  • Click the project name at the top to go back to the Analyses tab

  • Your browser may warn you that any unsaved changes to the data viewer session will be lost. Ignore this message and proceed to the Analyses tab

  • Click the button
  • Click Finish to run (Figure 8)

  • Click the button
  • Click Finish to run (Figure 9)

  • Click Select data node to launch the data node selector
    Merge Protein and mRNA data
    Collapsing tasks to simplify the pipeline
    References
    our support pagearrow-up-right
    Figure 1. Split by feature type produces two data nodes, one for each data type
    Figure 2. Single cell QA/QC produces a task node
    Figure 3. Each cell is shown as a point on the plot.
    Figure 4. Filtering low quality cells based on protein expression data
    Figure 5. After the Apply filter button is selected, you will be presented with a preview of your pipeline. You need to select the appropriate data node to apply the filtering to. In this case, the Antibody capture node
    Figure 6. Filtering low quality cells based on gene expression data
    Figure 7. Antibody Capture and Gene Expression data have been filtered to remove low quality cells
    Figure 8. Recommended normalization for protein count data
    Figure 9. Recommended normalization for single cell gene expression data
    Figure 10. The two normalization tasks produce Normalized counts data nodes
    Figure 11. Select the normalizated gene expression counts to merge the protein counts with
    Figure 12. Merged counts output
    Figure 13. Choosing the first task node to generate a collapsed task
    Figure 14. Tasks that can be the start or end of a collapsed task are shown in purple
    Figure 15. Naming the collapsed task
    Figure 16. Collapsed tasks are represented by a single task node
    Figure 17. Expanding a collapsed task to show its components

    Classifying Cells

    • Exploratory Analysis Results

    • T cells

    • B cells

    We will now examine the results of our exploratory analysis and use a combination of techniques to classify different subsets of T and B cells in the MALT sample.

    hashtag
    Exploratory Analysis Results

    • Double click the merged UMAP data node

    • Under Configure on the left, click Style, select the Graph-based cluster node, and color by the Graph-based attribute (Figure 1)

    The 3D UMAP plot opens in a new data viewer session (Figure 2). Each point is a different cell and they are clustered based on how similar their expression profiles are across proteins and genes. Because a graph-based clustering task was performed upstream, a biomarker table is also displayed under the plot. This table lists the proteins and genes that are most highly expressed in each graph-based cluster. The graph-based clustering found 11 clusters, so there are 11 columns in the biomarker table.

    • Click and drag the 2D scatter plot icon from New plot onto the canvas (Figure 2)

    • Drop the 2D scatter plot to the right of the UMAP plot

    • Click Merged counts to use as data for the 2D scatter plot (Figure 3)

    A 2D scatter plot has been added to the right of the UMAP plot. The points in the 2D scatter plot are the same cells as in the UMAP, but they are positioned along the x- and y-axes according to their expression level for two protein markers: CD3_TotalSeqB and CD4_TotalSeqB, respectively (Figure 4).

    • In Select & Filter, click Criteria to change the selection mode

    • Click the blue circle next to the Add rule drop-down menu (Figure 5)

    • Click Merged counts to change the data source

    • Choose CD3_TotalSeqB from the drop-down list (Figure 6)

    • Click and drag the slider on the CD3D_TotalSeqB selection rule to include the CD3 positive cells (Figure 7)

    As you move the slider up and down, the corresponding points on both plots will dynamically update. The cells with a high expression for the CD3 protein marker (a marker for T cells) are highlighted and the deselected points are dimmed (Figure 8).

    • Click Merged counts in Get data on the left under Setup

    • Click and drag CD8a_TotalSeqB onto the 2D scatter plot (Figure 9)

    • Drop CD8_TotalSeqB onto the x-axis configuration option

    The CD3 positive cells are still selected, but now you can see how they separate into CD4 and CD8 positive populations (Figure 10).

    The simplest way to classifying cell types is to look for the expression of key marker genes or proteins. This approach is more effective with CITE-Seq data than with gene expression data alone as the protein expression data has a better dynamic range and is less sparse. Additionally, many cell types have expected cell surface marker profiles established using other technologies such as flow cytometry or CyTOF. Let's compare the resolution power of the CD4 and CD8A gene expression markers compared to their protein counterparts.

    • Click the duplicate plot icon above the 2D scatter plot (Figure 11)

    • Click Merged counts in the Get Data icon under Setup

    • Search for the CD4 gene

    • Click and drag CD4 onto the duplicated 2D scatter plot

    The second 2D scatter plot has the CD8A and CD4 mRNA markers on the x- and y-axis, respectively (Figure 12). The protein expression data has a better dynamic range than the gene expression data, making it easier to identify sub-populations.

    • On the first 2D scatter plot (with protein markers), click in the top right corner

    • Manually select the cells with high expression of the CD4_TotalSeqB protein marker (Figure 13)

    More than 2000 cells show positive expression for the CD4 cell surface protein.

    Let's perform the same test on the gene expression data.

    • Click in the top right of the plot to switch back to pointer mode

    • Click on a blank spot on the plot to clear the selection

    • On the second 2D scatter plot (with mRNA markers), click in the top right corner

    This time, only 500 cells show positive expression for the CD4 marker gene. This means that the protein data is less sparse (i.e. there fewer zero counts), which further helps to reliably detect sub-populations.

    hashtag
    T cells

    Based on the exploratory analysis above, most of the CD3 positive cells are in the group of cells in the right side of the UMAP plot. This is likely to be a group of T cells. We will now examine this group in more detail to identify T cell sub-populations.

    • Click in the top right corner of both 2D scatter plots, to remove them from the canvas

    • Click in the top right corner of the 3D UMAP plot

    • Draw a lasso around the group of putative T cells (Figure 15)

    • Click in the Select & Filter tool to include the selected points

    • Click in the top right of the plot to switch back to pointer mode

    • Click and drag the plot to rotate it around

    This group of putative T cells predominantly consists of cells assigned to graph-based clusters 3, 4, and 6, indicated by the colors. Examining the biomarker table for these clusters can help us infer different types of T cell.

    • Add the Biomarkers table using the Table option in the New plot menu, you can drag and reposition the table using the button in the top left corner of the plot .

    • Click and drag the bar between the UMAP plot and the biomarker table to resize the biomarker table to see more of it (Figure 17)

    If you need to create more space on the canvas, hide the panel words on the left using the arrow .

    Cluster 6 has several interesting biomarkers. The top biomarker is CXCL13, a gene expressed by follicular B helper T cells (Tfh cells). Another biomarker is the PD-1 protein, which is expressed in Tfh cells. This protein promotes self-tolerance and is a target for immunotherapy drugs. The TIGIT protein is also expressed in cluster 6 and is another immunotherapy drug target that promotes self-tolerance.

    Cluster 4 expresses several marker genes associated with cytotoxicity (e.g. NKG7 and GZMA) and both CD3 and CD8 proteins. Thus, these are likely to be cytotoxic cells.

    We can visually confirm these expression patterns and assess the specificity of these markers by coloring the cells on the UMAP plot based on their expression of these markers.

    • Click the duplicate plot icon above the UMAP plot

    We will color the cells on the duplicate by their expression of marker genes, while keeping the original plot colored by graph-based cluster assignment.

    • Click and drag the CXCL13 gene from the biomarker table onto the duplicate UMAP plot

    • Drop the CXCL13 gene onto the Green (feature) option (Figure 18)

    • Click and drag the NKG7 gene from the biomarker table onto the duplicate UMAP plot

    • Drop the NKG7 gene onto the Red (feature) option

    The cells with higher CXCL13 and NKG7 expression are now colored green and red, respectively. By looking at the two UMAP plots side by side, you can see these two marker genes are localized in graph-based clusters 6 and 4, respectively (Figure 19).

    • In Select & Filter, click to remove the CD3_TotalSeqB filtering rule

    • Click the blue circle next to the Add criteria drop-down list

    • Search for Graph to search for a data source

    • In the Graph-based filtering rule, click All to deselect all cells

    • Click cluster 6 to select all cells in cluster 6

    • Using the Classify tool, click Classify selection

    • Click in Select & Filter to exclude the cluster 6/Tfh cells

    • Click cluster 4 to select all cells in cluster 4

    • In the Classify icon, click Classify selection

    We can classify the remaining cells as helper T cells, as they predominantly express the CD4 protein marker.

    • Click on the invert selection icon in either of the UMAP plots (Figure 22)

    • In Classify, click Classify selection

    • Label the cells as Helper T cells

    • Click Save

    Let's look at our progress so far, before we classify subsets of B-cells.

    • Click the Clear filters link in Select & Filter

    • Select the duplicate UMAP plot (with the cell colored by marker genes)

    • Under Configure on the left, open Style and color the cells by New classifications (Figure 23)

    hashtag
    B cells

    In addition to T-cells, we would expect to see B lymphocytes, at least some of which are malignant, in a MALT tumor sample. We can color the plot by expression of a B cell marker to locate these cells on the UMAP plot.

    • In the Get data icon on the left, click Merged counts

    • Scroll down or use the search bar to find the CD19_TotalSeqB protein marker

    • Click and drag the CD19_TotalSeqB marker over to the UMAP plot on the right

    The cells in the UMAP plot are now colored from grey to blue according to their expression level for the CD19 protein marker (Figure 24). The CD19 positive cells correspond to several graph-based clusters. We can filter to these cells to examine them more closely,

    • Click in the top right corner of the UMAP plot

    • Lasso around the CD19 positive cells (Figure 25)

    • Click in Select & Filter to include the selected points

    The plots will rescale to include the selected points. The CD19 positive cells include cells from graph-based clusters 1, 2 and 7 (Figure 26).

    • Find the CD3_TotalSeqB protein marker in the biomarker table

    • Click and drag the CD3_TotalSeqB onto the UMAP plot on the right

    • Drop the CD3_TotalSeqB protein marker onto the Color configuration option on the plot (Figure 27)

    While these cells express T cell markers, they also group closely with other putative B cells and express B cell markers (CD19). Therefore, these cells are likely to be doublets.

    • Select either of the UMAP plots

    • Click on the Select & Filter

    • Find the CD3_TotalSeqB protein marker in the biomarker table

    The biomarkers for clusters 1 and 2 also show an interesting pattern. Cluster 1 lists IGHD as its top biomarker, while cluster 2 lists IGHA1 as the fourth most significant. Both IGHD (Immunoglobulin Heavy Constant Delta) and IGHA1 (Immunoglobulin Heavy Constant Alpha 1) encode classes of the immunoglobulin heavy chain constant region. IGHD is part of IgD, which is expressed by mature B cells, and IGHA1 is part of IgA1, which is expressed by activated B cells. We can color the plot by both of these genes to visualize their expression.

    • Click, drag and drop IGHD from the biomarker table onto the Green (feature) configuration option on the UMAP plot on the right

    • Click, drag and drop IGHA1 from the biomarker table onto the Red (feature) configuration option on the UMAP plot on the right (Figure 30)

    We can use the lasso tool to select and classify these populations.

    • Click in the top right corner of the UMAP plot

    • Lasso around the IGHD positive cells (Figure 31)

    • In the Classify icon on the left, click Classify selection

    • Lasso around the IGHA1 positive cells (Figure 32)

    • In the Classify icon on the left, click Classify selection

    • Label the cells as Activated B cells

    We can now visualize our classifications.

    • Click the Clear filters link in the Select & Filter icon on the left

    • Select the duplicate UMAP plot (with the cell colored by marker genes)

    • Under Configure on the left, click the Style icon and color the cells by New classifications (Figure 33)

    • Click Apply classifications in the Classify icon

    • Name the attribute Cell type

    • Click Run

    Optionally, you may wish to save this data viewer session if you need to go back and reclassify cells later. To save the session, click the icon on the left and name the session.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Differentially Expressed Proteins and Genes

    • Filter Groups

    • Re-split the Matrix

    • Differential Analysis and Visualization - Protein Data

    Next, we will filter out certain cells and re-split the data. Re-splitting the data can be useful if you want to perform differential analysis and downstream analysis separately for proteins and genes. For your own analyses, re-splitting the data is optional. You could just as well continue with differential analysis with the merged data if you prefer.

    hashtag
    Filter Groups

    Because we have classified our cells, we can now filter based on those classifications. This can be used to focus on a single cell type for re-clustering and sub-classification or to exclude cells that are not of interest for downstream analysis.

    • Click the Merged counts data node

    • Click Filtering

    • Click Filter cells

    This produces a Filtered counts data node (Figure 2).

    hashtag
    Re-split the Matrix

    • Click the Filtered counts data node

    • Click Pre-analysis tools

    • Click Split by feature type

    This will produce two data nodes, one for each data type (Figure 3). The split data nodes will both retain cell classification information.

    hashtag
    Differential Analysis and Visualization - Protein Data

    Once we have classified our cells, we can use this information to perform comparisons between cell types or between experimental groups for a cell type. In this project, we only have a single sample, so we will compare cell types.

    • Click the Antibody Capture data node

    • Click Statistics

    • Click Differential analysis

    The first step is to choose which attributes we want to consider in the statistical test.

    • Click Cell type

    • Click Add factor

    • Click Next

    Next, we will set up the comparison we want to make. Here, we will compare the Activated and Mature B cells.

    • Drag Activated B cells in the top panel

    • Drag Mature B cells in the bottom panel

    • Click Add comparison

    The comparison should appear in the table as Activated B cells vs. Mature B cells.

    • Click Finish to run the statistical test (Figure 4)

    The ANOVA task produces an ANOVA data node.

    • Double-click the ANOVA data node to open the task report

    The report lists each feature tested, giving p-value, false discovery rate adjusted p-value (FDR step up), and fold change values for each comparison (Figure 5).

    In addition to the listed information, we can access dot and violin plots for each gene or protein from this table.

    • Click in the CD45RA_TotalSeqB row

    This opens a dot plot in a new data viewer session, showing CD45A expression for cells in each of the classifications (Figure 6). First, we exclude Doublets and N/A cells from the plot:

    • Open Select and filter, select Criteria

    • Drag "Cell type" from the legend title to the Add criteria box

    • Uncheck Doublets and N/A

    We can use the Configuration panel on the left to edit this plot.

    • Open the Style icon

    • Switch on Violins under Summary

    • Switch on Overlay under Summary

    • Click the project name to return to the Analyses tab

    To visualize all of the proteins at the same time, we can make a hierarchical clustering heat map.

    • Click the ANOVA data node

    • Click Exploratory analysis in the toolbox

    • Click Hierarchical clustering/heatmap

    The heatmap can easily be customized using the tools on the left.

    • Open the Axes icon

    • Switch off Show Row labels

    • Increase the Font to 16 (Figure 8)

    • Activate the Transpose switch which will switch the Row and Column labels, so now the Row labels will be shown (Figure 9)

    • Open the Dendrograms icon

    • Choose Row color By cluster and change Row clusters to 4

    • Change Row dendrogram size to 80 (Figure 10)

    • In the Heatmap icon

    • Navigate to Range under Color

    • Set the Min and Max to -1.2 and 1.2, respectively

    • Switch the Shape back to Rectangle

    • Change the Color Palette by clicking on the color squares and selecting colors from the rainbow. Click outside of the selection box to exit this selection. The color options can be dragged alone the Palette to highlight value differences (Figure 12).

    Feel free to explore the other tool options on the left to customize the plot further.

    hashtag
    Differential Analysis, Visualization, and Pathway analysis - Gene Expression Data

    We can use a similar approach to analyze the gene expression data.

    • Click the project name to return to the Analyses tab

    • Click the Gene Expression data node

    • Click the Antibody Capture data node

    The comparison should appear in the table as Activated B cells vs. Mature B cells.

    • Click Finish to run the statistical test

    As before, this will generate an ANOVA task node and n ANOVA data node.

    • Double-click the ANOVA task node to open the task report (Figure 13)

    Because more than 20,000 genes have been analyzed, it is useful to use a volcano plot to get an idea about the overall changes.

    • Click in the top right corner of the table to open a volcano plot

    The Volcano plot opens in a new data viewer session, in a new tab in the web browser. It shows each gene as a point with cutoff lines set for P-value (y-axis) and fold-change (x-axis). By default, the P-value cutoff is set to 0.05 and the fold-change cutoff is set at |2| (Figure 14).

    The plot can be configured using various tools on the left. For example, the Style icon can be used to change the appearance of the points. The X and Y-axes can be changed in the Axes icon. The Statistics icon can be used to set different Fold-change and P-value thresholds for coloring up/down-regulated genes. The in plot controls can be used to transpose the volcano plot (Figure 14).

    • Click the ANOVA report tab in your web browser to return to the full report

    We can filter the full set of genes to include only the significantly different genes using the filter panel on the left.

    • Click FDR step up

    • Type 0.05 for the cutoff and press Enter on your keyboard

    • Click Fold change

    The number at the top of the filter will update to show the number of included genes (Figure 15).

    • Click to create a new data node including only these significantly different genes

    A task, Differential analysis filter, will run and generate a new Filtered Feature list data node. We can get a better idea about the biology underlying these gene expression changes using gene set or pathway enrichment. Note, you need to have the Pathway toolkit enabled to perform the next steps.

    • Click the Filtered feature list data node

    • Click Biological interpretation in the toolbox

    • Click Pathway enrichment

    The pathway enrichment results list KEGG pathways, giving an enrichment score and p-value for each (Figure 16).

    To get a better idea about the changes in each enriched pathway, we can view an interactive KEGG pathway map.

    • Click path:hsa05202 in the Transcriptional misregulation in cancer row

    The KEGG pathway map shows up-regulated genes from the input list in red and down-regulated genes from the input list in green (Figure 17).

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    Drop the CD4 gene onto the y-axis option

  • Search for the CD8A gene

  • Click and drag CD8A onto the duplicated 2D scatter plot

  • Drop the CD8A gene onto the x-axis option

  • Manually select the cells with high expression of the CD4 gene marker (Figure 14)

    Select Graph-based clustering (derived from the Merged counts > PCA data nodes)

  • Click the Add criteria drop-down list and choose Graph-based to add a selection rule (Figure 20)

  • Label the cells as Tfh cells (Figure 21)

  • Click Save

  • Label the cells as Cytotoxic cells

  • Click Save

  • Click in Select & Filter to exclude the cluster 4/Cytotoxic cells

  • Drop the CD19_TotalSeqB marker over the Color configuration option on the plot

    Click and drag CD3_TotalSeqB onto the Add criteria drop-down list in Select & Filter (Figure 28)
  • Set the minimum threshold to 3 in the CD3_TotalSeqB selection (Figure 29)

  • Click the Classify icon then click Classify selection

  • Label the cells as Doublets

  • Click Save

  • Click in Select & Filter to exclude the selected points

  • Label the cells as Mature B cells
  • Click Save

  • Click Save
    Click OK to close the message about a classification task being enqueued
    our support pagearrow-up-right
    Figure 1. Color the cells in the UMAP plot by their graph-based cluster assignment
    Figure 2. Add a 2D scatter plot and place it to the right of the UMAP plot
    Figure 3. Choose Merged counts data to draw the 2D scatter plot
    Figure 4. The canvas now has a 2D scatter plot next to the UMAP
    Figure 5. Click the blue circle to change the data source for the rule selector
    Figure 6. Choose the CD3_TotalSeqB protein marker as a selection rule
    Figure 7. Use the slider to select cells with positive expression for the CD3 protein marker
    Figure 8. CD3+ cells are selected on both plots
    Figure 9. Change the feature plotted on the x-axis to CD8_TotalSeqB
    Figure 10. 2D scatter plot with CD4_TotalSeqB and CD8_TotalSeqB features on the axes
    Figure 11. Click the duplicate plot icon to make a copy of the 2D scatter plot
    Figure 12. The second 2D scatter plot (bottom) has the CD8 and CD4 genes plotted against each other
    Figure 13. Draw a lasso to manually select CD4+ cells, based on protein expression
    Figure 14. Draw a lasso to manually select CD4+ (mRNA) cells
    Figure 15. Select the group of putative T cells
    Figure 16. Group of putative T-cells
    Figure 17. Resize plots to see more of the biomarker table
    Figure 18. Click and drag the gene from the biomarker table onto the plot
    Figure 19. The cells in the UMAP plot on the right are colored by their expression of CXCL13 (green) and NKG7 (red) marker genes. These cells belong to graph-based clusters 6 and 4, respectively, shown in the plot on the left
    Figure 20. Change the data source to Graph-based clustering and choose Graph-based from the drop-down list
    Figure 21. Select all cluster 6 cells and classify them as Tfh cells
    Figure 22. Invert the selection to select all remaining cells
    Figure 23. Color by New classifications (T cell subsets)
    Figure 24. Cells in UMAP plot colored by their expression of CD19 protein
    Figure 25. Lasso around CD19 positive cells
    Figure 26. Filtered CD19 positive cells
    Figure 27. Some cells within the CD19 positive clusters show signs of expressing T-cells markers
    Figure 28. Click and drag the CD3 protein marker directly onto the Add criteria drop-down list to create a selection criteria
    Figure 29. Select the remaining CD3 positive doublet cells
    Figure 30. The B cells colored by IGHD (green) and IGHA1 (red) gene expression
    Figure 31. Lasso around the IGHD positive cells
    Figure 32. Select IGHA1 positive cells
    Figure 33. UMAP with cells colored by cell types
    Set to exclude Cell type is Doublets using the drop-down menus
  • Click OR

  • Set the second filter to exclude Cell type is N/A using the drop-down menus

  • Click Finish to apply the filter (Figure 1)

  • Click ANOVA then click Next

    Click to include selected points

    Switch on Colored under Summary
  • Select the Graph-based clustering node in the Color by section

  • Color by Graph-based clusters under Color and use the slider to decrease the Opacity

  • Open the Axes icon

  • Select the Graph-based clustering node in the X axis section

  • Change the X axis data to Graph-based clusters

  • Use the slider to increase the Jitter on the X axis (Figure 7)

  • In the Cell order section, choose Graph-based clusters from the Assign order drop-down list
  • Click Finish to run with the other default settings

  • Double-click the Hierarchical clustering task node to open the heatmap

  • Change the Shape to Circle (Figure 11)
    Click Statistics
  • Click Differential analysis

  • Click ANOVA then click Next

  • Click Cell type

  • Click Add factor

  • Click Next

  • Drag Activated B cells in the top panel

  • Drag Mature B cells in the bottom panel

  • Click Add comparison

  • Set to From -2 to 2 and press Enter on your keyboard
    Make sure that Homo sapiens is selected in the Species drop-down menu
  • Click Finish to run

  • Double-click the Pathway enrichment task node to open the task report

  • Differential Analysis, Visualization, and Pathway analysis - Gene Expression Data
    our support pagearrow-up-right
    Figure 1. Set up the Filter groups task to exlcude Doublets and cells that are not classified
    Figure 2. Filter groups output
    Figure 3. It is possible to re-split the merged matrix once again
    Figure 4. Setting up a comparison for differentially expressed proteins
    Figure 5. GSA report for protein expression data
    Figure 6. CD45RA dot plot for all cells
    Figure 7. Configure the dot plot using the tools on the left
    Figure 8. Heatmap showing altered Axes labels
    Figure 9. Transpose the Heatmap to switch the columns and rows
    Figure 10. Configure the Dendrograms settings
    Figure 11. Configure the Heatmap icon
    Figure 12. Heatmap showing expression of protein markers after changing the Heatmap settings further
    Figure 13. GSA report for the gene expression data
    Figure 14. The volcano plot can be Configured using the icons on the left and in plot controls
    Figure 15. Use the panel on the left to filter the list for significant genes
    Figure 16. Results of pathway enrichment test
    Figure 17. Transcriptional misregulation in cancer pathway with significant genes highlighted in green and red
    Figure 18. Final CITE-Seq pipeline
    image2019-5-24 14_50_50
    image2019-5-24 15_5_13
    image2022-8-30_9-58-46
    Screenshot 2023-09-25 at 10 01 54