Cells undergo changes to transition from one state to another as part of development, disease, and throughout life. Because these changes can be gradual, trajectory analysis attempts to describe progress through a biological process as a position along a path. Because biological processes are often complex, trajectory analysis builds branching trajectories where different paths can be chosen at different points along the trajectory. The progress of a cell along a trajectory from the starting point or root, can be quantified as a numeric value, pseudotime.
Partek Flow offers Monocle 2 and Monocle 3 methods.
Major updates in Monocle 3 (compared to Monocle 2) include:
Monocle 3 learns the principal trajectory graph in the UMAP space;
the principal graph is smoothened and small branches are excluded;
support for principal graphs with loops and convergence points;
support for multiple root nodes.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
In Partek Flow, we use tools from Monocle 2 [1] to build trajectories, identify states and branch points, and calculate pseudotime values. The output of Trajectory analysis includes an interactive scatter plot visualization for viewing the trajectory and setting the root state (starting point of the trajectory) and adds a categorical cell level attribute, State. From the Trajectory analysis task report, you can run a second task, Calculate pseudotime, which adds a numeric cell-level attribute, Pseudotime, calculated using the chosen root state. Using the state and pseudotime attributes, you can perform downstream analysis to identify genes that change over pseudotime and characterize branch points.
Note that trajectory analysis will only work on data with <600,000,000 observations (number of cells × number of features). If your data set exceeds this limit, the Trajectory analysis task will not appear in the toolbox. Prior to performing trajectory analysis, you should:
1) Normalize the data
Trajectory analysis requires normalized counts as the input data. We recommend our default "CPM, Add 1, Log 2" normalization for most scRNA-Seq data. For alternative normalization methods, see our Normalization documentation.
2) Filter to cells that belong in the same trajectory
Trajectory analysis will build a single branching trajectory for all input cells. Consequently, only cells that share the biological process being studied should be included. For example, a trajectory describing progression through T cell activation should not include monocytes that do not undergo T cell activation. To learn more about filtering, please see our Filter groups (samples or cells) documentation.
3) Filter to genes that characterize the trajectory
The trajectory should be built using a set of genes that increase or decrease as a function of progression through the biological processes being modeled. One example is using differentially expressed genes between cells collected at the beginning of the process to cells collected at the end of the process. If you have no prior knowledge about the process being studied, you can try identifying genes that are differentially expressed between clusters of cells or genes that are highly variable within the data set. Generally, you should try to filter to 1,000 to 3,000 informative genes prior to performing trajectory analysis. The list manager functionality in Partek Flow is useful for creating a list of genes to use in the filter. To learn more, please see our documentation on Lists.
Dimensionality of the reduced space
While the trajectory is always visualized in a 2D scatter plot, the underlying structure of the trajectory may be more complex and better represented by more than two dimensions.
Scaling
You can choose to scale the genes prior to building the trajectory. Scaling removes any differences in variability between genes, while not scaling allows more variable genes to have a greater weight in building the trajectory.
Click on the task report, a 2D scatterplot will be opened in Data viewer (Figure 1).
The trajectory is shown with a black line showing the trajectory. Branch points are indicated by numbers in black circles. By default, cells are colored by state. You can use the control panel on the left to color, size, and shape by genes and attributes to help identify which state is the root of the trajectory.
To calculate pseudotime, you must choose a root state. The tip of the root state branch will have a value of 0 for pseudotime. Click any cell belonging to that state to select the state. The selected state will be highlighted while unselected cells are dimmed (Figure 2). Choose Calculate pseudotime in the Additional actions on the left control panel.
The Calculate pseudotime task will be performed, it generates a new Pseudotime result data node, which contains Pseudotime annotation for each cell (Figure 3).
Open the Pseudotime result report, a 2D scatterplot will be displayed in data viewer, colored by Pseudotime by default (Figure 4)
[1] Xiaojie Qiu, Qi Mao, Ying Tang, Li Wang, Raghav Chawla, Hannah Pliner, and Cole Trapnell. Reversed graph embedding resolves complex single-cell developmental trajectories. Nature methods, 2017.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
In Partek Flow, we use tools from Monocle 3 (1) to build trajectories, identify states and branch points, and calculate pseudotime values. The output of Trajectory analysis task includes an interactive 2D/3D visualization for viewing the trajectory trees and setting the root states (starting points of the trajectories). From the Trajectory analysis report, you can run a second task, Calculate pseudotime, which adds a numeric cell-level attribute, Pseudotime, calculated using the chosen root states.
Trajectory analysis by Monocle 3 requires data normalization and preprocessing. Regarding the normalization, we suggest to first use the Normalization and scaling section of the toolbox to normalize by counts per million (CPM), add offset of 1, and log2 transform. After that, launch the Trajectory analysis on the Normalized counts node; this input node cannot have zero values.
According to the Monocle 3 authors, you may want to filter in the top 5,000 genes with the highest variance (2,000 genes for datasets with fewer than 5,000 cells, and 300 genes for datasets with fewer than 1,000 cells) (1). Those numbers should be used as a guidance for the first-pass analysis and may need to be optimized, depending on the project at hand and the biological question.
To run Trajectory analysis tool, select the Normalized counts data node (or equivalent) and go to the toolbox: Exploratory analysis > Trajectory analysis
The configuration dialog presents four options (Figure 1).
Dimensionality of reduced space. This option specifies the number of UMAP dimensions that the original data are reduced to, in order to learn the trajectory tree (dimensionality of original data equals the number of genes). Default is two, meaning that the trajectory plot will be draw in two dimensions. To get a 3D trajectory plot, increase this option to 3.
Scaling. Normalized expression values can be further transformed by scaling to unit variance and zero mean (i.e. converting to Z score). The use of this option is recommended (1).
Data is logged. Select this option if the data have already been log-transformed upstream. When selected, Monocle 3 will skip the log2 step on the input data (see below).
Programmatically calculate default root nodes. If not selected, user has to specify the root nodes of the trajectory tree manually (default). Depending on the available meta-data, Monocle 3 may be able to pick the root nodes programmatically (see below for details)\
Under the hood, Monocle 3 will perform log2 transformation of the gene count matrix (if Data is logged was unselected), scale the matrix (if Scaling was selected), and project the gene count matrix into the top 50 principal components. Next, the dimensionality reduction will be implemented by UMAP (using default settings of the reduce dimension command).
Result of running Trajectory analysis in Partek Flow is the Trajectory result data node. Double clicking on the node opens a Data Viewer window with the trajectory plot (Figure 2). Cell trajectory graph shows position of each cell (blue dot) with respect to the UMAP coordinates (axes). Cell trajectories (one or more, depending on the data set) are depicted as black lines. Gray circles are trajectory nodes (i.e. cell communities).
To show / hide cell trajectory tree and trajectory nodes, select Axes in Configure section and on the upper-right corner of the dialog, select the Extra data drop-down options (Figure 2).
To perform pseudotime analysis, you need to point to the cells at the beginning of the biological process you are interested in. For example, cells at the earliest stage of differentiation sequence. There are two ways to perform pseudotime analysis in Partek Flow, depending on the way the root nodes (=cells at the beginning of pseudotime) are defined.
Manual selection of root node. The user has to specify the root nodes (one or more).
Automatic selection of the root node. The root node is picked by the algorithm.
If you want to manually pick the root nodes, leave the option Programmatically calculate default root nodes unselected when setting up the Trajectory analysis.
To start, select the root cell nodes (gray circles in trajectory tree) by left-clicking. If the trajectory result consists of more than one trajectory tree, you can specify more than one root node, e.g. one root node per trajectory tree (ctrl & click). If no root node is specified for a tree, that tree will not be included in the pseudotime calculation. Figure 4 shows an example where seven root nodes were identified.
Once you have identified all the root nodes, click on Additional button in Tools section on the left panel, push the Calculate pseudotime button in the dialog (Figure 5).
As a result, the cells will be annotated by pseudotime, using green to red gradient (start and end, respectively) (Figure 6). If, for a particular tree, no root node has been identified, those cells will be omitted from the pseudotime calculation and will be colored in black (Figure 8).
Pseudotime calculation display the structure of the graph using black lines. The circles with numbers (cell nodes) on the black lines represent special points. There are three types of cell nodes:
Root node (white). Root nodes are start points of the pseudotime and were defined by the user in the previous step (e.g. node 4 in Figure 7).
Branch node (black). Branch nodes indicate where the trajectory tree forks out; i.e. each branch represents a different cell fate or different trajectory (e.g. nodes 3-6, and 8 in Figure 7).
Leaf (light gray). Leaves correspond to different cell fates / different trajectory outcomes (e.g. nodes 5, 9, and 12 in Figure 7). The leaves correspond to cell states of Monocle 2.
The numbers within the circles are provided for reference purposes only. The intermediate nodes from the previous step have been removed.
If suitable meta-data are available, it is possible to automatically select the root node. For example, you may know which cells were harvested from the earliest time points. The cells need to be annotated by that information (Annotate Cells task) before running Trajectory analysis. The annotation will, in turn, be available in the Trajectory analysis setup dialog, upon selecting the Programmatically calculate default root nodes option.
Attribute for root nodes. The drop down list will show the available cell-level attributes. Specify the one which should be used to identify the root nodes.
Attribute value for root nodes. The drop down list will show the content of the attribute selected under Attribute for root nodes. Specify the entry that corresponds to the earliest time point
Once the options have been set, Monocle 3 will first group the cells according to which trajectory node they are nearest to. It then calculates the fraction of the cells from the earliest time point at each trajectory node. Finally, it picks the node with the highest prevalence of the early cells and treats it as the root node.
Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, Zhang F, Mundlos S, Christiansen L, Steemers FJ, Trapnell C, Shendure J. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019 Feb;566(7745):496-502. doi: 10.1038/s41586-019-0969-x. Epub 2019 Feb 20. PMID: 30787437; PMCID: PMC6434952.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.