Pseudobulk

Pseudobulk task combines expression values from all cells of a particular cell type classification for each sample. In essence, it creates virtual bulk level data from single cell level data. Because it is virtual bulk data, all the same tasks that can be performed on bulk level gene counts data can be performed on the output of Pseudobulk task.

Pseudobulk makes it easy to compare gene or protein expression for a cell type of interest between experimental groups.

Before running Pseudobulk , you must classify the cells. To run Pseudobulk , select the data node with your classified cells and invoke this task.

Select how you would like to group the cells, by default sample name is selected, this option allows you to pool all the cells in a sample to generate sample level expression. You can add other attribute from the drop-down list, e.g. cell type, then it will pool cells in each cell type within one sample as sample level expression on all the features.

Agrreation method option are Sum, Maximum, Mean, and Median. Expression values for cells from the same sample with the same cell type classification will be merged using the chosen pooling method. If the input data node contains raw count, Sum is recommended; if the input data node contains normalized count, Mean or Median will make more sense

After clicking Finish, a Pseudobulk data node will be generated which contains bulk level expression data.

Last updated

Was this helpful?