Single-cell QA/QC

The Single-cell QA/QC task in Connected Multiomics enables you to visualize several useful metrics that will help you include only high-quality cells. To invoke the Single-cell QA/QC task:

  • Click a Single cell counts data node

  • Click the QA/QC section of the task menu

  • Click Single cell QA/QC

By default, all samples are used to perform QA/QC. You can choose Split by sample in Grouping option to perform QA/QC separately for each sample.

You will be prompted to choose the genome assembly and annotation file by the Single cell QA/QC configuration dialog.

Note, it is still possible to run the task without specifying an annotation file. If you choose not to specify an annotation file, the detection of mitochondrial counts will not be possible. The annotation file should match the same annotation file used in the upstream analysis.

The Single cell QA/QC task report opens in a new data viewer session. Four dot and violin plots showing the value of every cell on the canvas: counts per cell, detected features per cell, the percentage of mitochondrial counts per cell (when annotation file contains the genes on MT chromosome), and the percentage of ribosomal counts per cell (human and mouse only).

If your cells do not express any mitochondrial genes or an appropriate annotation file was not specified, the plot for the percentage of mitochondrial counts per cell will be non-informative.

Mitochondrial genes are defined as genes located on a mitochondrial chromosome in the gene annotation file. The mitochondrial chromosome is identified in the gene annotation file by having "M" or "MT" in its chromosome name. If the gene annotation file does not follow this naming convention for the mitochondrial chromosome, Connected Multiomics will not be able to identify any mitochondrial genes.

Ribosomal genes are defined as genes that code for proteins in the large and small ribosomal subunits. Ribosomal genes are identified by searching their gene symbol against a list of 89 L & S ribosomal genes taken from HGNC. The search is case-insensitive and includes all known gene name aliases from HGNC. Identifying ribosomal genes is performed independent of the gene annotation file specified.

Total counts are calculated as the sum of the counts for all features in each cell from the input data node. The number of detected features is calculated as the number of features in each cell with greater than zero counts. The percentage of mitochondrial counts is calculated as the sum of counts for known mitochondrial genes divided by the sum of counts for all features and multiplied by 100. The percentage of ribosomal counts are calculated as the sum of counts for known ribosomal genes divided by the sum of counts for all features and multiplied by 100.

Each point on the plots is a cell. All cells from all samples are shown on the plots. The overlaid violins illustrate the distribution of cell values for the y-axis metric.

The appearance of a plot can be configured by selecting a plot and adjusting the Configure settings in the panel on the left. Here are some suggestions, but feel free to explore the other options available:

  • Open Axes and change the Y-axis scale to Logarithmic. This can be helpful to view the range of values better, although it is usually better to keep the Ribosomal counts plot in linear scale.

  • Within Style switch on Summary Box & Whiskers. Inspecting the median, Q1, Q3, upper 90%, and lower 10% quantiles of the distributions can be helpful in deciding appropriate thresholds.

High-quality cells can be selected using Select & Filter, which is pre-loaded with the selection criteria, one for each quality metric.

Hovering the mouse over one of the selection criteria reveals a histogram showing you the frequency distribution of the respective quality metric. The minimum and maximum thresholds can be adjusted by clicking and dragging the sliders or by typing directly into the text boxes for each selection criteria.

Alternatively, Pin histogram to view all of the distributions at one time to determine thresholds with ease.

Adjusting the selection criteria will select and deselect cells in all three plots simultaneously. Depending on your settings, the deselected points will either be dimmed or gray. The filters are additive. Combining multiple filters will include the intersection of the three filters. The number of cells selected is shown in the figure legend of each plot.

To filter the high-quality cells, click the include selected cells icon in Filter in the top right of Select & Filter, and click Apply observation filter...

Select the input data node for the filtering task and click Select.

A new data node, Filtered counts, will be generated under the Analyses tab.

Double click the Filtered counts data node to view the task report. The report includes a summary of the count distribution across all features for each sample; a detailed breakdown of the number of cells included in the filter for each sample; and the minimum and maximum values for each quality metric (expressed genes, total counts, etc) across the included cells for each sample.

Last updated

Was this helpful?