Cell barcode QA/QC

The Cell barcode QA/QC task lets you determine whether a given cell barcode is associated with a cell. This is an important QC step in all droplet-based single cell RNA-seq experiments, where all barcodes are sequenced.

To invoke Cell barcode QA/QC:

  • Click a Single cell counts data node

  • Click the QA/QC section of the task menu

  • Click Cell barcode QA/QC

The task can be performed with or without the EmptyDrops method enabled.

Cell Barcode QA/QC without EmptyDrops

To perform the task without the EmptyDrops method enabled, leave the checkbox unchecked and click Finish.

Note: Data imported from DRAGEN result is recommended to use this option since barcode with 0 counts are filtered out.

The Cell barcode QA/QC task report is a plot. X-axis is the barcodes ranked by their UMI counts. Y-axis is the UMI counts in the barcode. This type of plot is often referred to as a knee plot.

The knee plot is used to choose a cutoff point between barcodes that correspond to cells and barcodes that do not if the imported raw count data without any barcode filtering performed upstream. Connected Multiomics automatically calculates an inflection point, shown by the vertical line on the graph. Barcodes designated as cells are shown in blue while barcodes designated as without cells (background) are shown in grey.

The cutoff can be adjusted by dragging the vertical line across the graph or by using the text fields in the Filter panel on the left-hand side of the plot. Using the Filter panel, you can specify the number of cells or the percentage of reads in cells and the cutoff point will be adjusted to match your criteria. The number of cells and the percentage of counts in cells is adjusted as the cutoff point is changed. To return to the automatically calculated cutoff, click Reset sample filter.

The percentage of counts in cells and median counts per cell are useful technical quality metrics that can be consulted when optimizing sample handling, cell isolation techniques, and library preparation.

One knee plot is generated for each sample. In projects with multiple samples, Next and Back buttons will appear at the top left of the plot, to enable navigation between sample knee plots. Manual filters must be set separately for each sample. This is typically used when the user expects a certain number of cells to be processed, like in experiments where droplets were loaded with a predefined number of cells.

To return to the knee plot view, click Back to filter. To apply the filter and run the Filter barcodes task, click Apply filter. A Filtered counts data node will be generated.

Cell Barcode QA/QC with EmptyDrops

Note: If your data has already been filtered to remove barcodes with low total counts, this method will not be suitable. This method requires empty barcodes to be present in the single cell count matrix, in order to estimate the ambient RNA profile.

The EmptyDrops method (1) uses a statistical test to identify which barcodes correspond to real cells and empty droplets. An ambient RNA expression profile is estimated from barcodes below a specified total UMI count threshold, using the Good-Turing algorithm. The expression profile of each barcode above the low-count threshold is then tested for deviations from the ambient profile. Real cells are expected to have a low p-value, indicating a significant deviation from the expected background noise level. False discovery rate (FDR) correction is applied to all the p-values and those falling equal to or below the specified FDR level are detected as real cells. This can allow for the detection of additional cells that would otherwise be discarded due to a low total UMI count.

In addition, a knee point threshold will be calculated to identify cells with a very high total UMI count. It's possible that some barcodes with a high total UMI count will not pass the EmptyDrops significance test. This could be due to biases in the ambient RNA profile, leading to a non-significant difference between a barcode's expression profile vs the ambient profile. To protect against this issue, it is advisable to use the EmptyDrops results in conjunction with the knee point filter, on the assumption that barcodes with a very high total UMI count will always correspond to real cells. Note, the knee point will be more conservative than the inflection point calculated by Connected Multiomics when the EmptyDrops method is not enabled.

To perform the task with the EmptyDrops method, check the checkbox, configure the additional options, and click Finish.

Ambient count threshold

Barcodes with a total UMI count equal to or below this threshold will be used to create the ambient RNA expression profile to estimate background noise. The default is set to 100, which is reasonable for most data.

FDR threshold

Barcodes equal to or below this FDR threshold show a significant deviation from the ambient profile and can therefore be considered real cells. Increasing this value will result in more cells, but will also increase the number of potential false positives.

Random generator seed

This is used for performing Monte Carlo simulations to determine p-values. To reproduce results, use the same random seed for all runs.

There are additional metrics on the left of the plot in the report.

The number of actual cells detected by the EmptyDrops test and the knee point filter are shown above the Venn diagram on the left. In the above example plot 3,189 barcodes are above the knee point filter (represented by the vertical blue line on the plot) and 2,657 barcodes passed the significance test in EmptyDrops. The overlap between these sets of barcodes is represented by the Venn diagram.There are 1,583 barcodes pass the significance test in EmptyDrops and have a high total UMI count above the knee point filter; 1,606 barcodes have a very high total UMI count with no significant difference from the ambient profile in EmptyDrops; 1,074 barcodes fall below the knee point but are still significantly different from the ambient profile.

The number of cells included by the knee point filter can be adjusted either by click on the plot to change the position of the vertical blue line or by typing a different number of cells into the text box on the left.

The total number of cells is shown in the text box on the left. By default, this will be all of the cells detected by the knee point filter plus the extra cells detected by EmptyDrops. In the example, there are 3,189 cells with a high total UMI count plus the additional 1,074 cells from EmptyDrops (total = 4,263).

Different sections of the Venn diagram can be selected/deselected to include/exclude barcodes. For example, clicking the '1,606' section of the Venn diagram will deselect those barcodes. Now, the only cells that will pass the filter will be the significant ones from EmptyDrops.

References

  1. Lun, A., Riesenfeld, S., Andrews, T. et al. EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol. 2019; 20: 63.

Last updated

Was this helpful?