ScType

ScType is "a fully-automated and ultra-fast cell-type identification based solely on a given scRNA-seq data, along with a comprehensive cell marker database as background information."[1]. It allows accurate cell typing in scRNAseq and spatial transcriptome data[2]. Our implementation builds on the original tool, but allows for improved performance on larger datasets. Currently there is a generic and an Illumina Spatial Transcriptome Prep-optimised version of the task, both versions will give consistent results given identical inputs.

ScType cell classification

This version of the task can be used on any single cell and spatial data. The task can be called from any non-normalised counts node, in the Classification menu on the right.

  • Click on the Counts node.

  • Click on Classification> ScType cell classification in the toolbox.

  • Select the marker database from the SC Type database drop-down.

  • Select the marker database from the drop-down, the original ScType database is provided by default.

  • Use the checkboxes to the select the appropriate tissues. The options are automatically identified from the marker database provided.

  • The task uses an existing attribute for the classification, select the node contains the attribute to use for cell clustering. Choose the specific attribute from the Choose attribute drop-down list.

  • Click on Configure in Advanced options to modify the advanced parameters

  • Select the significance threshold in the Advanced options. This number indicates the fraction of cells, higher numbers indicate a less strict significance threshold (see paper for method details).

  • The task allows for selection of positive and/or negative biomarkers.

  • Click Finish to run the task.

The task outputs an annotation for each cell in the dataset. If a cell does not pass the significance threshold it will be classified as N/A. The annotation is saved as 'sctype', and is available from the node. You can use Annotation/Metadata>Publish cell attributes to project in the toolbox to make the attribute available at the Analysis level.

Scanpy ScType (Illumina Spatial Transcriptome Prep)

This version of the task has been optimised for use with Illumina Spatial Transcriptome Prep data. The task can be called from any non-normalised counts node, in the Classification menu on the right.

  • Click on the Counts node.

  • Click on Classification> Scanpy ScType cell classification in the toolbox.

  • Select the marker database from the SC Type database drop-down.

  • Use the Tissue dropdown to the select the appropriate tissue. The options are automatically identified from the marker database provided.

  • The task can use an existing attribute for the classification, select the node contains the attribute to use for cell clustering. Choose the specific attribute from the Choose attribute drop-down list.

  • You can choose to recompute the clustering attribute within the task by selecting Recompute.

  • Select the appropriate resolution parameter for the Leiden clustering.

  • Click Finish to run the task.

The task outputs an annotation for each cell in the dataset. If a cell does not pass the significance threshold it will be classified as N/A. The annotation is saved as 'sctype_classification', and is available from the node. You can use Annotation/Metadata>Publish cell attributes to project in the toolbox to make the attribute available at the Analysis level.

ScType marker database

Our implementation of ScType allows the user to provide their own marker databases for classification. Please ensure that the gene names used in the marker database match the names in the annotation used for the dataset. In principle, this can be used to classify cells from any species.

Database formatting

The task expects a marker database file in either .txt or .tsv format. If you are using the original ScType marker database as a template, please save the resulting file in either of these formats.

The file should have 5 columns: tissueType, cellName, geneSymbolmore1, geneSymbolmore2, shortName.

  • tissueType: Contains tissues being typed. This column will be parsed te determine the Tissue options in the task.

  • cellName: Contains the name of each cell type in each tissue. If a cell type is found in more than one tissue, it will be found more than once in this column.

  • geneSymbolmore1: Contains the positive markers for the cell type in that tissue, comma separated.

  • geneSymbolmore2: Contains the negative markers for the cell type in that tissue, comma separated.

  • shortName: Contains the cell type short name.

Adding a database

To add a ScType marker database, first upload it to your ICA project as described here. After creating your Study and linking it to the correct ICA project:

  • Click on the Add Data button.

  • Then Select from ICA project.

  • Select Library file > ScType from the drop down menu.

  • Navigate to the file location and select the checkbox to the left of the file name.

  • Click Add selected data at the bottom right of the page.

The newly added gene marker database will now be available form the dropdown menu in the task setup page.

scType outputs

Opening the task report displays the tissue and cell type per location. The assignment of a cell type (sctype) to each cell (ID) in the report will be the highest typescore.

  • Each row in the ID column represents a cell.

  • The rows in the tissue column represent biological systems that were used for cell typing.

  • The rows in the sctype column represent the assigned cell type for the cell.

  • The typescore represents a confidence score or how well the gene expression profile matches the marker genes for the cell type. A higher score indicates a stronger match.

Overall, the task works by assigning cell type labels (based on marker genes) to clusters. The max value of the typescore for cells within a cluster will define the assigned cell type. Scores for all cell type markers will be computed for each cluster, with the highest scoring cell type assigned to the cluster. Due to this, multiple clusters may be assigned the same cell type (eg. if the data was overclustered).

It is possible to get a cluster with an unknown cell type (e.g. using a custom sctype database where the gene names are not canonical such that the gene names do not match the dataset). Or if the cluster does not meet the significance threshold.

References

[1] Ianevski, A., Giri, A. K., & Aittokallio, T. (2022). Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data. Nature communications, 13(1), 1246.

[2] Nader, K., Tasci, M., Ianevski, A., Erickson, A., Verschuren, E. W., Aittokallio, T., & Miihkinen, M. (2024). ScType enables fast and accurate cell type identification from spatial transcriptomics data. Bioinformatics, 40(7), btae426.

Last updated

Was this helpful?