# ScType

ScType is "a fully-automated and ultra-fast cell-type identification based solely on a given scRNA-seq data, along with a comprehensive cell marker database as background information."\[1]. It allows accurate cell typing in scRNAseq and spatial transcriptome data\[2]. Our implementation builds on the original tool, but allows for improved performance on larger datasets.

### ScType cell classification

The task can be called from any non-normalised counts node, in the *Classification* menu on the right.

{% hint style="warning" %}
A known bug currently prevents running ScType on multiple samples in the same node if the cell IDs are not unique. Please use the '**Split by attribute**' function before running the task on each node individually.
{% endhint %}

* Click on the Counts node.
* Click on *Classification> Scanpy ScType cell classification* in the toolbox.
* Select the marker database from the **SC Type database** drop-down.
* Use the **Tissue** dropdown to the select the appropriate tissue. The options are automatically identified from the marker database provided.
* You can choose **Classify all samples together** if you want the task to consider all your observations as one sample, for example if the clustering attribute was computed on all samples at once.
* The task can use an existing attribute for the classification, select the node contains the attribute to use for cell clustering. Choose the specific attribute from the *Choose attribute* drop-down list.

<figure><img src="https://580316046-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWMxqQAMFOJtu98OBk9KN%2Fuploads%2Fgit-blob-ff2a3d26901eb16b0a8cad1428c03df8c6f35244%2Fsctype_new_2.png?alt=media" alt=""><figcaption></figcaption></figure>

* You can choose to recompute the clustering attribute within the task by selecting **Recompute.**
* Select the appropriate resolution parameter for the Leiden clustering.

<div align="left"><figure><img src="https://580316046-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWMxqQAMFOJtu98OBk9KN%2Fuploads%2Fgit-blob-34074c76bbe13e28b841b439d72b69a756a35662%2FScreenshot%202025-07-30%20at%2012.15.38.png?alt=media" alt=""><figcaption></figcaption></figure></div>

* Click **Finish** to run the task.

The task outputs an annotation for each cell in the dataset. If a cell does not pass the significance threshold it will be classified as N/A. The annotation is saved as 'sctype\_classification', and is available from the node. You can use *Annotation/Metadata>Publish cell attributes to project* in the toolbox to make the attribute available at the Analysis level.

### ScType marker database

Our implementation of ScType allows the user to provide their own marker databases for classification. Please ensure that the gene names used in the marker database match the names in the annotation used for the dataset. In principle, this can be used to classify cells from any species. The marker database present by default supports human and mouse only.

#### Database formatting

The task expects a marker database file in either .txt or .tsv format. If you are using the original ScType marker database as a template, please save the resulting file in either of these formats.

The file should have 5 columns: tissueType, cellName, geneSymbolmore1, geneSymbolmore2, shortName.

* tissueType: Contains tissues being typed. This column will be parsed te determine the Tissue options in the task.
* cellName: Contains the name of each cell type in each tissue. If a cell type is found in more than one tissue, it will be found more than once in this column.
* geneSymbolmore1: Contains the positive markers for the cell type in that tissue, comma separated.
* geneSymbolmore2: Contains the negative markers for the cell type in that tissue, comma separated.
* shortName: Contains the cell type short name.

<figure><img src="https://580316046-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWMxqQAMFOJtu98OBk9KN%2Fuploads%2Fgit-blob-deaff0ad2ef122c0db7f43aff0fecb5d14eb1447%2FScreenshot%202025-07-30%20at%2014.31.07.png?alt=media" alt=""><figcaption></figcaption></figure>

#### Adding a database

To add a ScType marker database, first upload it to your ICA project as described [here](https://help.ica.illumina.com/project/p-data#ui-upload). After creating your Study and linking it to the correct ICA project:

* Click on the **Add Data** button.
* Then **Select from ICA project**.
* Select **Library file > ScType** from the drop down menu.

<figure><img src="https://580316046-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWMxqQAMFOJtu98OBk9KN%2Fuploads%2Fgit-blob-7e05682cbaf9fba45a6503cf8f4b0d598536b5d6%2FScreenshot%202025-07-30%20at%2014.48.10.png?alt=media" alt=""><figcaption></figcaption></figure>

* Navigate to the file location and select the checkbox to the left of the file name.
* Click **Add selected data** at the bottom right of the page.

The newly added gene marker database will now be available form the dropdown menu in the task setup page.

### scType outputs

Overall, the task works by assigning cell type labels (based on marker genes) to clusters. The max value of the typescore for cells within a cluster will define the assigned cell type. Scores for all cell type markers will be computed for each cluster, with the highest scoring cell type assigned to the cluster. Due to this, multiple clusters may be assigned the same cell type (eg. if the data was overclustered).

It is possible to get a cluster with an unknown cell type (e.g. using a custom sctype database where the gene names are not canonical such that the gene names do not match the dataset). Or if the cluster does not meet the significance threshold.

The scType algorithm task results are consistent with the python implementation of scType given the same input and defaults described. The scType webapp has different default methods and parameters, so comparisons may not be consistent.

{% hint style="warning" %}
[Publish the cell attributes ](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/annotation-metadata/publish-cell-attributes-to-project)to the project so that you can use the classification throughout the analysis. You can also download this table.
{% endhint %}

## References

\[1] Ianevski, A., Giri, A. K., & Aittokallio, T. (2022). Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data. Nature communications, 13(1), 1246.

\[2] Nader, K., Tasci, M., Ianevski, A., Erickson, A., Verschuren, E. W., Aittokallio, T., & Miihkinen, M. (2024). ScType enables fast and accurate cell type identification from spatial transcriptomics data. Bioinformatics, 40(7), btae426.
