Illumina Connected Multiomics

Illumina Connected Multiomics (ICM) is available for further tertiary analysis of single-cell and other multiomic data.

Getting Started

Default Single Cell Analysis

Below are explanations of the steps that are run in the default single-cell analysis that is automatically launched on import of single-cell data in ICM. Also included below are the instructions to launch each step manually if input parameters need to be adjusted from the default settings.

Normalize Counts

Because different cells will have a different number of total counts, it is important to normalize the data prior to downstream analysis. For droplet-based single cell isolation and library preparation methods that use a 3' counting strategy, where only the 3' end of each transcript is captured and sequenced, we recommend the following normalization - 1. CPM (counts per million), 2. Add 1, 3. Log2. This accounts for differences in total UMI counts per cell and log transforms the data, which makes the data easier to visualize.

Click the counts node you wish to normalize
Click Normalization and scaling in the context-sensitive task menu on the right
Click Normalization
Click Use recommended to add the recommended normalization scheme

This adds CPM (counts per million), Add 1, and Log2 to the Normalization order panel. Normalization steps are performed in descending order.

Click Finish to apply the normalization

Note in the default single cell analysis pipeline, ICM runs the normalization step again to scale data by subtracting the mean of the feature and dividing by the standard deviation.

Filter Features

A common task in single-cell RNA-Seq analysis is to filter the data to include only informative genes (features). Because there is no gold standard for what makes a gene informative or not and ideal gene filtering criteria depends on your experimental design and research question, ICM has a wide variety of flexible filtering options. The Filter features step can also be performed before normalization or after normalization.

Select a data node containing the count matrix
Click Filtering in the task menu
Click Filter features
Select the Filter type and Filter criteria desired

There are four categories of filter available - noise reduction, statistics-based, feature metadata, and feature list.

The noise reduction filter allows you to exclude genes considered background noise based on a variety of criteria. The statistics-based filter is useful for focusing on a certain number or percentile of genes based on a variety of metrics, such as variance. The metadata, saved list, and manual list filters allow you to filter your data set to include or exclude particular genes.

For example, you can use a noise reduction filter to exclude genes that are not expressed by any cell in the data set, but were included in the matrix file. To do so:

Click the Noise reduction filter check box
Set the Noise reduction filter to Exclude features where value <= 0 in at least 99.9% of cells using the drop-down menus and text boxes
Click Finish to apply the filter

The default single cell pipeline uses the statistics-based filter to filter for the top 10% of features with the highest variance.

PCA

Principal components (PC) analysis (PCA) is an exploratory technique that is used to describe the structure of high dimensional data by reducing its dimensionality. Because PCA is used to reduce the dimensionality of the data prior to clustering as part of a standard single cell analysis workflow, it is useful to examine the results of PCA for your data set prior to clustering.

Select a data node containing the normalized and filtered count matrix
Click Exploratory analysis in the task menu
Click PCA from the drop-down list
Select the number of features to include
Select the number of PCs to calculate

You can choose Features contribute equally to standardize the genes prior to PCA or allow more variable genes to have a larger effect on the PCA by choosing by variance. By default, we take variance into account and focus on the most variable genes.

If you have multiple samples, you can choose to run PCA for each sample individually or for all samples together by selecting or not selecting the Split by sample option.

Click Finish to run

A new PCA task node will be produced on the task graph for the analysis. When complete, double-click the PCA task node to open the 3D PCA scatter plot in data viewer.

Beside PCA coordinates of the cells, PCA task report also includes, the Scree plot, the component loadings table, and the PC projections table.

The Scree plot lists PCs on the x-axis and the amount of variance explained by each PC on the y-axis, measured in Eigenvalue. The higher the Eigenvalue, the more variance is explained by the PC. Typically, after an initial set of highly informative PCs, the amount of variance explained by analyzing additional PCs is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering, UMAP and t-SNE.

Graph-based Clustering

Graph-based clustering identifies groups of similar cells using PC values as the input. By including only the most informative PCs, noise in the data set is excluded, improving the results of clustering.

Click the PCA data node
Click Exploratory analysis in the task menu
Click Graph-based clustering

Clustering can be performed on each sample individually or on all samples together.

Select the Clustering algorithm to use. The default Single-Cell analysis uses the Louvain algorithm.
Check Compute biomarkers to compute features that are highly expressed when comparing each cluster
Select the number of PCs to use
Click Configure to access the Advanced options

The Number of principal components can be set based on the your examination of the Scree plot and component loadings table. The default value is likely exhaustive for most data sets; altering this value may introduce noise that influences the number of clusters that are distinguished.

Click Finish to run the task

A new Graph-based clusters data and Biomarkers data node will be generated along with the task nodes

Double-click the Graph-based clusters node to see the cluster results and statistics. The Graph-based clustering result lists the Total number of clusters and what proportion of cells fall into each cluster.
Double-click the Biomarkers node to see the computed biomarkers if you have selected this option. The Biomarkers node includes the top features for each graph-based cluster. It displays the top-10 genes that distinguish each cluster from the others. Download at the top left of the table can be used to view and save more features. These are calculated using an ANOVA test comparing the cells in each group to all the other cells, filtering to genes that are 1.5 fold upregulated, and sorting by ascending p-value. This ensures that the top-10 genes of each cluster are highly and disproportionately expressed in that cluster.

UMAP

Uniform Manifold Approximation and Projection (UMAP) is a dimensional reduction technique. UMAP aims to preserve the essential high-dimensional structure and present it in a low-dimensional representation. UMAP is particularly useful for visually identifying groups of similar samples or cells in large high-dimensional data sets such as single cell RNA-Seq.

Click the Graph-based clusters or PCA node
Click Exploratory analysis in the task menu
Click UMAP
Select the number of PCs to use
Click Configure to access the Advanced options
Click Finish to run

If you have multiple samples, you can choose to run UMAP for each sample individually or for all samples together using the Split cells by sample option.

Like Graph-based clustering, UMAP takes PC values as its input and further reduces the data down to two or three dimensions. For consistency, you should use the same number of PCs as the input for UMAP that you used for Graph-based clustering.

A new UMAP task node will be produced. When complete, double-click the UMAP node to open the UMAP task report. Use the panel on the left to modify the plot or add more plots to this Data viewer session.

The UMAP scatter plot is interactive and can be viewed in 2D or 3D. The UMAP plot is 3D by default. You can rotate the 3D plot by left-clicking and dragging your mouse or using Control under Configure. You can zoom in and out using your mouse wheel. You can pan by right-clicking and dragging your mouse. You can use Style to modify color, shape, size, and labeling (e.g. add a fog effect to improve depth perception on the plot). Add a 2D plot clicking New plot, selecting 2D Scatter plot and selecting UMAP as the source of the data.

Other Single-Cell Analysis Tasks

QA/QC

The Single-cell QA/QC task in ICM enables you to visualize several useful metrics that will help you include only high-quality cells. To invoke the Single-cell QA/QC task:

Click a Single cell counts data node
Click the QA/QC section of the task menu
Click Single cell QA/QC

By default, all samples are used to perform QA/QC. You can choose to split the sample and perform QA/QC separately for each sample.

You will be prompted to choose the genome assembly and annotation file by the Single cell QA/QC configuration dialog and ideally this closely matches the references used in the DRAGEN secondary analysis. Note, it is still possible to run the task without specifying an annotation file. If you choose not to specify an annotation file, the detection of mitochondrial counts will not be possible.

t-SNE

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensional reduction technique that prioritizes local relationships to build a low-dimensional representation of the high-dimensional data that places objects that are similar in high-dimensional space close together in the low-dimensional representation. This makes t-SNE well suited for analyzing high-dimensional data when the goal is to identify groups of similar objects, such as cell types in single cell RNA-Seq data.

Click the Graph-based clusters or PCA node
Click Exploratory analysis in the task menu
Click t-SNE
Select the number of PCs to use
Click Configure to access the Advanced options
Click Finish to run

The t-SNE scatter plot visualization has the same functionality and style elements as the UMAP plot described above.

Differential Analysis

A common goal in single cell analysis is to identify genes that distinguish a cell type. To do this, you can use the differential analysis tools in ICM.

Click the Normalized counts results node
Click Statistics in the toolbox
Click Differential Analysis
Select ANOVA as the Method to use for differential analysis and click Next (note that other single cell suggested models include the Hurdle model or Wilcoxon but you are not limited)
Select and add the categorical and/or numeric factors for analysis
Click Next

The differential analysis tool can be used to compare one group of cells to another group of cells to identify genes or features that distinguish cells. Common examples include determining distinguishing genes between one cell type and all others, two cell types, or the same cell type between two experimental conditions.

The comparison builder can be used to create any of these tests. The top panel is the numerator for fold-change calculations so usually the experimental or test groups are selected in the top panel. The bottom panel is the denominator for fold-change calculations so the control group is often selected in the bottom panel.

Add attributes/classifications to the numerator
Add attributes/classifications to the denominator
Select Combine for a single comparison or Pairwise for a factorial set of comparisons
Select Add comparison
Optionally select the checkbox to Apply lowest average coverage filter to exclude a feature if the geometric average of its values over all samples is less than the specified value. This can be useful if no noise reduction filter has already been applied in the pipeline.
Click Configure to access the Advanced options which includes other Multiple test correction options.
Click Finish to run

When completed, double click the newly generated data node to open the ANOVA task report. The ANOVA task report lists genes on rows and the results of the statistical test (p-value, fold change, etc.) on columns. Genes are listed in ascending order by the p-value of the first comparison so the most significant gene is listed first.

Filter for Significant Genes

Using the filter control panel on the left, we can filter to just the genes that are significantly different for the comparison using the p-value and/or multiple test correction value (FDR step-up by default). The number of genes at the top of the filter control panel updates to indicate how many genes are left after the filters are applied.

Click Generate filtered node to generate a filtered version of the table for downstream analysis. This new data node containing the filtered genes will run in the Analyses pipeline to generate a filtered Feature feature list data node which will be available in the task graph by closing the ANOVA report and navigating to the Analyses pipeline; this filtered list can now be used for downstream tasks.

Gene set enrichment

While a long list of significantly different genes is important information about a cell type, it can be difficult to identify what the biological consequences of these changes might be just by looking at the genes one at a time. Using enrichment analysis, you can identify gene sets and pathways that are over-represented in a list of significant genes, providing clues to the biological meaning of your results.

Click the Feature list data node produced by the Differential analysis filter
Click Biological interpretation in the task menu
Click Gene set enrichment
Select the Database to use. ICM distributes the gene sets from the Gene Ontology Consortium, but Gene set enrichment can work with any custom or public gene set database.
Choose the latest assembly available from the Gene set drop-down
Click Finish

When completed, double-click the Gene set enrichment task node to open the task report.

The Gene set enrichment task report lists gene sets on rows with an enrichment score and p-value for each. It also lists how many genes in the gene set were in the input gene list and how many were not. Clicking the Gene set ID links to the geneontology.org or KEGG page for the gene set.

Hierarchical clustering / heatmap

Since we have filtered to a list of significantly different genes, we can visualize these genes by generating a heatmap or bubble map.

Click the Filtered feature list data node produced by the Differential analysis filter
Click Exploratory analysis in the toolbox
Click Hierarchical clustering / heatmap

This task is used to generate the heatmap or bubble map; choose Heatmap as the plot type. You can choose to Cluster features (genes) and cells (samples) under Feature order and Cell order in the Ordering section which will perform hierarchical clustering producing a dendrogram which is useful for determining relationships. For single cell data sets, you may choose to forgo clustering the cells in favor of ordering them by the attribute of interest (e.g. drag and drop to order the attribute in a way that makes sense). Both ordering methods help to make the heatmap more comprehensible. Please click here for more information on this task.

Select Feature order
Select Cell order
Optionally add any additional Filtering
Click Configure to access the Advanced options
Click Finish to run

Cell Typing with ScType

ScType allows automated cell-type identification based on scRNA-seq data along with a comprehensive cell marker database as background information.

Click the data node containing the non-normalized count matrix
Click on Classification > Single cell type in the toolbox
Select the marker database from the drop-down menu, the original full ScType database is provided by default
Select categorical attributes to Categorize by (e.g. graph-based clusters)
Optionally Filter tissue types
Select the SC Type algorithm to use
Click Configure to access and change any Advanced options
Click Finish to run

A new scType classification task node will be produced. When complete, double-click the Single cell type node to open the results of the cell-type identification. For each cell, the tissue, sctype result, and typescore are reported.

PreviousSecondary Analysis Results

Last updated 2 months ago

Was this helpful?