Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
NovaSeq 6000/6000Dx, NextSeq 2000, or NovaSeq X Series
Illumina Single Cell 3' RNA Prep Kit
T2: 1 library per sample, 1 FASTQ pair
T10: 1 library per sample, 1 FASTQ pair
T20: 1 library per sample, 1 FASTQ pair
T100: 4 libraries per sample, 1 FASTQ pair
T1000: 8 libraries per sample, 8 FASTQ pairs
For more information about the library preparation kit, refer to the .
A cloud account with a valid subscription. For information on registering your BaseSpace Sequence Hub or Illumina Connected Analytics account, refer to .
The sample sheet includes a list of samples and their index sequences, along with additional information required to run DRAGEN Single Cell RNA software. Appropriate index adapter sequences are determined by the assay used to perform analysis.
When running analysis on ICA, a valid sample sheet can be created by:
Use the steps below to create a DRAGEN Single Cell RNA run with the BaseSpace Run Planning tool. To get to the Run Planning tool, open BaseSpace Sequence Hub and navigate to the Runs page by using the navigation bar or by opening the menu on the left-hand side. From the New Run dropdown menu select Run Planning.
Once all details are captured and pass validation, review the run information and choose the Edit option to correct any information.
For NovaSeq 6000/6000Dx and NextSeq 1000/2000, Export the sample sheet to be uploaded to the instrument.
For NovaSeq X Series, the run can be saved as a draft or as a planned run (via Save as Draft and Save as Planned buttons respectively). Either selection will save the run to the Planned Runs screen on BaseSpace. Once a run is saved as Planned, it will appear on the NovaSeq X Series instrument where it can be selected for sequencing.
A sample sheet is required for each analysis with DRAGEN Single Cell RNA software. A sample sheet is a comma-separated value (*.csv) file format used by Illumina instruments, platforms, and analysis pipelines to store settings and data for sequencing and analysis. The DRAGEN Single Cell RNA software is compatible with the v2 sample sheet. For general information on the v2 sample sheet, refer to .
BaseSpace Run Planner (preferred), see for details
Downloading and modifying a sample sheet template following the requirements, see for details
The BaseSpace Sequence Hub Run Planning tool is used to generate a valid sample sheet in v2 format for use on a supported sequencer. Filling out the form on the user interface will produce a sample sheet with the required fields filled in that can be used to auto-launch a DRAGEN Single Cell RNA analysis. Refer to for more information about the Run Planning workflow and auto-launch.
For more information about the auto-launch, refer to . For additional information on run planning, refer to .
Run Name
Required
Run Name can contain 255 alphanumeric characters, dashes, underscores, periods, and spaces; and must start with an alphanumeric, a dash or an underscore.
Run Description
Optional
Run Description can contain 255 characters except square brackets, asterisks, and commas.
Instrument Platform
Required
Choose from DRAGEN Single Cell RNA software supported instruments:
NovaSeq X Series
NovaSeq 6000/6000Dx
NextSeq 1000/2000
Secondary Analysis
Required
Select BaseSpace / Illumina Connected Analytics.
Read 1
Required only for Instrument Platform NovaSeq X Series
45 for DRAGEN Single Cell RNA analysis. May be different if running multiple applications in a single run.
Index 1
Required only for Instrument Platform NovaSeq X Series
10 for DRAGEN Single Cell RNA analysis. May be different if running multiple applications in a single run.
Index 2
Required only for Instrument Platform NovaSeq X Series
10 for DRAGEN Single Cell RNA analysis. May be different if running multiple applications in a single run.
Read 2
Required only for Instrument Platform NovaSeq X Series
72 for DRAGEN Single Cell RNA analysis. May be different if running multiple applications in a single run.
Sample Container ID
Optional
Unique identifier for the container that holds the sample.
The DRAGEN Single Cell RNA application is a secondary analysis tool that can process multiplexed single cell RNA-Seq data in binary base call (BCL) files produced by NovaSeq 6000/6000Dx, NextSeq 1000/2000, and NovaSeq X Series sequencing systems to a cell-by-gene expression matrix.
You can perform secondary analysis in the cloud via BaseSpace Sequence Hub or Illumina Connected Analytics (ICA). When performing secondary analysis in the cloud, the analysis application launches automatically in BaseSpace Sequence Hub or ICA after the sequencing workflow completes.
Application
Required
Select DRAGEN Single Cell RNA - 4.4.4
Description
Optional
Optional Text Field
Library Prep Kit
Required
Select Illumina Single Cell 3’ RNA Prep
Index Adapter Kit
Required
Select a supported index adapter kit:
Illumina Single Cell UD 8 Indexes
Illumina Single Cell UD Indexes Set A
Reference Genome
Required only for Instrument Platform NovaSeq X Series
Select the appropriate genome reference for the sample type.
Description
Optional
Optional Text Field
Library Prep Kit
Required
Auto-populated from previous step
Index Adapter Kit
Required
Auto-populated from previous step
Index Reads
Required
Defaults to 2 indexes
Read Type
Required
Defaults to Paired End
Read Lengths
Required
Defaults to 45:10:10:72. May be different if running multiple applications in a single run. The default is compatible with 150 cycle SBS kits. If using a larger kit, the Read 2 cycle information can be increased.
There are diminishing returns for increased read lengths as the insert will read through the cDNA sequence into the poly-A region with longer read lengths.
Read 1 should not be updated as it contains the cell barcode and binning index. Longer read lengths will need to be trimmed. Shorter read lengths will impact cell barcode identification
Sample Table
Required
The Lanes, Sample ID, and Index ID should be filled out based on how the sample will be prepared based on the library preparation kit used. See <link to prep docs , but need direct link> for more information. The optional Project field is used to specify the associated BaseSpace Project to output data to. If left empty, Project will default to the Project name derived from the Experiment/Run name.
Override Cycles
Required
Defaults to U45;I10;I10;Y72. May be different if running multiple applications in a single run.
Reference Genome
Required
Select the appropriate genome reference for the sample type.
RNA Annotation File
Optional
For custom references, use this field to select the corresponding GTF file to use for annotation.
For built in references, use this field to override default annotations. The following list shows the default GTFs being used for annotation.
GENCODE v19
Homo sapiens [UCSC] hg19 v5
Homo sapiens [UCSC] hg19 v5 Pangenome
Homo sapiens [NCBI] hs37d5 v5
Homo sapiens [NCBI] hs37d5 v5 Pangenome
GENCODE v44
Homo sapiens [1000 Genomes] hg38 v5
Homo sapiens [1000 Genomes] hg38 v5 Pangenome
GENCODE vM23
Mus musculus [UCSC] mm10
ENSEMBL 98
Rattus norvegicus [UCSC] rn6
Configuration Type
Optional
Defaults to Illumina Single Cell 3’ RNA
Barcode Read
Required
Defaults to Read 1
RNA Library Type
Required
Defaults to Stranded Forward
Barcode Position
Required
Defaults to 0_7+11_16+20_25+31_38
UMI/BI Position
Required
Defaults to 39_41
For data processed with the Illumina Single Cell 3’ RNA Prep Kit, each read in R1 includes a cellular barcode sequence followed by a 3-base binning index (BI) sequence. R2 includes the sequences cDNA constructs created from the captured mRNA, which contain random cut sites that serve as intrinsic molecular identifiers (IMIs) and are used for molecular counting.
When selecting human, it is recommended to use linear references for RNA analysis. See for more information.
When selecting human, it is recommended to use linear references for RNA analysis. See for more information.
For more information about FASTQ processing refer to the detailing the DRAGEN PIPseq scRNA Pipeline.
The DRAGEN Single Cell RNA analysis can be manually launched to analyze previously generated FASTQ files by using a BaseSpace App.
Use the steps below to create a manually launch the DRAGEN Single Cell RNA app in BaseSpace. To get to the app, open BaseSpace Sequence Hub and navigate to the Apps page by using the navigation bar or by opening the menu on the left-hand side. Select or search for the DRAGEN Single Cell RNA app from the list of available apps. Select Launch Application to provide details for your analysis. Detailed steps are provided below.
Analysis Name
Required
Name of the analysis
Save Results To
Required
Select the project that will store the analysis results.
Biosample(s)
Required
Browse and select the biosamples to be analyzed.
Reference
Required
Select the reference genome to use in the analysis. The app provides support for common human, mouse, and rat genomes in addition to supporting custom references built by the DRAGEN Reference Builder app.
Custom Reference Files
Optional
Ensure "Include RNA Data in Reference" is enabled
Gene Annotation File
Optional
For custom references, select the corresponding GTF file to use. For built in references, the following list shows the default GTFs being used. This can be overridden for custom annotations by using this field.
GENCODE v19
Homo sapiens [UCSC] hg19 v5
Homo sapiens [UCSC] hg19 v5 Pangenome
Homo sapiens [NCBI] hs37d5 v5
Homo sapiens [NCBI] hs37d5 v5 Pangenome
GENCODE v44
Homo sapiens [1000 Genomes] hg38 v5
Homo sapiens [1000 Genomes] hg38 v5 Pangenome
GENCODE vM23
Mus musculus [UCSC] mm10
ENSEMBL 98
Rattus norvegicus [UCSC] rn6
Map/Align Output
Required
Select whether to output the alignments in BAM or CRAM format.
Library Kit
Required
Select your Illumina Single Cell 3' RNA Prep Kit.
Barcode Position
Required
Defaults to 0_7+11_16+20_25+31_38 for Illumina Single Cell 3' RNA Prep Kits.
UMI Position
Required
Defaults to 39_41 for Illumina Single Cell 3' RNA Prep Kits.
Barcode/UMI Read
Required
Defaults to Read 1 for Illumina Single Cell 3' RNA Prep Kits.
Barcode/UMI Source
Required
Select the appropriate setting that matches how FASTQ files were generated.
FASTQ Header – the FASTQ files were generated with the OverrideCycles sample sheet setting writing the R1 sequence to the FASTQ header
Barcode/UMI Read - the Read 1 FASTQ files were created without setting OverrideCycles in the sample sheet so the Read 1 FASTQ file contains the full sequencing read.
Barcode Sequence List File
Optional
Specify a file containing valid cell barcode sequences. Maps to --single-cell-barcode-sequence-whitelist in command line arguments. Not required for Illumina Single Cell 3' RNA Prep Kits.
RNA Library Type
Required
Auto-populated for Illumina Single Cell 3' RNA Prep Kits.
Poly-A Trimming
Optional
Select the poly-A trimming method. Disabled for Illumina Single Cell 3' RNA Prep Kits.
Demultiplexing Method
Optional
Select genotype-based or genotype-free sample demultiplexing.
Sample VCF
Optional
Specify a VCF file for genotype-based demultiplexing. Maps to --single-cell-demux-sample-vcf in command line arguments.
Reference VCF
Optional
Specify a VCF file for genotype-free demultiplexing. Maps to --single-cell-demux-reference-vcf in command line arguments.
Number of Samples
Optional
Specify the number of samples for genotype-free demultiplexing. Maps to --single-cell-demux-number-samples in command line arguments.
Detect Doublets
Optional
Enable doublet detection in sample demultiplexing. Maps to --single-cell-demux-detect-doublets in command line arguments.
Cell Hashing and Feature Counting
Optional
Use the checkboxes to enable cell hashing and feature counting using feature barcode UMI.
Feature Barcode UMI Position
Optional
Feature barcode UMI position is in the format of <start index>_<end index>. ex: 11_18 specifies an 8 bp sequence from positions 11 to 18 (inclusive). The first position is 0.
Cell Hashing Reference
Optional
Specify a CSV or FASTA cell-hashing reference file that contains sample-specific oligo-tags. Maps to --single-cell-cell-hashing-reference in command line arguments.
Detect Doublets
Optional
Select the checkbox to enable doublet detection in cell-hashing sample demultiplexing. Maps to --single-cell-demux-detect-doublets in command line arguments.
Feature Barcode Reference
Optional
Specify a CSV or FASTA feature reference file that contains feature barcodes. Maps to --single-cell-feature-barcode-reference in command line arguments.
Expected Number of Cells
Optional
Specify the expected number of cells. The DRAGEN default is used if not set. Adjust only if the expected number of cells is so far from the default that DRAGEN does not call the correct cell filtering threshold automatically.
Thresholding Method
Optional
Specify the method for determining the count threshold value.
Ratio: DRAGEN estimates the count threshold as max(Te, Tm). Tm is 10% of the count seen in the cell at the 10th percentile of the expected cells. Te is 50% of the count seen in the least abundant expected cell.
Inflection: DRAGEN estimates the count threshold by analyzing inflection points in the cumulative distribution of counts.
Fixed: The count threshold is set to force the expected number of cells.
Maps to --single-cell-threshold in command line arguments.
Use the Additional Arguments section to define any custom settings. Below are some commonly used additional arguments.
--annotation-file-ignore-biotypes=none
When selecting the Illumina Single Cell 3’ RNA Library Prep Kit, the pipeline will automatically ignore pseudogenes, shortRNA, and rRNA biotypes during mapping. This behavior can be disabled by adding "--annotation-file-ignore-biotypes=none".
Accept the BaseSpace Labs disclaimer and Launch Application to begin your analysis.
The DRAGEN Single Cell RNA app only supports Biosample inputs. For more information on Biosamples refer to the .
Custom references can be generated from a FASTA file and optionally a GTF file with the DRAGEN Reference Builder app. For more information, refer to .
Within each barcode and gene combination, IMIs are grouped in one of 64 bins, based on the 3-base binning index. For each bin, all identical IMIs are collapsed into a single count, since they are likely PCR duplicates of the same fragment generated during library prep.
Any barcode and gene combination that has ten or fewer unique binning indexes is assigned the number of unique binning indexes as its final count estimate. The pipeline then totals the number of IMIs associated with each remaining barcode and gene combination, and divides that number by the IPM correction factor, which accounts for the additional copies generated from a single captured molecule during five amplification cycles. The final count is the maximum between the floor of this value and the number of unique binning indexes for this barcode and gene.
Because all IMIs from the same parent molecule share a binning index, the number of unique binning indexes observed within a specific barcode and gene is determined by the number of molecules and is not impacted by the number of IMIs that were produced by the molecules. This means that the probabilistic relationship between the number of unique bins and the true number of molecules in a barcode and gene combination is constant and is the result of random sampling from the 64 possible bin indexes when each molecule is captured. For the subset of barcode and gene combinations with between 5 and 32 unique bin indexes, dividing the total number of IMIs by the average number of molecules expected based on the number of unique bin indexes gives you the estimated average IMIs per molecule (IPM).
The estimated molecular count for a barcode and gene is the total number of IMIs divided by the IPM, rounded down. The more true molecules a barcode and gene combination has, the true average IMIs per molecule should approach the average IPM of the sample. For barcode and gene combinations with very few molecules, the number of unique bins is expected to be a better predictor of the molecular count than the number of IMIs because the variance in the true IMIs per molecule among this group is high since the number of molecules in each individual barcode and gene combination is low. For this reason, IPM correction is applied for barcode and gene combinations with more than 10 unique bin indexes, and otherwise the corrected count is equal to the number of unique bin indexes.
For more information about Transcript Counting refer to the detailing the DRAGEN PIPseq scRNA Pipeline.
For more information about Error Correction refer to the detailing the DRAGEN PIPseq scRNA Pipeline.
DRAGEN also supports processing samples from Illumina's CRISPR Single Cell kits using PIPseq technology. Setting --scrna-enable-pipseq-crispr-mode
to true activates this mode.
Activating PIPseq CRISPR mode automatically configures DRAGEN for processing feature reads containing the CRISPR guide RNA (gRNA) sequences. This includes handling offsets in the cell-barcode position for the gRNA reads, transforming the gRNA cell-barcodes to match the gene expression ones, utilizing the "hook and grab" approach for identifying the gRNA reads, and counting the gRNA reads (disregarding BIs and IMIs). Both gene expression and gRNA reads are processed in the same single cell workflow, so extra steps are added to identify the hook sequence of gRNA reads. Note: unmapped reads do not contribute to gene expression read counts but are still included in gRNA counts if they match the hook sequence.
The “hook and grab” method is a targeted approach for identifying CRISPR perturbation reads. It leverages a conserved sequence within the guide RNA structural region as a “hook” to locate the guide RNA and then “grabs” the specific guide by mapping it to a database of known sequences based on their displacement from the hook.
For more information about PIPseq CRISPR mode refer to the detailing the DRAGEN PIPseq scRNA Pipeline.
Every run of Illumina DRAGEN Single Cell software produces a DRAGEN report in HTML format which includes QC metrics for trimming, mapping, and single cell analysis as well as a barcode rank plot. Below is a description of metrics and plots on each tab of the DRAGEN report.
Input Reads
Total number of input reads to DRAGEN
Max Read Length
Maximum detected input read length
Average Input Read Length
Average input read length to DRAGEN, after any adapter trimming by BCL Convert
Masked
Total number 3’ Poly-G bases masked from mapping
Trimmed
Total number of reads trimmed by DRAGEN
Filtered
Total number of reads removed from the input by DRAGEN
Fixed-Length
Total number of fixed-length trimmed reads
Adapter
Total number of adapter trimmed reads
Total input reads
Total number of input reads
QC-failed reads
Total number of reads failing one or more quality checks
% QC-failed
Percentage of reads failing one or more quality checks
Unique reads
Total number of unique reads
% Unique
Percentage of reads that are unique
Mapped reads
Total number of mapped reads, adjusted for filtered and excluded targets
% Mapped
Percentage of reads that are mapped, adjusted for filtered and excluded targets
Total Bases
Total number of input bases
Mapped R1
Total number of mapped bases on R1
% Mapped R1
Percentage of R1 bases mapped
% Q30 R1
Percentage of R1 bases with phred quality score >=30
The deduplication bar chart reflects the ratio of unique and deduplicated reads.
The read MAPQs bar chart reflects the percent of reads in various categories of phred quality score (Q0-Q10, Q10-Q20, Q20-Q30, Q30-Q40, Q40+).
The UMAP plot allows visualization of individual cells in 2D space to capture the similarities between cells. Clustering is performed using the Louvain method.
The Top Marker Genes table includes the top 10 marker genes for each cluster of the UMAP. For each gene, the gene name, ENSEMBL gene ID, log2 fold change, and pValue are shown. The contents of the table are also available in CSV format by selecting Download CSV.
Base Quality by Position
Phred-scale quality value for bases at a given location
Mean Base Quality by Position
Average Phred-scale quality value of bases with a specific nucleotide and at a given location in the read
Read Length Distribution
Total number of reads with each observed length
Read Quality Distribution
Total number of reads with each observed average Phred-scale quality score
%GC Content
Percentage of sequences with each GC content across the whole length of each sequence compared to a modelled normal distribution of GC content
Read Quality by %GC Content
Average Phred-scale read mean quality for reads with each GC content percentile between 0% and 100%
Ambiguous Base Content by Position
Percent ambiguous bases at a given location
Base Content by Position
Percent of bases of each specific nucleotide at given locations in the read
Adapter Content by Position
Percentage of the proportion of your library which has seen each of the adapter sequences at each position. Once a sequence has been seen in a read it is counted as being present right through to the end of the read so the percentages you see will only increase as the read length increases
The DRAGEN Single Cell RNA software has optional and required fields in addition to general sample sheet requirements. Below is a description of the fields in each section.
LibraryPrepKits
Required
Accepted values are: IlluminaSingleCell3RNAPrep
SoftwareVersion
Required
The DRAGEN component software version. DRAGEN Single Cell RNA software requires 4.4.0.
NoLaneSplitting
Required
TRUE for DRAGEN Single Cell RNA software
TrimUMI
Required
0 for DRAGEN Single Cell RNA software
OverrideCycles
Required
U45;I10;I10;Y72 for DRAGEN Single Cell RNA software. May be different if running multiple applications in a single run.
FastqCompressionFormat
Required
gzip
Sample_ID
Required
Must match a Sample_ID listed in the [Cloud_DragenSingleCellRna_Data] and [Cloud_Data] section.
Index
Required
Index 1 sequence
Index2
Required
Index 2 sequence
Lane
Only for NovaSeq 6000/6000 Dx workflow
Indicates which lane corresponds to a given sample. Enter a single numeric value per row. Cannot be empty, i.e. the analysis fails if the Lane column is present without a value in each row.
SoftwareVersion
Required
The DRAGEN component software version. DRAGEN Single Cell RNA software requires 4.4.0.
EnablePipseqMode
Required
TRUE for Illumina Single Cell 3’ RNA kit. Maps to --scrna-enable-pipseq-mode in command line arguments.
ReferenceGenomeDir
Required
Location of reference genome TAR containing a DRAGEN hash table and optionally a GTF.
BarcodeRead
Required
Read1 for Illumina Single Cell 3’ RNA kit
RnaLibraryType
Required
SF for Illumina Single Cell 3’ RNA kit (stranded forward). Maps to --rna-library-type in command line arguments.
BarcodePosition
Required
0_7+11_16+20_25+31_38 for Illumina Single Cell 3’ RNA kit. Maps to --scrna-barcode-position in command line arguments.
UmiPosition
Required
39_41 for Illumina Single Cell 3’ RNA kit. Maps to --scrna-umi-position in command line arguments.
Sample_ID
Required
Must match a Sample_ID listed in the [BCLConvert_Data] and [Cloud_Data] section.
GeneratedVersion
Not Required
The cloud version used to create the sample sheet. Optional if manually updating a sample sheet. (ex: 1.17.0.202411192008).
Cloud_Workflow
Not Required
ica_workflow_1
BCLConvert_Pipeline
Required
The value is a universal record number (URN). The valid value is:
urn:ilmn:ica:pipeline:730df76f-715a-45bf-9500-e6e0ce1ab224#BclConvert_v4_3_13
Cloud_DragenSingleCellRna_Pipeline
Required
The value is a URN in the following format:
urn:ilmn:ica:pipeline:b3c5ab5f-2853-4873-93c4-61a807f844a7#DRAGEN_Single_Cell_RNA_4-4-2_-_Sequencer_Integration_Only
Sample_ID
Required
Must match a Sample_ID listed in the [BCLConvert_Data] and [Cloud_DragenSingleCellRna_Data] section.
ProjectName
Not Required
The BaseSpace Sequence Hub project name
LibraryName
Not Required
Combination of sample ID and index values in the following format: sampleID_Index_Index2.
LibraryPrepKit
Required
The Library Prep Kit used
IndexAdapterKitName
Required
The Index Adapter Kit used
The preferred method for creating sample sheets is to use
Total Input Reads
Total number of reads for the sample
% Mapped
Percentage of reads that are mapped to the genome
Total Barcoded Reads
Number of reads with barcodes that match the whitelist
Passing Cells
Total unique barcodes that pass the count threshold for a passing cell
% Reads in Passing Cells
Number of reads in passing barcodes divided by the total number of reads in all cells of barcodes
Mean Reads per Cell*
Mean reads per cell (Total gene reads / Passing cells)
Median Reads per Cell
The median number of reads per passing barcode
Median Molecules per Cell
The median number of captured RNAs per passing barcode
Median Genes per Cell
The median number of unique genes identified per passing barcode
% Sequencing Saturation
Percentage chance that if you sequenced an additional read, it would have already been observed at least once
*Only available for analyses with PIPseq CRISPR mode enabled
The barcode rank plot (often referred to as the “knee plot”), orders barcodes based on the number of transcripts associated with them. Typically, the cell barcodes are concentrated at the top of the rank plot, whereas the background barcodes are concentrated in the lower portion of the plot. The purpose of the cell calling is to find a point in the first “knee” area that separates the cells from the background.
This section is only available for analyses with PIPseq CRISPR mode enabled.
Input Gene Expression Reads
Total input gene expression reads
Input Feature Reads
Total input feature reads
Barcoded Gene Expression Reads
Total barcoded gene expression reads
Barcoded Feature Reads
Total barcoded feature reads
Unique Exon Reads
Unique exon matching reads
Unique Intron Reads
Unique intron matching reads
Filtered Antisense Reads
Filtered antisense reads
Filtered Ambiguous Reads
Filtered ambiguous reads
Filtered Low MAPQ Reads
Filtered low MAPQ reads
Filtered Non-Matching Reads
Filtered non-matching reads
Mitochondrial Reads
Mitochondrial reads
Total Gene Reads
Total gene reads
Counted Gene Reads
Total counted genes
Molecules
Total molecules
Genes Detected
Total genes detected
Cell Type
Barcode classification category. “PASS” for passing cells or “LOW” for background cells.
Total Gene Reads
Total number of genic reads in the category
Molecules
Total number of molecules in the category
Genes
Total number of unique genes detected for each cell in the category
Mitochondrial Reads
Total number of mitochondrial reads in the category. This is based on gene names that include prefixes like “MT-” and may not work with every mapping reference.
Feature Molecules*
Total number of CRISPR or other target molecules in the category
Features*
Total number of unique CRISPR sequences or other targets detected in the category
*Only available for analyses with PIPseq CRISPR mode enabled
This section is only available for analyses with PIPseq CRISPR mode enabled.
Input Feature Reads
Total input feature reads
Barcoded Feature Reads
Total barcoded feature reads
Feature Matching Reads
Feature matching reads
Feature Reads
Total Feature Reads
Feature Molecules
Total feature molecules
Median Feature Reads per Cell
Median feature reads per passing cell
Feature Molecules per Cell
Median feature molecules per passing cell
Features per Cell
Median features per passing cell
Features Detected
Total features detected
CRISPR Reads
Total CRISPR reads matching known barcodes
% Mapped Features
Percentage of CRISPR tags mapped
% Features in Cells
Percentage of valid CRISPR tags in cells
% Cells with Features
Percentage passing cells with CRISPR reads
Illumina Connected Multiomics (ICM) is available for further tertiary analysis of single-cell and other multiomic data.
Below are explanations of the steps that are run in the default single-cell analysis that is automatically launched on import of single-cell data in ICM. Also included below are the instructions to launch each step manually if input parameters need to be adjusted from the default settings.
Because different cells will have a different number of total counts, it is important to normalize the data prior to downstream analysis. For droplet-based single cell isolation and library preparation methods that use a 3' counting strategy, where only the 3' end of each transcript is captured and sequenced, we recommend the following normalization - 1. CPM (counts per million), 2. Add 1, 3. Log2. This accounts for differences in total UMI counts per cell and log transforms the data, which makes the data easier to visualize.
Click the counts node you wish to normalize
Click Normalization and scaling in the context-sensitive task menu on the right
Click Normalization
Click Use recommended to add the recommended normalization scheme
This adds CPM (counts per million), Add 1, and Log2 to the Normalization order panel. Normalization steps are performed in descending order.
Click Finish to apply the normalization
Note in the default single cell analysis pipeline, ICM runs the normalization step again to scale data by subtracting the mean of the feature and dividing by the standard deviation.
A common task in single-cell RNA-Seq analysis is to filter the data to include only informative genes (features). Because there is no gold standard for what makes a gene informative or not and ideal gene filtering criteria depends on your experimental design and research question, ICM has a wide variety of flexible filtering options. The Filter features step can also be performed before normalization or after normalization.
Select a data node containing the count matrix
Click Filtering in the task menu
Click Filter features
Select the Filter type and Filter criteria desired
There are four categories of filter available - noise reduction, statistics-based, feature metadata, and feature list.
The noise reduction filter allows you to exclude genes considered background noise based on a variety of criteria. The statistics-based filter is useful for focusing on a certain number or percentile of genes based on a variety of metrics, such as variance. The metadata, saved list , and manual list filters allow you to filter your data set to include or exclude particular genes.
For example, you can use a noise reduction filter to exclude genes that are not expressed by any cell in the data set, but were included in the matrix file. To do so:
Click the Noise reduction filter check box
Set the Noise reduction filter to Exclude features where value <= 0 in at least 99.9% of cells using the drop-down menus and text boxes
Click Finish to apply the filter
The default single cell pipeline uses the statistics-based filter to filter for the top 10 features with the highest variance.
Principal components (PC) analysis (PCA) is an exploratory technique that is used to describe the structure of high dimensional data by reducing its dimensionality. Because PCA is used to reduce the dimensionality of the data prior to clustering as part of a standard single cell analysis workflow, it is useful to examine the results of PCA for your data set prior to clustering.
Select a data node containing the normalized and filtered count matrix
Click Exploratory analysis in the task menu
Click PCA from the drop-down list
Select the number of features to include
Select the number of PCs to calculate
You can choose Features contribute equally to standardize the genes prior to PCA or allow more variable genes to have a larger effect on the PCA by choosing by variance. By default, we take variance into account and focus on the most variable genes.
If you have multiple samples, you can choose to run PCA for each sample individually or for all samples together by selecting or not selecting the Split by sample option.
Click Finish to run
A new PCA task node will be produced on the task graph for the analysis. When complete, double-click the PCA task node to open the 3D PCA scatter plot in data viewer.
Beside PCA coordinates of the cells, PCA task report also includes, the Scree plot, the component loadings table, and the PC projections table.
The Scree plot lists PCs on the x-axis and the amount of variance explained by each PC on the y-axis, measured in Eigenvalue. The higher the Eigenvalue, the more variance is explained by the PC. Typically, after an initial set of highly informative PCs, the amount of variance explained by analyzing additional PCs is minimal. By identifying the point where the Scree plot levels off, you can choose an optimal number of PCs to use in downstream analysis steps like graph-based clustering, UMAP and t-SNE.
Graph-based clustering identifies groups of similar cells using PC values as the input. By including only the most informative PCs, noise in the data set is excluded, improving the results of clustering.
Click the PCA data node
Click Exploratory analysis in the task menu
Click Graph-based clustering
Clustering can be performed on each sample individually or on all samples together.
Select the Clustering algorithm to use. The default Single-Cell analysis uses the Louvain algorithm.
Check Compute biomarkers to compute features that are highly expressed when comparing each cluster
Select the number of PCs to use
Click Configure to access the Advanced options
The Number of principal components can be set based on the your examination of the Scree plot and component loadings table. The default value of 100 is likely exhaustive for most data sets, but may introduce noise that reduces the number of clusters that can be distinguished.
Click Finish to run the task
A new Graph-based clusters data and Biomarkers data node will be generated along with the task nodes
Double-click the Graph-based clusters node to see the cluster results and statistics. The Graph-based clustering result lists the Total number of clusters and what proportion of cells fall into each cluster as well as Maximum modularity which is a measurement of the quality of the clustering result where optimal modularity is 1.
Double-click the Biomarkers node to see the computed biomarkers if you have selected this option. The Biomarkers node includes the top features for each graph-based cluster. It displays the top-10 genes that distinguish each cluster from the others. Download at the bottom right of the table can be used to view and save more features. These are calculated using an ANOVA test comparing the cells in each group to all the other cells, filtering to genes that are 1.5 fold upregulated, and sorting by ascending p-value. This ensures that the top-10 genes of each cluster are highly and disproportionately expressed in that cluster.
Uniform Manifold Approximation and Projection (UMAP) is a dimensional reduction technique. UMAP aims to preserve the essential high-dimensional structure and present it in a low-dimensional representation. UMAP is particularly useful for visually identifying groups of similar samples or cells in large high-dimensional data sets such as single cell RNA-Seq.
Click the Graph-based clusters or PCA node
Click Exploratory analysis in the task menu
Click UMAP
Select the number of PCs to use
Click Configure to access the Advanced options
Click Finish to run
If you have multiple samples, you can choose to run UMAP for each sample individually or for all samples together using the Split cells by sample option.
Like Graph-based clustering, UMAP takes PC values as its input and further reduces the data down to two or three dimensions. For consistency, you should use the same number of PCs as the input for UMAP that you used for Graph-based clustering.
A new UMAP task node will be produced. When complete, double-click the UMAP node to open the UMAP task report. Use the panel on the left to modify the plot or add more plots to this Data viewer session.
The UMAP scatter plot is interactive and can be viewed in 2D or 3D. The UMAP plot is 3D by default. You can rotate the 3D plot by left-clicking and dragging your mouse or using Control under Configure. You can zoom in and out using your mouse wheel. You can pan by right-clicking and dragging your mouse. You can use Style to modify color, shape, size, and labeling (e.g. add a fog effect to improve depth perception on the plot). Add a 2D plot clicking New plot, selecting 2D Scatter plot and selecting UMAP as the source of the data.
The Single-cell QA/QC task in ICM enables you to visualize several useful metrics that will help you include only high-quality cells. To invoke the Single-cell QA/QC task:
Click a Single cell counts data node
Click the QA/QC section of the task menu
Click Single cell QA/QC
By default, all samples are used to perform QA/QC. You can choose to split the sample and perform QA/QC separately for each sample.
If your Single cell counts data node has been annotated with a gene/transcript annotation, the task will run without a task configuration dialog. However, if you imported a single cell counts matrix without specifying a gene/transcript annotation file, you will be prompted to choose the genome assembly and annotation file by the Single cell QA/QC configuration dialog. Note, it is still possible to run the task without specifying an annotation file. If you choose not to specify an annotation file, the detection of mitochondrial counts will not be possible.
t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensional reduction technique that prioritizes local relationships to build a low-dimensional representation of the high-dimensional data that places objects that are similar in high-dimensional space close together in the low-dimensional representation. This makes t-SNE well suited for analyzing high-dimensional data when the goal is to identify groups of similar objects, such as cell types in single cell RNA-Seq data.
Click the Graph-based clusters or PCA node
Click Exploratory analysis in the task menu
Click t-SNE
Select the number of PCs to use
Click Configure to access the Advanced options
Click Finish to run
The t-SNE scatter plot visualization has the same functionality and style elements as the UMAP plot described above.
A common goal in single cell analysis is to identify genes that distinguish a cell type. To do this, you can use the differential analysis tools in ICM.
Click the Normalized counts results node
Click Statistics in the toolbox
Click Differential Analysis
Select ANOVA as the Method to use for differential analysis and click Next
Select and add the categorical and numeric factors for analysis
Click Next
The differential analysis tool can be used to compare one group of cells to another group of cells to identify genes or features that distinguish cells. Common examples include determining distinguishing genes between one cell type and all others, two cell types, or the same cell type between two experimental conditions.
The comparison builder can be used to create any of these tests. The top panel is the numerator for fold-change calculations so the experimental or test groups should be selected in the top panel. The bottom panel is the denominator for fold-change calculations so the control group should be selected in the bottom panel.
Add classifications to the numerator
Add classifications to the denominator
Select Combine for a single comparison or Pairwise for a factorial set of comparisons
Select Add comparison
Optionally select the checkbox to Apply lowest average coverage filter to exclude a feature if the geometric average of its values over all samples is less than the specified value.
Click Configure to access the Advanced options
Click Finish to run
When completed, double click the newly generated data node to open the ANOVA task report. The ANOVA task report lists genes on rows and the results of the statistical test (p-value, fold change, etc.) on columns. Genes are listed in ascending order by the p-value of the first comparison so the most significant gene is listed first.
Using the filter control panel on the left, we can filter to just the genes that are significantly different for the comparison.The number of genes at the top of the filter control panel updates to indicate how many genes are left after the filters are applied.
Click Generate filtered node to generate a filtered version of the table for downstream analysis. The ANOVA report will close and a new task, the Filter list task, will run and generate a filtered Feature feature list data node.
While a long list of significantly different genes is important information about a cell type, it can be difficult to identify what the biological consequences of these changes might be just by looking at the genes one at a time. Using enrichment analysis, you can identify gene sets and pathways that are over-represented in a list of significant genes, providing clues to the biological meaning of your results.
Click the Feature list data node produced by the Differential analysis filter
Click Biological interpretation
Click Gene set enrichment
Select the Database to use. ICM distributes the gene sets from the Gene Ontology Consortium, but Gene set enrichment can work with any custom or public gene set database.
Choose the latest assembly available from the Gene set drop-down
Click Finish
When completed, double-click the Gene set enrichment task node to open the task report.
The Gene set enrichment task report lists gene sets on rows with an enrichment score and p-value for each. It also lists how many genes in the gene set were in the input gene list and how many were not. Clicking the Gene set ID links to the geneontology.org or KEGG page for the gene set.
Once we have filtered to a list of significantly different genes, we can visualize these genes by generating a heatmap.
Click the Filtered feature list data node produced by the Differential analysis filter
Click Exploratory analysis in the toolbox
Click Hierarchical clustering / heatmap
The hierarchical clustering task will generate the heatmap; choose Heatmap as the plot type. You can choose to Cluster features (genes) and cells (samples) under Feature order and Cell order in the Ordering section. You will almost always want to cluster features as this generates the clear blocks of color that make heatmaps comprehensible. For single cell data sets, you may choose to forgo clustering the cells in favor of ordering them by the attribute of interest.
Select Feature order
Select Cell order
Optionally add any additional Filtering
Click Configure to access the Advanced options
Click Finish to run
ScType allows automated cell-type identification based on scRNA-seq data along with a comprehensive cell marker database as background information.
Click the data node containing the non-normalized count matrix
Click on Classification > Single cell type in the toolbox
Select the marker databse from the drop-down menu, the original ScType database is provided by default
Select categorical attributes to Categorize by
Optionally Filter tissue types
Select the SC Type algorithm to use
Click Configure to access the Advanced options
Click Finish to run
A new scType classification task node will be produced. When complete, double-click the Single cell type node to open the results of the cell-type identification. For each cell, the tissue, sctype result, and typescore are reported.
The following table describes the files created during secondary analysis:
<Sample_ID>.scRNA.bam
Binary Alignment Map (BAM) files containing information about all reads in the input FASTQ files that were mapped to the reference genome
<Sample_ID>.scRNA.bam.bai
Index file for the BAM for use by downstream applications
<Sample_ID>.scRNA.barcodeCounts.txt
Text file containing the counts per barcode
<Sample_ID>.scRNA.barcodeSummary.tsv
Summary of barcode statistics
<Sample_ID>.scRNA_metrics.csv
Single cell metrics summary with assay sensitivity and quality metrics
<Sample_ID>.scRNA.matrix.mtx.gz
Sparse matrix with rows that represent features and genes detected, and columns that consist of all barcodes that were detected
<Sample_ID>.scRNA.features.tsv.gz
Information about the features corresponding to the rows of the sparse matrix
<Sample_ID>.scRNA.barcodes.tsv.gz
List of barcodes corresponding to the columns of the sparse matrix
<Sample_ID>.scRNA.filtered.matrix.mtx.gz
Filtered sparse matrix with rows that represent features and genes detected, and columns that consist of all barcodes that were detected
<Sample_ID>.scRNA.filtered.features.tsv.gz
Information about the features corresponding to the rows of the filtered sparse matrix
<Sample_ID>.scRNA.filtered.barcodes.tsv.gz
List of barcodes corresponding to the columns of the filtered sparse matrix
For information on tracking and viewing run and analysis results in BaseSpace Sequence Hub, refer to .
To view results on ICA, you may either click on "View Files in ICA" in the top right corner of your BSSH Analysis page, or directly access the analysis in ICA. It will be in a BSSH managed project with the same name as your BSSH workgroup. For information on viewing analysis results on your Illumina Connected Analytics account, refer to .