Partek Flow offers a number of tools developed by 10x Genomics that can be used for analyzing single cell RNA-seq Gene Expression (GEX), Targeted Gene Expression, Feature Barcode, and Cell Multiplexing, and data produced by the 10x Chromium Platform. It will include Cell Ranger - Gene Expression, Cell Ranger - ATAC, Space Ranger....
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Cell Ranger is a set of analysis pipelines that process Chromium single cell data to align reads, generate feature-barcode matrices, and perform clustering and gene expression analysis for 10X Genomics Chromium Technology[1].
Cell Ranger - ATAC task in Partek Flow includes two different wrappers. To deal with the single cell ATAC-Seq dataset, the 'cellranger-atac count' pipeline from Cell Ranger ATAC v2.0[2] has been wrapped in Flow. It takes FASTQ files from 'cellranger-atac mkfastq' and performs ATAC analysis including reads filtering and alignment, barcode counting, identification of transposase cut sites, peak and cell calling, and count matrix generation. Its outputs then become the starting point for downstream analysis for scATAC-Seq data. To process Chromium Single Cell Multiome ATAC + Gene Expression sequencing data, ‘cellranger-arc count’ v2.0[3] has been wrapped to generate a variety of analyses pertaining to gene expression, chromatin accessibility and their linkage.
When importing raw reads for processing using the Cell Ranger - ATAC task for scATAC-Seq data, the user is not required to specify the data type. If you are importing 10x multiome ATAC + Gene Expression data, first select the ATAC files choose data type ATAC-Seq and complete the import. Once the import task has successfully run, add the gene expression reads to each of the samples. Remember to specify mRNA in the data type during import.
To run the Cell Ranger - ATAC task for scATAC-Seq data, select the Unaligned reads data node, then select Cell Ranger - ATAC in the 10x Genomics section (top panel, Figure 1). For 10x multiome ATAC + Gene Expression data, there will be two data nodes once the FASTQ files have been imported into Flow properly - ATAC-Seq and mRNA (bottom panel, Figure 1). Users should select the ATAC-Seq datanode to trigger the Cell ranger - ATAC task.
Similar to the Cell Ranger - Gene Expression task, a first time user will be asked to create a Reference assembly. In Partek Flow, we will use Cell Ranger ARC 2.0.0 to create a Reference assembly for all 10x Genomics analysis pipelines. Please refer to our Cell Ranger - Gene Expression task manual on how to build or use Reference assembly.
Once the right assembly has been chosen/provided, simply press the Finish button to run the task with default settings. The reference assembly of ‘Homo sapiens (human) - hg38’ has been used as an example here (Figure 2).
The interface will be different for Single Cell Multiome ATAC + Gene Expression sequencing data because the gene expression data from the very same cell has to be paired with the ATAC-Seq data (Figure 3)
After the task has finished successfully, a new data node named Single cell counts will be displayed (Figure 4). This data node contains a filtered peak barcode count matrix for ATAC-Seq data, but a unified feature-barcode matrix that contains gene expression counts alongside ATAC-Seq peak counts for each cell barcode for multiomic data. To open the task report when the task is finished, double click the output data node, or select the Task report in the Task results section after single clicking the data node. The task report (Figure 5) is the same as the ‘Summary HTML’ from Cell Ranger ATAC output.
The Library Complexity section in Data Quality report plots the observed per cell complexity, measured as median unique fragments per cell, as a function of mean reads per cell (Figure 6). While the Mapping section displays the Insert Size Distribution plot, and metrics derived from it. Single Cell ATAC read pairs produce detailed information about nucleosome packing and positioning. The fragment length distribution captures the nucleosome positioning periodicity. The Targeting section shows profiling of the chromatin accessibility behavior of the library at epigenetically relevant regions in the genome. The Enrichment around TSS plot is helpful to assess the signal-to-noise ratio of the library. It is well known that Transcriptional Start Sites (TSSs) and the promoter regions around them have a higher degree of chromatin accessibility compared to other regions of the genome. The Peaks targeting plot presents the variation in the number of on-target fragments, or fragments that overlap peaks, within each barcode group. A higher percentage of the barcode fragments overlap peaks is expected for cell-associated barcodes.
The task report for multiomic data analysis is more complicated. It contains summary metrics at different levels - ATAC, gene expression, both/joint. Joint view is the default view visible upon first rendering the summary and can be accessed by clicking "Joint" at the top left corner. Metrics that are specific to the given Chromatin Accessibility library will appear in the ATAC tab. Lastly, metrics that are specific to the given gene expression library will appear in the Gene Expression tab (Figure 7). To understand the details, please refer to 10x Genomics webpage[4].
Other adjustable parameters in this task (Figure 2) include:
Subsample percentile: Downsample to preserve this fraction of reads.
Users can also click Configure to change the default settings in Advanced options (Figure 2).
Override peak caller: To override the peak caller, users specify peaks to use in downstream analyses from supplied 3-column BED file. The supplied peaks file must be sorted by position and not contain overlapping peaks; comment lines beginning with `#` are allowed.
Force cells: Define the top N barcodes with the most fragments overlapping peaks as cells and override the cell calling algorithm. N must be a positive integer <= 20,000. Use this option if the number of cells estimated by Cell Ranger -ATAC is not consistent with the barcode rank plot.
Memory limit (GB): Restricts Cell Ranger - ATAC to use specified amount of memory (in GB) to execute pipeline stages.
If users have converted FASTQ outside of Partek, the available count matrix can be imported along with additional files (Figure 8A). Files that Flow will need to complete the import includes the following:
filtered_feature_bc_matrix.h5
per_barcode_metrics.csv (or the name is singlecell.csv)
peaks.bed
fragments.tsv.gz.tbi
fragments.tsv.gz.
Those five files can usually be found in the outs/ subdirectory within the pipeline output directory (Figure 8B). Five files are necessary per sample because scATAC-seq is more complicated than RNA-seq. If peak calling was performed on each sample/dataset independently, the peaks are unlikely to be exactly the same so all of the samples/datasets need to be merged to create a common set of peaks across the samples/datasets; this is performed during data import wherein all of the samples/datasets need to be imported at one time, not separately. To add samples, click the green + button (Figure8A).
Although the index files (I1 or I2) are optional, we encourage users to include all of the FASTQs in the table (Figure 9) while importing data for Cell Ranger - ATAC.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
STARsolo is a versatile tool developed for single-cell RNA sequencing analysis. It inputs the raw FASTQ reads files, and performs mapping, demultiplexing and quantification for single cell RNA-seq data. It follows CellRanger logic for cell barcode whitelisting and UMI deduplication, and produces nearly identical gene counts in the same format. STARsolo is also running much faster than the CellRanger[1].
Partek Flow wraps STARsolo v2.7.11a and focuses on assays from 10X Genomics.
To run the STARsolo task for scATAC-Seq data, select the Unaligned reads data node, then select STARsolo in the 10x Genomics section (Figure 1).
Once the task has been picked, a first time user will be asked to create a index file (Figure 2). Clicking the big blue button of Create star2.7.8a index would pop up a new window where lists the requirements that users need to fill in (right panel, Figure 2).
Once the right options have been chosen, simply press the Create button to finish. The index of ‘ hg38 - Ensembl 99’ has been created as an example here (Figure 3).
The main task menu will be refreshed as above (Figure 3) for STARsolo task if the index has been added. Users can go ahead select the Assay type before they click the Finish button to run the task. The STARsolo for this version only handles gene expression data no matter where it’s sourced from.
A new data node named Single cell counts will be displayed in Flow if the task has been completed successfully (Figure 4). Downstream analysis such as QA/QC, normalization, dimension reduction, clustering, differential analysis, etc will start from the data node.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
The task report is sample based. Users can use the dropdown list on the top left to switch samples. Under the sample name, there are two tabs on each report - Summary report and Data Quality report (Figure 5). Important information on the Estimated Number of Cells, Median high-quality fragments per cell, Fraction of high-quality fragments overlapping peaks, as well as information on Sample, Sequencing, Cells and Cell Clustering are summarized in different panels. Importantly, the Barcode Rank plot and the Fragment Distribution plot have also been included in the Cells section of the Summary report (Figure 5). Descriptions of metrics in the following sections can also be found by clicking the to the section header in the Summary HTML file itself.
The import of ATAC FASTQ files are as straightforward as the (sc)RNA-Seq data. However, we need to associate the two different types of data together for 10x multiome ATAC + Gene Expression data. This includes two steps: 1). Import the ATAC FASTQ files in the new page after clicking on the "Automatically create samples from files" in Data tab (Figure 10A) and select the data type, as ATAC-Seq, from the dropdown list (Figure 10B). 2). Move back to the Data tab and display all of the files by clicking on the Show data files button at the bottom left of Sample name table. Then click the green + button (Figure 10C) to add RNA FASTQs to the same sample. Similarly, we will select the data type from the dropdown (mRNA) before we finish the import process by clicking on the Associate file button (Figure 10D).
Space Ranger is a set of analysis pipelines that process Visium Spatial Gene Expression data with brightfield and fluorescence microscope images (1).
Space Ranger 2.0.1 has been wrapped in Partek Flow as Space Ranger task. The task takes .fastq and .jpeg/.tiff files as input and performs alignment, filtering, barcode counting, and UMI counting. The output is gene expression count matrix in a .h5 format (both raw and filtered are available for download via Task Details), as well as a .zip file with spatial files (image). Note that the Space Ranger task in Partek Flow does not include all the options and uses cases covered by the Space Ranger pipelines of 10x Genomics.
Note: when using the Space Ranger task in Partek Flow, there are more restrictions on the sample name-- sample name can only contain letters, digits, underscores and dashes. Please Edit the sample names on Data tab in Partek Flow to remove any other characters, e.g. space etc.
To run Space Ranger in Partek Flow select Unaligned Reads node on the Analysis tab and then select the Space Ranger task in the toolbox (Figure 1).
Select the 10x assay type. Choose CytAssist gene expression if you are using the Visium CytAssist gene expression library.
For both Space Ranger or Cell Ranger tasks, a Reference assembly is required (Figure 2). To define the Reference assembly, first select the Genome build for the organism of interest, then select the annotation Index. In this manner, custom libraries can be created (e.g. keep the Genome build but change the annotation Index).
To add a new species genome, choose New assembly from the drop down for Genome build which will open a new window with configuration options to edit, then click Create (Figure 3).
The sample table under Input options has one row per sample (Figure 2). Image file is required, and that is a single hematoxylin and eosin brightfield image in either .jpg or .tiff format. Click on the Browse button under Browse image file and the file browser will come up. Point to the image file and push Continue. Formalin-fixed paraffin-embedded (FFPE) image files require the Probe set file otherwise it is optional; it is a .csv file specifying the probe set used (=target panel).
If you want to specify the sample's slide and area information, check the box by the Use slide serial number file under Advanced options and then click Browse to point to the file. The file should be tab-delimited with samples on rows. The first column is the sample name, the slide name is on the second column, slide area is on the third column.
If the Slide serial number is not available for CytAssist samples, the Slide parameter should be specified where visium-2 corresponds to a 6.5 x 6.5 mm capture area and visium-2-large corresponds to a 11 x 11 mm capture area.
If necessary, click on the Configure link in the Advanced options section to open the Advanced Options dialog (Figure 4). Use R1 length to hard trim the R1 reads to specified length; use R2 length to hard trim the R2 reads to specified length.
The result of Space Ranger task is the Single cell counts data node, which contains the gene expression data. Double click the Single cell counts node to invoke the task report (Figure 5) which is the same as the ‘Summary HTML’ from the original 10x Genomic pipelines. Task report is sample based. You can use the dropdown list in top left to switch samples (not shown in Figure 6). Each report consists of two pages: Summary and Analysis. For details, please visit 10x Genomics web page.
After creating the Single cell counts node, the next step is to associate the microscopy image with the expression data. To start, select the Single cell counts data node and then go to Annotate Visium image in the toolbox (Figure 6).
The setup page shows the sample table (one sample per row; Figure 7). Click on the Browse button to open the file browser and point to the file _spatial.zip, created by the Space Ranger task. After that, click on Finish to launch the Annotate task.
You can find the location of the _spatial.zip file using the following steps. Select the Space Ranger task node (i.e. the rectangle) and then click on the Task Details (toolbox). Click on the Output files link to open the page with the list of files created by the Space Ranger task. Mouse over any of the files to see the directory in which the file is located. Figure 8 shows the path to the .zip file which is required for Annotate Visium image.
A new data node, Annotated counts, will be generated (Figure 9).
The Annotated counts node is Split by sample. This means that any tasks performed from this node will also be split by sample. Invoke tasks from the Single cell counts node to combine samples for analyses.
Annotate Visium image task creates a new node, Annotated counts. Double click on the Annotated counts node to invoke the Data Viewer showing data points overlaid on top of the microscopy image (Figure 10).
Cell Ranger is a set of analysis pipelines that process Chromium single cell data to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis for 10X Genomics Chromium Technology[1].
The 'cellranger count' pipeline from Cell Ranger v6.0.0[2] has been wrapped in Partek Flow as Cell Ranger - Gene Expression task. It does not comprehensively cover all of the options and analysis cases Cell Ranger can handle for now, but converts FASTQ files from 'cellranger mkfastq' and performs alignment, filtering, barcode counting, UMI counting for single cell gene expression and Feature Barcode data. The output gene expression count matrix in .h5 format (both raw and filtered available for users to download in the output page of task details) then becomes the starting point for downstream analysis for scRNA-seq in Flow. For Feature Barcode data, Flow outputs a unified feature-barcode matrix that contains gene expression counts alongside Feature Barcode counts for each cell barcode.
Note: When use Cell Ranger - Gene Expression task in Partek Flow, there are more restrictions on sample name -- sample name can only contain letters, digits, underscores and dashes. Please edit the sample names on Data tab in Partek Flow to remove any other characters ,e.g. space etc.
To run the Cell Ranger - Gene Expression task for scRNA-seq data in Flow, select Unaligned reads datanode, then select Cell Ranger - Gene Expression in the 10x Genomics section (left panel, Figure 1). For Feature Barcode data, there will be two data nodes once the FASTQ files have been imported into Flow properly - mRNA and protein (right panel, Figure 1). Users should select mRNA datanode to trigger the Cell ranger - Gene Expression task.
Once the Genomics production has been picked, users will be asked to create a Reference assembly if it is the first time to run the Cell Ranger - Gene Expression task (Figure 2). In Partek Flow, we will use Cell Ranger ARC 2.0.0 to create reference assembly for all 10x Genomics analysis pipelines. To create and use a reference assembly, Cell Ranger ARC requires a reference genome sequence (FASTA file) and gene annotations (GTF file), here are the details.
Clicking the big grey button of Create Cell Ranger ARC 2.0.0 reference would pop up a new window where lists the requirements that users need to fill in (Figure 3). To create the same reference genomes (2020-A) that are provided in Cell Ranger by default, the transcriptome annotations are respectively GENCODE v32 for human and vM23 for mouse, which are equivalent to Ensembl 98[3]. If users don't have any options in the dropdown list, they can click Add annotation model (GTF file) for Index, or New assembly... (FASTA file)for Assembly and upload the files.
Once the right options has been chosen/provided, simply press the Create button to finish. The reference assembly of ‘Homo sapiens (human) - hg38’ has been created as an example here (Figure 4).
The main task menu will be refreshed as above (Figure 4) for gene expression data if references have been added. Users can go ahead click the Finish button to run the task as default.
While for Feature Barcode data, there are more information needed besides reference assembly. An additional section of Protein has been added to the interface if Single cell gene expression + Cell surface protein has been selected for Feature Barcode data (Figure 5). Users need firstly push the button Select data node and select the correct data for feature of antibody capture or protein in a new pop-up window (top right, Figure 5). Then users need to upload the feature reference file (.csv) prepared for their datasets. A Feature Reference CSV file declares the molecule structure and unique Feature Barcode sequence of each feature present in the experiment. It should include at least six columns: id, name, read, pattern, sequence and feature_type. An example of TotalSeq™-B Feature Reference CSV has been linked here. Users can download it by clicking the link and use it as a template for their own data. But for more details, please refer to 10x Genomics webpage[4].
A new data node named Single cell counts will be displayed in Flow if the task has been finished successfully (Figure 6). This data node contains a filtered feature barcode count matrix for gene expression data, but a unified feature-barcode matrix that contains gene expression counts alongside Feature Barcode counts for each cell barcode for Feature Barcode data. To open the task report when the task is finished, double click the output data node, or select the Task report in the Task results section after single clicking the data node. Users then will find the task report (Figure 7) is the same to the ‘Summary HTML’ from Cell Ranger output.
Task report is sample based. Users can use the dropdown list on the top left to switch samples. Under the sample name, there are two tabs on each report - Summary report and Analysis report (Figure 7). Important information on Estimated Number of Cells, Mean Reads per Cell, Median Genes per Cell, as well as information on Sequencing, Mapping, and Sample are summarized in different panels. The Barcode Rank Plot has also been included as an important piece in the Cells panel in the Summary report (Figure 7).
Another two plots -biplots of Sequencing Saturation and Median Genes per Cell to Mean Reads per Cell have been included in the Analysis report as they are important metrics to library complexity and sequencing depth (Figure 8).
Other than two additional panels summarized information for Antibody Sequencing and Antibody Application have been added, the task report for Feature Barcode data is the same to scRNA-seq data report.
Users can click Configure to change the default settings in Advanced options (Figure 4).
Include introns: Count reads mapping to intronic regions. This may improve sensitivity for samples with a significant amount of pre-mRNA molecules, such as nuclei.
Expected cells: Expected number of recovered cells. Default: 3,000 cells.
Force cells: Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this if the number of cells estimated by Cell Ranger is not consistent with the barcode rank plot.
Memory limit (GB): Restricts Cell Ranger - Gene Expression to use specified amount of memory (in GB) to execute pipeline stages.
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Details will be exhibited and the panel will be expanded correspondingly if the the icon is clicked. In the example below, the plot of Median Genes per Cell has been expanded while the Sequencing Saturation plot hasn't (Figure 9).